# Analyzing Data From Helicopter Prison Escapes

## Background

There have been multiple prison escapes where an inmate escapes by means of a helicopter. This sounds especially interesting and may seem like a scene from the movies. However, it is absolutely true. One of the earliest instances was the escape of Joel David Kaplan, nicknamed "*Man Fan*". On August 19, 1971, Kaplan escaped from the Santa Martha Acatitla in Mexico. 

In the course of this project, I intend to use my current skill set to carry out the following:

1. Obtain real data from the Internet and prepare it for analysis.
2. Explore and analyze the data using Python.
3. Practice and familiarize myself with using Jupyter notebooks.

## About the Data

Data for this analysis will be obtained from the [List of helicopter prison escapes](https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes) Wikipedia article. The data collates information on helicopter prison escape attempts from year 1971 to 2020. 

Each column provides unique information on the escape attempts such as the `Date`, `Prison Name`, `Succeeded` (did the attemp succeed or not), `Escapees` (escapee names) and `Details` (an explanation of the events surrounding that perticular escape attempt).

## Importing Useful Libraries and Functions

First, we will begin importing some helper functions defined in the `helper.py` file. We will also import a very helpful visualisation library called `plotly.express`

In [1]:
from helper import *
import plotly.express as px #

ConnectionResetError: [Errno 54] Connection reset by peer

## Getting the Data

In [None]:
# Store the link to the data into a variable
url = 'https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes'

# The function below extracts data from a link into a list of lists
data = data_from_url(url)

Evaluating the first three rows of the data imported

In [None]:
for row in data[:3]:
    print(row)

While the details column can be helpful for understanding each escape attempt, It makes our outputs difficult to read. We will remove the `details` column for now.

In [None]:
index = 0

for row in data:
    data[index] = row[:-1] # select every column asides 'details'
    index +=1
    
print(data[:3])

Now, our displays appear much cleaner.

## Analysis Objectives

In the course of this analysis, I'll attempt to answer the following questions:

1. In which year did the most attempts at breaking out of prison with a helicopter occur?

2. In which countries do the most attempted helicopter prison escapes occur?

3. Which countries have the highest chances of helicopter escape success?

This means that we will have to evaluate the escape attempts by year, collate the frequency of escape attempts in each country, and compute the chances of these attempts succeeding in each country

## Evaluating attempts by year
From each row in the dataset, we will extract only the _year_ from the date:

In [None]:
for row in data:
    row[0] = fetch_year(row[0])
    
print(data[:3])

Let's identify the **earliest** and **latest** years in the dataset, then store the entire range of years in a variable. We will call the variable - `years`

In [None]:
min_year = min(data, key=lambda x: x[0])[0] # identify minimum year
max_year = max(data, key=lambda x: x[0])[0] # identify maximum year

In [None]:
# collect all the years between the minimum and maximum years
years = []
for y in range(min_year, max_year + 1):
    years.append(y)

We will now intialize a list of lists that will help us record the history of attempts per year

In [None]:
attempts_per_year = []

for year in years:
    attempts_per_year.append([year, 0])
    
print(attempts_per_year)

Estimate the number of breakouts per year

In [None]:
for year in attempts_per_year:
    
    for row in data:
        if year[0] == row[0]:
            year[1] +=1
            
print(attempts_per_year)

### Analysis Question one:
In which year did the most attempts at breaking out of prison with a helicopter occur?

In [None]:
# This function is already defined in the 'helper.py' file
%matplotlib inline
barplot(attempts_per_year)

**Comments:** _The years in which the most helicopter prison break attempts occurred were 1986, 2001, 2007 and 2009, with a total of three attempts each_

## Evaluating Attempts by Country

In [None]:
countries_frequency = df["Country"].value_counts()

In [None]:
print_pretty_table(countries_frequency)

### Analysis Question Two
In which countries do the most attempted helicopter prison escapes occur

In [None]:
# Visualize the data above in a bar chart using plotly

fig = px.bar(countries_frequency, x=countries_frequency.index, 
             y= countries_frequency.values, 
             title = 'Helicopter prison escape attempts by country (1971 - 2020)',
             labels={
                 'y': 'Number of Ocurrences',
                 'index': 'Countries'
             },
             text= countries_frequency.values,
             template= 'none')

fig.update_yaxes(showticklabels=False)

fig.show('png', width='900')

**Comments:** _The highest number of helicopter escape attempts were recorded in France (15 attempts), the United states follows with 8 attempts, Greece, Canada and Beligium recorded 4 escape attempts each_

## Computing the Chances of Success

First, we will collate records of countries and their success information from the `data` variable:

In [None]:
success_info = []

for row in data:
    success_info.append([row[2], row[3]])

print(success_info) 

Then, we will Identify each unique country using information from the countries frequency table

In [None]:
countries = list(countries_frequency.index)

Finally, let's Compute the success to failure ratio into a new list of lists called `success chances`

In [None]:
success_chances = []

for country in countries:
    
    ratio = [0,0]
    
    # increment success and failure for every count of yes and no respectivel
    for row in success_info:
        if row[1].lower() == 'yes' and row[0] == country:
            ratio[0] +=1
        elif row[1].lower() == 'no' and row[0] == country:
            ratio[-1] += 1
    
    # compute the success to failure ratio
    if ratio[-1] == 0:
        ratio = ratio[0]/1
    else:
        ratio = ratio[0]/ratio[-1]
            
    success_chances.append([country, ratio])

success_chances

### Analysis Question Three
Based on the records compiled for success and failures, In which countries do helicopter prison breaks have a higher chance of success?

Again, we will visualize this data in a Bar Chart using `plotly`

In [None]:
success_chances = dict(success_chances)

x_label = list(success_chances.keys())
y_label = list(success_chances.values())

fig = px.bar(x=x_label, 
             y= y_label, 
             title = 'Ratio of Successful to Failed Escape Attempts',
             labels={
                 'y': 'Success/Failure Ratio',
                 'x': 'Countries'
             },
             text= y_label,
             template= 'none')

fig.update_yaxes(showticklabels=False)
fig.update_traces(textangle=0)
fig.show('png', width='900')

**Comments:** _Canada and US prisoners have the highest chances (3 times success to failure rates) of escaping through helicopters, France (2.75) and Brazil (2.0) closely trail behind_

### Conclusion

_**From analysing this dataset, we have been able to observe that:**_

* The highest amout of attempted helicopter prison breaks occured in the years 1986, 2001, 2007 and 2009.

* Within the period of 1971 - 2020, France had recorded the highest amount of helicopter attempted prison breaks (15).

* Although, France may have recorded the highest amount of helicopter prison break attempts, the chances for actual successes from these attempts are higher in other countries like US and Canada than in France.

### Prompts for future exploration
* Could there be a relationship between the number of escapees and the success of an escape attempt?
* Are there any escapees that have tried escaping more than once?

# Credits
DataQuest: [Data Science with Python](www.dataquest.io)