# Prison Break
(not the show)

---

This project explores prison breaks with an intent to learn the following:
- Obtaining real data from the internet and preparing it for analysis
- Analyzing the data using Python

Specifically, we'll answer the following questions:
- In which year did the most helicopter prison break attempts occur?
- In which countries do the most attempted helicopter prison breaks occur?

We'll use Python to create a frequency table showing our results. As seen in the table below, France leads with the highest number of attempted helicopter prison breaks:

| Country | Number of Occurrences |
| --- | --- |
| France | 15 |
| United States	| 8 |
| Belgium | 4 |
| Canada | 4 |
| Greece | 4 |
| Australia | 2 |
| Brazil | 2 |
| United Kingdom | 2 |
| Mexico | 1 |
| Ireland | 1 |
| Italy | 1 |
| Puerto Rico | 1 |
| Chile	| 1 |
| Netherlands | 1 |
| Russia | 1 |

As mentioned earlier, this project works with real-life data and we get it straight from a frequently updated [Wikipedia Article](https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes#Actual_attempts).

`helper.py` is a module containing useful "helper" functions that we'll need for this project. Let's get into it.

## Helper functions
We begin by importing the helper functions.

In [1]:
from helper import *

## Get the Data
Now, let's get the data from the [List of helicopter prison escapes](https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes) Wikipedia article and print the first three rows.

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes'
data = data_from_url(url)

for row in data[:3]:
    print(row)

['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Both men were flown to Texas and then different planes flew Kaplan to California and Contreras to Guatemala.[3] The Mexican government never initiated extradition proceedings against Kaplan.[9] The escape is told in a book, The 10-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It also inspired t

## Clean the Data
### 1. Removing the details
If you've visited the [Wikipedia Article](https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes), you'll notice that most of the space on the screen is taken by the last element, the "Details" column. To make it easier to look at our data, we're going to get rid of it

In [3]:
for row in data:
    row.pop()

print(data[:3])

[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus Twomey Kevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock Trapnell Martin Joseph McNally James Kenneth Johnson']]


Alternatively, you could:
1. Create a new variable called index and set its value to 0.

2. Iterate over each row in data using the iteration variable row.

    - Inside the loop, replace the current row in data with row without its last element using the expression data[index] = row[:-1].

    - Increment the index variable by 1 to move to the next row.

```py
index = 0

for row in data:
    row = row[:-1]
    data[index] = row
    index += 1
```

However, my version is more concise as it uses the function `pop()` which specifically removes the last element of a list. If the elements we were removing were not the last on our lists, then we would use the ```.remove()``` method or result to slicing as shown above.

### 2. Extracting the year
You might have also observed that the dates in the dataset are in the format `,` (They are strings with commas, e.g. `'August 19, 1971'`) . We're only interested in the year, a number,  which is easier to conduct statistical analysis on. To achieve this, we'll utilize the helper function `fetch_year()`.

`fetch_year('August 19, 1971')` => `1971`

In [4]:
for row in data:
    row[0] = fetch_year(row[0])

print(data[:3])

[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus Twomey Kevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock Trapnell Martin Joseph McNally James Kenneth Johnson']]


## Attempts per year
Our first objective was to find out in which year the most helicopter prison break attempts occured. To do this, we must first know how many attempts occured in each year of our dataset. We'll do this by creating a list of lists, where each inner list contains two elements:
1. A year
2. The number of attempts that occurred in that corresponding year

We'll achieve this by:
- Creating a dummy list of lists in the format `[[year, 0]]`
- Populating the list of lists

In [7]:
# attempts_per_year = [[]]

# for row in data:
#     for attempt in attempts_per_year:
#         if row[0] in attempt:
#             attempt[1] += 1
#             continue
#         else:
#             attempts_per_year.append([row[0], 1])

# print(attempts_per_year)