#### Code by Larissa Evaldt

#### Importing libraries and reading dataset

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('data/garda.csv', index_col=[0])

In [3]:
df.shape

(1476, 1)

#### Dataset has 1476 rows and 1 column. Let's have a look on the first rows:

In [4]:
df.head()

Unnamed: 0,Garda Station
0,"63101 Balbriggan, D.M.R. Northern Division"
1,"63101 Balbriggan, D.M.R. Northern Division"
2,"63101 Balbriggan, D.M.R. Northern Division"
3,"63101 Balbriggan, D.M.R. Northern Division"
4,"63101 Balbriggan, D.M.R. Northern Division"


#### We have a lot of duplicated entries, because this dataset was dividided by types of crimes, it had a row for each type of crime and the number of times that the type of crime happened in each station. Let's get only the unique entries:

In [5]:
df['Garda Station'].unique()

array(['63101 Balbriggan, D.M.R. Northern Division',
       '66201 Ballyfermot, D.M.R. Western Division',
       '63201 Ballymun, D.M.R. Northern Division',
       '65101 Blackrock, Co Dublin, D.M.R. Eastern Division',
       '66101 Blanchardstown, D.M.R. Western Division',
       '62101 Bridewell Dublin, D.M.R. North Central Division',
       '65201 Cabinteely, D.M.R. Eastern Division',
       '66102 Cabra, D.M.R. Western Division',
       '66202 Clondalkin, D.M.R. Western Division',
       '63401 Clontarf, D.M.R. Northern Division',
       '63301 Coolock, D.M.R. Northern Division',
       '64101 Crumlin, D.M.R. Southern Division',
       '61101 Donnybrook, D.M.R. South Central Division',
       '63202 Dublin Airport, D.M.R. Northern Division',
       '65203 Dun Laoghaire, D.M.R. Eastern Division',
       '65102 Dundrum, D.M.R. Eastern Division',
       '66103 Finglas, D.M.R. Western Division',
       '62202 Fitzgibbon Street, D.M.R. North Central Division',
       '63102 Garristown, 

In [6]:
len(df['Garda Station'].unique())

41

#### We have data for 41 garda stations. Let's drop duplicates and keep only the unique

In [7]:
df = df.drop_duplicates()

In [8]:
df

Unnamed: 0,Garda Station
0,"63101 Balbriggan, D.M.R. Northern Division"
12,"66201 Ballyfermot, D.M.R. Western Division"
24,"63201 Ballymun, D.M.R. Northern Division"
36,"65101 Blackrock, Co Dublin, D.M.R. Eastern Div..."
48,"66101 Blanchardstown, D.M.R. Western Division"
60,"62101 Bridewell Dublin, D.M.R. North Central D..."
72,"65201 Cabinteely, D.M.R. Eastern Division"
84,"66102 Cabra, D.M.R. Western Division"
96,"66202 Clondalkin, D.M.R. Western Division"
108,"63401 Clontarf, D.M.R. Northern Division"


#### Reseting the index

In [9]:
df = df.reset_index(drop=True)

In [10]:
df

Unnamed: 0,Garda Station
0,"63101 Balbriggan, D.M.R. Northern Division"
1,"66201 Ballyfermot, D.M.R. Western Division"
2,"63201 Ballymun, D.M.R. Northern Division"
3,"65101 Blackrock, Co Dublin, D.M.R. Eastern Div..."
4,"66101 Blanchardstown, D.M.R. Western Division"
5,"62101 Bridewell Dublin, D.M.R. North Central D..."
6,"65201 Cabinteely, D.M.R. Eastern Division"
7,"66102 Cabra, D.M.R. Western Division"
8,"66202 Clondalkin, D.M.R. Western Division"
9,"63401 Clontarf, D.M.R. Northern Division"


### We need to get latitude and longitude of these Garda Stations! 

### BUT the Google Maps API is not able to find the addresses in this way that the Irish Censo formats the name of the station. 
### For example, if we search Google Maps for:
* 63101 Balbriggan, D.M.R. Northern Division
#### It will not be able to find. But if instead we search for:
* Balbriggan Garda Station
#### It finds! So let's format our entries to these new format:

#### Looping through every row of the dataset
* For each iteration, select the garda station name. Ex: On the first iteration, original will be = "63101 Balbriggan, D.M.R. Northern Division"
* Then delete the numbers and space from the front. It will become: "Balbriggan, D.M.R. Northern Division"
* Delete everything after the comma. String will be just: "Balbriggan"
* Add " Garda Station" to the end to have the final name, it will be: "Balbriggan Garda Station"
* Update dataframe to new garda name that will be easier for the API to find latitude and longitude.

In [15]:
for x in range(len(df)):
    original = df['Garda Station'][x] #select original name
    without_numbers = original.lstrip('0123456789 ') #delete numbers and space from the front
    without_end = without_numbers.split(',')[0] #delete everything after the comma
    final = without_end + ' Garda Station' #at this point string is smt like for instance: 'Balbriggan Garda Station'
    df['Garda Station'][x] = final #update entry 'x' with new name

In [16]:
df

Unnamed: 0,Garda Station
0,Balbriggan Garda Station
1,Ballyfermot Garda Station
2,Ballymun Garda Station
3,Blackrock Garda Station
4,Blanchardstown Garda Station
5,Bridewell Dublin Garda Station
6,Cabinteely Garda Station
7,Cabra Garda Station
8,Clondalkin Garda Station
9,Clontarf Garda Station


### Getting Latitude and Longitude 

In [17]:
from googlemaps import Client as GoogleMaps
from time import sleep

In [18]:
df['Longitude'] = ""
df['Latitude'] = ""

In [19]:
df.head(1)

Unnamed: 0,Garda Station,Longitude,Latitude
0,Balbriggan Garda Station,,


#### Getting latitude and longitude from google maps

In [20]:
gmaps = GoogleMaps('API_KEY_WAS_HERE')

In [21]:
for x in range(len(df)):
    try:
        sleep(1) #to add delay in case of large DFs
        geocode_result = gmaps.geocode(df['Garda Station'][x])        
        df['Latitude'][x] = geocode_result[0]['geometry']['location'] ['lat']
        df['Longitude'][x] = geocode_result[0]['geometry']['location']['lng']
    except IndexError:
        print("Address was wrong...")
    except Exception as e:
        print("Unexpected error occurred.", e )

In [22]:
df

Unnamed: 0,Garda Station,Longitude,Latitude
0,Balbriggan Garda Station,-6.19098,53.614378
1,Ballyfermot Garda Station,-6.358053,53.344754
2,Ballymun Garda Station,-6.263922,53.394374
3,Blackrock Garda Station,-6.177503,53.299793
4,Blanchardstown Garda Station,-6.380952,53.389924
5,Bridewell Dublin Garda Station,-6.274125,53.347081
6,Cabinteely Garda Station,-6.150646,53.26079
7,Cabra Garda Station,-6.307702,53.365012
8,Clondalkin Garda Station,-6.394952,53.323146
9,Clontarf Garda Station,-6.22021,53.363411


In [23]:
df.to_csv('garda_coords.csv')