# Segmenting and clustering neighborhoods in Toronto

This notebook is **part 2** of the course's third weeks assignment.  This notebook uses the results from part one of the assignment by loading the results from a file.


# Assignment, Part 2

Assignment description:

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to **get the latitude and the longitude coordinates of each neighborhood**.

We will use the Geocoder Python package: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

```
import geocoder

lat_lng_coords = None

while (lat_lng_coords is None):
    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
    lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]


```

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create a **dataframe which contains part 1 data and the latitude and longitude coordinates for each postal code (columns _Latitude_ and _Longitude_)**.


Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (2 marks)

## Step 0 - Import libraries


In [1]:
import pandas as pd
import requests

# Uncomment following, if importing geocoder gives an error
#!pip install geocoder
import geocoder

print("Libraries imported.")

Libraries imported.


## Step 1

Load the Toronto Postal Code + neighborhood data from previous part into a dataframe.

In [2]:
toronto_data_filename = "toronto_postal_cleaned.csv"
toronto_df = pd.read_csv(toronto_data_filename)

print("\n\nRead", toronto_df.shape[0], "rows of data into toronto_df\n")
toronto_df.head()




Read 103 rows of data into toronto_df



Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M7A,Queen's Park,Queen's Park
3,M9A,Etobicoke,Islington Avenue
4,M3B,North York,Don Mills North


## Step 2

Get the geo data

### Step 2, try 1 (load geo data using geocoder)

Here is code that works to load coordinate data with geocoder.  However, after I downloaded the coordinates, I noticed they are not exactly the same as from google, so I decided to use the already collected coordinates from the provided file _Geospatial_Coordinates.csv_ as given in the assignment.


However, here is the code to download each coordinate from the online services, for archival purposes or so...

In [3]:

# Change this to True if you want to run this code
do_for_REAL = False

if do_for_REAL:

    # make changes into a copy, so we don't need to reload the original toronto_df data
    toronto_coords_df = toronto_df
    
    #Read postal codes into a list
    postal_codes = [pc for pc in toronto_coords_df["PostalCode"]]

    # Read location coordinates from arcgis service.  Google service didn't work but arcgis did, so used it.
    # Only later found that the coordinate values are not exactly the same.
    latitudes_list = []
    longitudes_list = []

    for pc in postal_codes:

        request = '{}, Toronto, Ontario'.format(pc)
        lat_lng_coords = None
        while (lat_lng_coords is None):
            # Show some progress, because this takes some time.
            print("trying request", request)
            #g = geocoder.google(request)
            g = geocoder.arcgis(request)
            lat_lng_coords = g.latlng

        latitudes_list.append(lat_lng_coords[0])
        longitudes_list.append(lat_lng_coords[1])

    print("\n\nReading coordinates done, read", len(latitudes_list), "latitudes and", len(longitudes_list), "longitudes\n\n")
    
    toronto_coords_df["Latitude"] = latitudes_list
    toronto_coords_df["Longitude"] = longitudes_list
    toronto_coords_df.head()
    
    # save data to file
    output_file_name = "toronto_postal_plus_arcgis.csv"
    text_file = open(output_file_name, 'w')
    text_file.write(toronto_coords_df.to_csv(index=False))
    text_file.close()
    print("\n\ntoronto_coords_df written to file", output_file_name, "\n\n")

else:
    print("Code was not run")
    
# That's all for that rehearsal...


Code was not run


## Step 2, try 2 (use the coordinates given in a file)

So here we use the coordinates from *Geospatial_Coordinates.csv* file.

In [4]:
geo_df = pd.read_csv("Geospatial_Coordinates.csv")

print("\n\nRead", geo_df.shape[0], "rows of data into geo_df\n")
geo_df.head()



Read 103 rows of data into geo_df



Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Geo data from file needs a bit of engineering, as one column needs to be renamed for later purposes (=merge)

In [5]:
# Clean geo_df data, column name must be changed from 'Postal Code' to 'PostalCode'
geo_df.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
geo_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now we can merge the two dataframes, *toronto_df* and *geo_df* into a new *toronto_geo_df*.

In [6]:
# Merge to two dataframes together
toronto_geo_df = pd.merge(toronto_df, geo_df, on='PostalCode')

## Final result

In [7]:
toronto_geo_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
3,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
4,M3B,North York,Don Mills North,43.745906,-79.352188
5,M6B,North York,Glencairn,43.709577,-79.445073
6,M4C,East York,Woodbine Heights,43.695344,-79.318389
7,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
8,M6C,York,Humewood-Cedarvale,43.693781,-79.428191
9,M4E,East Toronto,The Beaches,43.676357,-79.293031
