# Exploring & Clustering the Neighborhoods in Toronto

- **Problem 1: In first section of notebook, we have collected data from Wikipedia and manipulate it.**  
- **Problem 2: In Second section of notebook, we have used 'https://cocl.us/Geospatial_data' to collect latitude and longitude after that we will save that to existing table.**  
- **Problem 3: In third section of notebook, we will explore and cluster the neighborhoods in Toronto and generate map to visualize neighborhoods of Toronto**

Wikipedia Page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

## Solution of Problem 1:

In [1]:
#importing required libraries
import pandas as pd
import numpy as np
print("Packages have been imported!")

Packages have been imported!


**Using Pandas html method to fetch the data and use list index to grab the table, also replaced the Not Assigned with Nan**

In [2]:
data = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M", na_values="Not assigned")[0]

In [3]:
#showing the first five results:
data.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


In [4]:
#showing the last five results:
data.tail()

Unnamed: 0,Postal code,Borough,Neighborhood
175,M5Z,,
176,M6Z,,
177,M7Z,,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...
179,M9Z,,


**Now, We will drop those rows which has Nan in 'Borough' column and store the result set in new variable 'data_process'**

In [5]:
#Now, dropping Nan values:
data_process = data.dropna(subset=["Borough"])

#Also reset the index value. Use 'drop = True', for not creating old index column
data_process.reset_index(drop = True, inplace=True)
data_process

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


**To replace the "/" with ",", We will use replace method with regex = True Argument and print the result set**

In [7]:
data_process["Neighborhood"].replace(r"/", ",", regex=True, inplace=True)
data_process

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,..."


**Printing the shape of data_process dataframe**

In [8]:
print("Shape of data_process is: {}".format(data_process.shape))

Shape of data_process is: (103, 3)


**Now, save the result in csv to later use:**

In [9]:
data_process.to_csv("Wikipedia_Scrapping.csv")

## Solution of Problem 2:

- In this part, we have used latitude and longitude CSV file which is provided by the instrunctor

In [10]:
#importing lat,longitude to collect latitude & longitude data
#provided by the instrunctor: https://cocl.us/Geospatial_data
lat = pd.read_csv("https://cocl.us/Geospatial_data")

In [11]:
result = [list(lat.loc[lambda lat: lat["Postal Code"] == x][["Latitude", "Longitude"]].values[0]) \
          for x in data_process["Postal code"].values]

In [12]:
data_with_location = pd.concat([data_process, pd.DataFrame(columns=["Latitude", "Longitude"], data=result)], axis=1)

In [13]:
data_with_location.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494


## Solution of Problem 3

In [14]:
#importing required libraries:
import folium

#To focus on Map first, we will choose a location randomly in Toronto
address_string = "Downtown, Toronto"
lat = 43.654599
long = -79.386306

#Printing the location, and address we have picked
print("Address String is: {} \nlatitude: {} \nLongitude: {}".format(address_string, lat, long))

Address String is: Downtown, Toronto 
latitude: 43.654599 
Longitude: -79.386306


In [15]:
toronto_map = folium.Map(location=[lat, long], zoom_start=11)

toronto_feature_group = folium.map.FeatureGroup()

for loc in zip(data_with_location[["Neighborhood", "Latitude", "Longitude"]].values):
    addr, lat, long = loc[0][0], loc[0][1], loc[0][2]
    toronto_feature_group.add_child(folium.CircleMarker(location=[lat, long], radius=3, color="blue", fill_color="blue",
                                                       fill_opacity=0.4, fill=True))
    folium.Marker([lat, long], popup=str(addr)).add_to(toronto_map)
    
print("Saving the Map in HTML!")
toronto_map.save("Exploring_clustering.html")

Saving the Map in HTML!


**We have saved the map in html so that we can display the map in jupyter notebook (Note: It will help to see the output on Github**

**To display the Map in notebook, we will use IPython Library**

In [16]:
#importing required libraries

from IPython.display import HTML, display

In [17]:
#read the html file from current directory:
doc = open(r"Exploring_clustering.html", "r")
docs = doc.read()
doc.close()

In [19]:
#code to embed the map in jupyter notebook:

embed = HTML('<iframe srcdoc="{}" '
             'style="width: {}px; height: {}px; display:block; width: 90%; margin: 0 auto; '
             'border: none"></iframe>'.format(docs.replace('"', '&quot;'), 800, 500))

embed