## IBM DATA SCIENCE PROFESSIONAL CERTIFICATE - FINAL CAPSTONE PROJECT

## The Battle of Neighborhods in Toronto, Canada.

#### For our project we need data on various localities/ townships in Toronto, Canada. This notebook is going to demonstrate the data preparation process. The steps are mentioned below:

#### 1 - Performing web scrapping on the Wikipedia page [Postal Codes of Canada](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) to get the details of the Postal Codes of Toronto, Canada.
#### 2 - Using Geocoder package, get the latitude and longitude coordinates of the postal codes.
#### 3 - Building an API for Foursquare Maps site for exploring the neighborhoods.
#### 4 - Segmenting and Clustering the neighborhoods in Toronto.

#### Importing the required libraries for performing data scrapping and data manupulation.

In [2]:
import    pandas              as     pd
import    numpy               as     np
import    json
from      geopy.geocoders     import Nominatim
import    requests
from      pandas.io.json      import json_normalize
import    matplotlib.cm       as     cm
import    matplotlib.colors   as     colors
from      sklearn.cluster     import KMeans
import    folium
from      bs4                 import    BeautifulSoup
import    lxml
import    requests
import    csv
import    geocoder # import geocoder

In [6]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#### Importing the dataset and performing required transformation.

In [8]:
source_page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
page_object = BeautifulSoup(source_page, 'lxml')

#print(page_object.prettify())

In [9]:
postal_table = page_object.find('table',class_='wikitable sortable')
#print(postal_table.prettify())

In [10]:
file_content = []
for table in postal_table.find_all('tr'):
    header=[]
    for table1 in table.find_all('td'):
        header.append(table1.text.replace('\n',''))
    file_content.append(header)
            
#print(file_content)

In [28]:
postal_code_df = pd.DataFrame(file_content,columns=['Postcode','Borough','Neighborhood'])

In [29]:
postal_code_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,,,
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


In [30]:
postal_code_df.drop(0, inplace=True)
postal_code_df.drop(postal_code_df[postal_code_df['Borough'] == 'Not assigned'].index, inplace=True)
postal_code_df['Neighborhood'] = np.where((postal_code_df.Neighborhood == 'Not assigned'),postal_code_df.Postcode, postal_code_df['Neighborhood'])

In [31]:
postal_code_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor


In [32]:
postal_code_df = postal_code_df.groupby('Postcode').agg({'Borough':'first','Neighborhood':', '.join}).replace(',',', ').reset_index()

In [33]:
postal_code_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [17]:
postal_code_df.shape

(103, 3)

In [34]:
csv_path = 'D:/IBM Data Science Professional Course/Geospatial_Coordinates.csv'
coord_df = pd.read_csv(csv_path)
coord_df.rename(columns={'Postal Code':'Postcode'}, inplace=True)
coord_df.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [40]:
postal_location_df = pd.merge(postal_code_df, coord_df, on='Postcode')
postal_location_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [50]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [49]:
map_toronto = folium.Map(location=[latitude, longitude], tiles='OpenStreetMap', zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(postal_location_df['Latitude'], postal_location_df['Longitude'], postal_location_df['Borough'], postal_location_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto.save('map_toronto.html')

#### Now, let's define credentials for FOURSQUARE API for fetching required information.

In [51]:
CLIENT_ID = 'I3H2CB240CDH2XQN15LUUXI2UKPVZVJGQLMWWHINSYF0ZOCR' # your Foursquare ID
CLIENT_SECRET = 'I3AIA5YZNCQYW53NDOMVFJEHUV1RATOTTYKAU11C0XCHRNAR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: I3H2CB240CDH2XQN15LUUXI2UKPVZVJGQLMWWHINSYF0ZOCR
CLIENT_SECRET:I3AIA5YZNCQYW53NDOMVFJEHUV1RATOTTYKAU11C0XCHRNAR
