### 1. Start by creating a new Notebook for this assignment.

done

### 2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe


In [1]:
import pandas as pd
import numpy as np
from urllib.request import urlopen
from bs4 import BeautifulSoup

In [2]:
html = urlopen('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
content = BeautifulSoup(html, 'html.parser')
table = content.find('table')
# print(table.prettify())

In [3]:
items = []
for tr in table.find_all('tr')[1:]:
    row_data = tr.find_all('td')
    items.append([cell.text.strip() for cell in row_data])
df = pd.DataFrame(items, columns = ['PostalCode', 'Borough', 'Neighborhood'])
df.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### 3. To create the above dataframe:

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [4]:
df = df[df['Borough'] != 'Not assigned']
df.reset_index(drop = True, inplace = True)

- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

##### there are no such records:

In [5]:
df[df['PostalCode'].duplicated()]

Unnamed: 0,PostalCode,Borough,Neighborhood


- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.


In [6]:
df['new_Neighborhood'] = np.where(df['Neighborhood']=='Not assigned', df['Borough'], df['Neighborhood'])
df.head(10)


Unnamed: 0,PostalCode,Borough,Neighborhood,new_Neighborhood
0,M3A,North York,Parkwoods,Parkwoods
1,M4A,North York,Victoria Village,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront","Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights","Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government","Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village","Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge","Malvern, Rouge"
7,M3B,North York,Don Mills,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens","Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson","Garden District, Ryerson"


- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.

In [7]:
df['Neighborhood'] = df['new_Neighborhood']
df.drop(['new_Neighborhood'], axis=1, inplace=True)
df.reset_index(drop = True, inplace = True)
df.to_csv('data.csv', index = False)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [8]:
df.shape

(103, 3)

### 4. Submit a link to your Notebook on your Github repository. 

done

## PART 2

##### install and import geocoder library

In [9]:
!conda install -c conda-forge geocoder --yes
import geocoder

Collecting package metadata (current_repodata.json): done
Solving environment: / 
  - anaconda/osx-64::openssl-1.1.1d-h1de35cc_2
  - defaults/osx-64::openssl-1.1.1d-h1de35ccdone

# All requested packages already installed.



#### create helper function to retrieve latitude and longtitude based on PostalCode

In [10]:
def get_coords(postal_code_series):
    # initialize your variable to None
    coordinates = None
    # loop until you get the coordinates
    while(coordinates is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code_series.strip()))
        coordinates = g.latlng
        latitude = coordinates[0]
        longitude = coordinates[1]
    return latitude, longitude

#### get the latitude and the longitude coordinates of each neighborhood.

In [None]:
df['Latitude'], df['Longitude'] = zip(*df['PostalCode'].apply(get_coords))
df.head()

#### install geopy and folium

In [None]:
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium --yes

#### fetch latitude and longitude for 'Toronto, Ontario'

In [None]:
from geopy.geocoders import Nominatim
import folium

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_ontario")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

#### show Neighborhoods on the map

In [None]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
for lat, long, post, borough, neigh in zip(df['Latitude'], df['Longitude'], df['PostalCode'], df['Borough'], df['Neighborhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto