The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').content
soup = BeautifulSoup(page, 'lxml')
tables = soup.find_all('table', class_ = 'sortable')
column_names_list = []
postcode_list = []
borough_list = []
neighborhood_list = []

In [3]:
for table in tables:
    ths = table.find_all('th')
    for th in ths:
        column_names_list.append(th.text.strip())
    
    for tr in table.find_all('tr'):
        tds = tr.find_all('td')
        if not tds:
            continue
        
        postcode, borough, neighborhood = [td.text.strip() for td in tds[:3]]
        
        if borough == 'Not assigned':
            continue
        
        if neighborhood == 'Not assigned':
            if borough:
                neighborhood = borough
        
        if postcode in postcode_list:
            neighborhood_list[postcode_list.index(postcode)] += ',' + neighborhood
        
        else:
            postcode_list.append(postcode)
            borough_list.append(borough)
            neighborhood_list.append(neighborhood)
            
column_names_list[0] = 'Postal Code'
column_names_list[2] = 'Neighborhood'
data_tuples = list(zip(postcode_list,borough_list,neighborhood_list))
df = pd.DataFrame(data_tuples, columns=column_names_list)
print(df.head())

  Postal Code           Borough                     Neighborhood
0         M3A        North York                        Parkwoods
1         M4A        North York                 Victoria Village
2         M5A  Downtown Toronto         Harbourfront,Regent Park
3         M6A        North York  Lawrence Heights,Lawrence Manor
4         M7A      Queen's Park                     Queen's Park


In [5]:
print(df.shape)

(103, 3)


# 2

In [6]:
geo_data = pd.read_csv('https://cocl.us/Geospatial_data')
print(geo_data.shape)

(103, 3)


In [7]:
df2 = pd.merge(df, geo_data, how='inner', on='Postal Code')
print(df2.head())

  Postal Code           Borough                     Neighborhood   Latitude  \
0         M3A        North York                        Parkwoods  43.753259   
1         M4A        North York                 Victoria Village  43.725882   
2         M5A  Downtown Toronto         Harbourfront,Regent Park  43.654260   
3         M6A        North York  Lawrence Heights,Lawrence Manor  43.718518   
4         M7A      Queen's Park                     Queen's Park  43.662301   

   Longitude  
0 -79.329656  
1 -79.315572  
2 -79.360636  
3 -79.464763  
4 -79.389494  


# 3

In [9]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library


Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  47.60 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  35.70 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  36.26 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  48.10 MB/s


In [10]:
LATITUDE = 43.653908
LONGITUDE = -79.384293

toronto_map = folium.Map(location=[LATITUDE, LONGITUDE], zoom_start=10)

for latitude, longitude, borough, neighborhood in zip(df2['Latitude'], df2['Longitude'],
                                                      df2['Borough'], df2['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
    ).add_to(toronto_map)

toronto_map