    The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
    Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

    If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
    Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
    In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

# Data Scrapping from Wikipedia in order to create a notebook

### Libraries and packages that were used
1. urllib 
2.BeautifulSoup 
3.Numpy  
4.Pandas 

So the first step is to install the beautiful soup package because I haven't used it yet.

In [1]:
conda install -c anaconda beautifulsoup4

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


***
After installing the package, in order to proceed to process the data it was necessary to import the needed libraries.
***
In this code the data was extracted from wikipedia and then assigned to different lists trough **_append_** inside a for lop to assign each value to the corresponding column.

In [2]:
import urllib.request
from bs4 import BeautifulSoup
import numpy as np
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
Postal=[]
Borough=[]
Neighborhood=[]
all_tables=soup.find_all("table")
right_table=soup.find('table', class_='wikitable sortable')
for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        Postal.append(cells[0].find(text=True))
        Borough.append(cells[1].find(text=True))
        Neighborhood.append(cells[2].find(text=True))



Importing the pandas libraries is done in here because now the lists are fulfilled with the information.
Then replacing the not assigned values is necessary to erase some rows.

In [3]:
import pandas as pd
df=pd.DataFrame(Postal,columns=['PostalCode'])
df['Borough']=Borough
df['Neighborhood']=Neighborhood
#replace Not assigned to NaN using numpy 
df.replace(to_replace='Not assigned\n',value=np.nan,inplace=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A\n,,\n
1,M2A\n,,\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
...,...,...,...
175,M5Z\n,,\n
176,M6Z\n,,\n
177,M7Z\n,,\n
178,M8Z\n,Etobicoke\n,"Mimico NW, The Queensway West, South of Bloor,..."


After replacing it's time to drop the rows that have not assigned values.

In [4]:
#drop nNan values
df.dropna(inplace=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
5,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
6,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"
...,...,...,...
160,M8X\n,Etobicoke\n,"The Kingsway, Montgomery Road, Old Mill North\n"
165,M4Y\n,Downtown Toronto\n,Church and Wellesley\n
168,M7Y\n,East Toronto\n,Business reply mail Processing Centre\n
169,M8Y\n,Etobicoke\n,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


When the data was imported from **Wikipedia** it came with some _\n_ in the rows, so I'll rjust delet them by targeting the element and replacing it with nothing.

In [5]:
#replace those \n that were imported from the Wikipedia site.
df = df.replace('\n','', regex=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,Business reply mail Processing Centre
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### The final result in the dataframe is accquired, just reseting the index because in the step above the indexes were deleted from the Dataframe.

In [6]:
df.reset_index(drop=True,inplace=True)
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


### Importing the dataset that contains latitude and longitud info.

In [7]:
df2= pd.read_csv("http://cocl.us/Geospatial_data")

#### Extracting the geographical information from the second dataset and using the **assign** method the add both columns.
At the end we have achieved the second 

In [10]:
Lat= df2['Latitude']
Lon = df2['Longitude']
df = df.assign(Latitude= Lat,Longitude= Lon)
df.head(10)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.763573,-79.188711
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.770992,-79.216917
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.773136,-79.239476
5,M9A,Etobicoke,Islington Avenue,43.744734,-79.239476
6,M1B,Scarborough,"Malvern, Rouge",43.727929,-79.262029
7,M3B,North York,Don Mills,43.711112,-79.284577
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.716316,-79.239476
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.692657,-79.264848
