## Web Scraper

This Scraper scrapes the data of differnet neighbourhoods of Toronto from the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and transforms the data into a pandas dataframe and then finds the latitude and the longitude coordinates of each neighborhood.  

The following libraries were used:

<li>BeautifulSoup for scraping data</li>
<li>geocoder for finding the coordinates</li>

    

## 1. Scraping the data

#### Importing the necessary librariries

In [93]:
import pandas as pd
from bs4 import BeautifulSoup as soup
import requests



#### Scraping into pandas dataframe 

In [94]:
# Creating a Pandas dataframe
df = pd.DataFrame(columns = ["PostalCode","Borough","Neighborhood","Latitude","Longitude"])


#Importing the tables html
my_url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page_html=requests.get(my_url).text
page_soup=soup(page_html,"html.parser")

containers = page_soup.findAll("table",{"class":"wikitable sortable"})
container = containers[0] 

rows = container.findAll("tr")

# Adding the data of all the rows into the pandas dataframe iteratively
for i in range(len(rows)):
    cells = rows[i].findAll("td")

    try:
        if(cells[1].text.strip() == "Not assigned"):
            continue
        else:
            df.loc[i,"PostalCode"] = cells[0].text.strip()
            df.loc[i,"Borough"] = cells[1].text.strip()
            df.loc[i,"Neighborhood"] = cells[2].text.strip()
    
    except:
        pass
    
# Resetting the index
df.reset_index(inplace = True)

In [95]:
df.head()
    

Unnamed: 0,index,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,3,M3A,North York,Parkwoods,,
1,4,M4A,North York,Victoria Village,,
2,5,M5A,Downtown Toronto,"Regent Park, Harbourfront",,
3,6,M6A,North York,"Lawrence Manor, Lawrence Heights",,
4,7,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,


## 2. Finding Coordinates using geocoder

In [65]:
import geocoder # import geocoder

for i in range(df.shape[0]):
    
    postal_code = df.loc[i,"PostalCode"]
    
    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng

    df.loc[i,"Latitude"] = lat_lng_coords[0]
    df.loc[i,"Longitude"] = lat_lng_coords[1]
    

In [101]:
df.head()

Unnamed: 0,index,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,3,M3A,North York,Parkwoods,43.7533,-79.3297
1,4,M4A,North York,Victoria Village,43.7259,-79.3156
2,5,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
3,6,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648
4,7,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895


## 3.Exporting to csv format 

In [None]:
df.to_csv("neighborhood.csv")


A csv file name "neighboorhood.csv" is created with all the relevant information 