# Capstone Project - The Battle of Neighborhoods Part 1
## Boris Chepikov

### 1. A description of the problem and a discussion of the background.

**One of the most frequently questioning business problem for entrepreneurs on the first stage is where and what to business. To answer 'where' and 'what' questions, it is important and highly necessary to look over what businesses and financial facilities are populated around the town. However, this requires lots of time and efforts. To boost, looking over and analyzing business types or categories are efficient and drag the entrepreneurs to progress.**

**Additionally, it is also important to analyze where people are populated. If a business is started at the place where people barely go, the business will shut down even though it has no competitors. Therefore, this project will also look over the population by each postal code.**

### 2. A description of the data and how it will be used to solve the problem.

##### Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

**For this project, I will use two datasets. One is from the previous project, Neighborhood Segmentation and Clustering, and another dataset is from Statistics Canada(https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Table.cfm?Lang=Eng&T=1201&S=22&O=A) for the population by postal code in Canada in 2016.**

**The first dataset is used for the numbers of businesses by business types in each town in Toronto, and the second dataset from Statistics Canada will be used for analyzing populations by postal codes in Canada.**

In [2]:
!conda install -c conda-forge bs4 -y

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - bs4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.8.2       |   py36h9f0ad1d_1         157 KB  conda-forge
    openssl-1.1.1e             |       h516909a_0         2.1 MB  conda-forge
    soupsieve-1.9.4            |   py36h9f0ad1d_1          58 KB  conda-forge
    bs4-4.8.2                  |                1           4 KB  conda-forge
    certifi-2019.11.28         |   py36h9f0ad1d_1         149 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Tot

In [2]:
!conda install -c conda-forge geopy -y

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          92 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0   conda-forge
    geopy:         1.21.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.21.0         | 58 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###

In [3]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from geopy.geocoders import Nominatim

In [7]:
#take first dataset
wikipedia_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
raw_wikipedia_page= requests.get(wikipedia_link).text
soup = BeautifulSoup(raw_wikipedia_page)
#print(soup.prettify())

In [10]:
table = soup.find('table')
column_names = ['Postalcode','Borough','Neighborhood']
df = pd.DataFrame(columns = column_names)
#print(table)

In [11]:
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

In [12]:
df=df[df['Borough']!='Not assigned']

In [13]:
df[df['Neighborhood']=='Not assigned']=df[['Borough']]
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [14]:
temp_df=df.groupby('Postalcode')['Neighborhood'].apply(lambda x: "%s" % ', '.join(x))
temp_df=temp_df.reset_index(drop=False)
temp_df.rename(columns={'Neighborhood':'Neighborhood_joined'},inplace=True)    

In [15]:
df_merge = pd.merge(df, temp_df, on='Postalcode')

In [None]:
df_merge.drop(['Neighborhood'],axis=1,inplace=True)

In [16]:
df_merge.drop_duplicates(inplace=True)

In [17]:
df_merge.rename(columns={'Neighborhood_joined':'Neighborhood'},inplace=True)

In [18]:
df_merge.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Neighborhood.1
0,M3A,North York,Parkwoods,Parkwoods
1,M4A,North York,Victoria Village,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,Queen's Park / Ontario Provincial Government


In [19]:
df_merge.shape

(103, 4)

In [20]:
def get_geocode(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

In [21]:
geo_df=pd.read_csv('http://cocl.us/Geospatial_data')

In [22]:
geo_df.rename(columns={'Postal Code':'Postalcode'},inplace=True)
geo_merged = pd.merge(geo_df, df_merge, on='Postalcode')

In [23]:
geo_data=geo_merged[['Postalcode', 'Latitude', 'Longitude', 'Borough', 'Neighborhood']]

In [24]:
toronto_data=geo_data[geo_data['Borough'].str.contains("Toronto")]

In [25]:
CLIENT_ID = 'SYJAJB1RT5OAPCY3PLNEKDOT2UDQHGSBTLXBJPTNBZGOAGUS' 
CLIENT_SECRET = 'WYP4CHP3QS35XFVFXCRBKW52WU4XV343JAQPZC4PO3HTV3YA' 
VERSION = '20180604'

In [26]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
  
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
      
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [27]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Neighborhood
Neighborhood


In [29]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Neighborhood,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,Neighborhood,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,Neighborhood,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,Neighborhood,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,Neighborhood,43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [30]:
#Second Dataset
population_canada = pd.read_csv("http://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/File.cfm?T=1201&SR=1&RPP=9999&PR=0&CMA=0&CSD=0&S=22&O=A&Lang=Eng&OFT=CSV", encoding= 'unicode_escape')

In [31]:
population_canada.head()

Unnamed: 0,Geographic code,Geographic name,Province or territory,"Incompletely enumerated Indian reserves and Indian settlements, 2016","Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,01,Canada,,T,35151728.0,15412443.0,14072079.0
1,A0A,A0A,Newfoundland and Labrador,,46587.0,26155.0,19426.0
2,A0B,A0B,Newfoundland and Labrador,,19792.0,13658.0,8792.0
3,A0C,A0C,Newfoundland and Labrador,,12587.0,8010.0,5606.0
4,A0E,A0E,Newfoundland and Labrador,,22294.0,12293.0,9603.0


In [32]:
population_canada = population_canada.drop(0, axis = 0)