## Capstone Project

**Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem**

Due to the COVID-19 pandemic, there have been many people moving between cities to avoid virus hotspots. New York is one state that has seen a major spike in COVID-19 cases since the pandemic began. To escape the spread, a popular plan is to move to Toronto. Not only is it located in Canada, another country, but it is also closer to New York than some other states in the U.S

The only issue is that we must assess what commodities we will lose and gain by moving to Toronoto. New York is known for its wide variety of stores, restaurants, and supermarkets. So how would Toronto compare? This project aims to compare and contrast the frequency and distribution of stores between New York and Toronto. This way we have a clear understanding of what we gain and lose by moving.

**Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.**

We will be using loation datasets from both New York and Toronto. The links to both datasets are as provided:  
New York: https://geo.nyu.edu/catalog/nyu_2451_34572  
Toronto: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M  
We will then compare the coordinates from these datasets to the Foursquare API. Nearby venues will be marked and analyzed side-by-side. Using this data, we will have a better understanding of how the two locations compare.

In [1]:
import pandas as pd
import numpy as np
import json
from bs4 import BeautifulSoup
import requests

New York Neighborhoods

In [2]:
with open('ny_neighborhood.json') as json_data:
    newyork_data = json.load(json_data)
    
nyneighborhoods = newyork_data['features']

# define the dataframe with four columns: City, Borough, Neighborhood, Latitude, Longitude
column_names = ['City','Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
ny_neighborhoods = pd.DataFrame(columns=column_names)

for data in nyneighborhoods:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'City': 'New York',
                                          'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
ny_neighborhoods.head(10)

Unnamed: 0,City,Borough,Neighborhood,Latitude,Longitude
0,New York,Bronx,Wakefield,40.894705,-73.847201
1,New York,Bronx,Co-op City,40.874294,-73.829939
2,New York,Bronx,Eastchester,40.887556,-73.827806
3,New York,Bronx,Fieldston,40.895437,-73.905643
4,New York,Bronx,Riverdale,40.890834,-73.912585
5,New York,Bronx,Kingsbridge,40.881687,-73.902818
6,New York,Manhattan,Marble Hill,40.876551,-73.91066
7,New York,Bronx,Woodlawn,40.898273,-73.867315
8,New York,Bronx,Norwood,40.877224,-73.879391
9,New York,Bronx,Williamsbridge,40.881039,-73.857446


In [3]:
List_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(List_url).text
soup = BeautifulSoup(source, 'xml')
table=soup.find('table')

Toronto Boroughs/Neighborhoods

In [4]:
column_names = ['Postalcode','Borough','Neighborhood']
df = pd.DataFrame(columns = column_names)
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data 
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
