Name: Christian Pompa

# Capstone Project - Battle of the Neighborhoods
### Applied Data Science Capstone - IBM Coursera

### Table of Contents

* [1. Introduction: Business Problem](#introduction)
* [2. Data Description](#data)
* [3. Methodology](#methodology)
* [4. Analysis](#analysis)
* [5. Results and Discussion](#results)
* [6. Conclusion](#conclusion)

<a id="introduction"></a>

### 1. Introduction: Business Problem - Opening an Argentinean Restaurant or Cafe

This project will focus around two major cities. Cities known for diversity. With diversity, brings diverse restaurants, cafes and more, which bring people together. If you're a stakeholder, entrepreneur, or chef interested in opening an Argentinean restaurant or cafe in New York city or Toronto, this will be for you. 

Restaurant industry is huge in both of these cities, so we will try to identify the best location. To find the best location, we will use a number of data points such as: popular visiting locations, population percentage per borough, amount of current vertical competition, and no Argentinian restaurants in the area. 

Using efficient data science tools, we will generate two promising locations for both major cities. Each location will have the benefits explained and can be decided by the stakeholder.


<a id="data"></a>

### 2. Data Description

The expected data to be applied for our decision are: 
* Top five venues visited per neighborhood(borough)
* Population percentage per borough
* Amount of competition (Restaurant & Cafes) per neighborhood
* Current location and foursquare information for Argentinean restaurant

We will attempt to separate neighborhoods using folium maps and/or by radius. 

The following data sources will be utilized to extract and generate the required information. The list below represents the order in which the data will be generated. 

* New York City Census Information. CSV.
* Canada Census 2016: Population & Density information
* Wiki: Toronto Postal Codes
* Foursquare API: Retrieve results for amount of restaurants and cafes, check in information,  popularity to provide competition insight.
* Google Maps API Geocode: Product location information, visualize locations for comparison, provide results.

In [2]:
########################################
#### Libraries used in this section ####
########################################

import numpy as np
import pandas as pd # library for data analsysis
import requests
from bs4 import BeautifulSoup
import config
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import folium # plotting library
from sklearn.cluster import KMeans # import k-means from clustering stage
import matplotlib
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors # Matplotlib and associated plotting modules

#### 2a. New York City Census Information

New York City Census Information will be used to gather the population data for each county located in New York City. County population data will help determine which area we select for a future restaurant location. 

In [5]:
# Import: NYC Census Data CSV
links={'pop':'C:/Users/pompa/gitProjects/Coursera_Capstone/Coursera_Capstone/QuickFactsFeb2020.csv'}
# Create: DataFrame with Census Data
df_POP = pd.read_csv(links["pop"]) 

In [6]:
# Verify Data: Identify what needs to be removed.
df_POP.head()

Unnamed: 0,Fact,Fact Note,"New York city, New York","Value Note for New York city, New York","Bronx County (Bronx Borough), New York","Value Note for Bronx County (Bronx Borough), New York","Kings County (Brooklyn Borough), New York","Value Note for Kings County (Brooklyn Borough), New York","New York County (Manhattan Borough), New York","Value Note for New York County (Manhattan Borough), New York","Queens County (Queens Borough), New York","Value Note for Queens County (Queens Borough), New York","Richmond County (Staten Island Borough), New York","Value Note for Richmond County (Staten Island Borough), New York"
0,"Population estimates, July 1, 2019, (V2019)",,,,,,,,,,,,,
1,"Population estimates, July 1, 2018, (V2018)",,8398748.0,,1432132.0,,2582830.0,,1628701.0,,2278906.0,,476179.0,
2,"Population estimates base, April 1, 2010, (V2...",,,,,,,,,,,,,
3,"Population estimates base, April 1, 2010, (V2...",,8174988.0,,1384603.0,,2504717.0,,1586360.0,,2230578.0,,468730.0,
4,"Population, percent change - April 1, 2010 (es...",,,,,,,,,,,,,


In [7]:
# Clean Data: Drop NaN columns not needed.
df_POP.drop("Fact Note", axis = 1, inplace=True)
df_POP.drop("Value Note for New York city, New York", axis = 1, inplace=True)
df_POP.drop("Value Note for Bronx County (Bronx Borough), New York", axis = 1, inplace=True)
df_POP.drop("Value Note for Kings County (Brooklyn Borough), New York", axis = 1, inplace=True)
df_POP.drop("Value Note for New York County (Manhattan Borough), New York", axis = 1, inplace=True)
df_POP.drop("Value Note for Queens County (Queens Borough), New York", axis = 1, inplace=True)
df_POP.drop("Value Note for Richmond County (Staten Island Borough), New York", axis = 1, inplace=True)
df_POP.drop([2, 4], axis = 0, inplace=True)

In [8]:
# Clean Data: Drop extra row
df_POP.drop(0, axis = 0, inplace=True)
df_POP.head()

Unnamed: 0,Fact,"New York city, New York","Bronx County (Bronx Borough), New York","Kings County (Brooklyn Borough), New York","New York County (Manhattan Borough), New York","Queens County (Queens Borough), New York","Richmond County (Staten Island Borough), New York"
1,"Population estimates, July 1, 2018, (V2018)",8398748,1432132,2582830,1628701,2278906,476179
3,"Population estimates base, April 1, 2010, (V2...",8174988,1384603,2504717,1586360,2230578,468730
5,"Population, percent change - April 1, 2010 (es...",2.7%,3.4%,3.1%,2.7%,2.2%,1.6%
6,"Population, Census, April 1, 2010",8175133,1385108,2504700,1585873,2230722,468730
7,"Persons under 5 years, percent",6.5%,7.2%,7.2%,4.7%,6.2%,5.8%


In [9]:
# Clean Data: Rename Columns, drop data.
df_POP.columns = ['2018', 'New York city', 'Bronx County', 'Kings County', 'New York County', 'Queens County', 'Richmond County']
df_POP2 = df_POP
df_POP2 = df_POP2[:1]
df_POP2.drop("2018", axis = 1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


#### New York City population & population per county.

In [10]:
# Second df with only population
df_POP2

Unnamed: 0,New York city,Bronx County,Kings County,New York County,Queens County,Richmond County
1,8398748,1432132,2582830,1628701,2278906,476179


In [11]:
# Filtered: pop by county in NYC.
nyc_Totalpop = df_POP2['New York city'].str.replace(',', '').astype(int)
bronxCounty = df_POP2['Bronx County'].str.replace(',', '').astype(int)
kingsCounty = df_POP2['Kings County'].str.replace(',', '').astype(int)
nyCounty = df_POP2['New York County'].str.replace(',', '').astype(int)
queensCounty = df_POP2['Queens County'].str.replace(',', '').astype(int)
richmondCounty = df_POP2['Richmond County'].str.replace(',', '').astype(int)

In [12]:
# Filtered: Percentage of population per County
bronxCounty_popPercentage = int(bronxCounty) / int(nyc_Totalpop)
kingsCounty_popPercentage = int(kingsCounty) /  int(nyc_Totalpop)
nyCounty_popPercentage = int(nyCounty) / int(nyc_Totalpop)
queensCounty_popPercentage = int(queensCounty) / int(nyc_Totalpop)
richmondCounty_popPercentage = int(richmondCounty) / int(nyc_Totalpop)

In [13]:
# Clean: Initialise data to lists. 
data = [{'Bronx County': bronxCounty_popPercentage, 'Kings County': kingsCounty_popPercentage, 'New York County': nyCounty_popPercentage, 'Queens County': queensCounty_popPercentage, 'Richmond County': richmondCounty_popPercentage}] 

# Creates Percentage of population per County DataFrame. 
df_popPercentage = pd.DataFrame(data)
df_popPercentage

Unnamed: 0,Bronx County,Kings County,New York County,Queens County,Richmond County
0,0.170517,0.307526,0.193922,0.271339,0.056696


#### 2b. Latitude/Longitude for NYC counties.

In [14]:
### Lat Lng of Boroughs in New York City
# Filter: Used 'address'
address = 'Queens, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7498243 -73.7976337


In [15]:
# Neighborhoods in NYC
neighborhood_names = ['Bronx County', 'Kings County', 'New York County', 'Queens County', 'Richmond County'] 
nyc_neighborhoods = pd.DataFrame({'Neighborhood': neighborhood_names,
                             'Latitude': [40.850485, 40.645309, 40.78962, 40.7498, 40.564209],
                             'Longitude': [-73.840403, -73.955023, -73.7976, -73.7976, -74.125304]})

In [16]:
nyc_neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Bronx County,40.850485,-73.840403
1,Kings County,40.645309,-73.955023
2,New York County,40.78962,-73.7976
3,Queens County,40.7498,-73.7976
4,Richmond County,40.564209,-74.125304


In [17]:
###################################################################################

#### 2c. Toronto Census Information

Toronto population retrieved from Wiki. Population by Boroughs and total.

In [18]:
df_colhead = ['SubCities', 'Population']

In [19]:
# Toronto & York population: Import Wiki table information.
# 9-12: 9
weblink = requests.get("https://en.wikipedia.org/wiki/Demographics_of_Toronto")
soup = BeautifulSoup(weblink.content,'lxml')
table = soup.find_all('tbody')[9]

# Select Table with data
#table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

### Data for table
data = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    data.append(row)

### Convert imported data to pandas DataFrame
tor_eyork = pd.DataFrame(data)

In [20]:
# Remove unncessary rows
tor_eyork = tor_eyork.loc[:, tor_eyork.columns.intersection([0,1])]
# Clean: Reset Index, drop Null, replace /n, replace ','
tor_eyork = tor_eyork.reset_index(drop=True)
tor_eyork = tor_eyork.dropna(how='all')
tor_eyork = tor_eyork.replace(r'\n','', regex=True)
tor_eyork = tor_eyork.replace(r',','', regex=True)
# Clean: Drop extra columns
tor_eyork.rename(columns={0:'SubCities', 1:'Population'}, inplace=True)

total = sum(tor_eyork['Population'].astype(float))
# Total population column. Make totals row.
tor_eyork.loc['Total'] = pd.Series(total, index = ['Population'])
tor_eyork = tor_eyork.replace(np.nan, '', regex=True)
tor_eyork

Unnamed: 0,SubCities,Population
1,Spadina-Fort York,114315
2,Beaches-East York,108435
3,Davenport,107395
4,Parkdale-High Park,106445
5,Toronto-Danforth,105395
6,Toronto-St. Paul's,104940
7,University-Rosedale,100520
8,Toronto Centre,99590
Total,,847035


In [21]:
# North York population: Import Wiki table information.
# 9-12: 10
weblink = requests.get("https://en.wikipedia.org/wiki/Demographics_of_Toronto")
soup = BeautifulSoup(weblink.content,'lxml')
table = soup.find_all('tbody')[10]

# Select Table with data
#table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

### Data for table
data = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    data.append(row)

### Convert imported data to pandas DataFrame
tor_nyork = pd.DataFrame(data)

In [22]:
# Remove unncessary rows
tor_nyork = tor_nyork.loc[:, tor_nyork.columns.intersection([0,1])]
# Clean: Reset Index, drop Null, replace /n, replace ','
tor_nyork = tor_nyork.reset_index(drop=True)
tor_nyork = tor_nyork.dropna(how='all')
tor_nyork = tor_nyork.replace(r'\n','', regex=True)
tor_nyork = tor_nyork.replace(r',','', regex=True)
# Clean: Drop extra columns
tor_nyork.rename(columns={0:'SubCities', 1:'Population'}, inplace=True)

total = sum(tor_nyork['Population'].astype(float))
# Total population column. Make totals row.
tor_nyork.loc['Total'] = pd.Series(total, index = ['Population'])
tor_nyork = tor_nyork.replace(np.nan, '', regex=True)
tor_nyork

Unnamed: 0,SubCities,Population
1,Willowdale,117405
2,Eglinton-Lawrence,112925
3,Don Valley North,109060
4,Humber River-Black Creek,107725
5,York Centre,103760
6,Don Valley West,101790
7,Don Valley East,93170
Total,,745835


In [23]:
# SCARBOROUGH population: Import Wiki table information.
# 9-12: 11
weblink = requests.get("https://en.wikipedia.org/wiki/Demographics_of_Toronto")
soup = BeautifulSoup(weblink.content,'lxml')
table = soup.find_all('tbody')[11]

# Select Table with data
#table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

### Data for table
data = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    data.append(row)

### Convert imported data to pandas DataFrame
tor_scarborough = pd.DataFrame(data)

In [24]:
# Remove unncessary rows
tor_scarborough = tor_scarborough.loc[:, tor_scarborough.columns.intersection([0,1])]
# Clean: Reset Index, drop Null, replace /n, replace ','
tor_scarborough = tor_scarborough.reset_index(drop=True)
tor_scarborough = tor_scarborough.dropna(how='all')
tor_scarborough = tor_scarborough.replace(r'\n','', regex=True)
tor_scarborough = tor_scarborough.replace(r',','', regex=True)
# Clean: Drop extra columns
tor_scarborough.rename(columns={0:'SubCities', 1:'Population'}, inplace=True)

total = sum(tor_scarborough['Population'].astype(float))
# Total population column. Make totals row.
tor_scarborough.loc['Total'] = pd.Series(total, index = ['Population'])
tor_scarborough = tor_scarborough.replace(np.nan, '', regex=True)
tor_scarborough

Unnamed: 0,SubCities,Population
1,Scarborough Centre,110450
2,Scarborough Southwest,108295
3,Scarborough-Agincourt,104225
4,Scarborough-Rouge Park,101445
5,Scarborough-Guildwood,101115
6,Scarborough North,97610
Total,,623140


In [25]:
# ETOBICOKE population: Import Wiki table information.
# 9-12: 12
weblink = requests.get("https://en.wikipedia.org/wiki/Demographics_of_Toronto")
soup = BeautifulSoup(weblink.content,'lxml')
table = soup.find_all('tbody')[12]

# Select Table with data
#table = soup.find('table',{'class':'wikitable sortable'})
table_rows = table.find_all('tr')

### Data for table
data = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    data.append(row)

### Convert imported data to pandas DataFrame
tor_etobicoke = pd.DataFrame(data)

In [26]:
# Remove unncessary rows
tor_etobicoke = tor_etobicoke.loc[:, tor_etobicoke.columns.intersection([0,1])]
# Clean: Reset Index, drop Null, replace /n, replace ','
tor_etobicoke = tor_etobicoke.reset_index(drop=True)
tor_etobicoke = tor_etobicoke.dropna(how='all')
tor_etobicoke = tor_etobicoke.replace(r'\n','', regex=True)
tor_etobicoke = tor_etobicoke.replace(r',','', regex=True)
# Clean: Drop extra columns
tor_etobicoke.rename(columns={0:'SubCities', 1:'Population'}, inplace=True)

total = sum(tor_etobicoke['Population'].astype(float))
# Total population column. Make totals row.
tor_etobicoke.loc['Total'] = pd.Series(total, index = ['Population'])
tor_etobicoke = tor_etobicoke.replace(np.nan, '', regex=True)
tor_etobicoke

Unnamed: 0,SubCities,Population
1,Etobicoke-Lakeshore,127520
2,Etobicoke North,116960
3,Etobicoke Centre,116055
4,York South-Weston,115130
Total,,475665


In [27]:
df_Toronto_pop = pd.DataFrame({"EastYork":[tor_eyork.loc['Total', 'Population']],
                       "NorthYork":[tor_nyork.loc['Total', 'Population']],
                       "Scarborough":[tor_scarborough.loc['Total', 'Population']],
                        "Etobicoke":[tor_etobicoke.loc['Total', 'Population']]
                       })
index_ = ['Total']
df_Toronto_pop.index = index_
df_Toronto_pop['Total'] = df_Toronto_pop.sum(axis=1)
df_Toronto_pop

Unnamed: 0,EastYork,NorthYork,Scarborough,Etobicoke,Total
Total,847035.0,745835.0,623140.0,475665.0,2691675.0


#### 2d. Search for a Neighborhood in Toronto

In [28]:
# Import Wiki table information.
weblink = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(weblink.content,'lxml')
table = soup.find_all('tbody')[0]

# Select Table with data
table = soup.find('table', attrs={'class':'wikitable sortable'})
table_rows = table.find_all('tr')

### Data for table
data = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    data.append(row)
#print(data)
#print(df[0].to_json(orient='records'))

### Convert imported data to pandas DataFrame
df = pd.DataFrame(data, columns=["Postcode", "Borough", "Neighbourhood"])

#### Clean Data

In [29]:
# Remove \n
df = df.replace(r'\n','', regex=True)
df.columns = df.columns.str.strip()
# verify column headers
#df.columns

# verify column headers
df.columns

# Group columns by postal code and borough, removing duplicate postcode value
# and moving the duplicate data in borough to Neighbourhood.
df = df.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

# Rename headers
df.rename(columns ={'Postcode': 'PostalCode'}, inplace=True)
df.rename(columns ={'Neighbourhood': 'Neighborhood'}, inplace=True)

# Verify neighbourhood has values in all rows.
df.isin(['Not Available']).any().any()

False

#### Generating Lat & Lng by Postal Code.

In [30]:
# Main list to run
postal_code = df['PostalCode'].tolist()

In [31]:
# # Main list
# list1 = postal_code
# # Getting length of list 
# api_key = config.api_key
# i = 0
# # Iterating using while loop 
# while i < len(list1):  
#     element = list1[i]
#     geoCodeUrl = "https://maps.googleapis.com/maps/api/geocode/json?components=postal_code:{}|country:CA&key={}".format(
#         element,
#         api_key) 
    
#     # make the GET request
#     lookup = requests.get(geoCodeUrl)
#     data = lookup.json()  
    
#     # If no results are found for postal code, skip and move on.
#     if (data['status'] == 'ZERO_RESULTS'):
#         i += 1
#         continue
#     # Instantiate an empty dict
#     latlng = {}
#     # latlng information
#     latlng['PostalCode'] = data['results'][0]['address_components'][0]['long_name']
#     latlng['Longname'] = data['results'][0]['address_components'][1]['long_name']
#     latlng['Latitude'] = data['results'][0]['geometry']['location']['lat']
#     latlng['Longitude'] = data['results'][0]['geometry']['location']['lng']

#     with open('data.json', 'r') as j:
#         json_data = json.load(j)
#     # convert data to list if not
#         if type(json_data) is dict:
#             json_data = [json_data]

#     # use append() to add to list
#     json_data.append(latlng)    

#     #write list to file
#     with open('data.json', 'w') as outfile:
#         json.dump(json_data, outfile)
#     # Normalize data. Flatten JSON.
#     data_normalized = pd.json_normalize(json_data) # flatten JSON
#     i += 1 
       
#     if i == len(list1):
#         break

# print(data_normalized)

In [35]:
# Open new json file created above.
with open('data.json', 'r') as j:
    json_data = json.load(j)

In [36]:
# Rename longname to borough on new generated dataframe
data_df = pd.DataFrame(json_data)
data_df.rename(columns = {'Longname':'Borough'}, inplace=True)


In [37]:
# Clean data: Remove dups, NaN
data_df = data_df.drop_duplicates(subset='Latitude', keep='first')
data_df = data_df.drop(['venues'], axis=1)
data_df = data_df.dropna(how='all')

In [38]:
data_df.head()

Unnamed: 0,PostalCode,Borough,Latitude,Longitude
0,M1J,Scarborough,43.744734,-79.239476
1,M1B,Scarborough,43.806686,-79.194353
2,M1C,Scarborough,43.784535,-79.160497
3,M1E,Scarborough,43.763573,-79.188711
4,M1G,Scarborough,43.770992,-79.216917


In [39]:
####################################################################
#### Combine Dataframes: Postal Code, Neighborhood, with Latlng ####
####################################################################

In [40]:
# Merge df and df_data
# Postal code with 
df_merge_col = pd.merge(df, data_df, on='PostalCode')

# Clean df_merge_col
df_merge_col.rename(columns = {'Borough_x':'Borough'}, inplace=True)
df_merge_col.rename(columns = {'Neighbourhood':'Neighborhood'}, inplace=True)
df_merge_col = df_merge_col.drop(['Borough_y'], axis=1)
df_merge_col.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


<a id="methodology"></a>

### 3. Methodology

The goal is to provide two possible locations to open an Argentinian restaurant. The two locations will revolve around New York City, New York and Toronto, Canada.  

In the first step, we collected required data from U.S. Census and Canadian Census to retrieve population metrics in both cities. Latitude and Longitude information will be gathered for neighborhoods in both cities and will be used to search locations along with the postal code if available. 

Second, we will location Argentinian restaurants or restaurants similar to Argentinian restaurants. similar revolves around Latin American/South American categorized restaurants. One located, we will map to show the locations and distance between each restaurant found. 

Third, we will combine all data points found, per city, and map them. Using visualization, we can determine the most promising areas to consider a restaurant. Using K-Means clustering, we will present a a map with the following locations found.  We will recommend locations with low Argentinian styled restaurants, high population percentage, and moderate to high Foursquare check-in locations for restaurants as Foursquare will provide active users in those locations. 

<a id="analysis"></a>

### 4. Analysis

Analysis will combine the census data acquired in section #2, location data with Four Square search and explore queries, and top visited location information via foursquare. Solutions include: Argentinian restaurants per county in NYC and Toronto, competition in each area, hottest spots per city neighborhoods, and population percentage per county.

* [4b. Visualize: All NYC Argentinian restaurants](#4bvisualize)
* [4b. Visualize: NYC Argentinian restaurants by City](#4bvisualizecity)
* [4d. Visualize: All Toronto Argentinian restaurants](#4dvisualize)
* [4d. Visualize: Toronto Argentinian restaurants by City](#4dvisualizecity)
* [4e. NYC Top 10 most common visited locations](#nyctop10)
    * [4e. Result summary](#4eresults)
* [4f. Toronto Top 10 most common visited locations](#torontotop10)
    * [4f. Result summary](#4fresults)


Using the location data acquired in section #2, lets use the Latitude, Longitude, and County name to <b>locate the number of Argentinian and empanada restaurants in NYC County.</b>  Afterwards, we will do the same for Toronto. 

Lastly, we will use Foursquare category data to gather the top 10 locations per borough/neighborhood.

In [41]:
# Fourquare Information. 
VERSION = '20180604'
LIMIT = 30
radius = 6000
# Counties and Queeries being searched. 
## a / x = Counties ; b / y = queries
counties = ['Bronx', 'Kings', 'Brooklyn', 'Manhattan', 'Queens', 'Richmond']
queries = ['chimichurri', 'empanadas', 'Argentinian']
CLIENT_ID = config.CLIENT_ID
CLIENT_SECRET = config.CLIENT_SECRET

#### 4a. Search all counties in NYC. Search each queries in all counties. Generate an empty dataframe to import data into. Create nested df with all queries. In the end, we combine and clean up data. 

In [42]:
# Create an empty dataFrame to fill in next step. All counties will have their own DF and will
# be nested into one main. 
dataframe = {}
for city in counties:
    dataframe[city] = pd.DataFrame()

In [43]:
# Data: Loop through a number of search queries. We call the Four Square API to search 
# and query specific keywords related to Argentinian restaurants.
# We will use this loop for each county to find data by specific keywords. 
# 'dataframe' is populated on every loop and will provide results of query per location requested.
i=0
for x, y in [(x,y) for x in counties for y in queries]:      
    for i in range(len(x)):
        element2 = x
        element = y
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&near={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, element2, VERSION, element, radius, LIMIT)
        results = requests.get(url).json()

        result = {}                                #list that will store the results of all Json 
        result = results['response']['venues']     # filter specific location in results.json   

        
        with open('databackup.json', 'r') as j:
            json_data = json.load(j)
        # convert data to list if not
            if type(json_data) is dict:
                json_data = [json_data]
        # use append() to add to list
        json_data.append(result)    
        #write list to file
        with open('databackup.json', 'w') as outfile:
            json.dump(json_data, outfile)    



        #dataframe[element2] = pd.DataFrame() 
        dataframe[element2] = dataframe[element2].append(result, ignore_index=True) # add filtered JSON data to empty dataframe.

        i += 1    # add 1 to i and restart loop
            
print('completed')

completed


In [44]:
# Single out nested dataFrames to concat after.
df_bronx = pd.DataFrame(dataframe['Bronx'])
df_kings = pd.DataFrame(dataframe['Kings'])
df_brooklyn = pd.DataFrame(dataframe['Brooklyn'])
df_nycounty = pd.DataFrame(dataframe['Manhattan'])
df_queens = pd.DataFrame(dataframe['Queens']) 

#df_richmond = pd.DataFrame(dataframe['Richmond']) no searches found

In [45]:
counties2 = [df_bronx, df_kings, df_brooklyn, df_nycounty, df_queens]
# Add City to each nested DataFrame in dataframe
i = 0
for i in range(len(counties2)):
    element2 = counties2[i]

    def get_city_type(row):
        try:
            city_list = row['location']      
        except:
            city_list = row['venue.location']        
        if len(city_list) == 0:
            return None
        else:
            return city_list['city']    
    element2['City'] = element2.apply(get_city_type, axis=1)
    i += 1
print('Completed')

Completed


In [46]:
# Remove unwanted neighborhoods from county dataframes
i = 0
for i in range(len(counties2)):
    element2 = counties2[i]
    element2 = element2.drop_duplicates(subset='id', keep='first')
    
    bronxNames = df_bronx[ df_bronx['City'] == 'New York' ].index
    kingsNames = df_kings[ (df_kings['City'] == 'New York') | (df_kings['City'] == 'Ridgewood') ].index
    brooklynNames = df_brooklyn[ (df_brooklyn['City'] == 'New York') | (df_brooklyn['City'] == 'Ridgewood') ].index
    nycountyNames = df_nycounty[ (df_nycounty['City'] == 'Union City') | (df_nycounty['City'] == 'Astoria') | (df_nycounty['City'] == 'Bronx') | (df_nycounty['City'] == 'North Bergen') | (df_nycounty['City'] == 'Brooklyn') | (df_nycounty['City'] == 'West New York') | (df_nycounty['City'] == 'Queens') ].index
    queensNames = df_queens[ (df_queens['City'] == 'Brooklyn') | (df_queens['City'] == 'New York') ].index

    if i == 0:
        #delete extra neighborhoods not in bronx counties
        df_bronx.drop(bronxNames , inplace=True)
        df_bronx['Cluster Labels']='0'
    if i == 1:
        #delete extra neighborhoods not in kings/brooklyn counties
        df_kings.drop(kingsNames , inplace=True)
        df_kings['Cluster Labels']='1'
    if i == 2:
        #delete extra neighborhoods not in brooklyn/kings counties
        df_brooklyn.drop(brooklynNames , inplace=True)
        df_brooklyn['Cluster Labels']='1'
    if i == 3:
        #delete extra neighborhoods not in NYC/Manhattan counties
        df_nycounty.drop(nycountyNames , inplace=True)
        df_nycounty['Cluster Labels']='2'
    if i == 4:
        #delete extra neighborhoods not in Queens counties
        df_queens.drop(queensNames , inplace=True)
        df_queens['Cluster Labels']='3'
        
    i += 1

print('Completed')    

Completed


In [47]:
## Group df_kings & df_brooklyn (same county/neighborhood)
## Delete duplicates

kings_brooklyn = [df_kings, df_brooklyn]
df_kingsBrooklyn = pd.concat(kings_brooklyn, ignore_index=True)

# Remove duplicates from concat df's
df_kingsBrooklyn = df_kingsBrooklyn.drop_duplicates(subset='id', keep='first')

In [48]:
# Group updated dataframes

Group_frames = [df_bronx, df_kingsBrooklyn, df_nycounty, df_queens]
result = pd.concat(Group_frames, ignore_index=True)

2    288
0     55
3     36
1      7
Name: Cluster Labels, dtype: int64

In [49]:
# Clean: data is arranged to provide name, categorie titles, latitude and longitude.
# 'dataframe' is filtered with def to extract information from values.

filtered_columns = ['name', 'categories', 'location', 'City', 'Cluster Labels'] + ['id']
dataframe_filtered = result.loc[:, filtered_columns]
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']      
    except:
        categories_list = row['venue.categories']        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def get_lat_type(row):
    try:
        lat_list = row['location']      
    except:
        lat_list = row['venue.location']        
    if len(lat_list) == 0:
        return None
    else:
        return lat_list['lat']

def get_lng_type(row):
    try:
        lng_list = row['location']      
    except:
        lng_list = row['venue.location']        
    if len(lng_list) == 0:
        return None
    else:
        return lng_list['lng']

In [50]:
# filter the category for each row

dataframe_filtered['name'] = dataframe_filtered['name'].str.replace(r'\.', '').str.replace(r'\$', '').str.replace(r'\'', '').str.replace(r'\&', 'and').str.replace(r'\?', '')
dataframe_filtered['name'] = dataframe_filtered['name'].str.replace(r'\$', '')
dataframe_filtered['name'] = dataframe_filtered['name'].str.replace(r'\'', '')


dataframe_filtered['name'].str.replace('.', '').astype(object)
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered['Latitude'] = dataframe_filtered.apply(get_lat_type, axis=1)
dataframe_filtered['Longitude'] = dataframe_filtered.apply(get_lng_type, axis=1)

dataframe_filtered = dataframe_filtered[['name', 'categories', 'location', 'Latitude', 'Longitude', 'City', 'Cluster Labels', 'id']]
dataframe_filtered = dataframe_filtered.drop_duplicates(subset='id', keep='first')

In [68]:
dataframe_filtered['Cluster Labels'].value_counts()

2    30
0    11
1     5
3     5
Name: Cluster Labels, dtype: int64

In [52]:
dataframe_filtered['categories'].value_counts()

Empanada Restaurant          23
Food Truck                   13
Spanish Restaurant            3
Latin American Restaurant     2
Mexican Restaurant            2
Steakhouse                    2
Breakfast Spot                1
Food Stand                    1
Street Food Gathering         1
Food                          1
Paella Restaurant             1
Pizza Place                   1
Cuban Restaurant              1
Embassy / Consulate           1
Name: categories, dtype: int64

In [53]:
#Removing unwanted category titles: 'Cuban Restaurant', 'Caribbean Restaurant', 'Mexican Restaurant' , 'Embassy / Consulate', 'Street Food Gathering'

indexNames = dataframe_filtered[ (dataframe_filtered['categories'] == 'Embassy / Consulate') | (dataframe_filtered['categories'] == 'Mexican Restaurant') | (dataframe_filtered['categories'] == 'Cuban Restaurant') | (dataframe_filtered['categories'] == 'Caribbean Restaurant') | (dataframe_filtered['categories'] == 'Street Food Gathering') ].index
dataframe_filtered.drop(indexNames, inplace=True)

In [54]:
#dataframe_filtered['name'].value_counts()

The follow queries were searched throughout New York City neighborhoods to identify Argentinian and Argentinian styled restaurants. 
* Argentinian 
* Empanadas 
* Chimichurri

New York City, New York has a local chain named "Empanada Monumental." The total number of empanadas restaurants is 23 and locations are spread out throughout NYC. Also among Argentinian restaurants are a 12 food trucks, 3 Spanish and Latin American restaurants, followed by a steakhouse and more. 

Next, we have visualized the data points found below.

<a id="4bvisualize"></a>

#### 4b. Visualize: All counties combined.

In [55]:
# NYC County Lat/lng
nyclat = 40.7831 
nyclng = -73.9712

In [56]:
venues_map_nyc = folium.Map(location=[nyclat, nyclng], zoom_start=12) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the New York City
folium.features.CircleMarker(
    [nyclat, nyclng],
    radius=10,
    color='red',
    popup='New York City, NY',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map_nyc)

# add all restaurant results as blue circle markers
for lat, lng, label in zip(dataframe_filtered.Latitude, dataframe_filtered.Longitude, dataframe_filtered.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.5
    ).add_to(venues_map_nyc)

# display map
venues_map_nyc

Folium Map of all queried restaurants found. 

<a id="4bvisualizecity"></a>

#### Cluster neighborhoods in NYC. Generate map and visualize clusters. 

In [57]:
# set number of clusters
kclusters = 4
# Lat/Lngfor 
latitude = dataframe_filtered['Latitude'].tolist()
longitude = dataframe_filtered['Longitude'].tolist()

In [58]:
# create map
map_clusters_nyc = folium.Map(location=[nyclat, nyclng], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.seismic(np.linspace(0, 1, len(ys)))
seismic = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dataframe_filtered['Latitude'], dataframe_filtered['Longitude'], dataframe_filtered['name'], dataframe_filtered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=seismic[int(cluster)-1],
        fill=True,
        fill_color=seismic[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters_nyc)
       
map_clusters_nyc

In [69]:
dataframe_filtered['Cluster Labels'].value_counts()

2    30
0    11
1     5
3     5
Name: Cluster Labels, dtype: int64

Folium map: Restaurant data clustered by city. You may view what is in each city cluster below.

In [None]:
####################

### 4c. Toronto restaurant search requests. 

Toronto restaurant search query. We will search specific boroughs in Toronto then map the results.

In [73]:
# Fourquare Information. 
VERSION = '20180604'
LIMIT = 30
radius = 6000
# Counties and Queeries being searched. 
## a / x = Counties ; b / y = queries
counties = ['Central_Toronto','Downtown_Toronto','East_Toronto','East_York','Etobicoke','North_York','Scarborough','West_Toronto','York']
queries = ['chimichurri', 'empanadas', 'Argentinian']
CLIENT_ID = config.CLIENT_ID
CLIENT_SECRET = config.CLIENT_SECRET

In [74]:
toronto_df = {}
for city in counties:
    toronto_df[city] = pd.DataFrame()

In [75]:
# Data: Loop through a number of search queries. We call the Four Square API to search 
# and query specific keywords related to Argentinian restauraunts.
# We will use this loop for each county to find data by specific keywords. 
# 'dataframe' is populated on every loop and will provide results of query per location requested.\
i=0
for x, y in [(x,y) for x in counties for y in queries]:   
    for i in range(len(x)):
        element = x
        element2 = y
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&near={}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            element,
            VERSION, 
            element2, 
            radius, 
            LIMIT)

        # make the GET request
        results = requests.get(url).json()
        
        # Error handlers
        try:
            a = results['response']['venues']
            b = results['meta']['code']   
        except KeyError:
            continue
        
        
        if b == 400:
            break
        elif a == []:
            break

        toronto_data = {}                                #list that will store the results of all Json 
        toronto_data = results['response']['venues']     # filter specific location in results.json   

        
        with open('tor_databackup.json', 'r') as j:
            json_data = json.load(j)
        # convert data to list if not
            if type(json_data) is dict:
                json_data = [json_data]
        # use append() to add to list
        json_data.append(toronto_data)    
        #write list to file
        with open('tor_databackup.json', 'w') as outfile:
            json.dump(json_data, outfile)    

        #toronto_df[element] = pd.DataFrame() 
        toronto_df[element] = toronto_df[element].append(toronto_data, ignore_index=True) # add filtered JSON data to empty dataframe.

    i += 1   # add 1 to i and restart loop

print('completed')

completed


In [76]:
#################################################
# Single out nested dataFrames to concat after. #
#################################################

#df_Central_Toronto = pd.DataFrame(toronto_df['Central_Toronto'])
df_Downtown_Toronto = pd.DataFrame(toronto_df['Downtown_Toronto'])
#df_East_Toronto = pd.DataFrame(toronto_df['East_Toronto'])
#df_East_York = pd.DataFrame(toronto_df['East_York'])
df_Etobicoke = pd.DataFrame(toronto_df['Etobicoke']) 
#df_North_York = pd.DataFrame(toronto_df['North_York'])
#df_Scarborough = pd.DataFrame(toronto_df['Scarborough'])
df_West_Toronto = pd.DataFrame(toronto_df['West_Toronto'])
#df_York = pd.DataFrame(toronto_df['York']) 

In [77]:
tcounties = [df_Downtown_Toronto, df_Etobicoke, df_West_Toronto]

In [78]:
##################################################
# Add City to each nested DataFrame in dataframe #
##################################################

i = 0
for i in range(len(tcounties)):
    element2 = tcounties[i]

    def get_city_type(row):
        try:
            city_list = row['location']      
        except:
            city_list = row['venue.location']        
        if len(city_list) == 0:
            return None
        else:
            return city_list['city']    
    element2['City'] = element2.apply(get_city_type, axis=1)
    i += 1
print('Completed')

Completed


In [79]:
# add cluster numbers to tcounties
i = 0
for i in range(len(tcounties)):
    if i == 0:
        #add cluster number
        df_Downtown_Toronto['Cluster Labels']='0'
    if i == 1:
        #add cluster number 
        df_Etobicoke['Cluster Labels']='1'
    if i == 2:
        #add cluster number 
        df_West_Toronto['Cluster Labels']='2'
    i += 1

print('Completed')   

Completed


In [80]:
###############################################
# Drop duplicates for each boroughs dataframe #
###############################################

df_Downtown_Toronto = df_Downtown_Toronto.drop_duplicates(subset='id', keep='first', inplace=False)
df_Etobicoke = df_Etobicoke.drop_duplicates(subset='id', keep='first', inplace=False)
df_West_Toronto = df_West_Toronto.drop_duplicates(subset='id', keep='first', inplace=False)

In [81]:
## Concat cleaned dataframes 
## Delete duplicates after

df_toronto_all = pd.concat(tcounties, ignore_index=True)

# Remove duplicates from concat df's
df_toronto_all = df_toronto_all.drop_duplicates(subset='id', keep='first')

In [82]:
# Clean: data is arranged to provide name, categorie titles, latitude and longitude.
# 'dataframe' is filtered with def to extract information from values.

filtered_columns = ['name', 'categories', 'location', 'City', 'Cluster Labels' ] + ['id']
dataframe_filtered = df_toronto_all.loc[:, filtered_columns]
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']      
    except:
        categories_list = row['venue.categories']        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def get_lat_type(row):
    try:
        lat_list = row['location']      
    except:
        lat_list = row['venue.location']        
    if len(lat_list) == 0:
        return None
    else:
        return lat_list['lat']

def get_lng_type(row):
    try:
        lng_list = row['location']      
    except:
        lng_list = row['venue.location']        
    if len(lng_list) == 0:
        return None
    else:
        return lng_list['lng']

In [83]:
# Apply various filters to clean dataframe. Cleaning revolves around folium standards.

dataframe_filtered['name'] = dataframe_filtered['name'].str.replace(r'\.', '').str.replace(r'\$', '').str.replace(r'\'', '').str.replace(r'\&', 'and').str.replace(r'\?', '')
dataframe_filtered['name'] = dataframe_filtered['name'].str.replace(r'\$', '')
dataframe_filtered['name'] = dataframe_filtered['name'].str.replace(r'\'', '')


dataframe_filtered['name'].str.replace('.', '').astype(object)
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered['Latitude'] = dataframe_filtered.apply(get_lat_type, axis=1)
dataframe_filtered['Longitude'] = dataframe_filtered.apply(get_lng_type, axis=1)

dataframe_filtered = dataframe_filtered[['name', 'categories', 'location', 'Latitude', 'Longitude', 'City', 'id', 'Cluster Labels']]
dataframe_filtered = dataframe_filtered.drop_duplicates(subset='id', keep='first')

In [84]:
### Lat Lng of Boroughs in Toronto
# Filter: Used 'address'
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.653963 -79.387207


In [85]:
#Toronto lat,lng
############################### 
torlat = 43.65396 
torlng = -79.38720

<a id="4dvisualize"></a>

#### 4d. Visualize data - Toronto

In [86]:
venues_map_tor = folium.Map(location=[torlat, torlng], zoom_start=12) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the New York City
folium.features.CircleMarker(
    [torlat, torlng],
    radius=10,
    color='red',
    popup='Toronto, CA',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map_tor)

# add all restaurant results as blue circle markers
for lat, lng, label in zip(dataframe_filtered.Latitude, dataframe_filtered.Longitude, dataframe_filtered.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.5
    ).add_to(venues_map_tor)

# display map
venues_map_tor

In [87]:
from sklearn.cluster import KMeans # import k-means from clustering stage

In [88]:
# set number of clusters
kclusters = 3


In [89]:
# Lat/Lng list for data before visualizing
latitude = dataframe_filtered['Latitude'].tolist()
longitude = dataframe_filtered['Longitude'].tolist()

In [102]:
dataframe_filtered


Unnamed: 0,name,categories,location,Latitude,Longitude,City,id,Cluster Labels
0,Jumbo Empanadas,Empanada Restaurant,"{'address': '245 Augusta Ave', 'crossStreet': ...",43.654831,-79.402098,Toronto,4ad4c05cf964a520e6f520e3,0
16,Argentinian BBQ Den,,"{'address': '857 Shaw St', 'lat': 43.665842, '...",43.665842,-79.424114,Toronto,4e07720c6284d9ee92d57f48,0
32,Empanadas DelSUR,Latin American Restaurant,"{'address': '639b The Queensway', 'lat': 43.62...",43.627464,-79.497858,Etobicoke,5025a382e4b064c42bc70ba7,1
42,Delicious Empanadas and More,Colombian Restaurant,"{'lat': 43.698172, 'lng': -79.451371, 'labeled...",43.698172,-79.451371,Toronto,5bde1d5edb1d81002c7a97fd,2


<a id="4dvisualizecity"></a>

#### Toronto Argentinian restaurants by borough

In [91]:
# create map
map_clusters_tor = folium.Map(location=[torlat, torlng], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.seismic(np.linspace(0, 1, len(ys)))
seismic = [colors.rgb2hex(i) for i in colors_array]
#cmap = matplotlib.cm.get_cmap('Spectral')


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dataframe_filtered['Latitude'], dataframe_filtered['Longitude'], dataframe_filtered['name'], dataframe_filtered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=seismic[int(cluster)-1],
        fill=True,
        fill_color=seismic[int(cluster)-1],
        fill_opacity=0.8).add_to(map_clusters_tor)
       
map_clusters_tor

Folium map: Toronto search data by neighborhood. 

Summary: Not many restaurants were found when searching all Toronto postal codes. This leads me to believe there are not many Argentinian styled restaurants in the Toronto city area. 

<a id="nyctop10"></a>

### 4e. NYC foursquare top check-in points by neighborhood lat/lng values. 
We will use previous work to retrieve the information needed.

In [85]:
neighborhood_latitude = nyc_neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = nyc_neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = nyc_neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bronx County are 40.850485, -73.840403.


In [86]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 6000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#url # display URL

In [87]:
results = requests.get(url).json()
#results

In [88]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [89]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Residence Inn by Marriott New York The Bronx a...,Hotel,40.849917,-73.842152
1,LA Fitness,Gym / Fitness Center,40.849739,-73.841949
2,Zeppieri & Sons Italian Bakery,Bakery,40.847119,-73.832057
3,Starbucks,Coffee Shop,40.851371,-73.844087
4,iLoveKickboxing,Gym,40.852871,-73.828085


In [90]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [91]:
nyc_neighborhoods_venues = getNearbyVenues(names=nyc_neighborhoods['Neighborhood'],
                                           latitudes=nyc_neighborhoods['Latitude'],
                                           longitudes=nyc_neighborhoods['Longitude']
                                          )

Bronx County
Kings County
New York County
Queens County
Richmond County


In [92]:
print(nyc_neighborhoods_venues.shape)
nyc_neighborhoods_venues.head()

(61, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bronx County,40.850485,-73.840403,Residence Inn by Marriott New York The Bronx a...,40.849917,-73.842152,Hotel
1,Bronx County,40.850485,-73.840403,LA Fitness,40.849739,-73.841949,Gym / Fitness Center
2,Bronx County,40.850485,-73.840403,Starbucks,40.851371,-73.844087,Coffee Shop
3,Bronx County,40.850485,-73.840403,Skyline Bar & Lounge,40.852904,-73.842612,Lounge
4,Bronx County,40.850485,-73.840403,Stop & Shop,40.847089,-73.843463,Supermarket


In [93]:
nyc_neighborhoods_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bronx County,25,25,25,25,25,25
Kings County,18,18,18,18,18,18
New York County,6,6,6,6,6,6
Queens County,4,4,4,4,4,4
Richmond County,8,8,8,8,8,8


In [94]:
# one hot encoding
nyc_onehot = pd.get_dummies(nyc_neighborhoods_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nyc_onehot['Neighborhood'] = nyc_neighborhoods_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyc_onehot.columns[-1]] + list(nyc_onehot.columns[:-1])
nyc_onehot = nyc_onehot[fixed_columns]

nyc_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Bagel Shop,Bank,Bar,Buffet,Bus Station,Business Service,Caribbean Restaurant,Chinese Restaurant,...,Recording Studio,Rental Car Location,Restaurant,Sandwich Place,Sports Club,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Video Game Store
0,Bronx County,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bronx County,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bronx County,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bronx County,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bronx County,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0


In [95]:
nyc_grouped = nyc_onehot.groupby('Neighborhood').mean().reset_index()
nyc_grouped

Unnamed: 0,Neighborhood,American Restaurant,Bagel Shop,Bank,Bar,Buffet,Bus Station,Business Service,Caribbean Restaurant,Chinese Restaurant,...,Recording Studio,Rental Car Location,Restaurant,Sandwich Place,Sports Club,Supermarket,Sushi Restaurant,Thai Restaurant,Theater,Video Game Store
0,Bronx County,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.04,0.08,0.0,0.04,0.0,0.04,0.04,0.04,0.0,0.0
1,Kings County,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.166667,0.055556,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556
2,New York County,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Queens County,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Richmond County,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0


In [96]:
# Acquire top 5 venues by neighborhood
num_top_venues = 5

for hood in nyc_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = nyc_grouped[nyc_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bronx County----
                 venue  freq
0           Donut Shop  0.08
1  Rental Car Location  0.08
2                  Gym  0.08
3           Food Court  0.08
4          Music Venue  0.04


----Kings County----
                       venue  freq
0       Caribbean Restaurant  0.17
1           Video Game Store  0.06
2             Discount Store  0.06
3               Liquor Store  0.06
4  Latin American Restaurant  0.06


----New York County----
                  venue  freq
0          Intersection  0.17
1               Dog Run  0.17
2  Gym / Fitness Center  0.17
3           Bus Station  0.17
4                   Gym  0.17


----Queens County----
                 venue  freq
0          Bus Station  0.25
1     Business Service  0.25
2          Sports Club  0.25
3    Korean Restaurant  0.25
4  American Restaurant  0.00


----Richmond County----
                venue  freq
0  Italian Restaurant  0.25
1      Sandwich Place  0.12
2          Bagel Shop  0.12
3       Grocery Store  0.12
4 

In [97]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Create a dataframe with top 10 most common venues

In [100]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = nyc_grouped['Neighborhood']

for ind in np.arange(nyc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nyc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronx County,Donut Shop,Food Court,Gym,Rental Car Location,Buffet,Fast Food Restaurant,Grocery Store,Gym / Fitness Center,Hotel,Deli / Bodega
1,Kings County,Caribbean Restaurant,Video Game Store,Miscellaneous Shop,Gym / Fitness Center,Electronics Store,Theater,Latin American Restaurant,Liquor Store,Lounge,Discount Store
2,New York County,Intersection,Bus Station,Gym,Dog Run,Park,Gym / Fitness Center,Bar,Bagel Shop,Grocery Store,Food Court
3,Queens County,Korean Restaurant,Sports Club,Bus Station,Business Service,Video Game Store,Deli / Bodega,Grocery Store,Food Court,Fast Food Restaurant,Electronics Store
4,Richmond County,Italian Restaurant,Grocery Store,Bagel Shop,Sandwich Place,Restaurant,Donut Shop,Park,Coffee Shop,Food Court,Fast Food Restaurant


<a id="4eresults"></a>

The list below represents the county with the most top 10 common venue check-ins for a type of restaurant based off category. This information will help identify the competitiveness in each borough. 

* <b>Bronx County: 5</b> (Buffet, Coffee Shop, Deli, Fast Food, Food Truck)
* <b>King County: 3</b> (Caribbean restaurant, Chinese restaurant, Mexican restaurant)
* <b>New York County: 2</b> (Buffet, Fast Food)
* <b>Queens County: 3</b> (Korean restaurant, Coffee Shop, Fast Food)
* <b>Richmond County: 6</b> (Asian restaurant, Sandwich, Restaurant, Italian restaurant, Mexican restaurant, Coffee shop)

We will compare this data to population and provide the best two possible locations.

<a id="torontotop10"></a>

#### 4f. Toronto foursquare top check-in points by neighborhood lat/lng values. 
We will use previous work to retrieve the information needed.

In [101]:
tor_neighborhood_latitude = df_merge_col.loc[0, 'Latitude'] # neighborhood latitude value
tor_neighborhood_longitude = df_merge_col.loc[0, 'Longitude'] # neighborhood longitude value

tor_neighborhood_name = df_merge_col.loc[0, 'Borough'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(tor_neighborhood_name, 
                                                               tor_neighborhood_latitude, 
                                                               tor_neighborhood_longitude))

Latitude and longitude values of Scarborough are 43.8066863, -79.1943534.


In [102]:
# type your answer here
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 6000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    tor_neighborhood_latitude, 
    tor_neighborhood_longitude, 
    radius, 
    LIMIT)
#url # display URL

In [103]:
results = requests.get(url).json()
#results

In [104]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    

In [105]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Toronto Pan Am Sports Centre,Athletics & Sports,43.790623,-79.193869
1,African Rainforest Pavilion,Zoo Exhibit,43.817725,-79.183433
2,Toronto Zoo,Zoo,43.820582,-79.181551
3,Polar Bear Exhibit,Zoo,43.823372,-79.185145
4,Australasia Pavillion,Zoo Exhibit,43.822563,-79.183286


In [106]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        # Error handlers
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
  
        except KeyError:
            continue
        
        
#         if b == 400:
#             break
        if results == []:
        
            continue
        
        # make the GET request
#         results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    return(nearby_venues)

In [107]:
# type your answer here

toronto_venues = getNearbyVenues(names=df_merge_col['Borough'],
                                   latitudes=df_merge_col['Latitude'],
                                   longitudes=df_merge_col['Longitude']
                                  )

Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
East York
East York
East Toronto
East York
East York
East York
East Toronto
East Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
North York
York
York
Downtown Toronto
West Toronto
W

In [108]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Scarborough,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,Scarborough,43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,Scarborough,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,Scarborough,43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,Scarborough,43.763573,-79.188711,Marina Spa,43.766,-79.191,Spa


In [109]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,116,116,116,116,116,116
Downtown Toronto,1318,1318,1318,1318,1318,1318
East Toronto,104,104,104,104,104,104
East York,73,73,73,73,73,73
Etobicoke,58,58,58,58,58,58
North York,238,238,238,238,238,238
Scarborough,85,85,85,85,85,85
West Toronto,169,169,169,169,169,169
York,18,18,18,18,18,18


In [110]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [111]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Central Toronto,0.008621,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.008621,0.0,0.0,0.008621,0.0,0.0,0.0,0.0,0.0
1,Downtown Toronto,0.003794,0.0,0.000759,0.000759,0.000759,0.000759,0.001517,0.002276,0.001517,...,0.002276,0.010622,0.001517,0.0,0.005311,0.0,0.006829,0.000759,0.001517,0.000759
2,East Toronto,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.009615,0.0,0.0,0.0
3,East York,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0
4,Etobicoke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0


In [112]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
            venue  freq
0     Coffee Shop  0.07
1  Sandwich Place  0.06
2            Park  0.05
3            Café  0.05
4     Pizza Place  0.04


----Downtown Toronto----
                 venue  freq
0          Coffee Shop  0.10
1                 Café  0.05
2           Restaurant  0.04
3  Japanese Restaurant  0.03
4             Beer Bar  0.02


----East Toronto----
                venue  freq
0    Greek Restaurant  0.09
1         Coffee Shop  0.07
2  Italian Restaurant  0.06
3                Café  0.05
4      Ice Cream Shop  0.04


----East York----
                 venue  freq
0          Coffee Shop  0.05
1                 Park  0.05
2             Pharmacy  0.04
3  Sporting Goods Shop  0.04
4                 Bank  0.04


----Etobicoke----
            venue  freq
0     Pizza Place  0.12
1  Sandwich Place  0.09
2   Grocery Store  0.05
3     Coffee Shop  0.05
4      Beer Store  0.03


----North York----
                    venue  freq
0             Coffee Shop  0.

In [113]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [114]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
tor_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
tor_neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    tor_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

tor_neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Park,Café,Pizza Place,Restaurant,Gym,Sushi Restaurant,Clothing Store,Dessert Shop
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Japanese Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Park,Seafood Restaurant
2,East Toronto,Greek Restaurant,Coffee Shop,Italian Restaurant,Café,Ice Cream Shop,Brewery,Bakery,American Restaurant,Pub,Pet Store
3,East York,Park,Coffee Shop,Pizza Place,Bank,Pharmacy,Sporting Goods Shop,Burger Joint,Grocery Store,Furniture / Home Store,Restaurant
4,Etobicoke,Pizza Place,Sandwich Place,Coffee Shop,Grocery Store,Gym,Home Service,Fast Food Restaurant,Pool,Liquor Store,Convenience Store
5,North York,Coffee Shop,Clothing Store,Pizza Place,Fast Food Restaurant,Restaurant,Japanese Restaurant,Furniture / Home Store,Grocery Store,Sandwich Place,Café
6,Scarborough,Chinese Restaurant,Coffee Shop,Breakfast Spot,Fast Food Restaurant,Bakery,Pizza Place,Intersection,Bus Line,Playground,Skating Rink
7,West Toronto,Bar,Café,Coffee Shop,Restaurant,Italian Restaurant,Bakery,Pizza Place,Breakfast Spot,Diner,Bookstore
8,York,Park,Fast Food Restaurant,Women's Store,Sandwich Place,Brewery,Bus Line,Coffee Shop,Convenience Store,Field,Hockey Arena


<a id="4fresults"></a>

The list below represents the county with the most top 10 common venue check-ins for a type of restaurant based off category. This information will help identify the competitiveness in each borough. 

* <b>Central Toronto: 6</b> (Coffee Shop, Sandwich, Cafe, Pizza Place, Restaurant, Sushi)
* <b>Downtown Toronto: 6</b> (Coffee Shop, Cafe, Restaurant, Japanese restaurant, Italian restaurant, Seafood restaurant)
* <b>East Toronto: 5</b> (Greek, Coffee Shop, Italian, Cafe, American)
* <b>East York 	: 4</b> (Coffee Shop, Pizza, Burger, Sandwich)
* <b>Etobicoke: 5</b> (Pizza, Sandwich, Coffee, Fast Food, Cafe)
* <b>North York: 6</b> (Coffee, Fast Food, Japanese, Restaurant, Sandwich, Cafe)
* <b>Scarborough: 6</b> (Coffee, Chinese, Breakfast, Fast food, Pizza, Middle Eastern)
* <b>West Toronto: 7</b> (Cafe, Coffee, Italian, Restaurant, Pizza, breakfast, Diner)
* <b>York: 2</b> (Fast Food, Caribbean)

We will compare this data to population and provide the best two possible locations.

<a id="results"></a>

### 5. Results & Discussion 

After reviewing the data, it was noticeable that the number of Argentinian styled restaurants in New York City greatly outnumbered the amount in Toronto. Although both cities are known for their diversity, the amount of Argentinian restaurants is very limited.

With the given research, we cannot draw a conclusion on why the numbers are different. We used the same formulas for searching Argentinian restaurants. 

All Toronto postal codes were searched within a 6000m radius with the same search queries used in New York Boroughs. We used a 6000m radius for New York City boroughs as well. 

New York City has a total population of 8,398,748 (2016 Census). A map cluster showing results of all queries by city shows Kings and Queens county have the least amount of Argentinian restaurants while having the highest amount of population.

In [92]:
map_clusters_nyc

In [93]:
df_POP2

Unnamed: 0,New York city,Bronx County,Kings County,New York County,Queens County,Richmond County
1,8398748,1432132,2582830,1628701,2278906,476179


Queens county has 2,278,906 people while having 5 Argentinian styled restaurants. Queens also 3 of the top 10 most common venues relating to food. This result, along with continued population growth, allows us to recommend Queens County as a good county for future Argentinian Restaurants. Other Counties in the area have more than 5 and continue to show positive results. 

* <b>Bronx County: 5</b> (Buffet, Coffee Shop, Deli, Fast Food, Food Truck)
* <b>King County: 3</b> (Caribbean restaurant, Chinese restaurant, Mexican restaurant)
* <b>New York County: 2</b> (Buffet, Fast Food)
* <b>Queens County: 3</b> (Korean restaurant, Coffee Shop, Fast Food)
* <b>Richmond County: 6</b> (Asian restaurant, Sandwich, Restaurant, Italian restaurant, Mexican restaurant, Coffee shop)

Kings county has a population of 2,582,830 people while having 5 Argentinian styled restaurants. Kings county also 3 of the top 10 most common venues relating to food. The restaurant range between Caribbean, Chinese, and Mexican restaurants. This result, along with continued population growth, allows us to recommend Kings County as a good county for future Argentinian Restaurants. Other Counties in the area have more than 5 results with lower population count. Kings and Queens county lead with the most population in New York City. 

Toronto Foursquare search results found different answers in the data in comparison to New York City. Toronto boroughs had more restaurants in the top 10 most common check-in locations compared to NYC. This may indicate the people of Toronto frequently check-in and eat at more restaurants. These restaurants are quiet diverse--ranging from sandwich, pizza, Italian, Japanese, middle eastern and many other cultural restaurants. 

* <b>Central Toronto: 6</b> (Coffee Shop, Sandwich, Cafe, Pizza Place, Restaurant, Sushi)
* <b>Downtown Toronto: 6</b> (Coffee Shop, Cafe, Restaurant, Japanese restaurant, Italian 
restaurant, Seafood restaurant)
* <b>West Toronto: 7</b> (Cafe, Coffee, Italian, Restaurant, Pizza, breakfast, Diner)
* <b>East Toronto: 5</b> (Greek, Coffee Shop, Italian, Cafe, American)
* <b>East York 	: 4</b> (Coffee Shop, Pizza, Burger, Sandwich)<br><br>

* <b>North York: 6</b> (Coffee, Fast Food, Japanese, Restaurant, Sandwich, Cafe)

* <b>Scarborough: 6</b> (Coffee, Chinese, Breakfast, Fast food, Pizza, Middle Eastern)<br><br>

* <b>Etobicoke: 5</b> (Pizza, Sandwich, Coffee, Fast Food, Cafe)
* <b>York: 2</b> (Fast Food, Caribbean)

The city of Toronto has a population of 2,691,675 (Canadian Census). According to Canadian Census, Toronto city & East York have a population of 874,035. North York is close second with 745,835.

In [94]:
df_Toronto_pop

Unnamed: 0,EastYork,NorthYork,Scarborough,Etobicoke,Total
Total,847035.0,745835.0,623140.0,475665.0,2691675.0


In [95]:
map_clusters_tor

Toronto & East York make up a good percentage of Toronto's population but also the amount of check-in's within the top 10 most common venues that happen to be restaurants/cafes. The diversity found in the searches leads me to believe an Argentinian would be accepted in the East York & Toronto city region. Most notably in the East York region as there are no Argentinian style restaurants as shown below. 

North York would be the next best location as there are no Argentinian locations found during our foursquare searches. 

<a id="conclusion"></a>

### 6. Conclusion

The purpose of this project was to identify and recommend two neighborhood cities within two diversely popular counties. The results would aid prospective restaurant entrepreneurs to open an Argentinian styled restaurant/cafe. Using the United States and Canadian Census, we located highly populated and culturally diverse boroughs within two major cities. Google API was used to find the latitude and longitude of the selected boroughs for each city. Foursquare API gave us access to further analysis on by calculating and providing the top 10 most common venues per borough; which provided information on competitors and consumer trends in the area. Foursquare also provided information such as restaurant name, latitude, longitude, and category name of the Argentinian restaurant queries we requested. 

We then clustered, by major county, the results of the queries and provided a visual map representing each result by its respected borough. This map will help aid future investors and entrepreneurs in locating a future location and can be used to find competitor locations to aim for competitor clients or avoid locations all together. 

Finally, we pinpointed two boroughs within New York City and Toronto which aligned with our purpose in providing solid locations for a future Argentinian restaurant. Each location was chosen based off low or no amounts of current Argentinian restaurants in a 6000m radius along with the highest population in selected borough. 

Toronto & East York make up a good percentage of Toronto's population, at 847,035. The amount of check-in's on the top 10 most common venues that happen to be restaurants/cafes also landed on Toronto & East York. The diversity found in the search leads me to believe an Argentinian restaurant would be a competitive location in the East York & Toronto city region. Most notably in the East York region as there are no Argentinian style restaurants as shown below. North York would be the next best location as there are no Argentinian locations found during our foursquare searches. 

Kings county, New York has a population of 2,582,830 while having five Argentinian styled restaurants. Kings county is provided three of the top 10 most common venues relating to food. The variety range between Caribbean, Chinese, and Mexican restaurants. This result, along with continued population growth, allows us to recommend Kings County as a good county for future Argentinian Restaurants. Other Counties in the area have more than 5 results with lower population count. Kings and Queens county lead with the most population in New York City. 