# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

  # 1. Introduction <a name="introduction"></a>

  ## 1.1 Background

The Coronavirus disease (COVID-19) is spreading all over the world nowadays. The disease has severely impacted people's living and working styles. Most people stay at home for more time than before. To reduce the risk of COVID-19, more and more people choose delivery service from restaurants, markets, et cetera, which makes a delivery man a hot job. 

  ## 1.2 Problem

It is not easy to work between boroughs due to the long distance between different boroughs and time costs on transportation. The problem is how to find the optimal boroughs and neighborhoods as the main working area. This project is going to do an optimal delivery job location analysis in New York City (NYC).

  ## 1.3 Interest

This project should help those who lose their job during this time and are willing to work in this field. Give them a simple reference when they are choosing where to work as a delivery man. The prerequisite is, the delivery man should be healthy and protect him or herself well. 

# 2. Data<a name="data"></a>

The data sources are:
* COVID-19 information from https://github.com/nychealth/coronavirusdata/blob/master/data-by-modzcta.csv 
* NYC neighborhoods geo dataset from https://geo.nyu.edu/catalog/nyu_2451_34572
* Geographic coordinate of candidate location will be obtained using **Google Maps API geocoding**.
* Number, type and location of venues will be obtained by **Foursquare API**.


The data need to collect are:
* COVID-19 case information about NYC.
* Geographic data of neighborhoods and boroughs in NYC.
* Venues information in the candidate borough and neighborhoods.

In [1]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install BeautifulSoup4
from bs4 import BeautifulSoup as Soup
import requests # library to handle requests



### Get the latest COVID-19 information of NYC


In [2]:
res = requests.get("https://github.com/nychealth/coronavirus-data/blob/master/data-by-modzcta.csv")
soup = Soup(res.content,'html.parser')
table = soup.find_all('table')[0] 
df_covid = pd.read_html(str(table))[0]
print(df_covid.shape)
df_covid.head()

(178, 11)


Unnamed: 0.1,Unnamed: 0,MODIFIED_ZCTA,NEIGHBORHOOD_NAME,BOROUGH_GROUP,COVID_CASE_COUNT,COVID_CASE_RATE,POP_DENOMINATOR,COVID_DEATH_COUNT,COVID_DEATH_RATE,PERCENT_POSITIVE,TOTAL_COVID_TESTS
0,,10001,Chelsea/NoMad/West Chelsea,Manhattan,412,1748.5,23563.03,24,101.85,8.1,5087
1,,10002,Chinatown/Lower East Side,Manhattan,1205,1569.92,76755.41,160,208.45,11.35,10620
2,,10003,East Village/Gramercy/Greenwich Village,Manhattan,501,931.2,53801.62,34,63.2,6.14,8164
3,,10004,Financial District,Manhattan,36,986.14,3650.61,1,27.39,6.55,550
4,,10005,Financial District,Manhattan,75,893.27,8396.11,2,23.82,5.87,1277


Clear the dataframe to make it more visuliazed

In [3]:
df_covid.drop(['Unnamed: 0','MODIFIED_ZCTA'], axis = 1, inplace = True)
df_covid.columns = ['Neighborhood', 'Borough','Case_count','Case_rate','Pop_denominator','Death_count','Death_rate','Percent_positive','Total_test']
df_covid.head()

Unnamed: 0,Neighborhood,Borough,Case_count,Case_rate,Pop_denominator,Death_count,Death_rate,Percent_positive,Total_test
0,Chelsea/NoMad/West Chelsea,Manhattan,412,1748.5,23563.03,24,101.85,8.1,5087
1,Chinatown/Lower East Side,Manhattan,1205,1569.92,76755.41,160,208.45,11.35,10620
2,East Village/Gramercy/Greenwich Village,Manhattan,501,931.2,53801.62,34,63.2,6.14,8164
3,Financial District,Manhattan,36,986.14,3650.61,1,27.39,6.55,550
4,Financial District,Manhattan,75,893.27,8396.11,2,23.82,5.87,1277


### Get the neighborhoods and boroughs geographical information

In [4]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Load the data

In [5]:
import json # library to handle JSON files

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

Tranform the data into a pandas dataframe

In [6]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Loop through the data and fill the dataframe one row at a time.

In [7]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [8]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [9]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Merge two dataframes and keep data with COVID-19 informatiom only

In [10]:
df = pd.merge(df_covid, neighborhoods, on=["Borough", "Neighborhood"], how="left")
df.dropna(inplace=True)
print(df.shape)
df.head()

(50, 11)


Unnamed: 0,Neighborhood,Borough,Case_count,Case_rate,Pop_denominator,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude
3,Financial District,Manhattan,36,986.14,3650.61,1,27.39,6.55,550,40.707107,-74.010665
4,Financial District,Manhattan,75,893.27,8396.11,2,23.82,5.87,1277,40.707107,-74.010665
5,Financial District,Manhattan,34,983.29,3457.77,0,0.0,5.87,579,40.707107,-74.010665
9,Chelsea,Manhattan,564,1139.38,49500.52,43,86.87,6.67,8455,40.744035,-74.003116
19,Lincoln Square,Manhattan,603,1000.42,60274.81,53,87.93,6.03,10006,40.773529,-73.985338


### Group by borough and order by case count

In [11]:
df_group = df_covid.groupby(['Borough']).sum()
df_group.sort_values(by=['Case_count'])

Unnamed: 0_level_0,Case_count,Case_rate,Pop_denominator,Death_count,Death_rate,Percent_positive,Total_test
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Staten Island,14125,35747.23,476179.01,895,2124.51,217.88,78710
Manhattan,26542,68345.47,1611943.49,2476,5993.91,401.83,275231
Bronx,48089,83919.05,1434692.65,3877,6641.04,472.2,251352
Brooklyn,57878,81826.43,2582829.99,5568,8185.72,522.18,401978
Queens,64999,162064.71,2288709.82,5914,14085.55,1107.89,369567


In [12]:
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values



Use geopy library to get the latitude and longitude values of New York City.

In [13]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


### Create a map of New York with neighborhoods have COVID-19 information

In [14]:
!pip install folium
import folium # map rendering library
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 12.1MB/s ta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


#### According to the above information, the Queens borough has a high case and death number.  The situation may cause public panic and stronger stay at home thoughts which brings more delivery work. So we can focus on Queens borough.

###  Explore neighborhoods in Manhattan 

Create a new dataframe of the Queens data.

In [15]:
queens_data = df[df['Borough'] == 'Queens'].reset_index(drop=True)
queens_data.head()

Unnamed: 0,Neighborhood,Borough,Case_count,Case_rate,Pop_denominator,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude
0,Sunnyside,Queens,522,2106.9,24775.7,42,169.52,11.64,4484,40.740176,-73.926916
1,Long Island City,Queens,58,1143.84,5070.64,2,39.44,5.89,985,40.750217,-73.939202
2,College Point,Queens,602,2581.17,23322.74,42,180.08,17.31,3478,40.784903,-73.843045
3,Whitestone,Queens,871,2178.83,39975.56,104,260.16,14.41,6046,40.781291,-73.814202
4,Jackson Heights,Queens,2687,4294.88,62562.88,254,405.99,19.75,13604,40.751981,-73.882821


get the geographical coordinates of Queens.

In [16]:
address = 'Queens, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Queens are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Queens are 40.7498243, -73.7976337.


In [17]:
# create map of Queens using latitude and longitude values
map_queens = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(queens_data['Latitude'], queens_data['Longitude'], queens_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_queens)  
    
map_queens

Define Foursquare Credentials and Version

In [18]:
CLIENT_ID = 'OPV5L1FZIKZWFJHAJB1GFRZVFDVMAJSYB45U3O3Z4ZXJU14Q' # your Foursquare ID
CLIENT_SECRET = '115JF4D4VUSZKVEW1UCS5B5Q1AYPFNU4WVVSJBDOJJTSF3Y5' # your Foursquare Secret
VERSION = '20200719' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OPV5L1FZIKZWFJHAJB1GFRZVFDVMAJSYB45U3O3Z4ZXJU14Q
CLIENT_SECRET:115JF4D4VUSZKVEW1UCS5B5Q1AYPFNU4WVVSJBDOJJTSF3Y5


#### Create a new dataframe with TOP 100 venues within 500m in all neighborhoods in Queens.

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
LIMIT = 100
queens_venues = getNearbyVenues(names=queens_data['Neighborhood'],
                                   latitudes=queens_data['Latitude'],
                                   longitudes=queens_data['Longitude']
                                  )

Sunnyside
Long Island City
College Point
Whitestone
Jackson Heights
Elmhurst
Rego Park
Forest Hills
Woodside
Maspeth
Middle Village
Cambria Heights
St. Albans
Kew Gardens
Ozone Park
Ozone Park
Richmond Hill
South Ozone Park
Woodhaven
Rosedale
Bellerose
Queens Village
Queens Village
Breezy Point


In [21]:
print(queens_venues.shape)
queens_venues.head()

(757, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Sunnyside,40.740176,-73.926916,Marabella's Pizza,40.740174,-73.923919,Pizza Place
1,Sunnyside,40.740176,-73.926916,Fish House,40.740322,-73.923142,Seafood Restaurant
2,Sunnyside,40.740176,-73.926916,Nita's European Bakery,40.739681,-73.924769,Bakery
3,Sunnyside,40.740176,-73.926916,Don Pollo II,40.740049,-73.923763,Peruvian Restaurant
4,Sunnyside,40.740176,-73.926916,I Love Paraguay,40.741087,-73.92149,South American Restaurant


Check how many venues were returned for each neighborhood

In [22]:
queens_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bellerose,19,19,19,19,19,19
Breezy Point,4,4,4,4,4,4
Cambria Heights,14,14,14,14,14,14
College Point,44,44,44,44,44,44
Elmhurst,32,32,32,32,32,32
Forest Hills,38,38,38,38,38,38
Jackson Heights,84,84,84,84,84,84
Kew Gardens,47,47,47,47,47,47
Long Island City,70,70,70,70,70,70
Maspeth,33,33,33,33,33,33


### Analyze each neighborhood in Queens

In [23]:
# one hot encoding
queens_onehot = pd.get_dummies(queens_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
queens_onehot['Neighborhood'] = queens_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [queens_onehot.columns[-1]] + list(queens_onehot.columns[:-1])
queens_onehot = queens_onehot[fixed_columns]

queens_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Colombian Restaurant,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Empanada Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Truck,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Health Food Store,Himalayan Restaurant,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kosher Restaurant,Latin American Restaurant,Laundromat,Laundry Service,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Moving Target,Nail Salon,New American Restaurant,Nightclub,Office,Optical Shop,Park,Pedestrian Plaza,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Post Office,Pub,Restaurant,Romanian Restaurant,Salon / Barbershop,Sandwich Place,School,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tea Room,Thai Restaurant,Theater,Tibetan Restaurant,Track,Trail,Train Station,Video Game Store,Video Store,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,Sunnyside,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Sunnyside,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Sunnyside,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Sunnyside,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Sunnyside,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
queens_grouped = queens_onehot.groupby('Neighborhood').mean().reset_index()
queens_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Colombian Restaurant,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Empanada Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Truck,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Health Food Store,Himalayan Restaurant,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Kosher Restaurant,Latin American Restaurant,Laundromat,Laundry Service,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Moving Target,Nail Salon,New American Restaurant,Nightclub,Office,Optical Shop,Park,Pedestrian Plaza,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Post Office,Pub,Restaurant,Romanian Restaurant,Salon / Barbershop,Sandwich Place,School,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tea Room,Thai Restaurant,Theater,Tibetan Restaurant,Track,Trail,Train Station,Video Game Store,Video Store,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,Bellerose,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0
1,Breezy Point,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
2,Cambria Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,College Point,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.090909,0.0,0.022727,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.022727,0.0,0.022727,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0
4,Elmhurst,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.21875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0
5,Forest Hills,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.078947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.052632,0.0,0.026316,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.052632
6,Jackson Heights,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.059524,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.02381,0.011905,0.011905,0.011905,0.0,0.011905,0.011905,0.0,0.011905,0.0,0.011905,0.02381,0.0,0.011905,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.02381,0.0,0.107143,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.02381,0.0,0.0,0.0,0.0,0.059524,0.011905,0.02381,0.0,0.0,0.011905,0.02381,0.02381,0.011905,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0
7,Kew Gardens,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.042553,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.0,0.021277,0.06383,0.0,0.0,0.042553,0.0,0.0,0.021277,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.042553,0.021277,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0
8,Long Island City,0.0,0.0,0.0,0.0,0.014286,0.014286,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.014286,0.057143,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.014286,0.014286,0.0,0.014286,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.014286,0.014286,0.0,0.014286,0.114286,0.0,0.014286,0.0,0.0,0.0,0.028571,0.0,0.014286,0.0,0.0,0.028571,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.114286,0.0,0.0,0.014286,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.014286,0.0,0.042857,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.057143,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.028571,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Maspeth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.090909,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.090909,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Write a function to sort the venues in descending order.

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = queens_grouped['Neighborhood']

for ind in np.arange(queens_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(queens_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bellerose,Deli / Bodega,Pizza Place,Italian Restaurant,Pub,Chinese Restaurant,Mobile Phone Shop,Seafood Restaurant,Motel,Gas Station,Donut Shop
1,Breezy Point,Beach,Monument / Landmark,Trail,Yoga Studio,Dumpling Restaurant,Flower Shop,Fish Market,Filipino Restaurant,Fast Food Restaurant,Farmers Market
2,Cambria Heights,Caribbean Restaurant,Restaurant,Cosmetics Shop,Liquor Store,Moving Target,Bakery,Chinese Restaurant,Pharmacy,Nightclub,Gym / Fitness Center
3,College Point,Deli / Bodega,Asian Restaurant,Bar,Pizza Place,Bakery,Chinese Restaurant,Seafood Restaurant,Latin American Restaurant,Caribbean Restaurant,Fried Chicken Joint
4,Elmhurst,Thai Restaurant,Mexican Restaurant,Chinese Restaurant,Vietnamese Restaurant,South American Restaurant,Bakery,Ice Cream Shop,Indonesian Restaurant,Malay Restaurant,Colombian Restaurant
5,Forest Hills,Gym / Fitness Center,Gym,Thai Restaurant,Convenience Store,Park,Pharmacy,Pizza Place,Yoga Studio,Optical Shop,Snack Place
6,Jackson Heights,Latin American Restaurant,Peruvian Restaurant,South American Restaurant,Bakery,Mobile Phone Shop,Mexican Restaurant,Thai Restaurant,Grocery Store,Diner,Supermarket
7,Kew Gardens,Cosmetics Shop,Chinese Restaurant,Bar,Bank,Pizza Place,Pet Store,Donut Shop,Indian Restaurant,Deli / Bodega,Park
8,Long Island City,Hotel,Coffee Shop,Pizza Place,Bar,Mexican Restaurant,Café,Supermarket,Donut Shop,Deli / Bodega,Market
9,Maspeth,Diner,Pizza Place,Bank,Grocery Store,Mobile Phone Shop,Chinese Restaurant,Bakery,Donut Shop,Lounge,Flower Shop


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5

queens_grouped_clustering = queens_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(queens_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 3, 1, 4, 1, 1, 1, 1, 1], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [28]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

queens_merged = queens_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
queens_merged = queens_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

queens_merged.head()

Unnamed: 0,Neighborhood,Borough,Case_count,Case_rate,Pop_denominator,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Sunnyside,Queens,522,2106.9,24775.7,42,169.52,11.64,4484,40.740176,-73.926916,1,Pizza Place,Chinese Restaurant,Bakery,Discount Store,Coffee Shop,Deli / Bodega,South American Restaurant,Italian Restaurant,Mexican Restaurant,Diner
1,Long Island City,Queens,58,1143.84,5070.64,2,39.44,5.89,985,40.750217,-73.939202,1,Hotel,Coffee Shop,Pizza Place,Bar,Mexican Restaurant,Café,Supermarket,Donut Shop,Deli / Bodega,Market
2,College Point,Queens,602,2581.17,23322.74,42,180.08,17.31,3478,40.784903,-73.843045,1,Deli / Bodega,Asian Restaurant,Bar,Pizza Place,Bakery,Chinese Restaurant,Seafood Restaurant,Latin American Restaurant,Caribbean Restaurant,Fried Chicken Joint
3,Whitestone,Queens,871,2178.83,39975.56,104,260.16,14.41,6046,40.781291,-73.814202,0,Dance Studio,Deli / Bodega,Bubble Tea Shop,Candy Store,Yoga Studio,Farmers Market,Food,Flower Shop,Fish Market,Filipino Restaurant
4,Jackson Heights,Queens,2687,4294.88,62562.88,254,405.99,19.75,13604,40.751981,-73.882821,1,Latin American Restaurant,Peruvian Restaurant,South American Restaurant,Bakery,Mobile Phone Shop,Mexican Restaurant,Thai Restaurant,Grocery Store,Diner,Supermarket


In [29]:
queens_merged.sort_values(by=['Case_count'])

Unnamed: 0,Neighborhood,Borough,Case_count,Case_rate,Pop_denominator,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Long Island City,Queens,58,1143.84,5070.64,2,39.44,5.89,985,40.750217,-73.939202,1,Hotel,Coffee Shop,Pizza Place,Bar,Mexican Restaurant,Café,Supermarket,Donut Shop,Deli / Bodega,Market
23,Breezy Point,Queens,111,3138.53,3536.69,4,113.1,8.36,1327,40.557401,-73.925512,2,Beach,Monument / Landmark,Trail,Yoga Studio,Dumpling Restaurant,Flower Shop,Fish Market,Filipino Restaurant,Fast Food Restaurant,Farmers Market
20,Bellerose,Queens,515,2529.06,20363.28,34,166.97,19.17,2687,40.728573,-73.720128,1,Deli / Bodega,Pizza Place,Italian Restaurant,Pub,Chinese Restaurant,Mobile Phone Shop,Seafood Restaurant,Motel,Gas Station,Donut Shop
0,Sunnyside,Queens,522,2106.9,24775.7,42,169.52,11.64,4484,40.740176,-73.926916,1,Pizza Place,Chinese Restaurant,Bakery,Discount Store,Coffee Shop,Deli / Bodega,South American Restaurant,Italian Restaurant,Mexican Restaurant,Diner
13,Kew Gardens,Queens,554,2880.28,19234.25,51,265.15,17.32,3199,40.705179,-73.829819,1,Cosmetics Shop,Chinese Restaurant,Bar,Bank,Pizza Place,Pet Store,Donut Shop,Indian Restaurant,Deli / Bodega,Park
2,College Point,Queens,602,2581.17,23322.74,42,180.08,17.31,3478,40.784903,-73.843045,1,Deli / Bodega,Asian Restaurant,Bar,Pizza Place,Bakery,Chinese Restaurant,Seafood Restaurant,Latin American Restaurant,Caribbean Restaurant,Fried Chicken Joint
21,Queens Village,Queens,693,3585.57,19327.45,54,279.4,22.89,3027,40.718893,-73.738715,1,Bus Stop,Bank,Pedestrian Plaza,Fish Market,Mexican Restaurant,Mobile Phone Shop,Fried Chicken Joint,Sandwich Place,Bakery,Salon / Barbershop
11,Cambria Heights,Queens,715,3488.78,20494.26,50,243.97,23.34,3063,40.692775,-73.735269,3,Caribbean Restaurant,Restaurant,Cosmetics Shop,Liquor Store,Moving Target,Bakery,Chinese Restaurant,Pharmacy,Nightclub,Gym / Fitness Center
14,Ozone Park,Queens,749,2814.26,26614.42,55,206.65,21.13,3545,40.680708,-73.843203,1,Gym,Furniture / Home Store,Pizza Place,Pharmacy,Bank,Diner,Chinese Restaurant,Mobile Phone Shop,Breakfast Spot,Sandwich Place
9,Maspeth,Queens,785,2269.22,34593.45,73,211.02,14.71,5335,40.725427,-73.896217,1,Diner,Pizza Place,Bank,Grocery Store,Mobile Phone Shop,Chinese Restaurant,Bakery,Donut Shop,Lounge,Flower Shop


In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(queens_merged['Latitude'], queens_merged['Longitude'], queens_merged['Neighborhood'], queens_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 3. Methodology <a name="methodology"></a>

In the first step, we have collected the required data: COVID-19 case number and geographical information of NYC.  
The second step is to find the most potential boroughs, according to the data above. According to the above information, the Queens borough has a high case and death number. The situation may cause public panic and more energetic stay at home thoughts which brings more delivery work. So, we define Queens as the most potential borough for delivery work.
In the third and final step, we will focus on the Queens borough. We will explore all neighborhoods by Foursquare API. TOP 10 venues in each neighborhood will be listed by taking the mean of the frequency of occurrence of each category for qualitative analysis, and neighborhood clusters will be mapped by using k-means clustering. 
The whole project uses the quantitive methodology to locate the principal borough and visualization methodology to narrow the range to neighborhoods. Lastly, use qualitative methodology to analyze the feasibility. 

# 4. Analysis<a name="analysis"></a>

We can get details of each cluster, like COVID-19 information and TOP10 venue in each neighborhood. Restaurants and varieties of stops and markets are the leading venues in all clusters. That means there are lots of merchants who may need to provide delivery service. 
From the cluster map, we know that cluster 2 is far away from other clusters. From the venue table below, we see that the first four most common venues in cluster 2 cannot provide delivery service. So, the optimal location to work as a delivery man should be in cluster 0, 1, 3, 4.

In [31]:
queens_merged.loc[queens_merged['Cluster Labels'] == 0, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Borough,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Queens,104,260.16,14.41,6046,40.781291,-73.814202,0,Dance Studio,Deli / Bodega,Bubble Tea Shop,Candy Store,Yoga Studio,Farmers Market,Food,Flower Shop,Fish Market,Filipino Restaurant


In [32]:
queens_merged.loc[queens_merged['Cluster Labels'] == 1, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Borough,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Queens,42,169.52,11.64,4484,40.740176,-73.926916,1,Pizza Place,Chinese Restaurant,Bakery,Discount Store,Coffee Shop,Deli / Bodega,South American Restaurant,Italian Restaurant,Mexican Restaurant,Diner
1,Queens,2,39.44,5.89,985,40.750217,-73.939202,1,Hotel,Coffee Shop,Pizza Place,Bar,Mexican Restaurant,Café,Supermarket,Donut Shop,Deli / Bodega,Market
2,Queens,42,180.08,17.31,3478,40.784903,-73.843045,1,Deli / Bodega,Asian Restaurant,Bar,Pizza Place,Bakery,Chinese Restaurant,Seafood Restaurant,Latin American Restaurant,Caribbean Restaurant,Fried Chicken Joint
4,Queens,254,405.99,19.75,13604,40.751981,-73.882821,1,Latin American Restaurant,Peruvian Restaurant,South American Restaurant,Bakery,Mobile Phone Shop,Mexican Restaurant,Thai Restaurant,Grocery Store,Diner,Supermarket
6,Queens,99,235.45,16.69,6736,40.728974,-73.857827,1,Bakery,Sandwich Place,Grocery Store,Bagel Shop,Pizza Place,Pharmacy,Donut Shop,Sushi Restaurant,Kosher Restaurant,Liquor Store
7,Queens,181,256.54,14.4,10425,40.725264,-73.844475,1,Gym / Fitness Center,Gym,Thai Restaurant,Convenience Store,Park,Pharmacy,Pizza Place,Yoga Studio,Optical Shop,Snack Place
8,Queens,194,224.36,15.45,15544,40.746349,-73.901842,1,Grocery Store,Filipino Restaurant,Latin American Restaurant,Bakery,Thai Restaurant,American Restaurant,Pub,Bar,Donut Shop,Pizza Place
9,Queens,73,211.02,14.71,5335,40.725427,-73.896217,1,Diner,Pizza Place,Bank,Grocery Store,Mobile Phone Shop,Chinese Restaurant,Bakery,Donut Shop,Lounge,Flower Shop
10,Queens,97,267.99,15.1,5681,40.716415,-73.881143,1,South American Restaurant,Sandwich Place,Bank,Sushi Restaurant,Bakery,Diner,Dessert Shop,Italian Restaurant,Farmers Market,Playground
13,Queens,51,265.15,17.32,3199,40.705179,-73.829819,1,Cosmetics Shop,Chinese Restaurant,Bar,Bank,Pizza Place,Pet Store,Donut Shop,Indian Restaurant,Deli / Bodega,Park


In [33]:
queens_merged.loc[queens_merged['Cluster Labels'] == 2, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Borough,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Queens,4,113.1,8.36,1327,40.557401,-73.925512,2,Beach,Monument / Landmark,Trail,Yoga Studio,Dumpling Restaurant,Flower Shop,Fish Market,Filipino Restaurant,Fast Food Restaurant,Farmers Market


In [34]:
queens_merged.loc[queens_merged['Cluster Labels'] == 4, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Borough,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Queens,295,317.63,19.09,17767,40.744049,-73.881656,4,Thai Restaurant,Mexican Restaurant,Chinese Restaurant,Vietnamese Restaurant,South American Restaurant,Bakery,Ice Cream Shop,Indonesian Restaurant,Malay Restaurant,Colombian Restaurant


In [35]:
queens_merged.loc[queens_merged['Cluster Labels'] == 3, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Borough,Death_count,Death_rate,Percent_positive,Total_test,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Queens,50,243.97,23.34,3063,40.692775,-73.735269,3,Caribbean Restaurant,Restaurant,Cosmetics Shop,Liquor Store,Moving Target,Bakery,Chinese Restaurant,Pharmacy,Nightclub,Gym / Fitness Center
12,Queens,91,231.28,22.28,5809,40.694445,-73.758676,3,Caribbean Restaurant,Convenience Store,Grocery Store,Shopping Mall,Fast Food Restaurant,Fried Chicken Joint,Motorcycle Shop,Café,Donut Shop,Discount Store


# 5. Results and Discussion<a name="results"></a>

The result is neighborhoods in three clusters. Each neighborhood has different common venues, and TOP10 venues are listed. More restaurants, shops, and markets mean more potential delivery work opportunities. The COVID-19 information in each neighborhood can remind the delivery man to be aware of the situation.

# 6. Conclusion<a name="conclusion"></a>

Purpose of this project was to combine existing COVID-19 data with data get from Foursquare API to give a guide to people who is willing to find a delivery man job as many people lose their job nowadays and delivery man becomes a hot job. 
The project is simplified cause some COVID-19 data do not have corresponding neighborhoods, so neighborhoods without COVID-19 data and COVID-19 data without geographic coordinates are deleted from the candidate.
For further study, the project could be improved by geocoding all COVID-19 data so more data could be used.
