<h1><center>Studying COVID-19 Cases in New York</center></h1>

# Introduction to the Problem

Day by day COVID-19 cases are increasing and everyone is concern about going out. Nobody knows if they are virus carriers as sometimes they have a strong immunity system that is enough to keep that virus away or at least they don't feel sick. Are poplular shops the reason? Do they have to be locked until this pandemic is gone? From my own prespective, popular shops often have a high chance to have more cases than unpopular ones. The most common shops that most people visit are the ones which should be closed because most likely the most cases came from similar shops.

The audience I'm targeting can be the shop owners or anyone interested in knowing what shops have a high chance to be closed.

# Solution

What is the solution for this? Simply by checking what the popular shops are near New York and predict what shops have a high chance to be locked. One way to do this is by clustering the venues that people do see are the most common for them.

Fetching the popular venues will be from Foursquare API.

# Data

In addition to using Foursquare API to get the nearby venues in New York, I could use the dataset we used in the previous labs to get the location coordinates. For more information about this data, check the description below!

### Description

Like we discussed, I'll be using the dataset we used in the previous labs. Each row will contain of a borough, neighborhood, latitude and longitude. Then by using the Foursquare API, I'll be directing the nearby venues via those coordiantes of each neighborhood.

# Importing Libraries

Let's import the libraries we need for this project.

In [1]:
import json
import time
import folium
import requests
import numpy as np
import pandas as pd
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors

from sklearn.cluster import KMeans

# Cleaning the data

Before going deeply to the problem and trying to predict, let's get our dataset ready and cleaned up.

### 1. Read our dataset.

In [2]:
# Read the JSON file content
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)["features"]
# Defining the column names that we'll be using
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# Defining the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
# Loop through all newyork data
for data in newyork_data:
    # Extract the borough and the neighborhood name
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    # Define the coordinates
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    # Append the row to the dataframe
    neighborhoods = neighborhoods.append({'Borough': borough, 'Neighborhood': neighborhood_name, 
                                          'Latitude': neighborhood_lat, 'Longitude': neighborhood_lon}, 
                                          ignore_index=True)
# Display the first five rows
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### 2. Check the data types.

In [3]:
neighborhoods.dtypes

Borough          object
Neighborhood     object
Latitude        float64
Longitude       float64
dtype: object

Looks like exactly what we need. Borough and neighborhood were defined as objects meaning they are strings and in the other hand, latitude and longitude were defined as floats so we are ready to go with this part!

# Analyzing the Data

Let's have a look at how our data is structured and what it contains.

### 1. Check how many unique boroughs extracted.

In [4]:
unique_boroughs = neighborhoods["Borough"].unique()
print(f'Found {len(unique_boroughs)} unique boroughs: {", ".join(unique_boroughs)}')

Found 5 unique boroughs: Bronx, Manhattan, Brooklyn, Queens, Staten Island


Cool, we have 5 unique boroughs. Let's check how many unique neighborhoods we got.

### 2. Check how many unique neighborhoods.

In [5]:
unique_neighborhoods = neighborhoods["Neighborhood"].unique()
print(f'Found {len(unique_neighborhoods)} unique neighborhoods.')

Found 302 unique neighborhoods.


Wow that's too many, 302 unique neighborhoods were extracted. I wonder how many rows we have after watching this.

### 3. Check how many records the dataset contains.

In [6]:
print(f"Found {neighborhoods.shape[0]} records in the dataset.")

Found 306 records in the dataset.


### 4. Check how many neighborhoods each borough has.

In [7]:
neighborhoods.groupby("Borough").count()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,52,52,52
Brooklyn,70,70,70
Manhattan,40,40,40
Queens,81,81,81
Staten Island,63,63,63


### 5. Create a new dataframe with only "Queens" borough.

In [8]:
# Create a new dataframe with only Queens borough (I wished doing that for all new york but the rate limits would be annoying)
queens_df = neighborhoods[neighborhoods['Borough'] == 'Queens']
queens_df.reset_index(drop=True)
queens_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
129,Queens,Astoria,40.768509,-73.915654
130,Queens,Woodside,40.746349,-73.901842
131,Queens,Jackson Heights,40.751981,-73.882821
132,Queens,Elmhurst,40.744049,-73.881656
133,Queens,Howard Beach,40.654225,-73.838138


# Foursquare

Time to set up foursquare API. We will need this to fetch nearby venues of the neighborhoods in our dataset. For this, I'll be creating a class that handles foursquare stuff just in case I wanted it later for other purposes. Ultimate reusability!

### 1. Set up Foursquare class.

In [9]:
def handle_timeout(func):
    def wrapper(*args, **kwargs):
        while True:
            try:
                result = func(*args, **kwargs)
            except:
                # Sleep for 3 seconds if encountered an error then try again
                time.sleep(3)
            return result
    return wrapper

class Foursquare:
    BASE_URI = "https://api.foursquare.com/v2"
    
    def __init__(self, client_id, client_secret, version):
        self.client_id = client_id
        self.client_secret = client_secret
        self.version = version
    
    @handle_timeout
    def explore(self, latitude, longitude, radius=500, limit=100):
        nearby_venues = []
        params = self.get_params(ll=f"{latitude},{longitude}", radius=radius, limit=limit)
        url = f"{self.BASE_URI}/venues/explore"
        result = requests.get(url, params=params)
        items = result.json().get("response", {}).get("groups", [{}])[0].get("items", []) or []
        for v in items:
            if v:
                nearby_venues.append([
                    v['venue']['name'], 
                    v['venue']['location']['lat'], 
                    v['venue']['location']['lng'],  
                    v['venue']['categories'][0]['name']])
        return nearby_venues
    
    def get_params(self, **kwargs):
        params = kwargs
        params.update({
            "client_id": self.client_id,
            "client_secret": self.client_secret,
            "v": self.version
        })
        return params

### 2. Set up Foursquare credentials.

In [10]:
# Defining Foursquare credentials
CLIENT_ID = "REDACTED"
CLIENT_SECRET = "REDACTED"
VERSION = "20180605"
# Initialize the foursquare instance
foursquare = Foursquare(CLIENT_ID, CLIENT_SECRET, VERSION)
# Let's check how the output looks
foursquare.explore(40.7128, -74.0060, limit=1)

[['The Bar Room at Temple Court',
  40.7114477287544,
  -74.00680157032005,
  'Hotel Bar']]

### 3. Define New York Coordiantes.

In [11]:
newyork_address = 'New York City, NY'
newyork_lat = 40.7128
newyork_lng = -74.0060

### 4. Visualize Queens borough map.

In [12]:
queens_map = folium.Map(location=[newyork_lat, newyork_lng], zoom_start=10)
# Loop through neighborhoods to add markers
for lat, lng, borough, neighborhood in zip(queens_df['Latitude'], queens_df['Longitude'], queens_df['Borough'], queens_df['Neighborhood']):
    label = folium.Popup(f"{neighborhood}, {borough}", parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc',
                        fill_opacity=0.7,
                        parse_html=False).add_to(queens_map)  
    
queens_map

In case you can't see this map: https://imgur.com/gtjaiyr

### 5. Fetch the nearby venues.

In [13]:
venues_list = []
for name, lat, lng in zip(neighborhoods["Neighborhood"], neighborhoods["Latitude"], neighborhoods["Longitude"]):
    print(name)
    venues = foursquare.explore(lat, lng)
    for venue in venues:
        venues_list.append([(name, lat, lng, venue[0], venue[1], venue[2], venue[3])])
queens_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
queens_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 
                         'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

Wondering if all worked out fine, let's check the dataframe in additon to the number of rows.

In [14]:
print(f"Fetched {queens_venues.shape[0]} venues.")
queens_venues

Fetched 10112 venues.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.844700,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop
...,...,...,...,...,...,...,...
10107,Fox Hills,40.617311,-74.081740,Bums Chicken N Ribs Joint,40.618192,-74.085506,BBQ Joint
10108,Fox Hills,40.617311,-74.081740,Bums Backyard,40.618083,-74.085603,Cocktail Bar
10109,Fox Hills,40.617311,-74.081740,Nettys playhouse,40.616856,-74.077566,Playground
10110,Fox Hills,40.617311,-74.081740,MTA Bus - Targee St & Vanderbilt Av (S74/S76),40.614856,-74.084598,Bus Stop


Wow, 10112 venues. Let's also see how many venues each neighborhood has.

In [15]:
queens_venues.groupby("Neighborhood").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,33,33,33,33,33,33
Annadale,12,12,12,12,12,12
Arden Heights,5,5,5,5,5,5
Arlington,6,6,6,6,6,6
Arrochar,21,21,21,21,21,21
...,...,...,...,...,...,...
Woodhaven,23,23,23,23,23,23
Woodlawn,26,26,26,26,26,26
Woodrow,20,20,20,20,20,20
Woodside,77,77,77,77,77,77


Way many venues! Let's also check how many unique categories there are.

In [16]:
print(f"Fetched {len(queens_venues['Venue Category'].unique())} unique venue category.")

Fetched 430 unique venue category.


Great, everything looks fine now. Let's carry on.

# Analyze Venues Dataframe

Next thing we need to do is to analyze the venues dataframe. First thing we need to one-hot encode the categories to create a new dataframe with the frequency of each.

### 1. One-hot encoding.

In [17]:
# one hot encoding
queens_onehot = pd.get_dummies(queens_venues[['Venue Category']], prefix="", prefix_sep="")
# Add neighborhood column to the dataframe
queens_onehot['Neighborhood'] = queens_venues['Neighborhood'] 
# Move neighborhood column to be the first column
fixed_columns = [queens_onehot.columns[-1]] + list(queens_onehot.columns[:-1])
# Create a new dataframe with the categories plus the neighborhood column
queens_onehot = queens_onehot[fixed_columns]
# Display the first five rows
queens_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 2. Group by neighborhood and by the frequency of each category.

In [18]:
queens_grouped = queens_onehot.groupby('Neighborhood').mean().reset_index()
queens_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,...,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
1,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
2,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
3,Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
4,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
296,Woodhaven,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
297,Woodlawn,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
298,Woodrow,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0
299,Woodside,0.0,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,0.0


### 3. Create a new dataframe with the most common venues.

First, we create a function (borrowed from the labs) to return the most common venues in descending order.

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[:num_top_venues]

Next we create the actual dataframe with the top 5 venues of each neighborhood (also borrowed from the labs).

In [20]:
# Initialize the number of top venues we want
num_top_venues = 5
indicators = ['st', 'nd', 'rd']
# Create columns according to number of top venues
columns = ['Neighborhood']
# Repeat the same number of top venues
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# Create a new sorted dataframe
queens_venues_sorted = pd.DataFrame(columns=columns)
# Add neighborhood column back to the dataframe
queens_venues_sorted['Neighborhood'] = queens_grouped['Neighborhood']
# Loop through the number of rows of the dataframe
for ind in np.arange(queens_grouped.shape[0]):
    queens_venues_sorted.iloc[ind, 1:] = return_most_common_venues(queens_grouped.iloc[ind, :], num_top_venues)
# Display the first five rows
queens_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Allerton,Deli / Bodega,Pizza Place,Bakery,Chinese Restaurant,Supermarket
1,Annadale,Pizza Place,American Restaurant,Bar,Restaurant,Bakery
2,Arden Heights,Pizza Place,Deli / Bodega,Rental Car Location,Pharmacy,Coffee Shop
3,Arlington,Bus Stop,Deli / Bodega,American Restaurant,Home Service,Boat or Ferry
4,Arrochar,Deli / Bodega,Italian Restaurant,Bus Stop,Pizza Place,Hotel


# Clustering Neighborhoods

Clustering time! We cluster neighborhoods using k-Means into 3 clusters.

In [21]:
# Initialize the number of clusters
n_clusters = 3
# Create a new dataframe with neighborhood dropped
queens_grouped_clustering = queens_grouped.drop('Neighborhood', 1)
# Run k-Means!
kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(queens_grouped_clustering)
# Check first 5 cluster labels that were generated via kMeans algorithm
kmeans.labels_[:5] 

array([2, 2, 2, 0, 0])

Then we create a new dataframe to include those labels.

In [22]:
# Add clustering labels column at the first column
queens_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)
# Clone queens_df to queens_merged
queens_merged = queens_df
# Merging queens_grouped with queens_df
queens_merged = queens_merged.join(queens_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
# Reset index
queens_merged.reset_index(inplace=True, drop=True)
# Display first five rows
queens_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Queens,Astoria,40.768509,-73.915654,0,Bar,Middle Eastern Restaurant,Hookah Bar,Greek Restaurant,Mediterranean Restaurant
1,Queens,Woodside,40.746349,-73.901842,0,Grocery Store,Thai Restaurant,Bakery,Latin American Restaurant,Deli / Bodega
2,Queens,Jackson Heights,40.751981,-73.882821,0,Latin American Restaurant,Peruvian Restaurant,South American Restaurant,Mobile Phone Shop,Bakery
3,Queens,Elmhurst,40.744049,-73.881656,0,Thai Restaurant,Mexican Restaurant,Chinese Restaurant,Bubble Tea Shop,South American Restaurant
4,Queens,Howard Beach,40.654225,-73.838138,0,Italian Restaurant,Bagel Shop,Pharmacy,Sandwich Place,Fast Food Restaurant


Next we visualize our results in a map!

In [23]:
# Creating the queens map
queens_venues_map = folium.Map(location=[queens_merged["Latitude"][0], queens_merged['Longitude'][0]], zoom_start=10)
# Choose different colors for each cluster to differentiate
x = np.arange(n_clusters)
ys = [i + x + (i*x)**2 for i in range(n_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# Adding markers to each neighborhood
markers_colors = []
for lat, lng, neighborhood, cluster_label in zip(queens_merged['Latitude'], queens_merged['Longitude'], queens_merged['Neighborhood'], queens_merged['Cluster Label']):
    label = folium.Popup(f"{neighborhood} Cluster {cluster_label}", parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster_label-1],
        fill=True,
        fill_color=rainbow[cluster_label-1],
        fill_opacity=0.7).add_to(queens_venues_map)
# Showing the map
queens_venues_map

In case you can't see this map: https://imgur.com/l15bf62

# Analzye Clusters

Let's see if there is any relation in the clusters that the algorithm generated for us.

### Cluster 1

In [24]:
queens_merged.loc[queens_merged['Cluster Label'] == 0, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Astoria,Bar,Middle Eastern Restaurant,Hookah Bar,Greek Restaurant,Mediterranean Restaurant
1,Woodside,Grocery Store,Thai Restaurant,Bakery,Latin American Restaurant,Deli / Bodega
2,Jackson Heights,Latin American Restaurant,Peruvian Restaurant,South American Restaurant,Mobile Phone Shop,Bakery
3,Elmhurst,Thai Restaurant,Mexican Restaurant,Chinese Restaurant,Bubble Tea Shop,South American Restaurant
4,Howard Beach,Italian Restaurant,Bagel Shop,Pharmacy,Sandwich Place,Fast Food Restaurant
9,Flushing,Hotpot Restaurant,Bubble Tea Shop,Chinese Restaurant,Bakery,Korean Restaurant
10,Long Island City,Hotel,Coffee Shop,Pizza Place,Bar,Café
19,South Ozone Park,Park,Deli / Bodega,Bar,Food,Home Service
21,Whitestone,Dance Studio,Deli / Bodega,Bubble Tea Shop,Candy Store,Fish Market
22,Bayside,Bar,Indian Restaurant,American Restaurant,Pizza Place,Sushi Restaurant


### Cluster 2

In [25]:
queens_merged.loc[queens_merged['Cluster Label'] == 1, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
63,Somerville,Park,Women's Store,Event Space,Eye Doctor,Factory


### Cluster 3

In [26]:
queens_merged.loc[queens_merged['Cluster Label'] == 2, queens_merged.columns[[1] + list(range(5, queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Corona,Chinese Restaurant,Supermarket,Mexican Restaurant,Park,Bakery
6,Forest Hills,Gym / Fitness Center,Gym,Yoga Studio,Pizza Place,Convenience Store
7,Kew Gardens,Chinese Restaurant,Indian Restaurant,Donut Shop,Bank,Cosmetics Shop
8,Richmond Hill,Deli / Bodega,Discount Store,Bank,Lounge,Pizza Place
11,Sunnyside,Pizza Place,Chinese Restaurant,South American Restaurant,Bar,Grocery Store
12,East Elmhurst,Donut Shop,Ice Cream Shop,Home Service,Pizza Place,Bakery
13,Maspeth,Diner,Pizza Place,Bank,Grocery Store,Mobile Phone Shop
14,Ridgewood,Pharmacy,Bakery,Greek Restaurant,Grocery Store,Bank
15,Glendale,Deli / Bodega,Food & Drink Shop,Pizza Place,Arts & Crafts Store,Brewery
16,Rego Park,Bakery,Sandwich Place,Grocery Store,Chinese Restaurant,Donut Shop


# Results

Through our analysis we clearly see Cluster 1 and Cluster 3 have high impact of the most common venues in New York thus they have a high chance of getting closed (unless they are already closed) to prevent increases in COVID-19 cases. Though those clusters are considered good zones, they will result in a crowd which helps in spreading COVID-19 virus exponentially thus results an increase in COVID-19 cases as being said.

# Summary

The main goal of this project was to target the popular / most common venues that have most likely high impact on COVID-19 cases thus we first cleaned our dataset to have everything ready for analysis then we managed to target Queens borough due to being afraid to risk Foursquare rate limits. Next we set up Foursquare to help us gather the venues by exploring around the coordiantes of each neighborhood resulting in extracting a total of 10112 venues. After that, we grouped the dataset by the frequency of each venue category so we recognize what the most common venues are. Finally we used k-Means to cluster the zones that have high demand thus we know where exactly COVID-19 are increasing.

# Thanks for reviewing my project!

I hope you found anything useful in this notebook! This notebook was created by [devkarim](https://github.com/devkarim).