<h1 align=center><font size = 5>Coursera Applied Data Science Capstone Project: Battle of the Neighborhoods</font></h1>

## Introduction

*Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.*

**Currently, the COVID-19 virus has spread across the globe, resulting in the latest pandemic. Since March 2020, New York City has seen rapid increase in confirmed cases, leading to a "shelter in place" order issued by the state officials. This means that healthy or sick, all New Yorkers must stay home unless they are essential workers or need urgent health care. All non-essential businesses that are normally open to the public must remain closed. Which neighborhoods are more affected by the latest orders? It would be helpful to understand which businesses are closed / remain open, in order to better redirect resources to support each of the neighborhoods.**

## Dataset

*Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.*

**In this exercise, Foursquare API data of New York City will be used. The explore function will be used to understand the venue types and counts in each neighborhood. Then based on the venue categories, we can assess the imapct of a "shelter in place" order on each neighborhood, mainly whether there are essential businesses (pharmacies and grocery stores, for example) in the area.**

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Solving environment: | 

We then need to access to a dataset that contains New York City boroughs and the neighborhoods that exist in each of the boroughs, including their latitude and longtitude coordinates in order to segment them.

In [None]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

In [None]:
# load the data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
# preview data set
newyork_data

In [None]:
# define new variable for new data
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

The next task is to transform this data of nested Python dictionaries into a pandas dataframe.

In [None]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

# make sure data is loaded as intended
neighborhoods

In [None]:
# loop through data and fill dataframe one row at a time
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
# preview the dataframe
neighborhoods.head()

In [None]:
# confirm the counts of boroughs and neightborhoods
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

Use geopy library to get the latitude and longitude values of New York City.

In [None]:
# In order to define an instance of the geocoder, we need to define a user_agent.
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))