<h1 align='center'> Finding the Best Neighborhood to Open a Restaurant in NYC</h1>

# 1. Introduction

New York City, often called 'The City' or simply 'New York', is the most populous city in the United States. New York City has been described as the cultural, financial, and media capital of the world, significantly influencing commerce, entertainment, research, technology, education, politics, tourism, art, fashion, and sports. With an estimated 2019 population of 8,336,817 distributed over about 302.6 square mile, New York is also the most densely populated major city in the United States.

With such a densely populated city, it is found that New York is filled with restaurants in almost all neighborhoods. In such a city, it is often difficult for someone to find the best place to open their own restaurant.

In this project, we will be leveraging the data provided to us by the vast internet, to help us find the best neighborhood for a restaurant-owner to open their restaurant, in order to get guaranteed business.

# 2. Data

## 2.1 Source

In order to carry forward this project we will need to gather the required data from the internet. The larger the dataset, the better our model will perform in finding the best neighborhoods in New York.

One of the main datasets we will be using is the neighborhood dataset for New York City. We will use <a href='https://cocl.us/new_york_dataset'>this</a> NYC dataset to get all the data we need for NYC Neighborhoods and Boroughs.

We will also be leveraging the <a href='https://developer.foursquare.com/'>Foursquare API</a> in this project, in order to get data related to all the restaurants(venues) in a neighborhood.

## 2.2 Acquisition

**Lets fetch and load the NYC dataset with all the Neighborhoods and Boroughs in NYC**

In [None]:
import json # importing a library required for handling .json files
import pandas as pd # importing a library useful for handling data with DataFrames

In [None]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset # Fetching the data from a remote cloud server
with open('newyork_data.json') as json_data: # Opening the .json file and loading the data into a variable
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features'] # Since all the data we need is strored in the 'features' key, we will have to extract that from the raw json

column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] # Defining the columns of the DataFrame
neighborhoods = pd.DataFrame(columns=column_names) # Creating a DataFrame for storing the NYC data

for data in neighborhoods_data: # Looping through the data points in out dataset and storing them in a pandas DataFrame for easier analysis and usage 
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)


print('New York City Neighborhood Data Loaded with {} boroughs and {} neighborhoods!'.format(len(neighborhoods['Borough'].unique()),neighborhoods.shape[0]))

**Now that we have the NYC dataset loaded, we need to use the Foursquare API to fetch all the venues in NYC neighborhoods. Here we'll be creating a function that returns the closest 100 venues given a neighborhood's coordinates and this function will be called form wherever we need it.**

In [None]:
import requests

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
# Creating a function for retrieving the top 100 venues in every neighborhood, which can be used when required
LIMIT = 100
radius = 500
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat,
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

##  2.3 Cleaning

**First, the NYC dataset that we are using in this project is extremely accurate, clean and structured, and hence it doesn't require any cleaning (like removing missing values, etc). The dataset can be used as is due to its high quality**

**The Foursquare API is an API that has been created to ensure that the user is able to retrieve clean, accurate and a large variety of data for their own use. Hence, the Foursquare data does not require any cleaning.**

# 3. Methodology & Code

# 4. Results

# 5. Discussion

# 6. Conclusion