# Introduction

New York City's demographics show that it is a large and ethnically diverse metropolis. It is the largest city in the United States with a long history of international immigration. New York City was home to nearly 8.5 million people in 2018, accounting for over 40% of the population of New York State and a slightly lower percentage of the New York metropolitan area, home to approximately 23.6 million. Over the last decade the city has been growing faster than the region. The New York region continues to be by far the leading metropolitan gateway for legal immigrants admitted into the United States.

This final project explores the best locations for Nepalese restaurants throughout the city of New York. As New York is the most diverse city in the world (800 languages are spoken in New York), it has a long tradition of different ethnical restaurants. Now when the idea of a healthy lifestyle conquered the minds of people all over the country, Nepalsese restaurants became extremely popular, as they offer a healthy alternative to regular American eating habits. That's why Potentially the owner of the new East European restaurant can have great success and consistent profit. However, as with any business, opening a new restaurant requires serious considerations and is more complicated than it seems from the first glance. In particular, the location of the restaurant is one of the most important factors that will affect whether it will have success or a failure. So our project will attempt to answer the questions “Where should the investor open a Nepalese Restaurant?” and “Where should I go If I want great Nepalese food?”

## Data

In order to answer the above questions, data on New York City neighborhoods, boroughs to include boundaries, latitude, longitude, restaurants, and restaurant ratings and tips are required.

New York City data containing the neighborhoods and boroughs, latitudes, and longitudes will be obtained from the data source: https://cocl.us/new_york_dataset

All data related to locations and quality of Italian restaurants will be obtained via the FourSquare API utilized via the Request library in Python.

## Methodology

• Data will be collected from https://cocl.us/new_york_dataset and cleaned and processed into a dataframe.

• FourSquare be used to locate all venues and then filtered by Nepalese restaurants. Ratings, tips, and likes by users will be counted and added to the dataframe.

• Data will be sorted based on rankings.

• Finally, the data be will be visually assessed using graphing from Python libraries.

## Problem Statement

1. What is / are the best location(s) for Nepalese cuisine in New York City?
2. In what Neighborhood and/or borough should the investor open a Nepalese restaurant to have the best chance of being successful?
3. Where would I go in New York City to have the best Nepalese food?


## Before we get the data and start exploring it, let's import all required libraries .

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests # library to handle requests

#from bs4 import BeautifulSoup
import os

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

import seaborn as sns

print('Libraries imported.')  

## My variables

In [None]:
# Define Foursquare Credentials and Version

CLIENT_ID = 'MXF2MFT3425WEAWBBMYZ4UK2XOZMCXKGIAYDSFAIEJXWB234' # your Foursquare ID
CLIENT_SECRET = 'OTHEKBG2CRTMGUP55D4TSFWPEXPWLBY30CZQE2YOAESDALKK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

## Now let's define the functions we are going to use later in this project

In [None]:
def geo_location(address):
    # get geo location of address
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude,longitude


def get_venues(lat,lng):
    #set variables
    radius=400
    LIMIT=100
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    # get all the data
    results = requests.get(url).json()
    venue_data=results['response']['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df


def get_venue_details(venue_id):
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
    # get all the data
    results = requests.get(url).json()
    print(results)
    venue_data=results['response']['venue']
    venue_details=[]
    try:
        venue_id=venue_data['id']
        venue_name=venue_data['name']
        venue_likes=venue_data['likes']['count']
        venue_rating=venue_data['rating']
        venue_tips=venue_data['tips']['count']
        venue_details.append([venue_id,venue_name,venue_likes,venue_rating,venue_tips])
    except KeyError:
        pass
    column_names=['ID','Name','Likes','Rating','Tips']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df


def get_new_york_data():
    url='https://cocl.us/new_york_dataset'
    resp=requests.get(url).json()
    # all data is present in features label
    features=resp['features']
    # define the dataframe columns
    column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
    # instantiate the dataframe
    new_york_data = pd.DataFrame(columns=column_names)
    for data in features:
        borough = data['properties']['borough'] 
        neighborhood_name = data['properties']['name']
        neighborhood_latlon = data['geometry']['coordinates']
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
        new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    return new_york_data

## Now let's use the above codes to get our initial NYC data

In [None]:
ny_data = get_new_york_data()
ny_data.head()

In [None]:
ny_data.shape

So there are total of 306 different Neighborhoods in New York.  

## Initial Data Analysis

Now let's analyze our initial data. 

In [None]:
clr = "green"
ny_data.groupby('Borough')['Neighborhood'].count().plot.bar(figsize=(10,5), color=clr)
plt.title('Neighborhoods per Borough: NYC', fontsize = 20)
plt.xlabel('Borough', fontsize = 15)
plt.ylabel('No. Neighborhoods',fontsize = 15)
plt.xticks(rotation = 'horizontal')
plt.legend()
plt.show()

## Let's analyze further and see how many Nepalese restaurants there are in each neighborhood and borough. After it, graph the results.

In [None]:
# queens has most neighborhoods
# prepare neighborhood list that contains nepalese restaurants
column_names=['Borough', 'Neighborhood', 'ID','Name']
nepalese_rest_ny=pd.DataFrame(columns=column_names)
count=1
for row in ny_data.values.tolist():
    Borough, Neighborhood, Latitude, Longitude=row
    venues = get_venues(Latitude,Longitude)
    nepalese_restaurants=venues[venues['Category']=='Nepalese Restaurant']   
    print('(',count,'/',len(ny_data),')','Nepalese Restaurants in '+Neighborhood+', '+Borough+':'+str(len(nepalese_restaurants)))
    print(row)
    for resturant_detail in nepalese_restaurants.values.tolist():
        id, name , category=resturant_detail
        nepalese_rest_ny = nepalese_rest_ny.append({'Borough': Borough,
                                                'Neighborhood': Neighborhood, 
                                                'ID': id,
                                                'Name' : name
                                               }, ignore_index=True)
    count+=1

In [None]:
#nepalese_rest_ny.to_csv('nepalese_rest_ny_tocsv1.csv') # Save the information so far to a .csv file due to limited calls on FourSquare
nepalese_rest_ny.to_csv('nepalese_rest_ny_tocsv1.csv') 

In [None]:
nepalese_ny = pd.read_csv('nepalese_rest_ny_tocsv1.csv')
nepalese_rest_ny.tail()

In [None]:
nepalese_rest_ny.shape

We got 75 Nepalese Restaurants across the New York City. 

As we continue our analysis, we see below that although Manhattan had the least number of neighborhoods, it does have the highest number of Nepalese restaurants. Additionally, we see how many restaurants the top 6 neighborhoods have. The neighborhood of Murray Hill has the highest number of Nepalese restaurants in all of NYC and is actually located in the boroughs of Manhattan and Queens.

In [None]:
nepalese_rest_ny.groupby('Borough')['ID'].count().plot.bar(figsize=(10,5), color=clr)
plt.title('Nepalese Restaurants per Borough: NYC', fontsize = 20)
plt.xlabel('Borough', fontsize = 15)
plt.ylabel('No. of Nepalese Restaurants', fontsize=15)
plt.xticks(rotation = 'horizontal')
plt.legend()
plt.show()

In [None]:
NOofNeigh = 6 # top number for graphing all the same past 6
nepalese_rest_ny.groupby('Neighborhood')['ID'].count().nlargest(NOofNeigh).plot.bar(figsize=(10,5), color=clr)
plt.title('Nepalese Restaurants per Neighborhood: NYC', fontsize = 20)
plt.xlabel('Neighborhood', fontsize = 15)
plt.ylabel('No. of Nepalese Restaurants', fontsize=15)
plt.xticks(rotation = 'horizontal')
plt.legend()
plt.show()

In [None]:
nepalese_rest_ny[nepalese_rest_ny['Neighborhood']=='Flatiron']

So the Murray Hill in Manhattan has the highest number of Nepalese Restaurants with a total count of 5.

Now we will get the ranking of each restaurant for further analysis.

In [None]:
column_names=['Borough', 'Neighborhood', 'ID','Name','Likes','Rating','Tips']
nepalese_rest_stats_ny=pd.DataFrame(columns=column_names)
count=1
for row in nepalese_rest_ny.values.tolist():
    Borough,Neighborhood,ID,Name=row
    try:
        venue_details=get_venue_details(ID)
        print(venue_details)
        id,name,likes,rating,tips=venue_details.values.tolist()[0]
    except IndexError:
        print('No data available for id=',ID)
        # we will assign 0 value for these resturants as they may have been 
        #recently opened or details does not exist in FourSquare Database
        id,name,likes,rating,tips=[0]*5
    print('(',count,'/',len(nepalese_rest_ny),')','processed')
    nepalese_rest_stats_ny = nepalese_rest_stats_ny.append({'Borough': Borough,
                                                'Neighborhood': Neighborhood, 
                                                'ID': id,
                                                'Name' : name,
                                                'Likes' : likes,
                                                'Rating' : rating,
                                                'Tips' : tips
                                               }, ignore_index=True)
    count+=1
nepalese_rest_stats_ny.tail()

In [None]:
nepalese_rest_stats_ny.to_csv('nepalese_rest_stats_ny_csv.csv') # As I move through this project I continue to save data to a .csv file 

In [None]:
nepalese_rest_stats_ny.shape

We got statistics for all 77 Nepalese Restaurants in New York city. 

Let's check what values we have in our DataFrame

In [None]:
nepalese_rest_stats_ny.info()

We see that the values like Likes, Tips are string values. We would need to convert them into float for further analysis

In [None]:
nepalese_rest_stats_ny['Likes'] = nepalese_rest_stats_ny['Likes'].astype('float64')
nepalese_rest_stats_ny['Tips'] = nepalese_rest_stats_ny['Tips'].astype('float64')
nepalese_rest_stats_ny.info()

## Now when the data types look correct, let's continue our analysis.

In [None]:
nepalese_rest_stats_ny.describe()

In [None]:
# Resturant with maximum Likes
nepalese_rest_stats_ny.iloc[nepalese_rest_stats_ny['Likes'].idxmax()]

In [None]:
# Resturant with maximum Ratings
nepalese_rest_stats_ny.iloc[nepalese_rest_stats_ny['Rating'].idxmax()]

In [None]:
# Resturant with maximum Tips
nepalese_rest_stats_ny.iloc[nepalese_rest_stats_ny['Tips'].idxmax()]

## Now let's identify and visualize neighborhood with the maximum average rating of restaurants

In [None]:
ny_neighborhood_stats=nepalese_rest_stats_ny.groupby('Neighborhood',as_index=False).mean()[['Neighborhood','Rating']]
ny_neighborhood_stats.columns=['Neighborhood','Average Rating']
ny_neighborhood_stats.sort_values(['Average Rating'],ascending=False).head(10)

Above are the top neighborhoods with the highest average ratings of Nepalese restaurants.

In [None]:
ny_borough_stats=nepalese_rest_stats_ny.groupby('Borough',as_index=False).mean()[['Borough','Rating']]
ny_borough_stats.columns=['Borough','Average Rating']
ny_borough_stats.sort_values(['Average Rating'],ascending=False).head()

Similarly these are the average ratings of Nepalese Restaurants for each Borough

## Let's visualize the results

In [None]:
plt.figure(figsize=(9,5), dpi = 100)
plt.title('Average rating of Nepalese Restaurants for each Borough')
plt.xlabel('Borough', fontsize = 15)
plt.ylabel('Average Rating', fontsize=15)
nepalese_rest_stats_ny.groupby('Borough').mean()['Rating'].plot(kind='bar', color=clr)
plt.legend()
plt.show()

We will consider all the neighborhoods with average rating greater or equal 8.0 to visualize on a map

In [None]:
ny_neighborhood_stats=ny_neighborhood_stats[ny_neighborhood_stats['Average Rating']>=8.0]
ny_neighborhood_stats

We will join this Dataset to original New York data to get longitude and latitude

In [None]:
ny_neighborhood_stats=pd.merge(ny_neighborhood_stats,ny_data, on='Neighborhood')
ny_neighborhood_stats=ny_neighborhood_stats[['Borough','Neighborhood','Latitude','Longitude','Average Rating']]
ny_neighborhood_stats

Now we will show this data on a map

In [None]:
# create map and display it
ny_map = folium.Map(location=geo_location('New York'), zoom_start=12)
# instantiate a feature group for the ratings in the dataframe
rating = folium.map.FeatureGroup()

# loop through the ratings and add each to the neighborhood feature group
for lat, lng, in ny_neighborhood_stats[['Latitude','Longitude']].values:
    rating.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=10, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

Let's add a new field to dataframe for labeling purpose

In [None]:
ny_neighborhood_stats['Label']=ny_neighborhood_stats['Neighborhood']+', '+ny_neighborhood_stats['Borough']+'('+ny_neighborhood_stats['Average Rating'].map(str)+')'
# add pop-up text to each marker on the map
for lat, lng, label in ny_neighborhood_stats[['Latitude','Longitude','Label']].values:
    folium.Marker([lat, lng], popup=label).add_to(ny_map)        
# add ratings to map
ny_map.add_child(rating)

## Results / Conclusion 

Manhattan and Brooklyn have the best rated Nepalese restaurants on average. Staten Island and The Bronx have the least amount of Nepalese restaurants per borough. However, of note, Murray Hill in Manhattan has the highest number of Nepalese Restaurants in all of NY. Despite Manhattan having the least number of neighborhoods in all five boroughs, it has the most number of Nepalese restaurants. Based on the above information, I would state that Manhattan and Brooklyn are the best locations for Nepalese cuisine in NYC. To have the best shot of success, I would open a Nepalese restaurant in Brooklyn. Brooklyn has multiple neighborhoods with average ratings exceeding 8.0 on a scale of 1.0 to 10.0 and has less amount of Nepalese restaurants than Manhattan, making competition easier. Also we should keep in mind, that real estate prices in Brooklyn are much cheaper than in Manhattan. Finally, I would go to OOTOYA in Manhattan for the best Nepalese food based on 1213 likes. As a final note, all of the above analysis is depended on the accuracy of Four Square data. A more comprehensive analysis and future work would need to incorporate data from other external databases.