# Capstone Project - Battle of the Neighborhoods 
## IBM Data Science Professional Certificate


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
* [Recommendation](#recommendation)


## Introduction: Business Problem <a name="introduction"></a>

The goal of this project is to find the optimal location to start a Fitness-Restaurant, named Fit and Food, in New York City. This place will combine a gym with a restaurant where tasty, healthy food is the staple. It is meant to be a place for people to go to eat after working out, grabbing a quick healthy lunch as well as a place to order take-away. 

The optimal location will be determined by looking at the neighborhoods of NYC in a number of ways. It is important that the neighborhood is not crowded with restaurants. On top of that it is important that the neighborhood in which to start Fit and Food has a health-centered attitude. This is because health-minded individuals are more likely to make use of Fit and Food which will result in more revenue over time.

## Data <a name="data"></a>

The data that is used for this project consistst of two parts. The first part is the NYC neighborhood data as obtained from course data by IBM.[1] In the case that the IBM Course link is not up anymore similar data can be found publicly available on the web.[2] The second part of the data comes from the FourSquare API, this data is freely available (with restrictions) for signed up accounts.[3] FourSquare's location data contains the location of most venues as well as their respective user ratings. Foursquare is the main provider of location data, for example Google Maps.

[1]: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
[2]: https://data.cityofnewyork.us/City-Government/Neighborhood-Tabulation-Areas-NTA-/cpf4-rkhq
[3]: https://foursquare.com/

### Imports of libraries
In the section below all the necessary libraries are imported.

In [1]:
import numpy as np 
import pandas as pd
import json 

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

# Library to handle requests
import requests

# Tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import sklearn packages
from sklearn.cluster import KMeans

# Map rendering library
import folium 

# Package used to obtain working directory paths
import os

# Save dataframes for later use
import pickle



print('Libraries have been imported.')

Libraries have been imported.


### NYC Neighborhood data

As mentioned before the NYC neighborhood's data is obtained from a link provided in the IBM Data Science course. 

In [2]:
url_nyc_json = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json'
data_nyc_json = requests.get(url_nyc_json).json()

print('Data downloaded!')

Data downloaded!


From the json file obtained at above a dataframe will be constructed in order to do the analysis later on. For each neighborhood the corresponding borough, latitude and longitude are filled.

In [3]:
data_nyc_json 
neighborhood_data = data_nyc_json['features']
neighborhood_data[0]

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhood_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [4]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use Geocoder to obtain the latitude and longitude for New York City.

The latitude and longitude for NYC are needed to construct a map of all the neighborhoods in New York CIty.

In [5]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_nyc = location.latitude
longitude_nyc = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude_nyc, longitude_nyc))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create the base map for NYC  
On the base map the neighborhoods will be shown. These neighborhoods will later be combined with the data obtained from FoursQuare.

In [6]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude_nyc, longitude_nyc], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Foursquare location Data 
Below the data from FourSquare is obtained. At first the credentials have to defined in order to do the API requests. Afterwards  a for loop will be created to obtain the Fitness and Restaurant Data for each neighborhood.

#### Define FourSquare credentials

In [None]:
CLIENT_ID = 'XS2RROXYD33QMK3R4XNSAGOKQ4HH4YJJEF3U3RASV20P3H4Z' # your Foursquare ID
CLIENT_SECRET = 'NLBBOB2XY3PKKOEBJALNWINDLW4MXMEL4GJZNWCMD4XBQWTW' # your Foursquare Secret
VERSION = '20201019' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

### Get requests for all Neighborhoods of NYC
In this section a get-request will be done for all gyms/fitness centers and restaurants in NYC for each neighborhood by using the FourSquare API. All neighborhoods are obtained from the NYC dataset. In order to mitigate possible performance issues only 100 results are requested per neighborhood. 

In order to not get all possible venues as a results from the GET-request we specify a category ID for Restaurants as well as Gyms. Both are obtained from the documentation of the FourSquare API.

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


In [8]:
#Range for forloop, needed to loop over all indices 
Range = neighborhoods.index

#Foursquare category id's
categoryId_restaurant = "4d4b7105d754a06374d81259"
categoryId_gym = "4bf58dd8d48988d175941735"

#Initialize response dfs
restaurants_df = pd.DataFrame()
gyms_df = pd.DataFrame()



In [9]:
# function that extracts the category of the venue from the FourSquare data
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

A function is defined in order to retrieve data for all neighborhoods as this is done for both gyms as well as restaurants separately. The reason for this is that there are likely less gyms in an area so if the query was combined we'd get significantly lower amount of gyms respective to restaurants.

Each get request is done for a specific neighborhood and category id. The results are then appended to the response dataframe.

In [10]:
def get_venues(category_ids,Range,response_df,CLIENT_ID,VERSION,neighborhoods):
    LIMIT = 100
    radius = 1000
    neighborhood_latitude = neighborhoods["Latitude"]
    neighborhood_longitude = neighborhoods["Longitude"]
    neighborhood_name = neighborhoods["Neighborhood"]
    for i in Range:
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            neighborhood_latitude[i], 
            neighborhood_longitude[i], 
            radius, 
            LIMIT,
            category_ids
            )
        
        results = requests.get(url).json()
        
        #Catch key error
        try:
            venues = results['response']['groups'][0]['items']
            nearby_venues = json_normalize(venues) # flatten JSON

            # filter columns
            filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']


            nearby_venues =nearby_venues.loc[:, filtered_columns]

            # filter the category for each row
            nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

            # clean columns
            nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.
                                     columns]
            nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
            nearby_venues["neighborhood"] = neighborhood_name[i]
            response_df = response_df.append(nearby_venues,ignore_index = True)
            #response_df["neighborhood"] = neighborhood_name
            
        except KeyError:       
            response_df = response_df.append(nearby_venues,ignore_index = True)
            #response_df["neighborhood"] = neighborhood_name
        
    return response_df
    

In [11]:
# Obtain the path to the directory in which data is saved
cwd = os.path.dirname(os.getcwd())
data_path = cwd + "\src\data\datasets"

In [12]:
# Dont query FoursQuare in the case data already exsists.
# If the data already exists load it, otherwise run get_venues function defined above.
restaurants = {}
gyms = {}
dataLoaded = False
try:
    with open(data_path +'/restaurants.pkl', 'rb') as f:
        restaurants_df = pickle.load(f)
    with open(data_path +'/gyms.pkl', 'rb') as f:
        gyms_df = pickle.load(f)
    print('Data loaded.')
    dataLoaded = True
except:
    pass

# When no data exists run function that gets data from FourSquare
if not dataLoaded:
    restaurants_df = get_venues(categoryId_restaurant,Range,restaurants_df,CLIENT_ID,VERSION,neighborhoods)
    gyms_df = get_venues(categoryId_gym,Range,gyms_df,CLIENT_ID,VERSION,neighborhoods)
    
    # Let's persists this in local file system
    with open(data_path +'/restaurants.pkl', 'wb') as f:
        pickle.dump(restaurants_df, f)
    with open(data_path +'/gyms.pkl', 'wb') as f:
        pickle.dump(gyms_df, f)


Data loaded.


Inspect the results as obtained from the FourSquare API.

In [13]:
gyms_df.head()

Unnamed: 0,name,categories,lat,lng,neighborhood
0,Boxingology LLC,Boxing Gym,40.898556,-73.838979,Wakefield
1,Mount St Michael's Academy (track),Track,40.898688,-73.840709,Wakefield
0,My Gym,Gym / Fitness Center,40.872681,-73.8294,Co-op City
1,24 Hour Fitness,Gym / Fitness Center,40.867818,-73.824984,Co-op City
2,bally's total fitness bartow ave,Gym / Fitness Center,40.86729,-73.832603,Co-op City


In [14]:
restaurants_df.head()

Unnamed: 0,name,categories,lat,lng,neighborhood
0,Ripe Kitchen & Bar,Caribbean Restaurant,40.898152,-73.838875,Wakefield
1,Jackie's West Indian Bakery,Caribbean Restaurant,40.889283,-73.84331,Wakefield
2,Ali's Roti Shop,Caribbean Restaurant,40.894036,-73.856935,Wakefield
3,Jimbo's,Burger Joint,40.89174,-73.858226,Wakefield
4,Dunkin',Donut Shop,40.890459,-73.849089,Wakefield


### Data manipulations

Several more manipulations to the data will be done before it can be analysed. Namely adding the general restaurant tag to the restaurants dataframe as well as the gym tag to the gyms dataframe. After this the data will be combined with the neighborhood data.

In [15]:
# Restaurant tag
restaurants_df["category"] = "Restaurant"
# Gym tag
gyms_df["category"] = "Gym"

In [16]:
#Add gym and restaurants together
df_overall = restaurants_df
df_overall = df_overall.append(gyms_df)
df_overall

Unnamed: 0,name,categories,lat,lng,neighborhood,category
0,Ripe Kitchen & Bar,Caribbean Restaurant,40.898152,-73.838875,Wakefield,Restaurant
1,Jackie's West Indian Bakery,Caribbean Restaurant,40.889283,-73.843310,Wakefield,Restaurant
2,Ali's Roti Shop,Caribbean Restaurant,40.894036,-73.856935,Wakefield,Restaurant
3,Jimbo's,Burger Joint,40.891740,-73.858226,Wakefield,Restaurant
4,Dunkin',Donut Shop,40.890459,-73.849089,Wakefield,Restaurant
...,...,...,...,...,...,...
31,Brooklyn Boulders Queensbridge,Climbing Gym,40.752649,-73.940010,Queensbridge,Gym
32,The Cliffs at Long Island City,Climbing Gym,40.748627,-73.948892,Queensbridge,Gym
0,Full Focus,Gym / Fitness Center,40.613092,-74.087140,Fox Hills,Gym
1,Zumba,Gym,40.624850,-74.081725,Fox Hills,Gym


#### One hot encoding
The one hot encoding technique is used to turn categorical values into a numeric representation.
Initially this is done for the neighborhoods and category.

In [17]:
#Get dummies for category
dummies_category = pd.get_dummies(df_overall["category"])
dummies_neighborhoods = pd.get_dummies(df_overall["neighborhood"])
df_dummy_neighborhood = pd.concat([df_overall,dummies_neighborhoods],axis = 1)
df_dummy_all = pd.concat([df_dummy_neighborhood,dummies_category],axis = 1)
neighborhoods_venues_dummied = df_dummy_all.copy()

## Methodology <a name="methodology"></a>

As a quick reminder: The data we have at this moment is: NYC neighborhood data, gyms in an area and the restaurants in an area.

The first part of the analysis will consists of an exploratory data analysis. Here the amount of restaurants and gyms in a neighborhoods will be evaluated. It is expected that insights will arise from this data about general neighborhoods that are crowded with restaurants/gyms or not. 

From the information that is obtained in the first part of the analysis additional features will be constructed. An example of this could be the ratio of Restaurants to gyms in a region or total gyms in a neighborhood. Another feature that could be used is the amount of fast food restaurants in a neighborhood which could be an indication of a non-healthy environment.

## Analysis <a name="analysis"></a>

The analysis will be done using clustering of neighborhoods. What we want to achieve with this clustering is defining neighborhoods dat are both healthy as well as not being to crowded with restaurants or gyms. In order to achieve this clustering the features that will be selected or created will be health and restaurant/fitness area-density related.

### Feature selection

### Group by neighborhood
For each neighborhood the following features are calculated:
* **Ratio of gyms to restaurants:** High ratio indicates a healthy region
* **Total amount of gyms:** A lot of gyms also mean healthier regions however we should strive for a healthy region with the lowest amount of gyms
* **Amount of restaurants:** We want to minimize the amount of restaurants


In [18]:
gym_mask = (neighborhoods_venues_dummied["category"] == "Gym")
restaurant_mask = (neighborhoods_venues_dummied["category"] == "Restaurant")
df = neighborhoods_venues_dummied
neighborhood_gym_count = df[df["category"]== "Gym" ].groupby('neighborhood').count()
neighborhood_restaurant_count = df[df["category"]== "Restaurant" ].groupby('neighborhood').count()

# Gym to restaurant ratio
neighborhood_gr_ratio = neighborhood_gym_count["category"]/neighborhood_restaurant_count['category']
gr_ratio = neighborhood_gr_ratio.reset_index()

# Total amount of gyms per Neighborhood
neighborhood_gym_count = neighborhood_gym_count["category"]
gym_count = neighborhood_gym_count.reset_index()

# Total amount of restaurants per Neighborhood
neighborhood_restaurant_count = neighborhood_restaurant_count['category'].reset_index()
restaurant_count = neighborhood_restaurant_count.reset_index(drop= True)


In [19]:
# We reset the indices to make them contious
neighborhoods_venues_dummied.reset_index(drop = True, inplace = True)

In [20]:
# Set proper datatypes and column names for merging

# Gym Restaurant Ratio
gr_ratio['neighborhood'] = gr_ratio['neighborhood'].astype("string")
gr_ratio.columns = ['neighborhood', 'gr_ratio']

# Gym count
gym_count['neighborhood'] = gym_count['neighborhood'].astype("string")
gym_count.columns = ['neighborhood', 'gym_count']

# Restaurant Count
restaurant_count['neighborhood'] = restaurant_count['neighborhood'].astype("string")
restaurant_count.columns = ['neighborhood', 'restaurant_count']

# Overall DF
neighborhoods_venues_dummied['neighborhood'] = neighborhoods_venues_dummied['neighborhood'].astype("string")



In [21]:
#Join values to the original dataframe

#make sure you start with empty dataframe
neighborhoods_venues = pd.DataFrame()
neighborhoods_features = pd.DataFrame()

#Join neighborhoods, gr ratio, gym count and restaurant count respectively
neighborhoods_venues = pd.merge(neighborhoods_venues_dummied,gr_ratio, on = "neighborhood", how = "left")
neighborhoods_venues = pd.merge(neighborhoods_venues,gym_count, on = "neighborhood", how = "left")
neighborhoods_venues = pd.merge(neighborhoods_venues,restaurant_count, on = "neighborhood", how = "left")

# Join total restaurants, gyms and ratios for each neighborhood without dummying all neighborhoods
neighborhoods_features = pd.merge(gr_ratio,gym_count, on = "neighborhood", how = "left")
neighborhoods_features = pd.merge(neighborhoods_features,restaurant_count, on = "neighborhood", how = "left")

#Drop missing values
neighborhoods_features.dropna(inplace = True)

In [22]:
neighborhoods_features.tail()

Unnamed: 0,neighborhood,gr_ratio,gym_count,restaurant_count
297,Woodhaven,0.05,3.0,60.0
298,Woodlawn,0.069767,3.0,43.0
299,Woodrow,0.3,3.0,10.0
300,Woodside,0.1,10.0,100.0
301,Yorkville,1.0,100.0,100.0


### Clustering

For the clustering the K-Means algorithm will be used. This is done because it is possible to suply the number of clusters based on k. The goal of this clustering is to divide the neighborhoods in "Healthy" and "Un-healthy". Because of this the number of clusters is chosen to be 2.

In [23]:
# Number of clusters corresponding to the amount of neighborhoods
k_clusters = 2

#Drop remaining categorical values from the dataframe
#neighborhoods_grouped_clustering = df_dummy_all.drop(["category","categories","name","neighborhood"], axis = 1)
neighborhoods_grouped_clustering = neighborhoods_features.drop(["neighborhood",'restaurant_count'], axis = 1)

kmeans = KMeans(n_clusters = k_clusters).fit(neighborhoods_grouped_clustering)

# Check labels for the first 10 values
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1])

In [25]:
# add clustering labels
#neighborhoods_features = neighborhoods_features.drop(['Cluster Labels'], axis = 1)
neighborhoods_features.insert(0, 'Cluster Labels', kmeans.labels_)


In [26]:
# Join neighborhoods_features and neighborhoods to get latitude and longitude of each neighborhood.
neighborhoods_clustered = pd.merge(neighborhoods_features , neighborhoods , left_on = "neighborhood", right_on = "Neighborhood" ,how = "left")

## Results and Discussion <a name="results"></a>

The results as obtained from the clustering will be shown in a map of New York City. On this map the different neighborhoods will be shown with their corresponding cluster label in the form of a color. From these neighborhoods a neighborhood is selected in the conclusions as a possible neighborhood in which to start Fit and Food.

### Create the map for NYC with clustered neighborhoods

In [27]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude_nyc, longitude_nyc], zoom_start=10)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lng, cluster, borough, neighborhood in zip(neighborhoods_clustered['Latitude'], neighborhoods_clustered['Longitude'], neighborhoods_clustered['Cluster Labels'], neighborhoods_clustered['Borough'] , neighborhoods_clustered['neighborhood']):
    label = '{}, {}'.format(neighborhood,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

The map above shows all the neighborhoods of NYC with their corresponding cluster. Here the purple color corresponds with the "Healthy" neighborhoods and the red color with the "Less-Healthy" neighborhoods. As a reminder, healthier neighborhoods contain more gyms and/or have a higher Gym to Restaurant ratio. In order to increase the accuracy of this clustering several recommendations are given in the recommendation section.

From the results shown below we can see that indeed the neighborhoods with high G/R ratios are clustered together. Another reason why a high G/R ratio is beneficial for Fit and Food is the fact that there are less Restaurants which could make up for the potential surplus of Gyms.

In [28]:
#Show a part of the results
neighborhoods_clustered.tail(25)

Unnamed: 0,Cluster Labels,neighborhood,gr_ratio,gym_count,restaurant_count,Borough,Neighborhood,Latitude,Longitude
269,0,University Heights,0.033333,2.0,60.0,Bronx,University Heights,40.855727,-73.910416
270,1,Upper East Side,1.0,100.0,100.0,Manhattan,Upper East Side,40.775639,-73.960508
271,1,Upper West Side,0.76,76.0,100.0,Manhattan,Upper West Side,40.787658,-73.977059
272,0,Utopia,0.102041,5.0,49.0,Queens,Utopia,40.7335,-73.796717
273,0,Van Nest,0.046154,6.0,130.0,Bronx,Van Nest,40.843608,-73.866299
274,0,Vinegar Hill,0.893617,42.0,47.0,Brooklyn,Vinegar Hill,40.703321,-73.981116
275,0,Wakefield,0.043478,2.0,46.0,Bronx,Wakefield,40.894705,-73.847201
276,0,Washington Heights,0.16,16.0,100.0,Manhattan,Washington Heights,40.851903,-73.9369
277,0,Weeksville,0.127907,11.0,86.0,Brooklyn,Weeksville,40.67504,-73.930531
278,0,West Brighton,0.08,4.0,50.0,Staten Island,West Brighton,40.631879,-74.107182


## Conclusion <a name="conclusion"></a>

When looking at the results as obtained form the clustering it becomes evident that there are regions in NYC that are more likely to be healthy. However just picking one of the purple neighborhoods to start Fit and Food would be foolish.The reason for this is that rent prices will vary greatly among the different neighborhoods. For example the rent prices in Manhatten will be a lot higher than in the Bronx. Because of this Murray Hill,Queens and Chelsea, Staten Island are picked as the neighborhoods where Fit and Food is most likely to succeed. However please note that a multitude of other factors come into play in finalizing the possible location. Thoughts on how to improve this analysis and thus mitigate the aforementioned problems are shown in the recommendation section

## Recommendation <a name="recommendation"></a>

In order to improve the analysis done in this project several recommendations can be made. When comparing the amount of gyms in an area one could look at the population density in that neighborhood. This could be a possible explanation for a decreased number of gyms. Another improvement that could be made is , when possible to obtain, using public health data to cluster the neighborhoods. The reason for this is that a healthier neighborhood will more likely make use of Fit and Food. 

Would it be the case that health data is unavailable another way of determining the health of a neighborhood would be to obtain  the percentage of fastfood restaurants in a neighborhood compared to the total number of restaurants. 

Another way of improving the model is by adding the cost of rent as well as the population density of each neighborhood. Population density can be used to normalize the amount of gyms or restaurants in a neighborhood.

Furthermore during the development of the Fit and Food concept and franchise it is important to continiously re-evaluate the assessments done in this analysis based on changing circumstances or business findings.