# The Best Possible Location to Open a New Restaurant in Asheville, NC

## 1. Introduction

Asheville is a city located in North Carolina, it has a population of approximately 92,000 people according to estimates defined by the US Census Bureau as of July 1, 2018. I choose this city for this project based on information checked on the city council page, the demographics of the city, and the city geographic location.

The city council of Asheville established in January 2016 a 20-year vision plan for the city, for me that is important because it is a sign of commitment towards future investments made in the city. Some of the topics included in the city plan are: being an equitable and diverse community, being a clean and healthy environment, being a connected and engaged community among other topics. However, the one that I find more important regarding business activity, like a potential new restaurant, is the commitment made by the city council to be a financially resilient city.

The demographics of the city defined by estimates of the US Census Bureau described a diverse city. As of July 1, 2018, the demographics include Asian, Black/African American, Native Hawaiian and Other Pacific Islander, White, and Hispanic/Latino. This diversity in races and the relatively small total population number show a potential location for a new restaurant opening. It is worth considering that the total population will increase once the 2020 census concludes.

Finally the geographic location of Asheville makes it a potential location for new restaurants, it has one of the fastest growing regional airports. According to the airport 2019 annual report the growth is representative of the vitality of western North Carolina and the commitment of the region’s travelers to “fly local.” This is another indicator of the potential growth in tourism for the city, and with that, another good reason to open a new restaurant.

This project would be helpful for an entrepreneur or an investor who's willing to open a new restaurant located in Asheville, North Carolina.

# 2. Data Section

The data used to find the best possible location to open a new restaurant in Asheville, North Carolina will be extracted from the City of Asheville GIS web page, it is classified by neighborhood and it has a GeoJSON file to download and get the official coordinates for each neighborhood. My analysis will describe the current venues in Asheville's neighborhoods obtained through the Foursquare API. 

Using Foursquare API I will get information regarding the venues in Asheville's neighborhoods, with this I will recommend the best possible neighborhood to open a new restaurant. My recommendation will be based on the potential customers the neighborhood would have for this new restaurant, the potential competitors, and the diversity of the chosen neighborhood.

The final goal of the project will not only be to establish the best possible neighborhood in Asheville but also to give a recommendation of the type of cuisine for this new restaurant. The recommendation will be based on the most common restaurants found in Asheville's neighborhoods, to establish the most popular and the less popular ones.

# 3. Methodology

## 3.1. Extract the Data

The first step in the methodology is to import the libraires needed for scrapping the data, plotting the maps and applying the machine learning algorithm K-Means to find clusters in the neighborhoods of Asheville.

In [1]:
# Data scrapping and analysis libraries
import numpy as np 
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from pandas.io.json import json_normalize

# Geolocation libraries
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests

# Plots and graphics libraries
import matplotlib.cm as cm
import matplotlib.colors as colors

# ML KMeans library
from sklearn.cluster import KMeans

# Map rendering library
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


After importing all the libraries needed, the next step is to download the GEOJSON file containing all the neighborhood names and the data which includes their coordinates, the data can be obtained in the City of Asheville GIS website (https://data-avl.opendata.arcgis.com/datasets/3450b18c20bf432eb8db7a002e631046_0). The following line transforms the GEOJSON into a JSON file so that it can be handled as a pandas dataframe after.

In [2]:
with open('Asheville_Neighborhoods.geojson') as json_data:
    asheville_data = json.load(json_data)
asheville_data

{'type': 'FeatureCollection',
 'name': '44ba7682-ec6f-4727-9d7f-c2adbb47753d2020413-1-a8xa3w.jvprs',
 'crs': {'type': 'name',
  'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}},
 'features': [{'type': 'Feature',
   'properties': {'edit_date': '2016/04/29 16:05:28+00',
    'objectid': 4059,
    'name': 'Lucerne Park',
    'nbhd_id': 'NBHD6',
    'abbreviation': None,
    'narrative': 'Inactive',
    'edit_by': 'stephanieosbourn',
    'st_area(shape)': 1420141.6178838,
    'st_length(shape)': 0},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[-82.60649570121961, 35.5827537112121],
      [-82.60803489031152, 35.584233234210785],
      [-82.60989432459202, 35.58471416945937],
      [-82.6106388120887, 35.583416992243635],
      [-82.61121261278123, 35.58090942935675],
      [-82.61030571196483, 35.58085299393836],
      [-82.60936888222429, 35.58045189093828],
      [-82.60819024620452, 35.58018902763426],
      [-82.60730220633238, 35.58219352452337],
      [-82.60649570

The data is similar to other GEOJSON files, presented as nested Python dictionaries. To work with the data it has to be transformed into a pandas dataframe. It is worth to mention that the relevant key in this nested dictionary is named *'features'*. With that in mind, it is proper to define a new variable including the data in this key.

In [3]:
neighborhood_data=asheville_data['features']

With the new variable assigned, the information within is listed as follows:

In [4]:
neighborhood_data[0]

{'type': 'Feature',
 'properties': {'edit_date': '2016/04/29 16:05:28+00',
  'objectid': 4059,
  'name': 'Lucerne Park',
  'nbhd_id': 'NBHD6',
  'abbreviation': None,
  'narrative': 'Inactive',
  'edit_by': 'stephanieosbourn',
  'st_area(shape)': 1420141.6178838,
  'st_length(shape)': 0},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-82.60649570121961, 35.5827537112121],
    [-82.60803489031152, 35.584233234210785],
    [-82.60989432459202, 35.58471416945937],
    [-82.6106388120887, 35.583416992243635],
    [-82.61121261278123, 35.58090942935675],
    [-82.61030571196483, 35.58085299393836],
    [-82.60936888222429, 35.58045189093828],
    [-82.60819024620452, 35.58018902763426],
    [-82.60730220633238, 35.58219352452337],
    [-82.60649570121961, 35.5827537112121]]]}}

The data provided by the City of Asheville GIS website contains multiple information including name, nbh_id, narrative, and so on. For this project, in particular, the information used will be the name of the neighborhood and the coordinates of it. The coordinates, however, are presented in a list, since the data obtained is from polygons representing the boundaries of each neighborhood. To transform the nested dictionary into a pandas dataframe let's create a new variable for the neighborhood name and their corresponding coordinates.

In [5]:
col_names=['Neighborhood','1','2']

The new dataframe with the coordinates will be called **neighboorhoods**.

In [6]:
neighborhoods=pd.DataFrame(columns=col_names)

Using a loop we iterate the data in **neighborhoods_data** extracting the name of the neighborhood and the coordinates assigned to it. This information will populate the dataframe created in the cell above called **neighborhoods**.

In [7]:
for data in neighborhood_data:

    neighborhood_name = data['properties']['name']
    temp_df = pd.DataFrame(data['geometry']['coordinates'])
    temp_df = temp_df.T
    temp_df = pd.DataFrame(temp_df.iloc[:,0].tolist(), columns=['1','2'])

    temp_df['Neighborhood'] = neighborhood_name

    neighborhoods=neighborhoods.append(temp_df).reset_index(drop=True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


## 3.2. Scrapping the Data & Ploting the Map of Asheville's Neighborhood

Once the data is extracted in a pandas dataframe called **neighborhoods**, we can continue with the scrapping process.

In [8]:
neighborhoods.head(2)

Unnamed: 0,1,2,Neighborhood
0,-82.606496,35.582754,Lucerne Park
1,-82.608035,35.584233,Lucerne Park


Let's rename the columns for the following procedures.

In [9]:
neighborhoods.rename(columns={'1':'Longitude','2':'Latitude'},inplace=True)

In [10]:
neighborhoods.head(2)

Unnamed: 0,Longitude,Latitude,Neighborhood
0,-82.606496,35.582754,Lucerne Park
1,-82.608035,35.584233,Lucerne Park


Now let's arrange the columns.

In [11]:
a_neigh=neighborhoods[['Neighborhood','Latitude','Longitude']]

In [12]:
a_neigh.head(2)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Lucerne Park,35.582754,-82.606496
1,Lucerne Park,35.584233,-82.608035


As explained above, the data obtained from the City of Asheville GIS contains a list of the coordinates forming the polygon of each neighborhood. To get the approximate coordinates for each neighborhood that will later be used for leveraging the Foursquare API, I will calculate the mean of the coordinates of each neighborhood then using the groupby method, I will create a new dataframe called **neigh_coord** containing each neighborhood with their approximate coordinates.

In [13]:
neigh_coord=a_neigh.groupby(['Neighborhood']).mean().reset_index()

Let's see the final dataframe that will be used to plot a map of Asheville NC, and later used to get the restaurants in each neighborhood using the Foursquare API.

In [14]:
neigh_coord.head(5)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Albemarle Park,35.607864,-82.543094
1,Altamont Apts,35.598241,-82.551363
2,Aston Park Tower,35.589132,-82.559754
3,Ballantree,35.511863,-82.514638
4,Bartlett Arms Apts,35.583293,-82.561437


The new dataframe **neigh_coord** must have 70 rows (excluding the attributes row) representing each neighborhood so that it matches the data from the City of Asheville GIS.

In [15]:
print('The shape of the dataframe is: ',neigh_coord.shape)

The shape of the dataframe is:  (71, 3)


Using **Nominatim** along with **geocoder** we can get the coordinates of Asheville, NC.

In [16]:
address = 'Asheville, NC'

geolocator = Nominatim(user_agent="nc_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Asheville City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Asheville City are 35.6009498, -82.5540161.


Now, with those coordinates saved into variables we can use **folium** to plot a map of Asheville, NC then with the **neigh_coord** dataframe we can also mark each of the 70 neighboorhood found in Asheville.

In [17]:
# Create a map of Asheville using folium
map_asheville = folium.Map(location=[latitude, longitude], zoom_start=11.5)

# add markers to map
for lat, lng, neigh in zip(neigh_coord['Latitude'],neigh_coord['Longitude'],neigh_coord['Neighborhood']):
    label = '{}'.format(neigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_asheville)  
    
map_asheville

## 3.3. Leveraging Foursquare API

To leverage the **neigh_coord** dataframe using Foursquare, first we need to assign new variables that will be used to access the API:

In [18]:
CLIENT_ID = 'XYIVGAKG1FVY53CDESKJ1NTOEFWDKTON5IBWCT0GG2EYBUEK' 
CLIENT_SECRET = 'X1CNQ1CFN3DXBQUSM0W0ZOXEQVKV5NDZDEDBPPYXBTOZMG05' 
VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XYIVGAKG1FVY53CDESKJ1NTOEFWDKTON5IBWCT0GG2EYBUEK
CLIENT_SECRET:X1CNQ1CFN3DXBQUSM0W0ZOXEQVKV5NDZDEDBPPYXBTOZMG05


Then let's define the parameters for the Foursquare queries, setting a limit of 100 venues within a radius of 600 meters (equivalent to 3 blocks).

In [19]:
LIMIT = 100
radius = 600

Now let's define a function to get the nearby venues in each of Asheville's neighborhood.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

To get the venues within each of the 70 neighborhood found in Asheville, we use the function above setting the parameters to be extracted from the **neigh_coord** dataframe.

In [21]:
asheville_venues = getNearbyVenues(names=neigh_coord['Neighborhood'],
                                   latitudes=neigh_coord['Latitude'],
                                   longitudes=neigh_coord['Longitude']
                                  )

Albemarle Park
Altamont Apts
Aston Park Tower
Ballantree
Bartlett Arms Apts
Beaverdam Run
Beverly Hills
Biltmore Park
Birch Forest
Blake Mountain
Brucemont/Louisiana 
Bull Mountain
Burton Street
Charlotte Street
Chestnut Hills
Cimarron
Cloister Condominiums
Crowfields Condominiums
Deaverview
Deerwood
Devonshire 
Downtown
East End/Valley Street
East West Asheville
Enka Village
Erskine-Walton
Falconhurst
Five Points
Gaia Village
Givens Estates
Grace
Grove Park/Sunset
Haw Creek
Heart of Chestnut Hills
Hillcrest
Hills of Beaverdam
Hillside Terrace
Hollybrook
Jackson Park
Kenilworth
Kenilworth Forest
Klondyke
Lake View Park
Lakeshore Heights
Lee Walker Heights
Livingston Heights
Lucerne Park
Malvern Hills
Montford
North Downtown
Norwood Park
Oak Forest
Oakhurst
Oakley
Park Avenue
Parkway Forest
Pebble Creek
Pisgah View 
Racquet Club Village
Redwood Forest
Royal Pines Village
Shiloh
South French Broad
South Oaks Townhomes
South Slope
Spears-Henrietta
The Views of Asheville
View Point
WECAN
W

Let's see the size of the **asheville_venues** dataframe and the top 5 rows:

In [22]:
print(asheville_venues.shape)
asheville_venues.head()

(1231, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albemarle Park,35.607864,-82.543094,Ultimate Ice Cream Charlotte St.,35.606444,-82.546105,Ice Cream Shop
1,Albemarle Park,35.607864,-82.543094,Charlotte Street Computers,35.608181,-82.546275,Electronics Store
2,Albemarle Park,35.607864,-82.543094,Princess Anne Hotel,35.602668,-82.544241,Bed & Breakfast
3,Albemarle Park,35.607864,-82.543094,Taco Temple,35.603995,-82.546819,Mexican Restaurant
4,Albemarle Park,35.607864,-82.543094,Gàn Shān Station,35.604543,-82.546518,Asian Restaurant


The function created to get the venues for each neighborhood outputs many categories aside from restaurants. This project will only focus on restaurants for each neighborhood. Getting the most popular ones and the not so popular will help to make a recommendation on which neighborhood should another restaurant be created and what type of cuisine should it offer. For that let's filter the **asheville_venues** dataframe to get only venues that contain the word *restaurant*, and assign this values to a new dataframe called **df_restaurants**.

In [23]:
df_restaurants= asheville_venues[asheville_venues['Venue Category'].str.lower().str.contains('restaurant')]

In [24]:
df_restaurants.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albemarle Park,2,2,2,2,2,2
Altamont Apts,26,26,26,26,26,26
Aston Park Tower,1,1,1,1,1,1
Bartlett Arms Apts,2,2,2,2,2,2
Birch Forest,10,10,10,10,10,10
Blake Mountain,3,3,3,3,3,3
Brucemont/Louisiana,7,7,7,7,7,7
Bull Mountain,1,1,1,1,1,1
Burton Street,8,8,8,8,8,8
Charlotte Street,2,2,2,2,2,2


We can also find how many unique venues with the word 'restaurant' are on the **df_restaurants** dataframe.

In [25]:
print('There are {} uniques categories.'.format(len(df_restaurants['Venue Category'].unique())))

There are 28 uniques categories.


## 3.4. Analyze Each of Asheville's Neighboord

To analize each neighborhood, I will use the one hot enconding method in order to get restaurants venues in a 0 and 1 matrix as follows:

In [26]:
# one hot encoding
asheville_onehot = pd.get_dummies(df_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
asheville_onehot['Neighborhood'] = df_restaurants['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [asheville_onehot.columns[-1]] + list(asheville_onehot.columns[:-1])
asheville_onehot = asheville_onehot[fixed_columns]

asheville_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Korean Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Tex-Mex Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
3,Albemarle Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,Albemarle Park,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12,Altamont Apts,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
15,Altamont Apts,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
17,Altamont Apts,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Exploring the new dataframe we can see that there is a column named *Restaurant* that does not provide information for this project since it does not specify the type of restaurant, so i will drop it out of the dataframe.

In [27]:
asheville_onehot.shape

(237, 29)

In [28]:
asheville_onehot.drop(['Restaurant'],axis=1,inplace=True)
asheville_onehot.shape

(237, 28)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each restaurant's category as follows:

In [29]:
asheville_grouped = asheville_onehot.groupby('Neighborhood').mean().reset_index()
asheville_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Korean Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,New American Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Tex-Mex Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Albemarle Park,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altamont Apts,0.115385,0.0,0.038462,0.038462,0.0,0.0,0.0,0.038462,0.076923,0.0,0.038462,0.115385,0.076923,0.038462,0.0,0.076923,0.038462,0.0,0.0,0.038462,0.038462,0.038462,0.038462,0.0,0.0,0.153846,0.0
2,Aston Park Tower,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
3,Bartlett Arms Apts,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Birch Forest,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0
5,Blake Mountain,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Brucemont/Louisiana,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.428571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bull Mountain,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Burton Street,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Charlotte Street,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We can now print the top 3 most common restaurant types in each of Asheville neighborhoods.

In [30]:
num_top_restaurants = 3

for hood in asheville_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = asheville_grouped[asheville_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_restaurants))
    print('\n')

----Albemarle Park----
                 venue  freq
0   Mexican Restaurant   0.5
1     Asian Restaurant   0.5
2  American Restaurant   0.0


----Altamont Apts----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.15
1            American Restaurant  0.12
2              Indian Restaurant  0.12


----Aston Park Tower----
                      venue  freq
0          Sushi Restaurant   1.0
1       American Restaurant   0.0
2  Mediterranean Restaurant   0.0


----Bartlett Arms Apts----
                        venue  freq
0  Modern European Restaurant   0.5
1         American Restaurant   0.0
2    Mediterranean Restaurant   0.0


----Birch Forest----
                  venue  freq
0    Mexican Restaurant   0.3
1  Fast Food Restaurant   0.2
2   American Restaurant   0.1


----Blake Mountain----
                  venue  freq
0  Fast Food Restaurant  0.67
1    Chinese Restaurant  0.33
2   American Restaurant  0.00


----Brucemont/Louisiana ----
                     venue

With the information above we can create a pandas dataframe, first creating a function to sort the restaraunts types in descending order.

In [31]:
def return_most_common_venues(row, num_top_restaurants):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 3 restaurant types for each neighborhood.

In [32]:
num_top_venues= 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Restaurant'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = asheville_grouped['Neighborhood']

for ind in np.arange(asheville_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(asheville_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(5)

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
0,Albemarle Park,Asian Restaurant,Mexican Restaurant,Vietnamese Restaurant
1,Altamont Apts,Vegetarian / Vegan Restaurant,American Restaurant,Indian Restaurant
2,Aston Park Tower,Sushi Restaurant,Vietnamese Restaurant,Italian Restaurant
3,Bartlett Arms Apts,Modern European Restaurant,Vietnamese Restaurant,Italian Restaurant
4,Birch Forest,Mexican Restaurant,Fast Food Restaurant,American Restaurant


## 3.5. Final Dataframe

The following dataframe will summarize the procedures made above, in order to know which neighboorhood in Asheville, NC is the one with more restaurants, and also what type of cuisines they offer. 

First, let's create a new dataframe called **df_1** containing the total number of restaurants returned in the query made in Foursquare. Then, let's rename the columns so that we don't encounter any problem merging this new **df_1** with the **neighborhoods_venues_sorted** dataframe.

In [33]:
df_1=df_restaurants[['Neighborhood','Venue']].groupby('Neighborhood').count().reset_index()
df_1.rename(columns={'Neighborhood':'nbname','Venue':'Number of Restaurants'},inplace=True)

Let's check the head of this new **df_1**:

In [34]:
df_1.head(5)

Unnamed: 0,nbname,Number of Restaurants
0,Albemarle Park,2
1,Altamont Apts,26
2,Aston Park Tower,1
3,Bartlett Arms Apts,2
4,Birch Forest,10


Now, let's merge both dataframes **df_1** and **neighborhood_venues_sorted** in a final dataframe called **df_result**:

In [35]:
df_result=pd.concat([df_1,neighborhoods_venues_sorted],axis=1)

Let's check the head of **df_result** to see if neighboorhood's names match between dataframes:

In [36]:
df_result.head(5)

Unnamed: 0,nbname,Number of Restaurants,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
0,Albemarle Park,2,Albemarle Park,Asian Restaurant,Mexican Restaurant,Vietnamese Restaurant
1,Altamont Apts,26,Altamont Apts,Vegetarian / Vegan Restaurant,American Restaurant,Indian Restaurant
2,Aston Park Tower,1,Aston Park Tower,Sushi Restaurant,Vietnamese Restaurant,Italian Restaurant
3,Bartlett Arms Apts,2,Bartlett Arms Apts,Modern European Restaurant,Vietnamese Restaurant,Italian Restaurant
4,Birch Forest,10,Birch Forest,Mexican Restaurant,Fast Food Restaurant,American Restaurant


Now a final scrapping:

In [37]:
df_result.drop(columns=['nbname'],axis=1,inplace=True)
df_result.set_index(['Neighborhood'],inplace=True)

And here we have the top 5 neighboorhoods measured by number of restaurants in them and the most common types of cuisines for each one:

In [38]:
df_result.sort_values(by=['Number of Restaurants'],ascending=False,inplace=True)
df_result.head(5)

Unnamed: 0_level_0,Number of Restaurants,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Altamont Apts,26,Vegetarian / Vegan Restaurant,American Restaurant,Indian Restaurant
Downtown,18,American Restaurant,Mexican Restaurant,Indian Restaurant
South Slope,12,American Restaurant,Mexican Restaurant,Seafood Restaurant
East End/Valley Street,12,American Restaurant,Mexican Restaurant,Greek Restaurant
Birch Forest,10,Mexican Restaurant,Fast Food Restaurant,American Restaurant


## 3.6. Cluster Neighborhoods using KMeans

To finalize the project, let's cluster the neighborhoods found in Asheville using the KMeans algorithm so that we can identify the potential customers preferences in cuisines types found in each cluster. For that let's assign *K=5* as follows: 

In [39]:
# set number of clusters
kclusters = 5

asheville_grouped_clustering = asheville_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(asheville_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 1, 2, 1, 0, 1, 0, 1, 3], dtype=int32)

Now we can create a new dataframe called **asheville_merged** including the clusters as well as the top 3 most common restaurant's cuisine types for each neighborhood.

In [40]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

asheville_merged = neigh_coord

# merge asheville_grouped with asheville_data to add latitude/longitude for each neighborhood
asheville_merged = asheville_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

asheville_merged.head(5) 

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
0,Albemarle Park,35.607864,-82.543094,3.0,Asian Restaurant,Mexican Restaurant,Vietnamese Restaurant
1,Altamont Apts,35.598241,-82.551363,1.0,Vegetarian / Vegan Restaurant,American Restaurant,Indian Restaurant
2,Aston Park Tower,35.589132,-82.559754,1.0,Sushi Restaurant,Vietnamese Restaurant,Italian Restaurant
3,Ballantree,35.511863,-82.514638,,,,
4,Bartlett Arms Apts,35.583293,-82.561437,2.0,Modern European Restaurant,Vietnamese Restaurant,Italian Restaurant


The above dataframe shows NaN entry for the Ballantree neighborhood, this is because this neighborhood is mostly residential and there are no restaurants found based on the parameters established in the Foursquare query made previously. I will drop the NaN values as they do not provide relevant data for this particular project. Let's confirm the shape of **asheville_merged**:

In [41]:
asheville_merged.shape

(71, 7)

Now let's drop the NaN values and confirm the new shape:

In [42]:
cluster_df=asheville_merged.dropna()

In [43]:
cluster_df.shape

(49, 7)

Now let's visualize the clusters:

In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cluster_df['Latitude'], cluster_df['Longitude'], cluster_df['Neighborhood'], cluster_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 3.6.1. Examine Clusters

#### 3.6.1.1. Cluster 1

In [45]:
clus_1=cluster_df.loc[cluster_df['Cluster Labels'] == 0]
clus_1

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
9,Blake Mountain,35.497273,-82.52786,0.0,Fast Food Restaurant,Chinese Restaurant,Vietnamese Restaurant
11,Bull Mountain,35.592481,-82.496277,0.0,Fast Food Restaurant,Vietnamese Restaurant,Italian Restaurant
41,Klondyke,35.611654,-82.572316,0.0,Fast Food Restaurant,Vietnamese Restaurant,Italian Restaurant
63,South Oaks Townhomes,35.508555,-82.531077,0.0,Fast Food Restaurant,American Restaurant,Italian Restaurant


In [46]:
print('The cluster shape is: ',clus_1.shape)

The cluster shape is:  (4, 7)


#### 3.6.1.2. Cluster 2

In [47]:
clus_2=cluster_df.loc[cluster_df['Cluster Labels'] == 1]
clus_2

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
1,Altamont Apts,35.598241,-82.551363,1.0,Vegetarian / Vegan Restaurant,American Restaurant,Indian Restaurant
2,Aston Park Tower,35.589132,-82.559754,1.0,Sushi Restaurant,Vietnamese Restaurant,Italian Restaurant
8,Birch Forest,35.502952,-82.527538,1.0,Mexican Restaurant,Fast Food Restaurant,American Restaurant
10,Brucemont/Louisiana,35.5819,-82.589464,1.0,Fast Food Restaurant,American Restaurant,New American Restaurant
12,Burton Street,35.584289,-82.585827,1.0,Fast Food Restaurant,Mexican Restaurant,Chinese Restaurant
14,Chestnut Hills,35.606203,-82.55142,1.0,Fast Food Restaurant,Asian Restaurant,Mexican Restaurant
17,Crowfields Condominiums,35.504588,-82.529527,1.0,Fast Food Restaurant,Mexican Restaurant,Asian Restaurant
19,Deerwood,35.516957,-82.525068,1.0,American Restaurant,Fast Food Restaurant,Mexican Restaurant
20,Devonshire,35.518469,-82.5254,1.0,American Restaurant,Fast Food Restaurant,Mexican Restaurant
21,Downtown,35.593439,-82.551936,1.0,American Restaurant,Mexican Restaurant,Indian Restaurant


In [48]:
print('The cluster shape is: ',clus_2.shape)

The cluster shape is:  (36, 7)


#### 3.6.1.3. Cluster 3

In [49]:
clus_3=cluster_df.loc[cluster_df['Cluster Labels'] == 2]
clus_3

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
4,Bartlett Arms Apts,35.583293,-82.561437,2.0,Modern European Restaurant,Vietnamese Restaurant,Italian Restaurant
25,Erskine-Walton,35.577399,-82.562636,2.0,Modern European Restaurant,Vietnamese Restaurant,Italian Restaurant
62,South French Broad,35.584998,-82.559766,2.0,Modern European Restaurant,Vietnamese Restaurant,Italian Restaurant


In [50]:
print('The cluster shape is: ',clus_3.shape)

The cluster shape is:  (3, 7)


#### 3.6.1.4. Cluster 4

In [51]:
clus_4=cluster_df.loc[cluster_df['Cluster Labels'] == 3]
clus_4

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
0,Albemarle Park,35.607864,-82.543094,3.0,Asian Restaurant,Mexican Restaurant,Vietnamese Restaurant
13,Charlotte Street,35.603519,-82.544806,3.0,Asian Restaurant,Mexican Restaurant,Vietnamese Restaurant
47,Malvern Hills,35.569925,-82.612753,3.0,Mexican Restaurant,Vietnamese Restaurant,Italian Restaurant
55,Parkway Forest,35.581453,-82.4811,3.0,Mexican Restaurant,Vietnamese Restaurant,Italian Restaurant
60,Royal Pines Village,35.469727,-82.516636,3.0,Chinese Restaurant,Mexican Restaurant,Vietnamese Restaurant


In [52]:
print('The cluster shape is: ',clus_4.shape)

The cluster shape is:  (5, 7)


#### 3.6.1.5. Cluster 5

In [53]:
clus_5=cluster_df.loc[cluster_df['Cluster Labels'] == 4]
clus_5

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant
51,Oak Forest,35.494335,-82.527777,4.0,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant


In [54]:
print('The cluster shape is: ',clus_5.shape)

The cluster shape is:  (1, 7)


# 4. Results & Discussion

Based on the table **df_result** we can say that the neighborhoods with more restaurants in Asheville are Altamond Apt with 26 restaurants, Downtown with 18 restaurants, South Slope with 12 restaurants, East End / Valley Street with 12 restaurants and Birch Forest with 10 restaurants. We can also say that the most common type of cuisine in these neighborhoods are American cuisine and Mexican cuisine.

Is worth to mention that the most common restaurant found in the neighborhood Altamont Apts, which is the neighborhood with the most restaurants, is Vegetarian / Vegan, and this type of cuisine is not found in any other neighborhood in the top 5 list.
Clustering the neighborhoods using Kmeans I found that the cluster number 2 is the one with all the top 5 neighborhoods. This cluster includes 35 neighborhoods sharing preferences according to the algorithm.

Among the types of cuisines, we found that American cuisine is one of the most popular ones. Clusters 1, 3, 4 and 5 include less than 6 neighborhood each one. For cluster number 1 the most popular cuisine is Fast Food, for cluster number 3 the most popular cuisine is Modern European Restaurant, for cluster number 4 the most popular cuisine is Mexican and Asian and for cluster number 5 the most popular cuisine is Vienatnemese.

# 5. Conclussion

The purpose of this project was to identify all the restaurants found in all the official neighborhoods registered in Asheville, NC in order to aid possible investors or entrepreneurs looking to establish a new restaurant in this city. By calculating the number of restaurants already established and the most common types of cuisines found in them, we can say that an adequate neighborhood to set a new restaurant would be Altamont Apts, Downtown or South Slope.

We can also say that the opening of a Vegetarian / Vegan restaurant in Downtown or South Slope would be worth to consider based on the fact that it is the most common type of cuisine in Altamont Apts, which is the neighborhood with more number of restaurants in Asheville, but it is not commonly found in other neighborhoods. Also, we can say that a safe bet for an investor is to open an American type of cuisine.

Clustering on these neighborhoods was then performed in order to segment potential customers and their preferences in cuisines types. In cluster number 2, which includes all of the top 5 neighborhoods, are other potential neighborhood locations to open a new restaurant, the cluster includes a total of 35 neighborhoods excluding the top 5. American cuisine is among the most common types.

The final location and type of cuisine will depend on the investor/entrepreneur preference, taking into consideration other possible factors including the ease of obtaining operating permits, real estate availability, prices, social and economic dynamics of every neighborhood among others.