# Memphis restaurant hunt. 


## Introduction
### 1.1 Background
Memphis is a city on the Mississippi River in southwest Tennessee, famous for the influential strains of blues, soul and rock 'n' roll that originated there. Elvis Presley, B.B. King and Johnny Cash recorded albums at the legendary Sun Studio, and Presley’s Graceland mansion is a popular attraction. The city itself has a population of 646,889. Memphis played a prominent role in the American civil rights movement and was the site of Martin Luther King Jr.'s 1968 assassination. Memphis is a regional center for commerce, education, media, art, and entertainment. There are many restaurants in Memphis, each belonging to different categories like Chinese, Italian , French etc.  As part of this project, we will list and visualize all major restaurants of Memphis.

### 1.2 Business Problem
How can entrepreneurs better understand city demographics and restaurant landscape, before opening a new restaurant?
To solve this business problem, we are going to cluster Memphis neighborhoods in order to recommend where entrepreneurs can open new venues. We will recommend arrears and highlight facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores.
To explore and target recommended locations we will access data through FourSquare API interface and arrange them as a DataFrame for visualization.
Questions that can be asked using the above mentioned datasets
What area has the highest number of restaurants in Memphis? 
Which areas have fewer restaurants? 
Which areas have higher populations? 
What is the best location in Memphis for Italian restaurant?

## Data sources 
To explore and target recommended locations in Memphis restaurants we will access data through FourSquare API interface and arrange them as a DataFrame for visualization.\
For this project we need the following data:\
Data source: Fousquare API : https://developer.foursquare.com/ \
Description: Memphis restaurants data that contains list Locality, restaurant name, rating along with their latitude and longitude.\
Data source: https://www.zip-codes.com/city/tn-memphis.asp#demographics \
Description: From this source I will receive information about demographics for nearby restaurants in each locality.

In [14]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

#### Download, scrape, convert into Dataframe and clean the data
In the scraped table not all data will be used in the future work.\
I had to drop columns with irelevant information like: Country, Area code and Type.\
As scraped information was converted in Data Frame as objects we need to delete "," from the column Populations and convert objects in Integers, so we can make calculations later.

In [2]:
List_url = "https://www.zip-codes.com/city/tn-memphis.asp#demographics"
source = requests.get(List_url).text

soup = BeautifulSoup(source, 'html.parser')
memphis_data = pd.DataFrame(columns=["Zipcode", "Type", "Country", "Population", "Area Code"])
table  = soup.find('table', { "class" : "statTable"})
#print(table)
for row in table.find_all("tr"):
    col = row.find_all("td")
    #print(col)
    Zipcode =col[0].text.strip("ZIP Code")
    Type =col[1].text
    Country =col[2].text
    Population =col[3].text
    AreaCode =col[4].text
    
    
    memphis_data = memphis_data.append({"Zipcode":Zipcode, "Type":Type, "Country":Country, "Population":Population, "Area Code":AreaCode}, ignore_index=True)
memphis_data=memphis_data[memphis_data['Type']!='Type']
M_data = memphis_data.drop(columns=['Type', 'Country', 'Area Code'])
M_data = M_data.replace(',', '', regex=True)
M_data = M_data.astype(int)
M_data = M_data[M_data['Population'] !=0]

M_data.head()

Unnamed: 0,Zipcode,Population
4,38103,12180
5,38104,23409
6,38105,6184
7,38106,27222
8,38107,17698


### Now I will get the latitude and the longitude coordinates of each neighborhoods and combine different neighborhoods that exist in one postal code area.

In [3]:
ZIP = pd.read_csv('https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&timezone=Europe/Helsinki&lang=en&use_labels_for_header=true&csv_separator=%3B', sep=';')
ZIP.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,48834,Fenwick,MI,43.141649,-85.04948,-5,1,"43.141649,-85.04948"
1,55304,Andover,MN,45.254715,-93.28652,-6,1,"45.254715,-93.28652"
2,55422,Minneapolis,MN,45.014764,-93.33965,-6,1,"45.014764,-93.33965"
3,29079,Lydia,SC,34.296064,-80.11319,-5,1,"34.296064,-80.11319"
4,29390,Duncan,SC,34.888237,-81.96902,-5,1,"34.888237,-81.96902"


### The dataframe consist of columns: ZipCode, City, State, geopoint, Timezone, Daylight savings time flag.
### For the mean of this assighnment I will drop columns with the information I do not need: City, State, geopoint, Timezone, Daylight savings time flag.
### And I will merge to geo data with ZIP Codes.

In [4]:
ZIP1 = ZIP.drop(columns=['geopoint', 'Timezone', 'Daylight savings time flag', 'State', 'City'])
ZIP1.rename(columns={'Zip':'Zipcode'},inplace=True)
geo_merged = pd.merge(ZIP1, M_data, on='Zipcode')
geo_data=geo_merged[['Zipcode','Population','Latitude','Longitude']]
geo_data.head()


Unnamed: 0,Zipcode,Population,Latitude,Longitude
0,38116,40404,35.03319,-90.01128
1,38111,41742,35.10935,-89.94363
2,38141,22462,35.016803,-89.84701
3,38126,7334,35.126469,-90.04359
4,38119,22330,35.082936,-89.84892


I choose only boroughs that contain the word Toronto.



## Problem 3
### Explore and cluster the neighborhoods in Toronto.


In [11]:
import numpy as np
import os
from sklearn.cluster import KMeans
!pip install folium
import folium 
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors


  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


In [12]:
CLIENT_ID = 'G1IRT315WKKXPWVBY50H0KTZ0IVLULFI1HGRKRK4OXN55CWB' # your Foursquare ID
CLIENT_SECRET = 'I3QMFSSLQF5OGVHSVASJ5VXMDIK1IPVETVIMBNUZMR0RJVOT' # your Foursquare Secret
VERSION = '20180604'

### Now I will create a list and dataframe with the venues in the area with raius 500.

In [76]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    search_query = 'Italian Restaurant'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            search_query,
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [77]:
Memphis_venues = getNearbyVenues(names=geo_data['Zipcode'],
                                   latitudes=geo_data['Latitude'],
                                   longitudes=geo_data['Longitude']
                                  )


38116
38111
38141
38126
38119
38117
38134
38127
38115
38122
38128
38107
38152
38109
38118
38104
38139
38135
38103
38125
38138
38105
38120
38114
38133
38132
38108
38112
38106


In [78]:
Memphis_venues['Venue Category'].unique()

array(['Italian Restaurant'], dtype=object)

In [79]:
Italian_Restaurant = Memphis_venues[Memphis_venues['Venue Category']=='Italian Restaurant']
print(Italian_Restaurant)

   Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
0         38117              35.112929               -89.90389   
1         38117              35.112929               -89.90389   
2         38104              35.133825               -90.00463   
3         38103              35.146131               -90.05340   

                     Venue  Venue Latitude  Venue Longitude  \
0               Luchessi's       35.111544       -89.902169   
1           Carmela's Cafe       35.114898       -89.899666   
2  Tamboli’s Pasta & Pizza       35.137827       -90.002600   
3          Capriccio Grill       35.142578       -90.052283   

       Venue Category  
0  Italian Restaurant  
1  Italian Restaurant  
2  Italian Restaurant  
3  Italian Restaurant  


### Now I can count them.

In [26]:
Memphis_venues.groupby('Zipcode').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
38103,40,40,40,40,40,40
38104,30,30,30,30,30,30
38105,2,2,2,2,2,2
38106,2,2,2,2,2,2
38108,2,2,2,2,2,2
38111,14,14,14,14,14,14
38112,24,24,24,24,24,24
38114,11,11,11,11,11,11
38115,4,4,4,4,4,4
38116,2,2,2,2,2,2


### and now I group them and identify the most common venue

In [19]:
Memphis_onehot = pd.get_dummies(Memphis_venues[['Venue Category']], prefix="", prefix_sep="")

Memphis_onehot.insert(loc=0, column='Zipcode', value=Memphis_venues['Zipcode'] )
Memphis_grouped = Memphis_onehot.groupby('Zipcode').mean().reset_index()
Memphis_grouped.head()

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zipcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Zipcode'] = Memphis_grouped['Zipcode']

for ind in np.arange(Memphis_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Memphis_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,38103,Hotel,Harbor / Marina,Salon / Barbershop,Bar,Park,Burger Joint,Grocery Store,Pub,Pizza Place,Nightclub
1,38104,Mobile Phone Shop,Coffee Shop,Wings Joint,Ice Cream Shop,Sandwich Place,Pizza Place,Pharmacy,Middle Eastern Restaurant,Lounge,Japanese Restaurant
2,38105,Gym / Fitness Center,BBQ Joint,Wings Joint,Food,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store
3,38106,Convenience Store,Park,Home Service,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Disc Golf
4,38108,Light Rail Station,Wings Joint,Food,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Disc Golf


### Now I will disstribute venues in different Clusters for Neighborhoods

In [20]:
# set number of clusters
kclusters = 5

Memphis_grouped_clustering = Memphis_grouped.drop('Zipcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Memphis_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 4, 1, 0, 0, 0, 4, 2], dtype=int32)

### I will add clusters in dataframe

In [21]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Memphis_merged = geo_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Memphis_merged = Memphis_merged.join(neighborhoods_venues_sorted.set_index('Zipcode'), on='Zipcode')

Memphis_merged.head()

Unnamed: 0,Zipcode,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,38116,40404,35.03319,-90.01128,2.0,Bus Stop,Wings Joint,Construction & Landscaping,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Disc Golf
1,38111,41742,35.10935,-89.94363,0.0,Pharmacy,Fried Chicken Joint,Ethiopian Restaurant,Video Store,Coffee Shop,Grocery Store,Asian Restaurant,Burger Joint,Gas Station,Seafood Restaurant
2,38141,22462,35.016803,-89.84701,,,,,,,,,,,
3,38126,7334,35.126469,-90.04359,0.0,Southern / Soul Food Restaurant,Gas Station,Dance Studio,Buffet,Music Venue,Fast Food Restaurant,Convenience Store,Gastropub,Gift Shop,Cosmetics Shop
4,38119,22330,35.082936,-89.84892,0.0,Home Service,Photography Studio,Shop & Service,Mobile Phone Shop,Fast Food Restaurant,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega


#### and sort them

In [22]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,38103,Hotel,Harbor / Marina,Salon / Barbershop,Bar,Park,Burger Joint,Grocery Store,Pub,Pizza Place,Nightclub
1,0,38104,Mobile Phone Shop,Coffee Shop,Wings Joint,Ice Cream Shop,Sandwich Place,Pizza Place,Pharmacy,Middle Eastern Restaurant,Lounge,Japanese Restaurant
2,0,38105,Gym / Fitness Center,BBQ Joint,Wings Joint,Food,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store
3,4,38106,Convenience Store,Park,Home Service,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Disc Golf
4,1,38108,Light Rail Station,Wings Joint,Food,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Disc Golf


In [23]:
address = 'Memphis, TN'

geolocator = Nominatim(user_agent="Mem_finder")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Memphis downtown are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Memphis downtown are 35.1490215, -90.0516285.


### I will put all the venuse on the map with different colors.

In [29]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Memphis_merged['Latitude'], Memphis_merged['Longitude'], Memphis_merged['Zipcode'], Memphis_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float