# Memphis restaurant hunt. 


## Introduction
### 1.1 Background
Memphis is a city on the Mississippi River in southwest Tennessee, famous for the influential strains of blues, soul and rock 'n' roll that originated there. Elvis Presley, B.B. King and Johnny Cash recorded albums at the legendary Sun Studio, and Presley’s Graceland mansion is a popular attraction. The city itself has a population of 646,889. Memphis played a prominent role in the American civil rights movement and was the site of Martin Luther King Jr.'s 1968 assassination. Memphis is a regional center for commerce, education, media, art, and entertainment. There are many restaurants in Memphis, each belonging to different categories like Chinese, Italian , French etc.  As part of this project, we will list and visualize all major restaurants of Memphis.

### 1.2 Business Problem
How can entrepreneurs better understand city demographics and competative landscape, before opening a new restaurant?
To solve this business problem, we are going to cluster Memphis neighborhoods in order to recommend where entrepreneurs can open new venues. We will recommend arrears and highlight neighborhoods restaurants.
To explore and target recommended locations we will access data through FourSquare API interface and arrange them as a DataFrame for visualization.
Questions that can be asked using the above mentioned datasets
What area has the highest number of restaurants in Memphis? 
Which areas have fewer restaurants? 
Which areas have higher populations? 
What is the best location in Memphis for American restaurant?

## Data sources 
To explore and target recommended locations in Memphis restaurants we will access data through FourSquare API interface and arrange them as a DataFrame for visualization.\
For this project we need the following data:\
Data source: Fousquare API : https://developer.foursquare.com/ \
Description: Memphis restaurants data that contains list Locality, restaurant name, rating along with their latitude and longitude.\
Data source: https://www.zip-codes.com/city/tn-memphis.asp#demographics \
Description: From this source I will receive information about demographics for nearby restaurants in each locality.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

#### Download, scrape, convert into Dataframe and clean the data
In the scraped table not all data will be used in the future work.\
I had to drop columns with irelevant information like: Country, Area code and Type.\
As scraped information was converted in Data Frame as objects we need to delete "," from the column Populations and convert objects in Integers, so we can make calculations later.

In [2]:
List_url = "https://www.zip-codes.com/city/tn-memphis.asp#demographics"
source = requests.get(List_url).text

soup = BeautifulSoup(source, 'html.parser')
memphis_data = pd.DataFrame(columns=["Zipcode", "Type", "Country", "Population", "Area Code"])
table  = soup.find('table', { "class" : "statTable"})
#print(table)
for row in table.find_all("tr"):
    col = row.find_all("td")
    #print(col)
    Zipcode =col[0].text.strip("ZIP Code")
    Type =col[1].text
    Country =col[2].text
    Population =col[3].text
    AreaCode =col[4].text
    
    
    memphis_data = memphis_data.append({"Zipcode":Zipcode, "Type":Type, "Country":Country, "Population":Population, "Area Code":AreaCode}, ignore_index=True)
memphis_data=memphis_data[memphis_data['Type']!='Type']
M_data = memphis_data.drop(columns=['Type', 'Country', 'Area Code'])
M_data = M_data.replace(',', '', regex=True)
M_data = M_data.astype(int)
M_data = M_data[M_data['Population'] !=0]

M_data.head()

Unnamed: 0,Zipcode,Population
4,38103,12180
5,38104,23409
6,38105,6184
7,38106,27222
8,38107,17698


In [5]:
List_url1 = "http://zipatlas.com/us/tn/memphis/zip-code-comparison/median-household-income.htm"
source = requests.get(List_url1).text

soup = BeautifulSoup(source, 'html.parser')
memphis_income = pd.DataFrame(columns=["Zip Code", "Location", "City", "Population", "AvgIncome", "NR"])
table  = soup.find('table')
#print(table)
for row in table.find_all("tr"):
    col = row.find_all("td")
    #print(col)
    Zipcode =col[0].text.strip("ZIP Code")
    Location =col[1].text
    City =col[2].text
    Population =col[3].text
    AvgIncome =col[4].text
    NR =col[5].text

memphis_income.head()

IndexError: list index out of range

### Now I will get the latitude and the longitude coordinates of each neighborhoods and combine different neighborhoods that exist in one postal code area.

In [4]:
ZIP = pd.read_csv('https://sites.google.com/site/breathe42/zip_to_lat_lon_North%20America.csv', sep=',')
ZIP.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,country code,postal code,place,state,statecode,province_or_county,province_or_countycode,community,communitycode,latitude,longitude,accuracy,Country,Continent
0,BM,DV 01,Devonshire,Devonshire Parish,1,,,,,32.3028,-64.7558,,Bermuda,North America
1,BM,DV 02,Devonshire,Devonshire Parish,1,,,,,32.3028,-64.7558,,Bermuda,North America
2,BM,DV 03,Devonshire,Devonshire Parish,1,,,,,32.3028,-64.7558,,Bermuda,North America
3,BM,DV 04,Devonshire,Devonshire Parish,1,,,,,32.3028,-64.7558,,Bermuda,North America
4,BM,DV 05,Devonshire,Devonshire Parish,1,,,,,32.3028,-64.7558,,Bermuda,North America


In [None]:
https://github.com/Artemkuchaev/Coursera_Capstone/blob/main/uszips.csv  
    4bf58dd8d48988d110941735 

### For the mean of this assighnment I will drop columns with the information I do not need: City, State, geopoint, Timezone, Daylight savings time flag.
### And I will merge to geo data with ZIP Codes.

In [5]:
ZIP1 = ZIP.drop(columns=['country code', 'statecode', 'province_or_county', 'province_or_countycode', 'community', 'Country', 'Continent', 'accuracy'])
ZIP1.rename(columns={'postal code':'Zipcode', 'latitude':'Latitude', 'longitude':'Longitude'},inplace=True)
geo_merged = pd.merge(ZIP1, M_data, on='Zipcode')
geo_data=geo_merged[['Zipcode','Population','Latitude','Longitude']]
geo_data.head()


Unnamed: 0,Zipcode,Population,Latitude,Longitude
0,38103,12180,35.144,-90.048
1,38104,23409,35.1334,-90.0046
2,38105,6184,35.1497,-90.033
3,38106,27222,35.1021,-90.033
4,38107,17698,35.1831,-90.0201


I choose only boroughs that contain the word Toronto.



## Problem 3
### Explore and cluster the neighborhoods in Toronto.


In [6]:
import numpy as np
import os
from sklearn.cluster import KMeans
!pip install folium
import folium 
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors


  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.1 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [7]:
CLIENT_ID = 'G1IRT315WKKXPWVBY50H0KTZ0IVLULFI1HGRKRK4OXN55CWB' # your Foursquare ID
CLIENT_SECRET = 'I3QMFSSLQF5OGVHSVASJ5VXMDIK1IPVETVIMBNUZMR0RJVOT' # your Foursquare Secret
VERSION = '20180604'

### Now I will create a list and dataframe with the only food venues in the Memphis.

In [8]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    categoryId="4d4b7105d754a06374d81259"
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            categoryId,
            lat, 
            lng, 
            radius, 
            LIMIT)

        
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zipcode', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
Memphis_venues = getNearbyVenues(names=geo_data['Zipcode'],
                                   latitudes=geo_data['Latitude'],
                                   longitudes=geo_data['Longitude']
                                  )


38103
38104
38105
38106
38107
38108
38109
38111
38112
38114
38115
38116
38117
38118
38119
38120
38122
38125
38126
38127
38128
38132
38133
38134
38135
38138
38139
38141
38152


### We can check what kind of venue categories are in the dataframe.

In [10]:
Memphis_venues['Venue Category'].unique()

array(['Cuban Restaurant', 'Mexican Restaurant', 'French Restaurant',
       'Seafood Restaurant', 'Breakfast Spot', 'Tapas Restaurant',
       'BBQ Joint', 'American Restaurant', 'Burger Joint',
       'New American Restaurant', 'Southern / Soul Food Restaurant',
       'Fast Food Restaurant', 'Bakery', 'Sushi Restaurant', 'Donut Shop',
       'Japanese Restaurant', 'Wings Joint', 'Pizza Place',
       'Fried Chicken Joint', 'Asian Restaurant', 'Food Truck', 'Café',
       'Steakhouse', 'Ethiopian Restaurant', 'Restaurant', 'Food',
       'Chinese Restaurant', 'Sandwich Place', 'Deli / Bodega',
       'Salad Place', 'Italian Restaurant', 'Mediterranean Restaurant',
       'Tex-Mex Restaurant', 'Korean Restaurant', 'Indian Restaurant'],
      dtype=object)

In [11]:
Memphis_venues.head()

Unnamed: 0,Zipcode,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,38103,35.144,-90.048,Havana's Pilón,35.144764,-90.051266,Cuban Restaurant
1,38103,35.144,-90.048,Maciel's Tortas & Tacos,35.144,-90.053038,Mexican Restaurant
2,38103,35.144,-90.048,Chez Philippe,35.14248,-90.05142,French Restaurant
3,38103,35.144,-90.048,Flying Fish,35.142064,-90.052735,Seafood Restaurant
4,38103,35.144,-90.048,Cockadoos,35.142581,-90.052344,Breakfast Spot


### Now I can count them.

In [12]:
Memphis_venues.groupby('Zipcode').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
38103,25,25,25,25,25,25
38104,12,12,12,12,12,12
38105,4,4,4,4,4,4
38108,1,1,1,1,1,1
38109,1,1,1,1,1,1
38111,8,8,8,8,8,8
38112,8,8,8,8,8,8
38114,1,1,1,1,1,1
38115,5,5,5,5,5,5
38116,1,1,1,1,1,1


### and now I group them and identify the most common venue

In [13]:
Memphis_onehot = pd.get_dummies(Memphis_venues[['Venue Category']], prefix="", prefix_sep="")

Memphis_onehot.insert(loc=0, column='Zipcode', value=Memphis_venues['Zipcode'] )
Memphis_grouped = Memphis_onehot.groupby('Zipcode').mean().reset_index()
Memphis_grouped.head()

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zipcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Zipcode'] = Memphis_grouped['Zipcode']

for ind in np.arange(Memphis_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Memphis_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,38103,American Restaurant,BBQ Joint,Mexican Restaurant,Bakery,Burger Joint,Southern / Soul Food Restaurant,Seafood Restaurant,Sushi Restaurant,Breakfast Spot,New American Restaurant
1,38104,Burger Joint,Donut Shop,Wings Joint,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Food Truck,Fried Chicken Joint,Seafood Restaurant,Asian Restaurant
2,38105,American Restaurant,Fast Food Restaurant,Burger Joint,Café,Deli / Bodega,Food Truck,Food,Ethiopian Restaurant,Donut Shop,Chinese Restaurant
3,38108,Steakhouse,Wings Joint,Cuban Restaurant,Food Truck,Food,Fast Food Restaurant,Ethiopian Restaurant,Donut Shop,Deli / Bodega,Chinese Restaurant
4,38109,Fast Food Restaurant,Wings Joint,Cuban Restaurant,Food Truck,Food,Ethiopian Restaurant,Donut Shop,Deli / Bodega,Chinese Restaurant,Fried Chicken Joint


### Now I will disstribute venues in different Clusters for Neighborhoods

In [14]:
# set number of clusters
kclusters = 5

Memphis_grouped_clustering = Memphis_grouped.drop('Zipcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Memphis_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 0, 1, 1, 3, 1, 2], dtype=int32)

### I will add clusters in dataframe

In [15]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Memphis_merged = geo_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Memphis_merged = Memphis_merged.join(neighborhoods_venues_sorted.set_index('Zipcode'), on='Zipcode')

Memphis_merged.head()

Unnamed: 0,Zipcode,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,38103,12180,35.144,-90.048,1.0,American Restaurant,BBQ Joint,Mexican Restaurant,Bakery,Burger Joint,Southern / Soul Food Restaurant,Seafood Restaurant,Sushi Restaurant,Breakfast Spot,New American Restaurant
1,38104,23409,35.1334,-90.0046,1.0,Burger Joint,Donut Shop,Wings Joint,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Food Truck,Fried Chicken Joint,Seafood Restaurant,Asian Restaurant
2,38105,6184,35.1497,-90.033,1.0,American Restaurant,Fast Food Restaurant,Burger Joint,Café,Deli / Bodega,Food Truck,Food,Ethiopian Restaurant,Donut Shop,Chinese Restaurant
3,38106,27222,35.1021,-90.033,,,,,,,,,,,
4,38107,17698,35.1831,-90.0201,,,,,,,,,,,


#### and sort them

In [16]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,38103,American Restaurant,BBQ Joint,Mexican Restaurant,Bakery,Burger Joint,Southern / Soul Food Restaurant,Seafood Restaurant,Sushi Restaurant,Breakfast Spot,New American Restaurant
1,1,38104,Burger Joint,Donut Shop,Wings Joint,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Food Truck,Fried Chicken Joint,Seafood Restaurant,Asian Restaurant
2,1,38105,American Restaurant,Fast Food Restaurant,Burger Joint,Café,Deli / Bodega,Food Truck,Food,Ethiopian Restaurant,Donut Shop,Chinese Restaurant
3,1,38108,Steakhouse,Wings Joint,Cuban Restaurant,Food Truck,Food,Fast Food Restaurant,Ethiopian Restaurant,Donut Shop,Deli / Bodega,Chinese Restaurant
4,0,38109,Fast Food Restaurant,Wings Joint,Cuban Restaurant,Food Truck,Food,Ethiopian Restaurant,Donut Shop,Deli / Bodega,Chinese Restaurant,Fried Chicken Joint


In [17]:
address = 'Memphis, TN'

geolocator = Nominatim(user_agent="Mem_finder")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Memphis downtown are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Memphis downtown are 35.1490215, -90.0516285.


### I will put all the venuse on the map with different colors.

In [19]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Memphis_merged['Latitude'], Memphis_merged['Longitude'], Memphis_merged['Zipcode'], Memphis_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float