# Opening a Restaurant in South Indian City of Chennai

## Capstone Project_IBM_Coursera_Apoorva Bajaj_Week 1_Notebook

### Section Index (Week 1)

1. [Section 1: Introduction: Background and Problem Discussion](#intro)
2. [Section 2: Data Description and Requirements](#data)

## Introduction: Background and Problem Discussion <a name="intro"></a>

I am from India, one of my close friends stays in Chennai. He wants to open a restaurant in Chennai. He is not sure what parameters to consider while choosing a perfect location for his restaurant project in Chennai. I am helping him with data-driven decision-making using Data science. This Capstone project will explore the neighbourhoods of Chennai, a city in the Southern Part of India. I am of the opinion, that this project, its results and conclusions will be helpful for my friend who is planning to open a Restaurant in Chennai and would also be helpful to someone who is thinking of opening a similar food-outlet in Chennai.

For the purpose of this project, I will be using “Foursquare API” to explore the neighbourhoods of Chennai. Specifically, I am interested in analysing the neighbourhood areas for which several venues can be obtained. Using Data Science methodologies, they are clustered based on the venues. I will be using “k-means clustering algorithm” to achieve the task. The optimal number of clusters will be gathered using silhouette score. I will be using “Folium visualization library** to visualize the clusters superimposed on the map of the city of Chennai. These clusters can be analysed to provide data driven and visual insights to my friend or other small scale businesses to select an apt location for their requirement, such as Restaurants or Hotels. 

#### Importing the libraries relevant for this project

In [1]:
import re
import json
import requests
import numpy as np
from bs4 import BeautifulSoup

import pandas as pd
#display all rows
pd.set_option('display.max_rows', None)
#display all columns
pd.set_option('display.max_columns', None)

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score

from geopy.geocoders import Nominatim

import folium

import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
%matplotlib inline

print('Libraries imported.')

Libraries imported.


## Data Description and Requirements <a name="data"></a>

Chennai has several neighbourhoods. I have found this dataset: https://chennaiiq.com/chennai/latitude_longitude_areas.asp  which includes the list of locations in Chennai city along with their Latitude and Longitude co-ordinates.
One thing to note is that the Latitude and Longitude data provided in this link is in Degrees, Minutes and Seconds format. This has to be converted to Decimal Degrees before starting the analysis.

In [2]:
url = 'https://chennaiiq.com/chennai/latitude_longitude_areas.asp'

html = requests.get(url)
print(html)

<Response [200]>


A Reponse value of 200 means that this process was successful. Now we will clean the file to access the Neighborhoods and their Locations. Using BeautifulSoup library,I will parse the html file to make it easier to access. 

In [3]:
soup = BeautifulSoup(html.text, 'html.parser')
table = soup.find("table", attrs={"class": "TBox"})

The html has been parsed and it can be used to build the dataset. As a next step, I will access the Location data and store it in a pandas dataframe: chennai_data.

#### Data Collection:

In [4]:
table_data = []
index = ['S.No.', 'Location', 'Latitude', 'Longitude']
for tr in table.find_all("tr", attrs={"class": "tab"}):
    t_row = {}
    for td, th in zip(tr.find_all("td"), index): 
        t_row[th] = td.text.replace('\n', '').strip()
    table_data.append(t_row)

chennai_data = pd.DataFrame(table_data[:-1], columns=index)
chennai_data.drop(columns=['S.No.'], inplace=True)
chennai_data.at[0,'Location'] = 'Adyar Bus Debot'
chennai_data.rename(columns={'Location': 'Neighborhood'}, inplace=True)
print(chennai_data.shape)
chennai_data.head()

(105, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Adyar Bus Debot,"12°59'50"" N","80°15'25"" E"
1,Adyar Signal,"13°00'23"" N","80°15'27"" E"
2,Alandur,"13°00'28"" N","80°12'35"" E"
3,Ambattur,"13°06'36"" N","80°10'12"" E"
4,Anna Arch,"13°04'28"" N","80°13'06"" E"


#### Data Conversion: Converting Degrees, Minute and Seconds to Decimal Degrees

There are a total of 105 neighborhoods. But as noted earlier the Latitude and Longitude data needs to be converted from Degrees, Minute & Seconds to Decimal Degrees. The dms2dd function as used below will solve this problem.

In [5]:
def dms2dd(s):
    degrees, minutes, seconds, direction = re.split('[°\'"]+', s)
    dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
    if direction in ('S','W'):
        dd*= -1
    return dd

chennai_data['Latitude'] = chennai_data['Latitude'].apply(dms2dd)
chennai_data['Longitude'] = chennai_data['Longitude'].apply(dms2dd)
print(chennai_data.shape)
chennai_data.head()

(105, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Adyar Bus Debot,12.997222,80.256944
1,Adyar Signal,13.006389,80.2575
2,Alandur,13.007778,80.209722
3,Ambattur,13.11,80.17
4,Anna Arch,13.074444,80.218333


Now that we have the neighborhoods dataset lets visualize them using Folium Library. First lets create a map of Chennai. The latitude and longitude of chennai can be obtained using the Geolocator library.

In [6]:
address = 'Chennai, Tamil Nadu'

geolocator = Nominatim(user_agent="chennai_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of chennai are {}, {}.'.format(latitude, longitude))

The geographical coordinate of chennai are 13.0801721, 80.2838331.


#### Chennai Map:

In [7]:
# create map of Toronto using latitude and longitude values
chennai_map = folium.Map(location=[latitude, longitude], zoom_start=11)    
chennai_map

#### Chennai Map with the neighborhoods superimposed on top:

In [8]:
# add neighborhood markers to map
for lat, lng, location in zip(chennai_data['Latitude'], chennai_data['Longitude'], chennai_data['Neighborhood']):
    label = '{}'.format(location)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(chennai_map)  

chennai_map

#### Define Foursquare Credentials and Version

In [9]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20210627' # Foursquare API version

#### Explore neighborhoods in Chennai

The following function will send an explore request for each neighborhood and return the 100 most popular places in the neighborhood around 500 meters.

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        count = 1
        while count != 5:
            try:
                results = requests.get(url).json()["response"]['groups'][0]['items']
                count = 5
            except:
                count += 1
    
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Lets use the above function on the chennai_data neighborhoods dataframe and store the venues data returned in the chennai_venues pandas dataframe.

In [11]:
chennai_venues = getNearbyVenues(names = chennai_data['Neighborhood'],
                                 latitudes = chennai_data['Latitude'],
                                 longitudes = chennai_data['Longitude']
                                 )

print(chennai_venues.shape)
chennai_venues.head()

(1130, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adyar Bus Debot,12.997222,80.256944,Zaitoon Restaurant,12.996861,80.256178,Middle Eastern Restaurant
1,Adyar Bus Debot,12.997222,80.256944,Kuttanadu Restaurant,12.99701,80.257799,Asian Restaurant
2,Adyar Bus Debot,12.997222,80.256944,Zha Cafe,12.99973,80.254806,Café
3,Adyar Bus Debot,12.997222,80.256944,"Adyar Ananda Bhavan, Besant Nagar",12.996678,80.258275,Fast Food Restaurant
4,Adyar Bus Debot,12.997222,80.256944,Kovai Pazhamudir Nilayam,12.996522,80.259776,Fruit & Vegetable Store


A total of 1130 venues were obtained. Now lets check the number of venues returned per neighborhood.

In [12]:
chennai_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AVM Studio,4,4,4,4,4,4
Adyar Bus Debot,16,16,16,16,16,16
Adyar Signal,32,32,32,32,32,32
Alandur,11,11,11,11,11,11
Ambattur,1,1,1,1,1,1
Anna Arch,14,14,14,14,14,14
Anna Nagar Roundana,21,21,21,21,21,21
Anna Nagar West Terminus,7,7,7,7,7,7
Anna Statue,11,11,11,11,11,11
Anna University Entrance,4,4,4,4,4,4


From the above dataframe we can see that Taj Coromandal returned the highest number of venues i.e. 50. Now lets check the unique categories of all the venues returned.

In [13]:
print('There are {} uniques categories.'.format(len(chennai_venues['Venue Category'].unique())))

There are 145 uniques categories.
