<a href="https://colab.research.google.com/github/ankit311/Coursera_Capstone/blob/main/Applied_Capstone_Assignment_Week_4_The_Battle_of_Neighbourhoods_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Where to go London or New York, where I will get good Indian food ?***

# **Introduction**

London and New York are the cities where people find lots of different variety of food to eat. These two cities are quite popular tourist and vacation destinations for people all around the world. They are diverse and multicultural and offer a wide variety of experiences that is widely sought after. Indian people who already live there or tourist from India who visit these cities are in search of good authentic Indian food. A lot has changed over the years and we now take a look at how these two cities hotels and restuarant have grown in term of bringing new Indian food to there menu.

In this project lets try to help those people and find there perfect place to eat and enjoy there time by having perfect Indian food cities has to offer.

# **Business Problem**

The aim is to help tourist from India who visit London and New York and Indian People who already stays there find suitable hotels and restaurant who serve good and authentic Indian food to them. This will also help people make decisions if they are thinking about vacation in these beautiful cities.

# **Data Description**

We require geographical location data for both London and New York.

**London**

To derive our solution, We scrape our data from [link](https://en.wikipedia.org/wiki/List_of_areas_of_London).

This wikipedia page has information about all the neighbourhoods, we limit it London.

Data Column Details -
*   borough : Name of borough
*   town : Name of town

This wikipedia page lacks information about the London geographical locations. To solve this problem we will use a python libary [geopy](https://pypi.org/project/geopy/).

**New York**

To derive our solution, We scrape our data from [link](https://en.wikipedia.org/wiki/New_York_City).

This wikipedia page has information about all the boroughs of New York City.

Data Column Details -

*   borough : Name of borough
*   town : Name of town

This wikipedia page lacks information about the New York City geographical locations. To solve this problem we will use a python libary [geopy](https://pypi.org/project/geopy/).

**Foursquare API Data**

We will need data about different hotels ans restaurants in different neighbourhoods of that specific borough. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. Based on all the information collected for both London and New York City, we have sufficient data to build our model. We then present our observations and findings. Using this data, our stakeholders can take the necessary decision.

# **Methodology**

We will be creating our model with the help of Python so we start off by importing all the required python packages.

In [None]:
# importing library
import pandas as pd
import numpy as np
import requests
# for data extraction
from bs4 import BeautifulSoup
# for plotting map
import folium
# for clusterring using KMeans algorithm
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

The approach taken here is to explore each of the cities individually, plot the map to show the neighbourhoods being considered and then build our model by clustering all of the similar neighbourhoods together and finally plot the new map with the clustered neighbourhoods. We draw insights and then compare and discuss our findings.

# **Exploring London**

**Neighbourhoods of London**

We begin to start collecting and refining the data needed for the our business solution to work.

**Data Collection**

To get the neighbourhoods in London, we start by scraping the list of areas of London wiki page.

In [None]:
# reading data from wiki page
london_data_url = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
response = requests.get(london_data_url)
response

<Response [200]>

Response 200 means that we are able to make the connection

In [None]:
# using Beautiful soup library to parse HTML data
soup_obj = BeautifulSoup(response.text, 'xml')
table_obj=soup_obj.findAll('table')
london_data_table = table_obj[1]

In [None]:
# parsing table data row and column wise
table_contents=[]
for row in london_data_table.findAll('tr'):
    cell = {}
    col_number = 0
    for col in row.findAll('td'):
        cell['col_'+str(col_number)] = str.strip(col.text)
        col_number += 1
    if len(cell) > 0:
        table_contents.append(cell)

In [None]:
# creating dataframe
london_data_df=pd.DataFrame(table_contents)
london_data_df.head(10)

Unnamed: 0,col_0,col_1,col_2,col_3,col_4,col_5
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728
5,Aldborough Hatch,Redbridge[9],ILFORD,IG2,20,TQ455895
6,Aldgate,City[10],LONDON,EC3,20,TQ334813
7,Aldwych,Westminster[10],LONDON,WC2,20,TQ307810
8,Alperton,Brent[11],WEMBLEY,HA0,20,TQ185835
9,Anerley,Bromley[11],LONDON,SE20,20,TQ345695


In [None]:
# renaming columns of dataframe
london_data_df.columns = ['location', 'london_borough', 'post_town', 'postcode', 'dial_code', 'os_grid_ref']
london_data_df.head(5)

Unnamed: 0,location,london_borough,post_town,postcode,dial_code,os_grid_ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


**Feature Selection**

We need only the london_borough, postal_town, post_code for further steps. We can drop the location, dial_code and os_grid_ref.

In [None]:
london_data_df = london_data_df.drop(['location', 'dial_code', 'os_grid_ref'], axis = 1)
london_data_df.head(10)

Unnamed: 0,london_borough,post_town,postcode
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
5,Redbridge[9],ILFORD,IG2
6,City[10],LONDON,EC3
7,Westminster[10],LONDON,WC2
8,Brent[11],WEMBLEY,HA0
9,Bromley[11],LONDON,SE20


Let's remove the Square brackets [ ] and numbers from the london_borough column

In [None]:
london_data_df['london_borough'] = london_data_df['london_borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
london_data_df.head(10)

Unnamed: 0,london_borough,post_town,postcode
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Croydon,CROYDON,CR0
3,Croydon,CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
5,Redbridge,ILFORD,IG2
6,City,LONDON,EC3
7,Westminster,LONDON,WC2
8,Brent,WEMBLEY,HA0
9,Bromley,LONDON,SE20


Take the dimension of the dataframe

In [None]:
london_data_df.shape

(531, 3)

We currently have 531 records and 3 columns of our data. It's time to perform Feature Engineering

**Feature Engineering**

We can only focusing on the neighbourhoods of London, so performing the changes

In [None]:
london_data_df = london_data_df[london_data_df['post_town'].str.contains('LONDON')]
london_data_df.head(10)

Unnamed: 0,london_borough,post_town,postcode
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,City,LONDON,EC3
7,Westminster,LONDON,WC2
9,Bromley,LONDON,SE20
10,Islington,LONDON,"EC1, N1"
12,Islington,LONDON,N19
14,Barnet,"BARNET, LONDON","EN5, NW7"
15,Enfield,LONDON,"N11, N14"
16,Wandsworth,LONDON,SW12


In [None]:
london_data_df.shape

(308, 3)

let's drop the duplicates row, dropping to enhance performance

In [None]:
london_data_df = london_data_df.drop_duplicates()

In [None]:
london_data_df.shape

(193, 3)

We now have only 193 rows. We can proceed with our further steps. Getting some descriptive statistics

In [None]:
london_data_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 193 entries, 0 to 522
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   london_borough  193 non-null    object
 1   post_town       193 non-null    object
 2   postcode        193 non-null    object
dtypes: object(3)
memory usage: 6.0+ KB


let's start by converting the address to its latitude and longitude coordinates.

**Method name** - get_lat_long

**Purpose** - It convert string address into geographical coordinates.

**Input** - Address (string)

**Output** - latitude and longitude coordinates

In [None]:
def get_lat_long(row):
    address = row[0] + ', ' + row[1]
    try:
        geolocator = Nominatim(user_agent="foursquare_agent")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
    except:
        latitude = 0.0
        longitude = 0.0
    row['latitude'] = latitude
    row['longitude'] = longitude
    return row


In [None]:
# above method is applied on london borough dataframe
london_data_df = london_data_df.apply(get_lat_long, axis=1)
index_names = london_data_df[ london_data_df['latitude'] == 0.0 ].index
london_data_df.drop(index_names, inplace = True)
london_data_df.head(100)

Unnamed: 0,london_borough,post_town,postcode,latitude,longitude
0,"Bexley, Greenwich",LONDON,SE2,51.451053,0.079100
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.491537,-0.214971
6,City,LONDON,EC3,51.515618,-0.091998
7,Westminster,LONDON,WC2,51.500444,-0.126540
9,Bromley,LONDON,SE20,51.402805,0.014814
...,...,...,...,...,...
214,Ealing,LONDON,W7,51.512655,-0.305195
223,Haringey,LONDON,"N4, N8, N15",51.601474,-0.111782
234,Lambeth,LONDON,SE24,51.501301,-0.117287
237,Islington,LONDON,N5,51.538429,-0.099905


# **Exploring New York**

**Neighbourhoods of New York**

We begin to start collecting and refining the data needed for the our business solution to work.

**Data Collection**

To get the neighbourhoods in New York, we start by scraping the list of areas of New York City wiki page.

In [None]:
#reading data from wiki page
nyc_data_url = "https://en.wikipedia.org/wiki/New_York_City"
response = requests.get(nyc_data_url)
response

<Response [200]>

Response 200 means that we are able to make the connection

In [None]:
# using Beautiful soup library to parse HTML data
soup_obj = BeautifulSoup(response.text, 'xml')
table_obj=soup_obj.findAll('table')
nyc_data_table = table_obj[2]

In [None]:
# parsing table data row and column wise
table_contents=[]
for row in nyc_data_table.findAll('tr'):
    cell = {}
    col_number = 0
    for col in row.findAll('td'):
        cell['col_'+str(col_number)] = str.strip(col.text)
        col_number += 1
    if len(cell) > 0:
        table_contents.append(cell)

In [None]:
# creating dataframe
nyc_data_df=pd.DataFrame(table_contents)
nyc_data_df.head(10)

Unnamed: 0,col_0,col_1,col_2,col_3,col_4,col_5,col_6,col_7
0,The Bronx,Bronx,1418207.0,42.695,42.1,109.04,33867.0,13006.0
1,Brooklyn,Kings,2559903.0,91.559,70.82,183.42,36147.0,13957.0
2,Manhattan,New York,1628706.0,600.244,22.83,59.13,71341.0,27544.0
3,Queens,Queens,2253858.0,93.31,108.53,281.09,20767.0,8018.0
4,Staten Island,Richmond,476143.0,14.514,58.37,151.18,8157.0,3150.0
5,City of New York,8336817,842.343,302.64,783.83,27547.0,10636.0,
6,State of New York,19453561,1731.91,47126.4,122056.82,412.0,159.0,
7,Sources:[163][164][165] and see individual bor...,,,,,,,


**Feature Selection**

We need only the borough name therefore dropping rest of the column.

In [None]:
nyc_data_df = nyc_data_df.drop(['col_'+str(x) for x in range(1,8)], axis = 1)
nyc_data_df.head(10)

Unnamed: 0,col_0
0,The Bronx
1,Brooklyn
2,Manhattan
3,Queens
4,Staten Island
5,City of New York
6,State of New York
7,Sources:[163][164][165] and see individual bor...


In [None]:
# renaming columns of dataframe
nyc_data_df.columns = ['nyc_borough']
nyc_data_df.head(10)

Unnamed: 0,nyc_borough
0,The Bronx
1,Brooklyn
2,Manhattan
3,Queens
4,Staten Island
5,City of New York
6,State of New York
7,Sources:[163][164][165] and see individual bor...


In [None]:
nyc_data_df['post_town'] = 'New York'
nyc_data_df = nyc_data_df[:][:5]
nyc_data_df.head(10)

Unnamed: 0,nyc_borough,post_town
0,The Bronx,New York
1,Brooklyn,New York
2,Manhattan,New York
3,Queens,New York
4,Staten Island,New York


Take the dimension of the dataframe

In [None]:
nyc_data_df.shape

(5, 2)

We now have only 5 rows. We can proceed with our further steps. Getting some descriptive statistics

In [None]:
nyc_data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   nyc_borough  5 non-null      object
 1   post_town    5 non-null      object
dtypes: object(2)
memory usage: 208.0+ bytes


let's start by converting the address to its latitude and longitude coordinates.

In [None]:
nyc_data_df = nyc_data_df.apply(get_lat_long, axis=1)
nyc_data_df.head(100)

Unnamed: 0,nyc_borough,post_town,latitude,longitude
0,The Bronx,New York,40.846651,-73.878594
1,Brooklyn,New York,40.650104,-73.949582
2,Manhattan,New York,40.789624,-73.959894
3,Queens,New York,40.749824,-73.797634
4,Staten Island,New York,40.583456,-74.149605


**Co-ordinates for London**

Getting the geocode for London to help visualize it on the map

In [None]:
def get_location(address):
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print(f'Latitude - {latitude}, Longitude - {longitude}')
    return [latitude, longitude]

**Visualize the Map of London**

To help visualize the Map of London and the neighbourhoods in London, we make use of the folium package.

In [None]:
# visualizing all the neighborhoods of London from the above data frame using folium
# london map
map_london = folium.Map(location=get_location('London, England'),zoom_start=10)

# adding marker to map
for lat,lng,borough,neighbourhood in zip(london_data_df['latitude'],london_data_df['longitude'],london_data_df['london_borough'],london_data_df['post_town']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_london)
map_london

Latitude - 51.5073219, Longitude - -0.1276474


**Visualize the Map of New York**

To help visualize the Map of New York and the neighbourhoods in New York, we make use of the folium package.

In [None]:
# visualizing all the New York neighborhoods from the above data frame using folium
# New York map
map_nyc = folium.Map(location=get_location('New York, USA'),zoom_start=10)

# adding marker to map
for lat,lng,borough,neighbourhood in zip(nyc_data_df['latitude'],nyc_data_df['longitude'],nyc_data_df['nyc_borough'],nyc_data_df['post_town']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_nyc)
map_nyc

Latitude - 40.7127281, Longitude - -74.0060152


let's now use Foursquare API to get Indian restaurant details for both London and New York.

**Foursquare API Credential**

In [None]:
# foursquare api credential
CLIENT_ID = 'QUOSLG3QT4KFSFBZJ23JP1Z2Z30Y0NZMEUWI3FI05FRGXMMY' # Foursquare ID
CLIENT_SECRET = 'QCEHGCY0EGU14T33S3PVOIJKHIF3N4PI150ZIDOTIEL2RL21' # Foursquare Secret
ACCESS_TOKEN = '2A0SDC1GKCMH2WKNIFOKZT5WJJNZSQZWVD2M2NUQ2IPGA2Q2' # FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QUOSLG3QT4KFSFBZJ23JP1Z2Z30Y0NZMEUWI3FI05FRGXMMY
CLIENT_SECRET:QCEHGCY0EGU14T33S3PVOIJKHIF3N4PI150ZIDOTIEL2RL21


Method name - api_request

Purpose - send request to URL

Input - URL (string) 

Response - JSON output

In [None]:
def api_request(url):
    response = requests.get(url).json()
    return response

## Search API URL for a specific venue category

> `https://api.foursquare.com/v2/venues/`**search**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`&query=`**QUERY**`&radius=`**RADIUS**`&limit=`**LIMIT**

let's find Indian food restaurant in London and its borough through Foursquare API

**Method name** - get_restaurant_london

**Purpose** - It helps in listing all the Indian restaurant in London

**Input** - Dataframe which contain latitude and longitude details, query, radius, client_id, client_secret, access_token, version, limit

**Response** - Restaurant details in JSON format

In [None]:
def get_restaurant_london(row, query, radius, client_id, client_secret, access_token, version, limit):
    latitude = row[3]
    longitude = row[4]
    url = f'https://api.foursquare.com/v2/venues/search?client_id={client_id}&client_secret={client_secret}&ll={latitude},{longitude}&oauth_token={access_token}&v={version}&query={query}&radius={radius}&limit={limit}'
    print(f'Final URL - {url}')
    resp = api_request(url)
    if resp['meta']['code'] == 200:
        print(resp['response'])
        # assign relevant part of JSON to venues
        venues = resp['response']['venues']
        return venues
    else:
        print(resp)

search_query = 'Indian'
radius = 500
print(f'query - {search_query}')
#london_data_df = london_data_df[:5][:]
london_restaurant_series = london_data_df.apply(get_restaurant_london, args=(search_query, radius, CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION, LIMIT), axis=1)
london_restaurant_series_updated = []
for x in london_restaurant_series:
    if len(x) > 0:
        london_restaurant_series_updated.extend(x)
london_restaurant_df = json_normalize(london_restaurant_series_updated)

query - Indian
Final URL - https://api.foursquare.com/v2/venues/search?client_id=QUOSLG3QT4KFSFBZJ23JP1Z2Z30Y0NZMEUWI3FI05FRGXMMY&client_secret=QCEHGCY0EGU14T33S3PVOIJKHIF3N4PI150ZIDOTIEL2RL21&ll=51.4510531,0.0790997&oauth_token=2A0SDC1GKCMH2WKNIFOKZT5WJJNZSQZWVD2M2NUQ2IPGA2Q2&v=20180604&query=Indian&radius=500&limit=30
{'venues': []}
Final URL - https://api.foursquare.com/v2/venues/search?client_id=QUOSLG3QT4KFSFBZJ23JP1Z2Z30Y0NZMEUWI3FI05FRGXMMY&client_secret=QCEHGCY0EGU14T33S3PVOIJKHIF3N4PI150ZIDOTIEL2RL21&ll=51.49153695,-0.21497089652160112&oauth_token=2A0SDC1GKCMH2WKNIFOKZT5WJJNZSQZWVD2M2NUQ2IPGA2Q2&v=20180604&query=Indian&radius=500&limit=30
{'venues': [{'id': '51cf30bf498e2c23fed00261', 'name': 'Indian Express', 'location': {'address': '3 North End Parade', 'lat': 51.49302784342529, 'lng': -0.20823223710944824, 'labeledLatLngs': [{'label': 'display', 'lat': 51.49302784342529, 'lng': -0.20823223710944824}], 'distance': 495, 'postalCode': 'W14 0SJ', 'cc': 'GB', 'city': 'London', '



In [None]:
# printing dataframe containing details of Indian restaurant present in London
display(london_restaurant_df)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet
0,51cf30bf498e2c23fed00261,Indian Express,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",v-1620732779,False,3 North End Parade,51.493028,-0.208232,"[{'label': 'display', 'lat': 51.49302784342529...",495,W14 0SJ,GB,London,Greater London,United Kingdom,"[3 North End Parade, London, Greater London, W...",
1,5b7aa96cf96b2c002c2aab72,Indian Street Food,"[{'id': '4bf58dd8d48988d1cb941735', 'name': 'F...",v-1620732779,False,,51.497345,-0.133849,"[{'label': 'display', 'lat': 51.49734481108781...",612,SW1P 2HZ,GB,London,Greater London,United Kingdom,"[London, Greater London, SW1P 2HZ, United King...",
2,4ce83bacf3bda1437526b8e4,Blue Ginger,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",v-1620732780,False,7 E Barnet Rd,51.652021,-0.175503,"[{'label': 'display', 'lat': 51.65202147816886...",461,EN 4 8,GB,Hertfordshire,Hertfordshire,United Kingdom,"[7 E Barnet Rd, Hertfordshire, EN 4 8, United ...",
3,4ce83bacf3bda1437526b8e4,Blue Ginger,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",v-1620732780,False,7 E Barnet Rd,51.652021,-0.175503,"[{'label': 'display', 'lat': 51.65202147816886...",461,EN 4 8,GB,Hertfordshire,Hertfordshire,United Kingdom,"[7 E Barnet Rd, Hertfordshire, EN 4 8, United ...",
4,5b7aa96cf96b2c002c2aab72,Indian Street Food,"[{'id': '4bf58dd8d48988d1cb941735', 'name': 'F...",v-1620732781,False,,51.497345,-0.133849,"[{'label': 'display', 'lat': 51.49734481108781...",612,SW1P 2HZ,GB,London,Greater London,United Kingdom,"[London, Greater London, SW1P 2HZ, United King...",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,5a9721a7872f7d0c0e60cb66,Indian Singles UK,"[{'id': '56aa371be4b08b9a8d573554', 'name': 'E...",v-1620732797,False,207 Regent Street,51.513239,-0.141124,"[{'label': 'display', 'lat': 51.51323852, 'lng...",482,W1B 3HH,GB,London,Greater London,United Kingdom,"[207 Regent Street, London, Greater London, W1...",
69,5f23ae7fffd2c04a6efc0e27,Indian Visa Online - London Office,"[{'id': '4bf58dd8d48988d1f6931735', 'name': 'G...",v-1620732797,False,"India House, Aldwych,",51.519799,-0.134196,"[{'label': 'display', 'lat': 51.51979918312931...",542,WC2B 4NA,GB,London,Greater London,United Kingdom,"[India House, Aldwych,, London, Greater London...",
70,4b7aababf964a5206f362fe3,KK Private Indian Bar,[],v-1620732797,False,Pinner Road,51.513619,-0.144432,"[{'label': 'display', 'lat': 51.513619, 'lng':...",494,,GB,Harrow,,United Kingdom,"[Pinner Road, Harrow, United Kingdom]",
71,5638ff9660b2753ece75b9f4,Flava's Fine Indian Cuisine,[],v-1620732798,False,"1 Mattock Lane, Ealing, Ealing London W5 5BG",51.511206,-0.308858,"[{'label': 'display', 'lat': 51.5112056, 'lng'...",300,W5 5BG,GB,,,United Kingdom,"[1 Mattock Lane, Ealing, Ealing London W5 5BG,...",


In [None]:
# dropping unnecessary column
london_restaurant_df = london_restaurant_df.drop(['id', 'categories', 'referralId', 'hasPerk', 'location.labeledLatLngs', 'location.formattedAddress'], axis=1)
display(london_restaurant_df)

Unnamed: 0,name,location.address,location.lat,location.lng,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.crossStreet
0,Indian Express,3 North End Parade,51.493028,-0.208232,495,W14 0SJ,GB,London,Greater London,United Kingdom,
1,Indian Street Food,,51.497345,-0.133849,612,SW1P 2HZ,GB,London,Greater London,United Kingdom,
2,Blue Ginger,7 E Barnet Rd,51.652021,-0.175503,461,EN 4 8,GB,Hertfordshire,Hertfordshire,United Kingdom,
3,Blue Ginger,7 E Barnet Rd,51.652021,-0.175503,461,EN 4 8,GB,Hertfordshire,Hertfordshire,United Kingdom,
4,Indian Street Food,,51.497345,-0.133849,612,SW1P 2HZ,GB,London,Greater London,United Kingdom,
...,...,...,...,...,...,...,...,...,...,...,...
68,Indian Singles UK,207 Regent Street,51.513239,-0.141124,482,W1B 3HH,GB,London,Greater London,United Kingdom,
69,Indian Visa Online - London Office,"India House, Aldwych,",51.519799,-0.134196,542,WC2B 4NA,GB,London,Greater London,United Kingdom,
70,KK Private Indian Bar,Pinner Road,51.513619,-0.144432,494,,GB,Harrow,,United Kingdom,
71,Flava's Fine Indian Cuisine,"1 Mattock Lane, Ealing, Ealing London W5 5BG",51.511206,-0.308858,300,W5 5BG,GB,,,United Kingdom,


**Visualization of all the Indian restaurant located in London**

In [None]:
# visualizing all the Indian restaurant in London neighborhoods from the above data frame using folium
# Indian Restaurant London map
map_rest_london = folium.Map(location=get_location('London, England'),zoom_start=10)

# adding marker to map
for lat,lng, rest_name in zip(london_restaurant_df['location.lat'],london_restaurant_df['location.lng'],london_restaurant_df['name']):
    label = '{}'.format(rest_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_rest_london)
map_rest_london

Latitude - 51.5073219, Longitude - -0.1276474


Clustering all the Indian restaurant located in London on the basis of there geographical location

**Algorithm used - *KMeans*** 

In [None]:
# using K-Means clustering algorithm for the clustering of the Indian restaurant in London neighborhoods

k=5 # number of cluster

london_cluster_data = london_restaurant_df[:][['location.lat', 'location.lng']] # considering only numerical data
kmeans = KMeans(n_clusters = k,random_state=0).fit(london_cluster_data)
kmeans.labels_
london_restaurant_df.insert(0, 'Cluster Labels', kmeans.labels_)
display(london_restaurant_df)

Unnamed: 0,Cluster Labels,name,location.address,location.lat,location.lng,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.crossStreet
0,2,Indian Express,3 North End Parade,51.493028,-0.208232,495,W14 0SJ,GB,London,Greater London,United Kingdom,
1,2,Indian Street Food,,51.497345,-0.133849,612,SW1P 2HZ,GB,London,Greater London,United Kingdom,
2,0,Blue Ginger,7 E Barnet Rd,51.652021,-0.175503,461,EN 4 8,GB,Hertfordshire,Hertfordshire,United Kingdom,
3,0,Blue Ginger,7 E Barnet Rd,51.652021,-0.175503,461,EN 4 8,GB,Hertfordshire,Hertfordshire,United Kingdom,
4,2,Indian Street Food,,51.497345,-0.133849,612,SW1P 2HZ,GB,London,Greater London,United Kingdom,
...,...,...,...,...,...,...,...,...,...,...,...,...
68,2,Indian Singles UK,207 Regent Street,51.513239,-0.141124,482,W1B 3HH,GB,London,Greater London,United Kingdom,
69,2,Indian Visa Online - London Office,"India House, Aldwych,",51.519799,-0.134196,542,WC2B 4NA,GB,London,Greater London,United Kingdom,
70,2,KK Private Indian Bar,Pinner Road,51.513619,-0.144432,494,,GB,Harrow,,United Kingdom,
71,3,Flava's Fine Indian Cuisine,"1 Mattock Lane, Ealing, Ealing London W5 5BG",51.511206,-0.308858,300,W5 5BG,GB,,,United Kingdom,


***Visualization of all the Indian restaurant located in London divided in different clusters***

In [None]:
# visualizing all the 5 Indian restaurant neighborhoods cluster from the above data frame using folium
# London map
map_london_clusters = folium.Map(location=get_location('London, England'),zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding markers to the map
markers_colors = []
for lat, lon, rest_name, cluster in zip(london_restaurant_df['location.lat'],london_restaurant_df['location.lng'],london_restaurant_df['name'], london_restaurant_df['Cluster Labels']):
    label = folium.Popup(' Cluster - ' + str(cluster) + ', '+ rest_name, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_london_clusters)
map_london_clusters

Latitude - 51.5073219, Longitude - -0.1276474


let's find Indian food restaurant in ***New York*** and its borough through Foursquare API

**Method name** - get_restaurant_nyc

**Purpose** - It helps in listing all the Indian restaurant in New York

**Input** - Dataframe which contain latitude and longitude details, query, radius, client_id, client_secret, access_token, version, limit

**Response** - Restaurant details in JSON format

In [None]:
def get_restaurant_nyc(row, query, radius, client_id, client_secret, access_token, version, limit):
    latitude = row[2]
    longitude = row[3]
    url = f'https://api.foursquare.com/v2/venues/search?client_id={client_id}&client_secret={client_secret}&ll={latitude},{longitude}&oauth_token={access_token}&v={version}&query={query}&radius={radius}&limit={limit}'
    print(f'Final URL - {url}')
    resp = api_request(url)
    if resp['meta']['code'] == 200:
        print(resp['response'])
        # assign relevant part of JSON to venues
        venues = resp['response']['venues']
        return venues
    else:
        print(resp)

search_query = 'Indian'
radius = 500
print(f'query - {search_query}')
nyc_restaurant_series = nyc_data_df.apply(get_restaurant_nyc, args=(search_query, radius, CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, VERSION, LIMIT), axis=1)
nyc_restaurant_series_updated = []
for x in nyc_restaurant_series:
    if len(x) > 0:
        nyc_restaurant_series_updated.extend(x)
nyc_restaurant_df = json_normalize(nyc_restaurant_series_updated)

query - Indian
Final URL - https://api.foursquare.com/v2/venues/search?client_id=QUOSLG3QT4KFSFBZJ23JP1Z2Z30Y0NZMEUWI3FI05FRGXMMY&client_secret=QCEHGCY0EGU14T33S3PVOIJKHIF3N4PI150ZIDOTIEL2RL21&ll=40.8466508,-73.8785937&oauth_token=2A0SDC1GKCMH2WKNIFOKZT5WJJNZSQZWVD2M2NUQ2IPGA2Q2&v=20180604&query=Indian&radius=500&limit=30
{'venues': []}
Final URL - https://api.foursquare.com/v2/venues/search?client_id=QUOSLG3QT4KFSFBZJ23JP1Z2Z30Y0NZMEUWI3FI05FRGXMMY&client_secret=QCEHGCY0EGU14T33S3PVOIJKHIF3N4PI150ZIDOTIEL2RL21&ll=40.6501038,-73.9495823&oauth_token=2A0SDC1GKCMH2WKNIFOKZT5WJJNZSQZWVD2M2NUQ2IPGA2Q2&v=20180604&query=Indian&radius=500&limit=30
{'venues': [{'id': '4f19c785e4b0c9f57df2cbaf', 'name': 'West Indian Ave.', 'location': {'lat': 40.64673565381504, 'lng': -73.94871135351131, 'labeledLatLngs': [{'label': 'display', 'lat': 40.64673565381504, 'lng': -73.94871135351131}], 'distance': 382, 'postalCode': '11226', 'cc': 'US', 'city': 'Brooklyn', 'state': 'NY', 'country': 'United States', '



In [None]:
# printing New York Indian restaurant detail in a dataframe
display(nyc_restaurant_df)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.address,location.crossStreet
0,4f19c785e4b0c9f57df2cbaf,West Indian Ave.,"[{'id': '4bf58dd8d48988d1f9931735', 'name': 'R...",v-1620732800,False,40.646736,-73.948711,"[{'label': 'display', 'lat': 40.64673565381504...",382,11226,US,Brooklyn,NY,United States,"[Brooklyn, NY 11226, United States]",,
1,4f32549519836c91c7cd23e2,C B Finger Licking Jamaican and West Indian Re...,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",v-1620732800,False,40.646622,-73.948807,"[{'label': 'entrance', 'lat': 40.646603, 'lng'...",393,11226,US,Brooklyn,NY,United States,"[1617 Nostrand Ave, Brooklyn, NY 11226, United...",1617 Nostrand Ave,
2,4cdc6246c409b60ccf30da1a,"Lisa's Pub, Indian Head","[{'id': '4bf58dd8d48988d11b941735', 'name': 'P...",v-1620732800,False,40.652583,-73.95466,"[{'label': 'display', 'lat': 40.652583, 'lng':...",509,11226,US,Brooklyn,NY,United States,"[Brooklyn, NY 11226, United States]",,
3,4a611b29f964a520dec11fe3,Nio's Trinidad Roti Shop,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1620732800,False,40.650638,-73.952338,"[{'label': 'display', 'lat': 40.65063843199367...",240,11226,US,Brooklyn,NY,United States,"[2702 Church Ave (at Rogers Ave.), Brooklyn, N...",2702 Church Ave,at Rogers Ave.
4,5b17c9f935d3fc002c2c09db,"Indian Egg Donors DGA, Inc.","[{'id': '54541900498ea6ccd0202697', 'name': 'H...",v-1620732800,False,40.787468,-73.955444,"[{'label': 'display', 'lat': 40.78746795654297...",445,10128,US,New York,NY,United States,"[1148 5th Ave (Suite 1B), New York, NY 10128, ...",1148 5th Ave,Suite 1B


In [None]:
# dropping unnecessary column from dataframe
nyc_restaurant_df = nyc_restaurant_df.drop(['id', 'categories', 'referralId', 'hasPerk', 'location.labeledLatLngs', 'location.formattedAddress'], axis=1)
display(nyc_restaurant_df)

Unnamed: 0,name,location.lat,location.lng,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.address,location.crossStreet
0,West Indian Ave.,40.646736,-73.948711,382,11226,US,Brooklyn,NY,United States,,
1,C B Finger Licking Jamaican and West Indian Re...,40.646622,-73.948807,393,11226,US,Brooklyn,NY,United States,1617 Nostrand Ave,
2,"Lisa's Pub, Indian Head",40.652583,-73.95466,509,11226,US,Brooklyn,NY,United States,,
3,Nio's Trinidad Roti Shop,40.650638,-73.952338,240,11226,US,Brooklyn,NY,United States,2702 Church Ave,at Rogers Ave.
4,"Indian Egg Donors DGA, Inc.",40.787468,-73.955444,445,10128,US,New York,NY,United States,1148 5th Ave,Suite 1B


**Visualization of all the Indian restaurant located in New York**

In [None]:
# visualizing all the Indian restaurant in New York neighborhoods from the above data frame using folium
# Indian Restaurant New York map
map_rest_nyc = folium.Map(location=get_location('New York, USA'),zoom_start=10)

# adding marker to map
for lat,lng, rest_name in zip(nyc_restaurant_df['location.lat'],nyc_restaurant_df['location.lng'],nyc_restaurant_df['name']):
    label = '{}'.format(rest_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_rest_nyc)
map_rest_nyc

Latitude - 40.7127281, Longitude - -74.0060152


Clustering all the Indian restaurant located in New York on the basis of there geographical location

**Algorithm used - *KMeans***

In [None]:
# using K-Means clustering algorithm for the clustering of the Indian restaurant in New York neighborhoods
k=3

nyc_cluster_data = nyc_restaurant_df[:][['location.lat', 'location.lng']]
kmeans = KMeans(n_clusters = k,random_state=0).fit(nyc_cluster_data)
kmeans.labels_
nyc_restaurant_df.insert(0, 'Cluster Labels', kmeans.labels_)
display(nyc_restaurant_df)

Unnamed: 0,Cluster Labels,name,location.lat,location.lng,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.address,location.crossStreet
0,1,West Indian Ave.,40.646736,-73.948711,382,11226,US,Brooklyn,NY,United States,,
1,1,C B Finger Licking Jamaican and West Indian Re...,40.646622,-73.948807,393,11226,US,Brooklyn,NY,United States,1617 Nostrand Ave,
2,2,"Lisa's Pub, Indian Head",40.652583,-73.95466,509,11226,US,Brooklyn,NY,United States,,
3,2,Nio's Trinidad Roti Shop,40.650638,-73.952338,240,11226,US,Brooklyn,NY,United States,2702 Church Ave,at Rogers Ave.
4,0,"Indian Egg Donors DGA, Inc.",40.787468,-73.955444,445,10128,US,New York,NY,United States,1148 5th Ave,Suite 1B


***Visualization of all the Indian restaurant located in New York divided in different clusters***

In [None]:
# visualizing all the 5 Indian restaurant neighborhoods cluster from the above data frame using folium
# New York map
map_nyc_clusters = folium.Map(location=get_location('New York, USA'),zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding markers to the map
markers_colors = []
for lat, lon, rest_name, cluster in zip(nyc_restaurant_df['location.lat'],nyc_restaurant_df['location.lng'],nyc_restaurant_df['name'], nyc_restaurant_df['Cluster Labels']):
    label = folium.Popup(' Cluster - ' + str(cluster) + ', '+ rest_name, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_nyc_clusters)
map_nyc_clusters

Latitude - 40.7127281, Longitude - -74.0060152


# **Results and Discussion**
*   From the above analysis, we can see that London has more number of Indian restaurant than New York.
*   After implementing KMeans clustering algorithm on both the cities restuarant data it can be seen that **Central London** borough area of London has highest number of Indian restaurant among all borough of London.

**Analysis Details-**


*   Collected city borough data from wikipedia for both the city London and New York.
*   Using geocoder python library found geographical location (latitude and longitude) of each borough of both the cities.
*   Using folium python library to plot map of both London and New York borough location.
*   Using Foursquare API collected all the Indian restaurant details of each borough of both the cities.
*   Using KMeans clustering algorithm clustered all the nearest Indian restaurant in both the cities.
*   Using folium plotted individual map which contain all the cluster of New York and London.



# **Conclusion**
The purpose of this project was to explore the cities of London and New York and see how attractive it is to potential tourists and residents. We explored both the cities based on their borough and find number of Indian food restaurant located their, finally concluding with clustering similar restaurant based on there location together.

We could see that London has a wide variety of experiences to offer in term of Indian restaurant which is unique in it's own way than New York. The cultural diversity in food is quite evident which also gives the feeling of a sense of inclusion to Indian tourists and residents.

Both New York and London seem to offer Indian restaurant but in numbers London win the race. **Overall, my suggesstion would be to go London if you are looking for good Indian food because it has around 73 Indian restaurant and especially Central London because it has more number of Indian resturant then other borough.**

# **Click on Open in Colab button present at top of notebook to see the Folium Map**