Data Science Capstone

Introduction/Business Problem

The recent Covid-19 pandemic not has implications for public health but also has implications for economic growth, in particular for the hospitality industry.  In particular, small food/takeaway outlets have been affected by changes in commuter traffic, for example. 

The aim of this project will be to investigate the impact of the economic, demographic, and location factors on where a new food outlet should be located in the city of London and associated boroughs.  For simplicity, I will consider the possible optimal location of an Indian restaurant.  Demographic, economic, and demographic data.  In particular, this report will focus on the following points.  

1.Median household income;
2.Average house prices;
3.Proximity of factories and other workplaces;
4.Foot traffic data (foursquare data);
5.Concentration of similar food-lets and popularity (foursquare data);
6.Crime levels by borough - often, there is a correlation between this and economic prosperity and education.
7.Popularity of similar types of restaurants in London boroughs.

Data sources

1.Opensource datasets, for example country of birth datasets, by region - demographics;
2.Average household income by region/borough in London - spending power;
3.Foursquare data to visualize types of work places;
4.Foursquare foot traffic data;
5.Use of foursquare data to visualize pockets of similar food outlets by region/borough in London.

How the data will be used

1.Use of folium choreopleth maps to visualise household income per capita across London boroughs;
2.Folium can be used to generate maps to visualize where the highest concentrations of similar restaurants (based on lonitude/latitude coordinates);
3.Use of foursquare data to analyse user data, for example ratings;
4.Heatmap of crime rates;

In [1]:
!pip install geopandas

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/d7/bf/e9cefb69d39155d122b6ddca53893b61535fa6ffdad70bf5ef708977f53f/geopandas-0.9.0-py2.py3-none-any.whl (994kB)
[K     |████████████████████████████████| 1.0MB 20.2MB/s eta 0:00:01
Collecting pyproj>=2.2.0 (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/53/ef/459d663c95677a63e8f8dd93b46ef89a885bfcf6bc0655b3f17a1566f78c/pyproj-3.0.1-cp36-cp36m-manylinux2010_x86_64.whl (6.5MB)
[K     |████████████████████████████████| 6.5MB 52.4MB/s eta 0:00:01
[?25hCollecting shapely>=1.6 (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 54.0MB/s eta 0:00:01
[?25hCollecting fiona>=1.8 (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/a6/45/85a48737bf6e8d1e5c20e80bf3ce104

In [2]:
import geopandas as gpd

Data processing stage

Data were pre-processed in order to make them more suitable for analysis.  For example, the point geometry column was split out to derive the latitude and longitude columns (using a lambda function shown below) from a geojson file of Middle layer super output codes.

In [3]:
geoData = gpd.read_file('Middle_Layer_Super_Output_Areas_(December_2011)_Population_Weighted_Centroids.geojson')
print(geoData.head())

   objectid   msoa11cd                  msoa11nm                   geometry
0         1  E02002536      Stockton-on-Tees 002  POINT (-1.29577 54.61069)
1         2  E02002537      Stockton-on-Tees 003  POINT (-1.27726 54.61131)
2         3  E02002534  Redcar and Cleveland 020  POINT (-1.05346 54.52765)
3         4  E02002535      Stockton-on-Tees 001  POINT (-1.28729 54.62215)
4         5  E02002532  Redcar and Cleveland 018  POINT (-1.05793 54.53718)


In [4]:
import pandas as pd

In [5]:
from geopandas import GeoDataFrame as gdf

In [6]:
df1 = pd.DataFrame(geoData)

In [7]:
df1.head()

Unnamed: 0,objectid,msoa11cd,msoa11nm,geometry
0,1,E02002536,Stockton-on-Tees 002,POINT (-1.29577 54.61069)
1,2,E02002537,Stockton-on-Tees 003,POINT (-1.27726 54.61131)
2,3,E02002534,Redcar and Cleveland 020,POINT (-1.05346 54.52765)
3,4,E02002535,Stockton-on-Tees 001,POINT (-1.28729 54.62215)
4,5,E02002532,Redcar and Cleveland 018,POINT (-1.05793 54.53718)


In [8]:
pd.DataFrame(df1['geometry'].tolist(), index=df1.index)

Unnamed: 0,0
0,POINT (-1.295773277600421 54.61069062106589)
1,POINT (-1.277262994489106 54.6113131901005)
2,POINT (-1.05345763288785 54.52764636018806)
3,POINT (-1.287294649101765 54.62214794778174)
4,POINT (-1.057931283237247 54.53717611590013)
...,...
7196,POINT (-3.582973692236147 51.60628436279884)
7197,POINT (-3.572921174293492 51.52123325638538)
7198,POINT (-3.596584857026458 51.51307902167358)
7199,POINT (-3.753911835882974 51.78076807891709)


In [9]:
df1['lon'] = df1.geometry.apply(lambda p: p.x)
df1['lat'] = df1.geometry.apply(lambda p: p.y)

In [10]:
df1.head()

Unnamed: 0,objectid,msoa11cd,msoa11nm,geometry,lon,lat
0,1,E02002536,Stockton-on-Tees 002,POINT (-1.29577 54.61069),-1.295773,54.610691
1,2,E02002537,Stockton-on-Tees 003,POINT (-1.27726 54.61131),-1.277263,54.611313
2,3,E02002534,Redcar and Cleveland 020,POINT (-1.05346 54.52765),-1.053458,54.527646
3,4,E02002535,Stockton-on-Tees 001,POINT (-1.28729 54.62215),-1.287295,54.622148
4,5,E02002532,Redcar and Cleveland 018,POINT (-1.05793 54.53718),-1.057931,54.537176


In [11]:
df1['msoa11nm'].str.contains('Barking', na=False)

0       False
1       False
2       False
3       False
4       False
        ...  
7196    False
7197    False
7198    False
7199    False
7200    False
Name: msoa11nm, Length: 7201, dtype: bool

In [12]:
df1['msoa11nm'].str.contains('Westminster', na=False)

0       False
1       False
2       False
3       False
4       False
        ...  
7196    False
7197    False
7198    False
7199    False
7200    False
Name: msoa11nm, Length: 7201, dtype: bool

In [13]:
df1[df1['msoa11nm'].str.contains('Ealing 001', na=False)]

Unnamed: 0,objectid,msoa11cd,msoa11nm,geometry,lon,lat
4408,4409,E02000238,Ealing 001,POINT (-0.34854 51.55385),-0.348536,51.553848


In [14]:
df1[df1['msoa11nm'].str.contains('Tower Hamlets', na=False)]

Unnamed: 0,objectid,msoa11cd,msoa11nm,geometry,lon,lat
4080,4081,E02000876,Tower Hamlets 013,POINT (-0.05864 51.52228),-0.058644,51.522276
4081,4082,E02000877,Tower Hamlets 014,POINT (-0.02918 51.51966),-0.029178,51.519658
4082,4083,E02000874,Tower Hamlets 011,POINT (-0.05253 51.52431),-0.052534,51.524307
4083,4084,E02000875,Tower Hamlets 012,POINT (-0.02202 51.52236),-0.022019,51.522358
4084,4085,E02000872,Tower Hamlets 009,POINT (-0.06874 51.52535),-0.068743,51.525349
4085,4086,E02000873,Tower Hamlets 010,POINT (-0.04236 51.52529),-0.042356,51.525293
4086,4087,E02000870,Tower Hamlets 007,POINT (-0.04874 51.52793),-0.048735,51.527933
4087,4088,E02000871,Tower Hamlets 008,POINT (-0.01543 51.52577),-0.015432,51.52577
4094,4095,E02000878,Tower Hamlets 015,POINT (-0.07030 51.51889),-0.0703,51.518885
4096,4097,E02000879,Tower Hamlets 016,POINT (-0.04895 51.51678),-0.048948,51.516782


In [15]:
df1[df1['msoa11cd'].str.contains('E12000007', na=False)]

Unnamed: 0,objectid,msoa11cd,msoa11nm,geometry,lon,lat


#Data were processed in orde to output the numbers pertaining to those groups coming under the indian section of the #population across London boroughs.  This was done by iterated through each column and limiting the output to those columns which contain the word "indian".

In [16]:
censusLondon = pd.read_excel('R2_2_EW__RT__Table_QS211__OA_London_v1.xlsx', 'MSOA', skiprows=9)

In [17]:
censusLondon.head()

Unnamed: 0,Region code,Region name,Local authority code,Local authority name,MSOA Code,MSOA Name,Unnamed: 6,Persons,Persons.1,Persons.2,...,Persons.241,Persons.242,Persons.243,Persons.244,Persons.245,Persons.246,Persons.247,Persons.248,Persons.249,Persons.250
0,,,,,,,,All categories: Ethnic group (detailed),White: English/Welsh/Scottish/Northern Irish/B...,White: Irish,...,Other ethnic group: Punjabi,Other ethnic group: Somali,Other ethnic group: Somalilander,Other ethnic group: Sri Lankan,Other ethnic group: Tamil,Other ethnic group: Thai,Other ethnic group: Turkish,Other ethnic group: Turkish Cypriot,Other ethnic group: Vietnamese,Other ethnic group: Any other ethnic group
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,E12000007,LONDON,E09000007,Camden,E02000166,Camden 001,,7924,4764,317,...,0,0,0,0,0,0,4,2,0,51
4,E12000007,LONDON,E09000007,Camden,E02000167,Camden 002,,7944,4606,162,...,0,0,0,0,0,0,1,0,0,31


In [18]:
censusLondon

Unnamed: 0,Region code,Region name,Local authority code,Local authority name,MSOA Code,MSOA Name,Unnamed: 6,Persons,Persons.1,Persons.2,...,Persons.241,Persons.242,Persons.243,Persons.244,Persons.245,Persons.246,Persons.247,Persons.248,Persons.249,Persons.250
0,,,,,,,,All categories: Ethnic group (detailed),White: English/Welsh/Scottish/Northern Irish/B...,White: Irish,...,Other ethnic group: Punjabi,Other ethnic group: Somali,Other ethnic group: Somalilander,Other ethnic group: Sri Lankan,Other ethnic group: Tamil,Other ethnic group: Thai,Other ethnic group: Turkish,Other ethnic group: Turkish Cypriot,Other ethnic group: Vietnamese,Other ethnic group: Any other ethnic group
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,E12000007,LONDON,E09000007,Camden,E02000166,Camden 001,,7924,4764,317,...,0,0,0,0,0,0,4,2,0,51
4,E12000007,LONDON,E09000007,Camden,E02000167,Camden 002,,7944,4606,162,...,0,0,0,0,0,0,1,0,0,31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
981,E12000007,LONDON,E09000031,Waltham Forest,E02000918,Waltham Forest 024,,9126,2734,149,...,0,25,0,0,0,2,37,6,0,47
982,E12000007,LONDON,E09000031,Waltham Forest,E02000919,Waltham Forest 025,,9221,1871,162,...,0,4,0,0,0,0,18,5,0,33
983,E12000007,LONDON,E09000031,Waltham Forest,E02000920,Waltham Forest 026,,9479,1690,111,...,0,7,0,0,0,3,57,10,0,41
984,E12000007,LONDON,E09000031,Waltham Forest,E02000921,Waltham Forest 027,,11001,2369,143,...,0,11,0,0,0,0,82,9,1,42


In [19]:
print(censusLondon.head())

  Region code Region name Local authority code Local authority name  \
0         NaN         NaN                  NaN                  NaN   
1         NaN         NaN                  NaN                  NaN   
2         NaN         NaN                  NaN                  NaN   
3   E12000007      LONDON            E09000007               Camden   
4   E12000007      LONDON            E09000007               Camden   

   MSOA Code   MSOA Name  Unnamed: 6                                  Persons  \
0        NaN         NaN         NaN  All categories: Ethnic group (detailed)   
1        NaN         NaN         NaN                                      NaN   
2        NaN         NaN         NaN                                      NaN   
3  E02000166  Camden 001         NaN                                     7924   
4  E02000167  Camden 002         NaN                                     7944   

                                           Persons.1     Persons.2  ...  \
0  White: E

In [20]:
column_names_first = censusLondon.columns[0:6].tolist()

In [21]:
print(column_names_first)

['Region code', 'Region name', 'Local authority code', 'Local authority name', 'MSOA Code', 'MSOA Name']


In [22]:
column_names_second = censusLondon.iloc[0,6:].tolist()

In [23]:
print(column_names_second)

[nan, 'All categories: Ethnic group (detailed)', 'White: English/Welsh/Scottish/Northern Irish/British', 'White: Irish', 'White: Gypsy or Irish Traveller', 'White: Afghan', 'White: Albanian', 'White: Anglo Indian', 'White: Argentinian', 'White: Australian/New Zealander', 'White: Baltic States', 'White: Bosnian', 'White: Brazilian', 'White: British Asian', 'White: Burmese', 'White: Chilean', 'White: Colombian', 'White: Commonwealth of (Russian) Independent States', 'White: Croatian', 'White: Cuban', 'White: Cypriot (part not stated)', 'White: Ecuadorian', 'White: European Mixed', 'White: Filipino', 'White: Greek', 'White: Greek Cypriot', 'White: Iranian', 'White: Israeli', 'White: Italian', 'White: Japanese', 'White: Kashmiri', 'White: Kosovan', 'White: Kurdish', 'White: Latin/South/Central American', 'White: Malaysian', 'White: Mexican', 'White: Moroccan', 'White: Multi-ethnic islands', 'White: Nepalese (includes Gurkha)', 'White: Nigerian', 'White: North African', 'White: North Americ

In [24]:
censusLondon.drop(censusLondon.columns[6], axis=1, inplace=True)

In [25]:
column_names_first = censusLondon.columns[0:6].tolist()

In [26]:
column_names_second = censusLondon.iloc[0,6:].tolist()

In [27]:
NewColumns = column_names_first + column_names_second

In [28]:
censusLondon.columns = NewColumns

In [29]:
print(censusLondon.head())

  Region code Region name Local authority code Local authority name  \
0         NaN         NaN                  NaN                  NaN   
1         NaN         NaN                  NaN                  NaN   
2         NaN         NaN                  NaN                  NaN   
3   E12000007      LONDON            E09000007               Camden   
4   E12000007      LONDON            E09000007               Camden   

   MSOA Code   MSOA Name  All categories: Ethnic group (detailed)  \
0        NaN         NaN  All categories: Ethnic group (detailed)   
1        NaN         NaN                                      NaN   
2        NaN         NaN                                      NaN   
3  E02000166  Camden 001                                     7924   
4  E02000167  Camden 002                                     7944   

  White: English/Welsh/Scottish/Northern Irish/British  White: Irish  \
0  White: English/Welsh/Scottish/Northern Irish/B...    White: Irish   
1             

In [30]:
censusLondon.drop([0,1,2])

Unnamed: 0,Region code,Region name,Local authority code,Local authority name,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: English/Welsh/Scottish/Northern Irish/British,White: Irish,White: Gypsy or Irish Traveller,...,Other ethnic group: Punjabi,Other ethnic group: Somali,Other ethnic group: Somalilander,Other ethnic group: Sri Lankan,Other ethnic group: Tamil,Other ethnic group: Thai,Other ethnic group: Turkish,Other ethnic group: Turkish Cypriot,Other ethnic group: Vietnamese,Other ethnic group: Any other ethnic group
3,E12000007,LONDON,E09000007,Camden,E02000166,Camden 001,7924,4764,317,7,...,0,0,0,0,0,0,4,2,0,51
4,E12000007,LONDON,E09000007,Camden,E02000167,Camden 002,7944,4606,162,7,...,0,0,0,0,0,0,1,0,0,31
5,E12000007,LONDON,E09000007,Camden,E02000168,Camden 003,8172,4879,346,2,...,0,6,0,0,0,1,3,0,1,32
6,E12000007,LONDON,E09000007,Camden,E02000169,Camden 004,7637,3663,174,1,...,0,0,1,0,0,0,4,0,0,48
7,E12000007,LONDON,E09000007,Camden,E02000170,Camden 005,8338,4041,336,2,...,0,5,0,0,0,0,2,0,0,39
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
981,E12000007,LONDON,E09000031,Waltham Forest,E02000918,Waltham Forest 024,9126,2734,149,5,...,0,25,0,0,0,2,37,6,0,47
982,E12000007,LONDON,E09000031,Waltham Forest,E02000919,Waltham Forest 025,9221,1871,162,11,...,0,4,0,0,0,0,18,5,0,33
983,E12000007,LONDON,E09000031,Waltham Forest,E02000920,Waltham Forest 026,9479,1690,111,5,...,0,7,0,0,0,3,57,10,0,41
984,E12000007,LONDON,E09000031,Waltham Forest,E02000921,Waltham Forest 027,11001,2369,143,16,...,0,11,0,0,0,0,82,9,1,42


In [31]:
censusLondon.head()

Unnamed: 0,Region code,Region name,Local authority code,Local authority name,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: English/Welsh/Scottish/Northern Irish/British,White: Irish,White: Gypsy or Irish Traveller,...,Other ethnic group: Punjabi,Other ethnic group: Somali,Other ethnic group: Somalilander,Other ethnic group: Sri Lankan,Other ethnic group: Tamil,Other ethnic group: Thai,Other ethnic group: Turkish,Other ethnic group: Turkish Cypriot,Other ethnic group: Vietnamese,Other ethnic group: Any other ethnic group
0,,,,,,,All categories: Ethnic group (detailed),White: English/Welsh/Scottish/Northern Irish/B...,White: Irish,White: Gypsy or Irish Traveller,...,Other ethnic group: Punjabi,Other ethnic group: Somali,Other ethnic group: Somalilander,Other ethnic group: Sri Lankan,Other ethnic group: Tamil,Other ethnic group: Thai,Other ethnic group: Turkish,Other ethnic group: Turkish Cypriot,Other ethnic group: Vietnamese,Other ethnic group: Any other ethnic group
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,E12000007,LONDON,E09000007,Camden,E02000166,Camden 001,7924,4764,317,7,...,0,0,0,0,0,0,4,2,0,51
4,E12000007,LONDON,E09000007,Camden,E02000167,Camden 002,7944,4606,162,7,...,0,0,0,0,0,0,1,0,0,31


In [32]:
censusLondon = censusLondon.drop([0,1,2])

In [33]:
censusLondon.head()

Unnamed: 0,Region code,Region name,Local authority code,Local authority name,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: English/Welsh/Scottish/Northern Irish/British,White: Irish,White: Gypsy or Irish Traveller,...,Other ethnic group: Punjabi,Other ethnic group: Somali,Other ethnic group: Somalilander,Other ethnic group: Sri Lankan,Other ethnic group: Tamil,Other ethnic group: Thai,Other ethnic group: Turkish,Other ethnic group: Turkish Cypriot,Other ethnic group: Vietnamese,Other ethnic group: Any other ethnic group
3,E12000007,LONDON,E09000007,Camden,E02000166,Camden 001,7924,4764,317,7,...,0,0,0,0,0,0,4,2,0,51
4,E12000007,LONDON,E09000007,Camden,E02000167,Camden 002,7944,4606,162,7,...,0,0,0,0,0,0,1,0,0,31
5,E12000007,LONDON,E09000007,Camden,E02000168,Camden 003,8172,4879,346,2,...,0,6,0,0,0,1,3,0,1,32
6,E12000007,LONDON,E09000007,Camden,E02000169,Camden 004,7637,3663,174,1,...,0,0,1,0,0,0,4,0,0,48
7,E12000007,LONDON,E09000007,Camden,E02000170,Camden 005,8338,4041,336,2,...,0,5,0,0,0,0,2,0,0,39


In [34]:
colNames = censusLondon.columns[censusLondon.columns.str.contains(pat = 'chinese')] 

In [35]:
print(colNames)

Index([], dtype='object')


In [36]:
indianPop_cols =[x for x in censusLondon.columns[censusLondon.columns.str.contains('Indian')]]


In [37]:
print(indianPop_cols)

['White: Anglo Indian', 'Mixed/multiple ethnic group: Anglo Indian', 'Asian/Asian British: Indian or British Indian', 'Asian/Asian British: Anglo Indian', 'Other ethnic group: Anglo Indian']


In [38]:
ind_df = censusLondon[['MSOA Code', 'MSOA Name', 'All categories: Ethnic group (detailed)'] + indianPop_cols]


In [39]:
print(ind_df)

     MSOA Code           MSOA Name All categories: Ethnic group (detailed)  \
3    E02000166          Camden 001                                    7924   
4    E02000167          Camden 002                                    7944   
5    E02000168          Camden 003                                    8172   
6    E02000169          Camden 004                                    7637   
7    E02000170          Camden 005                                    8338   
..         ...                 ...                                     ...   
981  E02000918  Waltham Forest 024                                    9126   
982  E02000919  Waltham Forest 025                                    9221   
983  E02000920  Waltham Forest 026                                    9479   
984  E02000921  Waltham Forest 027                                   11001   
985  E02000922  Waltham Forest 028                                   10523   

    White: Anglo Indian Mixed/multiple ethnic group: Anglo Indi

In [40]:
ind_df[:1]

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian
3,E02000166,Camden 001,7924,0,2,76,0,0


In [41]:

ind_columns = ind_df.columns.tolist()

In [42]:
print(ind_columns)

['MSOA Code', 'MSOA Name', 'All categories: Ethnic group (detailed)', 'White: Anglo Indian', 'Mixed/multiple ethnic group: Anglo Indian', 'Asian/Asian British: Indian or British Indian', 'Asian/Asian British: Anglo Indian', 'Other ethnic group: Anglo Indian']


In [43]:
sum_row = ind_df.sum(axis=1)

In [44]:
print(sum_row)

3      0.0
4      0.0
5      0.0
6      0.0
7      0.0
      ... 
981    0.0
982    0.0
983    0.0
984    0.0
985    0.0
Length: 983, dtype: float64


In [45]:
ind_df.iloc[:,3:8]

Unnamed: 0,White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian
3,0,2,76,0,0
4,0,1,163,4,0
5,0,1,85,1,0
6,0,1,324,1,0
7,0,0,284,0,0
...,...,...,...,...,...
981,0,0,830,0,0
982,0,1,574,1,0
983,0,1,436,0,1
984,0,0,479,0,1


In [46]:
sum_row = ind_df.iloc[:,3:8].sum(axis=1)

In [47]:
print(sum_row)

3       78.0
4      168.0
5       87.0
6      326.0
7      284.0
       ...  
981    830.0
982    576.0
983    438.0
984    480.0
985    419.0
Length: 983, dtype: float64


In [48]:
ind_df['total_indian'] = ind_df.iloc[:,3:8].sum(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [49]:
ind_df['total_indian'] = ind_df.loc[,3:8].sum(axis=1)

SyntaxError: invalid syntax (<ipython-input-49-3ef585756cb0>, line 1)

In [50]:
ind_df.loc[:,'Total'] = ind_df.iloc[:,3:8].sum(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(ilocs[0], value)


In [51]:
ind_df.head()

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian,total_indian,Total
3,E02000166,Camden 001,7924,0,2,76,0,0,78.0,78.0
4,E02000167,Camden 002,7944,0,1,163,4,0,168.0,168.0
5,E02000168,Camden 003,8172,0,1,85,1,0,87.0,87.0
6,E02000169,Camden 004,7637,0,1,324,1,0,326.0,326.0
7,E02000170,Camden 005,8338,0,0,284,0,0,284.0,284.0


In [52]:
df1.head()

Unnamed: 0,objectid,msoa11cd,msoa11nm,geometry,lon,lat
0,1,E02002536,Stockton-on-Tees 002,POINT (-1.29577 54.61069),-1.295773,54.610691
1,2,E02002537,Stockton-on-Tees 003,POINT (-1.27726 54.61131),-1.277263,54.611313
2,3,E02002534,Redcar and Cleveland 020,POINT (-1.05346 54.52765),-1.053458,54.527646
3,4,E02002535,Stockton-on-Tees 001,POINT (-1.28729 54.62215),-1.287295,54.622148
4,5,E02002532,Redcar and Cleveland 018,POINT (-1.05793 54.53718),-1.057931,54.537176


In [53]:
ind_df.head()

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian,total_indian,Total
3,E02000166,Camden 001,7924,0,2,76,0,0,78.0,78.0
4,E02000167,Camden 002,7944,0,1,163,4,0,168.0,168.0
5,E02000168,Camden 003,8172,0,1,85,1,0,87.0,87.0
6,E02000169,Camden 004,7637,0,1,324,1,0,326.0,326.0
7,E02000170,Camden 005,8338,0,0,284,0,0,284.0,284.0


In [54]:
merged_ind_df = pd.merge(ind_df, df1, how='left', left_on='MSOA Code', right_on = 'msoa11cd' )

In [55]:
merged_ind_df.head()

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian,total_indian,Total,objectid,msoa11cd,msoa11nm,geometry,lon,lat
0,E02000166,Camden 001,7924,0,2,76,0,0,78.0,78.0,4694,E02000166,Camden 001,POINT (-0.14705 51.56296),-0.147053,51.562958
1,E02000167,Camden 002,7944,0,1,163,4,0,168.0,168.0,4695,E02000167,Camden 002,POINT (-0.17330 51.55726),-0.173302,51.55726
2,E02000168,Camden 003,8172,0,1,85,1,0,87.0,87.0,4701,E02000168,Camden 003,POINT (-0.14154 51.55567),-0.141545,51.555674
3,E02000169,Camden 004,7637,0,1,324,1,0,326.0,326.0,4702,E02000169,Camden 004,POINT (-0.18638 51.55603),-0.18638,51.556033
4,E02000170,Camden 005,8338,0,0,284,0,0,284.0,284.0,4820,E02000170,Camden 005,POINT (-0.19996 51.55346),-0.19996,51.553458


In [56]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [57]:
address = 'London, UK'

geolocator = Nominatim(user_agent="Ldn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [58]:
map_ldn = folium.Map(location=[latitude, longitude], zoom_start=11)

In [59]:
for lat, lng, label in zip(merged_ind_df['lat'], merged_ind_df['lon'], merged_ind_df['MSOA Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ldn) 

In [60]:
map_ldn

In [61]:
localAuthorityLookup = censusLondon[['Local authority name', 'MSOA Name']]

In [62]:
localAuthorityLookup.head()

Unnamed: 0,Local authority name,MSOA Name
3,Camden,Camden 001
4,Camden,Camden 002
5,Camden,Camden 003
6,Camden,Camden 004
7,Camden,Camden 005


In [63]:
neighborhood_latitude = merged_ind_df.loc[0, 'lat'] # neighborhood latitude value
neighborhood_longitude = merged_ind_df.loc[0, 'lon'] # neighborhood longitude value

neighborhood_name = merged_ind_df.loc[0, 'MSOA Name'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Camden 001 are 51.562958220745536, -0.147053479241724.


In [64]:
CLIENT_ID = 'CYOIMKSGOILSP4PAJS31YH0XOOM0APOYNFNOC21AYPE50LIB' # your Foursquare ID
CLIENT_SECRET = '3P1KHXV2EEUSM5YOB4DUCUFR0N25CL33EUA1E0LN1TAICPU1' # your Foursquare Secret
VERSION = '20210530' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


radius=500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

Your credentails:
CLIENT_ID: CYOIMKSGOILSP4PAJS31YH0XOOM0APOYNFNOC21AYPE50LIB
CLIENT_SECRET:3P1KHXV2EEUSM5YOB4DUCUFR0N25CL33EUA1E0LN1TAICPU1


In [65]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [66]:
import requests
results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  


Unnamed: 0,name,categories,lat,lng
0,GAIL's Bakery,Bakery,51.561942,-0.149528
1,Cricks Corner Coffee Shop,Coffee Shop,51.563154,-0.140623
2,Bistro Laz,Mediterranean Restaurant,51.561333,-0.150528
3,The Bull & Last,Gastropub,51.558842,-0.148741
4,The Star,Pub,51.563564,-0.142665


In [67]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

12 venues were returned by Foursquare.


In [68]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [69]:
London_venues = getNearbyVenues(names=merged_ind_df['MSOA Name'],
                                   latitudes=merged_ind_df['lat'],
                                   longitudes=merged_ind_df['lon']
                                  )

Camden 001
Camden 002
Camden 003
Camden 004
Camden 005
Camden 006
Camden 007
Camden 008
Camden 009
Camden 010
Camden 011
Camden 012
Camden 013
Camden 014
Camden 015
Camden 016
Camden 017


KeyboardInterrupt: 

In [None]:
results.head()

In [None]:
df_2 = pd.DataFrame.from_dict(results)


In [None]:
df_2.head()

In [None]:
df_2.tail()

In [70]:
london_venues.head()

NameError: name 'london_venues' is not defined

In [None]:
London_venues.head()

In [71]:
merged_ind_df2 = pd.merge(merged_ind_df, localAuthorityLookup, how='left', left_on='MSOA Name', right_on = 'MSOA Name' )

In [72]:
merged_ind_df2.head()

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian,total_indian,Total,objectid,msoa11cd,msoa11nm,geometry,lon,lat,Local authority name
0,E02000166,Camden 001,7924,0,2,76,0,0,78.0,78.0,4694,E02000166,Camden 001,POINT (-0.14705 51.56296),-0.147053,51.562958,Camden
1,E02000167,Camden 002,7944,0,1,163,4,0,168.0,168.0,4695,E02000167,Camden 002,POINT (-0.17330 51.55726),-0.173302,51.55726,Camden
2,E02000168,Camden 003,8172,0,1,85,1,0,87.0,87.0,4701,E02000168,Camden 003,POINT (-0.14154 51.55567),-0.141545,51.555674,Camden
3,E02000169,Camden 004,7637,0,1,324,1,0,326.0,326.0,4702,E02000169,Camden 004,POINT (-0.18638 51.55603),-0.18638,51.556033,Camden
4,E02000170,Camden 005,8338,0,0,284,0,0,284.0,284.0,4820,E02000170,Camden 005,POINT (-0.19996 51.55346),-0.19996,51.553458,Camden


In [74]:
merged_ind_df_camden = merged_ind_df2[merged_ind_df2['Local authority name'] == 'Camden']

In [73]:
merged_ind_df_camden.tail()

NameError: name 'merged_ind_df_camden' is not defined

In [75]:
merged_ind_df_westminster = merged_ind_df2[merged_ind_df2['Local authority name'] == 'Westminster']

In [76]:
merged_ind_df_westminster.head()

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian,total_indian,Total,objectid,msoa11cd,msoa11nm,geometry,lon,lat,Local authority name
372,E02000960,Westminster 001,6620,0,0,519,2,0,521.0,521.0,4933,E02000960,Westminster 001,POINT (-0.16908 51.53453),-0.169084,51.534527,Westminster
373,E02000961,Westminster 002,9442,0,0,396,0,0,396.0,396.0,4934,E02000961,Westminster 002,POINT (-0.18461 51.53357),-0.184606,51.533567,Westminster
374,E02000962,Westminster 003,8624,0,0,586,1,0,587.0,587.0,4931,E02000962,Westminster 003,POINT (-0.17557 51.52902),-0.175573,51.529016,Westminster
375,E02000963,Westminster 004,10363,1,0,116,0,0,117.0,117.0,4932,E02000963,Westminster 004,POINT (-0.20840 51.52927),-0.208403,51.529275,Westminster
376,E02000964,Westminster 005,12549,0,2,201,1,0,204.0,204.0,4929,E02000964,Westminster 005,POINT (-0.20077 51.52761),-0.200774,51.527608,Westminster


In [77]:
merged_ind_df_westminster.tail()

Unnamed: 0,MSOA Code,MSOA Name,All categories: Ethnic group (detailed),White: Anglo Indian,Mixed/multiple ethnic group: Anglo Indian,Asian/Asian British: Indian or British Indian,Asian/Asian British: Anglo Indian,Other ethnic group: Anglo Indian,total_indian,Total,objectid,msoa11cd,msoa11nm,geometry,lon,lat,Local authority name
391,E02000979,Westminster 020,8081,0,1,165,1,0,167.0,167.0,4125,E02000979,Westminster 020,POINT (-0.13473 51.49630),-0.134734,51.496304,Westminster
392,E02000980,Westminster 021,8684,0,1,211,4,0,216.0,216.0,4629,E02000980,Westminster 021,POINT (-0.13209 51.49236),-0.132091,51.492358,Westminster
393,E02000981,Westminster 022,8991,1,1,146,0,0,148.0,148.0,4627,E02000981,Westminster 022,POINT (-0.14130 51.48998),-0.141298,51.489981,Westminster
394,E02000982,Westminster 023,8226,0,0,136,0,0,136.0,136.0,4625,E02000982,Westminster 023,POINT (-0.14832 51.48901),-0.148317,51.489013,Westminster
395,E02000983,Westminster 024,8434,0,1,107,0,0,108.0,108.0,4623,E02000983,Westminster 024,POINT (-0.13669 51.48683),-0.136692,51.486832,Westminster


In [78]:
westminster_venues = getNearbyVenues(names=merged_ind_df_westminster['MSOA Name'],
                                   latitudes=merged_ind_df_westminster['lat'],
                                   longitudes=merged_ind_df_westminster['lon']
                                  )

Westminster 001
Westminster 002
Westminster 003
Westminster 004
Westminster 005
Westminster 006
Westminster 007
Westminster 008
Westminster 009
Westminster 010
Westminster 011
Westminster 012
Westminster 013
Westminster 014
Westminster 015
Westminster 016
Westminster 017
Westminster 018
Westminster 019
Westminster 020
Westminster 021
Westminster 022
Westminster 023
Westminster 024


In [None]:
results

In [79]:
CLIENT_ID = 'CYOIMKSGOILSP4PAJS31YH0XOOM0APOYNFNOC21AYPE50LIB' # your Foursquare ID
CLIENT_SECRET = '3P1KHXV2EEUSM5YOB4DUCUFR0N25CL33EUA1E0LN1TAICPU1' # your Foursquare Secret
VERSION = '20210530' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: CYOIMKSGOILSP4PAJS31YH0XOOM0APOYNFNOC21AYPE50LIB
CLIENT_SECRET:3P1KHXV2EEUSM5YOB4DUCUFR0N25CL33EUA1E0LN1TAICPU1


In [None]:
westminster_venues = getNearbyVenues(names=merged_ind_df_westminster['MSOA Name'],
                                   latitudes=merged_ind_df_westminster['lat'],
                                   longitudes=merged_ind_df_westminster['lon']
                                  )

In [80]:
westminster_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Westminster 001,51.534527,-0.169084,Panzer's,51.533646,-0.1723,Deli / Bodega
1,Westminster 001,51.534527,-0.169084,GAIL's Bakery,51.533885,-0.171737,Bakery
2,Westminster 001,51.534527,-0.169084,Drunch,51.535029,-0.168829,Restaurant
3,Westminster 001,51.534527,-0.169084,Chicken Shop & Dirty Burger,51.533543,-0.170232,Fast Food Restaurant
4,Westminster 001,51.534527,-0.169084,Oslo Court,51.533364,-0.166508,French Restaurant


In [81]:
venue_counts = westminster_venues.groupby('Neighborhood').count()

In [83]:
westminster_onehot = pd.get_dummies(westminster_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
westminster_onehot['Neighborhood'] = westminster_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [westminster_onehot.columns[-1]] + list(westminster_onehot.columns[:-1])
westminster_onehot = westminster_onehot[fixed_columns]

westminster_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Bagel Shop,Bakery,Bar,Bed & Breakfast,Beer Bar,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Camera Store,Canal,Canal Lock,Candy Store,Casino,Caucasian Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dry Cleaner,Electronics Store,English Restaurant,Event Space,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gelato Shop,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Iraqi Restaurant,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Neighborhood,Noodle House,North Indian Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pharmacy,Pier,Pilates Studio,Pizza Place,Plaza,Portuguese Restaurant,Pub,Radio Station,Ramen Restaurant,Recording Studio,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Road,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shoe Store,Skating Rink,Soup Place,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sri Lankan Restaurant,Stables,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Court,Thai Restaurant,Theater,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Westminster 001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Westminster 001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Westminster 001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Westminster 001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Westminster 001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [84]:
westminster_grouped = westminster_onehot.groupby('Neighborhood').mean().reset_index()

In [85]:
num_top_venues = 10

for hood in westminster_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = westminster_grouped[westminster_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Westminster 001----
                  venue  freq
0         Deli / Bodega  0.08
1                   Pub  0.08
2           Coffee Shop  0.08
3     French Restaurant  0.08
4   Japanese Restaurant  0.04
5  Gym / Fitness Center  0.04
6        Pilates Studio  0.04
7        Cricket Ground  0.04
8  Fast Food Restaurant  0.04
9            Restaurant  0.04


----Westminster 002----
                       venue  freq
0                       Café  0.13
1            Thai Restaurant  0.10
2                        Pub  0.10
3  Middle Eastern Restaurant  0.07
4                     Bakery  0.07
5         Italian Restaurant  0.07
6                      Hotel  0.03
7                     Lounge  0.03
8                  Gift Shop  0.03
9                     Garden  0.03


----Westminster 003----
                 venue  freq
0       Cricket Ground  0.15
1                  Pub  0.11
2        Grocery Store  0.11
3                 Café  0.07
4          Coffee Shop  0.07
5  Lebanese Restaurant  0.04
6     

#According to the results above, Westminster 014 seems to be a good location for an Indian resturant.

In [82]:
venue_counts

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Westminster 001,25,25,25,25,25,25
Westminster 002,30,30,30,30,30,30
Westminster 003,27,27,27,27,27,27
Westminster 004,7,7,7,7,7,7
Westminster 005,25,25,25,25,25,25
Westminster 006,25,25,25,25,25,25
Westminster 007,49,49,49,49,49,49
Westminster 008,43,43,43,43,43,43
Westminster 009,23,23,23,23,23,23
Westminster 010,15,15,15,15,15,15


In [86]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [117]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = westminster_grouped['Neighborhood']

for ind in np.arange(westminster_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(westminster_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Westminster 001,Coffee Shop,Deli / Bodega,Pub,French Restaurant,Pharmacy,Bakery,Pilates Studio,Restaurant,Salad Place,Modern European Restaurant
1,Westminster 002,Café,Thai Restaurant,Pub,Bakery,Middle Eastern Restaurant,Italian Restaurant,Yoga Studio,Recording Studio,Boat or Ferry,Garden
2,Westminster 003,Cricket Ground,Pub,Grocery Store,Café,Coffee Shop,Gourmet Shop,Garden,Lebanese Restaurant,Gift Shop,Bakery
3,Westminster 004,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio
4,Westminster 005,Coffee Shop,Grocery Store,Lebanese Restaurant,Pizza Place,Bus Stop,Indian Restaurant,Boxing Gym,Gym,French Restaurant,Middle Eastern Restaurant


#Westminster 005 above has "indian restaurant" category among its ten most common venue categories.
#Below K-means clustering is used in order group together areas in Westminster boriugh with similar venue categories.

In [118]:
# set number of clusters
kclusters = 5

westminster_grouped_clustering = westminster_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(westminster_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 1, 3, 1, 1, 0, 0, 1, 4], dtype=int32)

In [119]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()#
incomeLondon = pd.read_excel('ons-model-based-income-estimates-msoa.xls', '2015-16 (annual income)')
westminster_merged = westminster_venues.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
westminster_merged = pd.merge(westminster_merged, incomeLondon, how='left', left_on='Neighborhood', right_on = 'MSOA name' )
westminster_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MSOA code,MSOA name,Local authority code,Local authority name,Region code,Region name,Total annual income (£),Upper confidence limit (£),Lower confidence limit (£),Confidence interval (£),Net annual income (£),Upper confidence limit (£).1,Lower confidence limit (£).1,Confidence interval (£).1,Net income before housing costs (£),Upper confidence limit (£).2,Lower confidence limit (£).2,Confidence interval (£).2,Net income after housing costs (£),Upper confidence limit (£).3,Lower confidence limit (£).3,Confidence interval (£).3
0,Westminster 001,51.534527,-0.169084,Panzer's,51.533646,-0.1723,Deli / Bodega,2,Coffee Shop,Deli / Bodega,Pub,French Restaurant,Pharmacy,Bakery,Pilates Studio,Restaurant,Salad Place,Modern European Restaurant,E02000960,Westminster 001,E09000033,Westminster,E12000007,London,70500,89600,55500,34100,51200,63700,41100,22600,52400,63400,43300,20200,40600,49600,33300,16300
1,Westminster 001,51.534527,-0.169084,GAIL's Bakery,51.533885,-0.171737,Bakery,2,Coffee Shop,Deli / Bodega,Pub,French Restaurant,Pharmacy,Bakery,Pilates Studio,Restaurant,Salad Place,Modern European Restaurant,E02000960,Westminster 001,E09000033,Westminster,E12000007,London,70500,89600,55500,34100,51200,63700,41100,22600,52400,63400,43300,20200,40600,49600,33300,16300
2,Westminster 001,51.534527,-0.169084,Drunch,51.535029,-0.168829,Restaurant,2,Coffee Shop,Deli / Bodega,Pub,French Restaurant,Pharmacy,Bakery,Pilates Studio,Restaurant,Salad Place,Modern European Restaurant,E02000960,Westminster 001,E09000033,Westminster,E12000007,London,70500,89600,55500,34100,51200,63700,41100,22600,52400,63400,43300,20200,40600,49600,33300,16300
3,Westminster 001,51.534527,-0.169084,Chicken Shop & Dirty Burger,51.533543,-0.170232,Fast Food Restaurant,2,Coffee Shop,Deli / Bodega,Pub,French Restaurant,Pharmacy,Bakery,Pilates Studio,Restaurant,Salad Place,Modern European Restaurant,E02000960,Westminster 001,E09000033,Westminster,E12000007,London,70500,89600,55500,34100,51200,63700,41100,22600,52400,63400,43300,20200,40600,49600,33300,16300
4,Westminster 001,51.534527,-0.169084,Oslo Court,51.533364,-0.166508,French Restaurant,2,Coffee Shop,Deli / Bodega,Pub,French Restaurant,Pharmacy,Bakery,Pilates Studio,Restaurant,Salad Place,Modern European Restaurant,E02000960,Westminster 001,E09000033,Westminster,E12000007,London,70500,89600,55500,34100,51200,63700,41100,22600,52400,63400,43300,20200,40600,49600,33300,16300


In [121]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, income, cluster in zip(westminster_merged['Neighborhood Latitude'], westminster_merged['Neighborhood Longitude'], westminster_merged['Neighborhood'], westminster_merged['Net income after housing costs (£)'],westminster_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster analysis to look at the make-up of each cluster below.

In [129]:


incomeLondon = pd.read_excel('ons-model-based-income-estimates-msoa.xls', '2015-16 (annual income)')
cluster_1 = westminster_merged.loc[westminster_merged['Cluster Labels'] == 1, westminster_merged.columns[[1] + list(range(5, westminster_merged.shape[1]))]]
cluster_0 = westminster_merged.loc[westminster_merged['Cluster Labels'] == 0, westminster_merged.columns[[1] + list(range(5, westminster_merged.shape[1]))]]
cluster_2 = westminster_merged.loc[westminster_merged['Cluster Labels'] == 2, westminster_merged.columns[[1] + list(range(5, westminster_merged.shape[1]))]]
cluster_3 = westminster_merged.loc[westminster_merged['Cluster Labels'] == 3, westminster_merged.columns[[1] + list(range(5, westminster_merged.shape[1]))]]
cluster_4 = westminster_merged.loc[westminster_merged['Cluster Labels'] == 4, westminster_merged.columns[[1] + list(range(5, westminster_merged.shape[1]))]]
cluster_5 = westminster_merged.loc[westminster_merged['Cluster Labels'] == 5, westminster_merged.columns[[1] + list(range(5, westminster_merged.shape[1]))]]
print(cluster_5)
cluster_3

Empty DataFrame
Columns: [Neighborhood Latitude, Venue Longitude, Venue Category, Cluster Labels, 1st Most Common Venue, 2nd Most Common Venue, 3rd Most Common Venue, 4th Most Common Venue, 5th Most Common Venue, 6th Most Common Venue, 7th Most Common Venue, 8th Most Common Venue, 9th Most Common Venue, 10th Most Common Venue, MSOA code, MSOA name, Local authority code, Local authority name, Region code, Region name, Total annual income (£), Upper confidence limit (£), Lower confidence limit (£), Confidence interval (£), Net annual income (£), Upper confidence limit (£).1, Lower confidence limit (£).1, Confidence interval (£).1, Net income before housing costs (£), Upper confidence limit (£).2, Lower confidence limit (£).2, Confidence interval (£).2, Net income after housing costs (£), Upper confidence limit (£).3, Lower confidence limit (£).3, Confidence interval (£).3]
Index: []


Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MSOA code,MSOA name,Local authority code,Local authority name,Region code,Region name,Total annual income (£),Upper confidence limit (£),Lower confidence limit (£),Confidence interval (£),Net annual income (£),Upper confidence limit (£).1,Lower confidence limit (£).1,Confidence interval (£).1,Net income before housing costs (£),Upper confidence limit (£).2,Lower confidence limit (£).2,Confidence interval (£).2,Net income after housing costs (£),Upper confidence limit (£).3,Lower confidence limit (£).3,Confidence interval (£).3
82,51.529275,-0.209414,Gym,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700
83,51.529275,-0.209008,Yoga Studio,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700
84,51.529275,-0.210925,Italian Restaurant,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700
85,51.529275,-0.21247,Garden,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700
86,51.529275,-0.210857,Yoga Studio,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700
87,51.529275,-0.205941,Gym,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700
88,51.529275,-0.205866,Boxing Gym,3,Yoga Studio,Gym,Italian Restaurant,Boxing Gym,Garden,Event Space,Food Court,Food & Drink Shop,Fish & Chips Shop,Film Studio,E02000963,Westminster 004,E09000033,Westminster,E12000007,London,43700,55400,34500,20900,33500,41500,27000,14400,30100,36500,24900,11500,22800,27600,18900,8700


In [116]:
incomeLondon.head()

Unnamed: 0,MSOA code,MSOA name,Local authority code,Local authority name,Region code,Region name,Total annual income (£),Upper confidence limit (£),Lower confidence limit (£),Confidence interval (£),Net annual income (£),Upper confidence limit (£).1,Lower confidence limit (£).1,Confidence interval (£).1,Net income before housing costs (£),Upper confidence limit (£).2,Lower confidence limit (£).2,Confidence interval (£).2,Net income after housing costs (£),Upper confidence limit (£).3,Lower confidence limit (£).3,Confidence interval (£).3
0,E02004297,County Durham 001,E06000047,County Durham,E12000001,North East,35900,45200,28500,16700,27300,33700,22100,11700,27600,33300,22800,10400,25600,31000,21200,9800
1,E02004290,County Durham 002,E06000047,County Durham,E12000001,North East,42500,53600,33700,19900,29800,37000,23900,13100,28600,34500,23700,10800,27500,33200,22700,10500
2,E02004298,County Durham 003,E06000047,County Durham,E12000001,North East,38000,47700,30200,17600,28300,35100,22800,12300,28200,34100,23400,10700,26700,32300,22100,10200
3,E02004299,County Durham 004,E06000047,County Durham,E12000001,North East,33500,42200,26700,15500,26600,32800,21600,11200,25500,30800,21100,9700,22400,27100,18500,8700
4,E02004291,County Durham 005,E06000047,County Durham,E12000001,North East,31700,39800,25200,14600,25500,31400,20700,10800,25100,30200,20800,9500,20900,25300,17200,8000


In [None]:
mergedWestminster = pd.merge(merged_ind_df_westminster, incomeLondon, how='left', left_on='MSOA Name', right_on = 'MSOA name' )

In [None]:
merged_ind_df_westminster.head()

In [None]:
incomeLondon.head()

In [None]:
mergedWestminster.head()

In [None]:
mergedWestminster

In [None]:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import numpy as np

fig, ax = plt.subplots(1, figsize=(20,5))
ax.set_xticks(np.arange(0, merged_ind_df_westminster['total_indian'].max()+100, 100))
n, bins, patches = ax.hist(x=merged_ind_df_westminster['total_indian'], 
                            
                            fc='blue', 
                            ec='black', 
                            bins=np.arange(0, merged_ind_df_westminster['total_indian'].max()+100, 20), 
                            alpha=0.5, 
                            zorder=3)
ax.grid(color='white', linestyle='solid', zorder=0)
ax.set_facecolor('#EEEEEE')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.set_title('Histogram of Indian population in MSOAs Westminster')

In [None]:
merged_ind_df_westminster

In [None]:
mergedWestminster.head()

In [None]:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import numpy as np

fig, ax = plt.subplots(1, figsize=(20,5))
ax.set_xticks(np.arange(0, mergedWestminster['Net income before housing costs (£)'].max()+100, 100))
n, bins, patches = ax.hist(x=mergedWestminster['Net income before housing costs (£)'], 
                            
                            fc='blue', 
                            ec='black', 
                            bins = 20, 
                            alpha=0.5, 
                            zorder=3)
ax.grid(color='white', linestyle='solid', zorder=0)
ax.set_facecolor('#EEEEEE')
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
ax.set_title('Histogram of household income in MSOAs Westminster')

In [None]:
geolocator = Nominatim(user_agent="cn_london")
location = geolocator.geocode("London")
import folium
# creating map
m = folium.Map(location=[location.latitude, location.longitude], tiles='cartodbpositron', zoom_start=11)

# choropleth map for population density in MSOAs
folium.Choropleth(
    geo_data=mergedWestminster.to_json(),
    name='Choropleth',
    data=final_pop_df,
    columns=['MSOA Code', 'total_indian'],
    key_on='feature.properties.msoa11cd',
    fill_color='YlGn',
    fill_opacity=0.8,
    line_opacity=0.2,
    nan_fill_color='grey',
    nan_fill_opacity=0.4,
    legend_name='Population'
).add_to(m)
m

In [None]:

choropleth = folium.chloropleth(
    geo_data=state_geo,
    name='choropleth',
    data=state_data,
    columns=['District Name', 'Arrival'],
    key_on='feature.properties.NAME_2',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Arecanut Arrival(in Quintal)',
    highlight=True,
    line_color='black'
).add_to(m)

Discussion: In this project, analysis was limited to Westminster boroughs.  Here I found that the best location for an indian restaurant, based on kmeans clustering, was cluster 2.  One of the areas which appears to have most Indian resturants is Westminster 005. 
    
Future direction: use of multivariate analysis to looks at relationships between spending power (household income from dataset above) to see if there is a correlation between the percentage of indian resturants in given areas and economic/demographic factors.