# GCSE Results in London

### Background

GCSEs, short for General Certificate of Secondary Education, are academic qualifications which are generally taken by pupils in secondary education (aged 15-16) in England, Wales and Northern Ireland. Young students usually gain a number of GCSEs in subjects such as English, mathematics and the sciences, which are marked from 9 to 1, with 9 being the highest grade and 1 being the lowest. The grades achieved by pupils can have a major influence on their lives, as they are seen as foundational education one must attain. The impact of good/bad GCSE results can be substantial, as these can effect which further education institutions pupils get accepted into, what universities they can attend and what jobs they can land. All these factors play an important role in the future life of pupils.  

### Problem

This project will focus analysing what areas of London achieve high GCSE grades and which areas achieve low grades. We will use machine learning tools to try and find patterns to see which areas parents should send their kinds. 

Our goal is see if areas of high achievers can areas of low achievers have clear disparities when looking at the Foursquare location data. Is the presence of certain facilities/venues in an area an important factor for the average GCSE grade in that area?  

### Data

The data we will be using is taken from the UK department for education, a link to this data is available here: https://data.london.gov.uk/download/gcse-results-by-borough/12a95356-81d3-49d6-8a13-e41b62f5e5c4/gcse-results.csv.
This data gives us information about the "Attainment" of pupils. This variable gives us the average pupils grade across eight subjects for each London Borough (Area). We will use the most recent data entries (2018/2019) for the different London Boroughs to see how these compare. We will also limit our analysis to coeducational schools only.

### Methodology

Firstly, we need to acquire the GCSE results data and load it to a pandas data frame. Next we do basic data cleaning processes such as removing irrelevant data entries/columns. After which we use folium to visualise where the different London Boroughs are located on a map of London. Then we can use the foursquare API to attain geospatial data on the different venues located in the area. Using this data we proceed with cluster analysis to find out how different clusters of London Boroughs compare with respect to GCSE results and surrounding venues. The last step is to summarise our findings in the concluding section.  

In [1]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         394 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0

The following packages will be

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


In [3]:
data = "https://data.london.gov.uk/download/gcse-results-by-borough/12a95356-81d3-49d6-8a13-e41b62f5e5c4/gcse-results.csv"

df = pd.read_csv(data)
df.head()

Unnamed: 0,Code,Area,Year,Sex,Pupils,Attainment8,Progress8
0,E09000001,City of London,2018/19,All,.,.,.
1,E09000002,Barking and Dagenham,2018/19,All,2353,46.4,0.16
2,E09000003,Barnet,2018/19,All,3804,57.1,0.57
3,E09000004,Bexley,2018/19,All,3115,49.6,-0.09
4,E09000005,Brent,2018/19,All,3038,50.2,0.47


## Data Cleaning

In [4]:
df = df[df['Year']=='2018/19'] #only interested in 2018/19 data set

In [5]:
df = df[df['Sex']=='All'] #only interested in coeducational schools

In [6]:
df=df[:-12] #only interested in London boroughs

In [7]:
#City of London has no schools, so we delete this entry
df=df.iloc[1:]

In [8]:
df.shape

(32, 7)

In [9]:
#We are only interested in variables Area and Attainment8 so we delete the other columns
df.drop(['Code', 'Year','Sex','Pupils','Progress8'], axis=1, inplace=True)

In [10]:
geolocator = Nominatim(user_agent='my_email@server.com')


In [11]:
df['city_coord'] = df['Area'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

In [12]:
df[['Latitude', 'Longitude']] = df['city_coord'].apply(pd.Series)

In [13]:
df.drop(['city_coord'], axis=1, inplace = True)

In [14]:
df.head()

Unnamed: 0,Area,Attainment8,Latitude,Longitude
1,Barking and Dagenham,46.4,51.554117,0.150504
2,Barnet,57.1,51.65309,-0.200226
3,Bexley,49.6,39.969238,-82.936864
4,Brent,50.2,32.937346,-87.164718
5,Bromley,50.8,51.402805,0.014814


In [15]:
address = 'London, UK'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London City are 51.5073219, -0.1276474.


In [16]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Area'], df['Attainment8']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

On the map we can see that 7 entries have got the wrong location coordinates as these are outside of London.
We set these to be correct:

In [19]:
df

Unnamed: 0,Area,Attainment8,Latitude,Longitude
1,Barking and Dagenham,46.4,51.554117,0.150504
2,Barnet,57.1,51.65309,-0.200226
3,Bexley,49.6,39.969238,0.1505
4,Brent,50.2,32.937346,-87.164718
5,Bromley,50.8,51.402805,0.014814
6,Camden,48.6,39.94484,-75.119891
7,Croydon,45.5,51.371305,-0.101957
8,Ealing,50.9,51.512655,-0.305195
9,Enfield,46.5,51.652085,-0.081018
10,Greenwich,45.3,51.482084,-0.004542


In [37]:
#Bexley
df.loc[3,'Latitude'] = 51.4549
df.loc[3,'Longitude'] = 0.1505
#Brent
df.loc[4,'Latitude'] = 51.5588
df.loc[4,'Longitude'] = 0.2817
#Camden
df.loc[6,'Latitude'] = 51.5290
df.loc[6,'Longitude'] = 0.1255
#Havering
df.loc[15,'Latitude'] = 51.5812
df.loc[15,'Longitude'] = 0.1837
#Sutton
df.loc[28,'Latitude'] = 51.3618
df.loc[28,'Longitude'] = 0.1945
#Tower Hamlets
df.loc[29,'Latitude'] = 51.5099
df.loc[29,'Longitude'] = 0.0059
#Waltham Forest
df.loc[30,'Latitude'] = 51.5908
df.loc[30,'Longitude'] = 0.0134

df

Unnamed: 0,Area,Attainment8,Latitude,Longitude
1,Barking and Dagenham,46.4,51.554117,0.150504
2,Barnet,57.1,51.65309,-0.200226
3,Bexley,49.6,51.4549,0.1505
4,Brent,50.2,51.5588,0.2817
5,Bromley,50.8,51.402805,0.014814
6,Camden,48.6,51.529,0.1255
7,Croydon,45.5,51.371305,-0.101957
8,Ealing,50.9,51.512655,-0.305195
9,Enfield,46.5,51.652085,-0.081018
10,Greenwich,45.3,51.482084,-0.004542


Now if we create another map we should find all locations to be inside London.

In [38]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Area'], df['Attainment8']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

All good!

#### We access the API

In [39]:
#Define Foursquare Credentials and Version
CLIENT_ID = 'YR5FP4OAJUNF5RC02NU0AE5RMAYLWJEWSZXU4VRJZOJZIUXR' 
CLIENT_SECRET = 'WZZ0RIS40PEJ0IUCHRUJ5YV3XDKQPXVN1FIDVUX3YCDV02TG'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YR5FP4OAJUNF5RC02NU0AE5RMAYLWJEWSZXU4VRJZOJZIUXR
CLIENT_SECRET:WZZ0RIS40PEJ0IUCHRUJ5YV3XDKQPXVN1FIDVUX3YCDV02TG


In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [41]:
location_venues = getNearbyVenues(names=df['Area'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [42]:
location_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,5,5,5,5,5,5
Barnet,31,31,31,31,31,31
Bexley,30,30,30,30,30,30
Brent,2,2,2,2,2,2
Bromley,43,43,43,43,43,43
Camden,4,4,4,4,4,4
Croydon,25,25,25,25,25,25
Ealing,96,96,96,96,96,96
Enfield,56,56,56,56,56,56
Greenwich,61,61,61,61,61,61


In [43]:
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 206 uniques categories.


In [44]:
location_venues.shape

(1174, 7)

In [45]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Austrian Restaurant,Auto Workshop,Bagel Shop,Bakery,Bar,Beach,Beer Bar,Beer Store,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Bulgarian Restaurant,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Casino,Chaat Place,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,College Cafeteria,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Event Service,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kebab Restaurant,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Lighthouse,Liquor Store,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pie Shop,Pier,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Restaurant,River,Road,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoothie Shop,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tourist Information Center,Trail,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [46]:
london_grouped = venues_onehot.groupby('Street').mean().reset_index()
london_grouped

Unnamed: 0,Street,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Austrian Restaurant,Auto Workshop,Bagel Shop,Bakery,Bar,Beach,Beer Bar,Beer Store,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Bulgarian Restaurant,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Casino,Chaat Place,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,College Cafeteria,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cricket Ground,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Event Service,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kebab Restaurant,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Lighthouse,Liquor Store,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pie Shop,Pier,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Restaurant,River,Road,Rugby Pitch,Salad Place,Sandwich Place,Scenic Lookout,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoothie Shop,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tourist Information Center,Trail,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.116279,0.0,0.0,0.116279,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.069767,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.046512,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ealing,0.0,0.0,0.0,0.0,0.010417,0.0,0.010417,0.0,0.0,0.010417,0.03125,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.052083,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.104167,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.010417,0.0,0.010417,0.010417,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.020833,0.010417,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.020833,0.0,0.0,0.010417,0.010417,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,0.010417,0.0,0.010417,0.0,0.0,0.03125,0.0,0.010417,0.010417,0.0,0.0,0.020833,0.0625,0.0,0.0,0.010417,0.010417,0.072917,0.0,0.010417,0.0,0.020833,0.0,0.0,0.0,0.010417,0.010417,0.0,0.0,0.010417,0.0,0.0,0.0,0.010417,0.0,0.010417,0.010417,0.0,0.0,0.0,0.010417,0.0,0.0,0.010417,0.010417,0.0,0.0,0.0,0.020833,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.020833,0.0,0.010417,0.0,0.0,0.0,0.0
8,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.089286,0.0,0.0,0.089286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.017857,0.017857,0.0,0.0,0.035714,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.053571,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.017857,0.0
9,Greenwich,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.032787,0.016393,0.016393,0.0,0.0,0.0,0.0,0.081967,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.032787,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.016393,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.016393,0.016393,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.032787,0.032787,0.0,0.016393,0.0,0.0,0.016393,0.081967,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.016393,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0


In [47]:
london_grouped.shape

(31, 207)

In [48]:

num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barking and Dagenham----
               venue  freq
0           Bus Stop   0.4
1  Convenience Store   0.2
2       Liquor Store   0.2
3      Grocery Store   0.2
4         Playground   0.0


----Barnet----
                  venue  freq
0           Coffee Shop  0.13
1  Fast Food Restaurant  0.06
2                  Park  0.06
3    Italian Restaurant  0.06
4                   Pub  0.06


----Bexley----
                venue  freq
0                 Pub  0.10
1         Coffee Shop  0.10
2      Clothing Store  0.10
3         Supermarket  0.07
4  Italian Restaurant  0.07


----Brent----
               venue  freq
0        Golf Course   0.5
1                Pub   0.5
2  Afghan Restaurant   0.0
3          Multiplex   0.0
4             Museum   0.0


----Bromley----
                   venue  freq
0            Coffee Shop  0.12
1         Clothing Store  0.12
2   Gym / Fitness Center  0.07
3            Pizza Place  0.05
4  Portuguese Restaurant  0.05


----Camden----
               venue  freq
0

In [49]:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [50]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [51]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

In [52]:
london_grouped=df

In [53]:
from sklearn.cluster import KMeans

In [54]:
#Distribute in 5 Clusters

# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([0, 1, 2, 4, 4, 2, 0, 4, 0, 0, 2, 3, 0, 4, 2, 2, 2, 0, 3, 1, 0, 0,
       4, 2, 3, 3, 2, 1, 2, 0, 2, 3], dtype=int32)

In [55]:
#Dataframe to include Clusters

london_grouped_clustering=df
london_grouped_clustering.head()

Unnamed: 0,Area,Attainment8,Latitude,Longitude
1,Barking and Dagenham,46.4,51.554117,0.150504
2,Barnet,57.1,51.65309,-0.200226
3,Bexley,49.6,51.4549,0.1505
4,Brent,50.2,51.5588,0.2817
5,Bromley,50.8,51.402805,0.014814


In [56]:

london_grouped_clustering.shape

(32, 4)

In [57]:
df.shape

(32, 4)

In [58]:
london_grouped_clustering.dtypes

Area            object
Attainment8     object
Latitude       float64
Longitude      float64
dtype: object

In [59]:
df.dtypes

Area            object
Attainment8     object
Latitude       float64
Longitude      float64
dtype: object

In [60]:
# add clustering labels
london_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
london_grouped_clustering = london_grouped_clustering.join(venues_sorted.set_index('Street'), on='Area')

london_grouped_clustering.head(30) # check the last columns!

Unnamed: 0,Area,Attainment8,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Barking and Dagenham,46.4,51.554117,0.150504,0,Bus Stop,Convenience Store,Liquor Store,Grocery Store,English Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
2,Barnet,57.1,51.65309,-0.200226,1,Coffee Shop,Park,Pharmacy,Convenience Store,Fast Food Restaurant,Restaurant,Bookstore,Pub,Italian Restaurant,Supermarket
3,Bexley,49.6,51.4549,0.1505,2,Coffee Shop,Clothing Store,Pub,Supermarket,Fast Food Restaurant,Pharmacy,Italian Restaurant,American Restaurant,Bakery,Furniture / Home Store
4,Brent,50.2,51.5588,0.2817,4,Pub,Golf Course,Yoga Studio,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
5,Bromley,50.8,51.402805,0.014814,4,Clothing Store,Coffee Shop,Gym / Fitness Center,Burger Joint,Pub,Portuguese Restaurant,Pizza Place,Electronics Store,Movie Theater,Chocolate Shop
6,Camden,48.6,51.529,0.1255,2,Business Service,Skate Park,Gym,Rugby Pitch,Yoga Studio,English Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
7,Croydon,45.5,51.371305,-0.101957,0,Pub,Coffee Shop,Malay Restaurant,Sushi Restaurant,Bookstore,Museum,Gaming Cafe,Mediterranean Restaurant,Clothing Store,Italian Restaurant
8,Ealing,50.9,51.512655,-0.305195,4,Coffee Shop,Pub,Platform,Café,Clothing Store,Burger Joint,Park,Italian Restaurant,Bakery,Restaurant
9,Enfield,46.5,51.652085,-0.081018,0,Coffee Shop,Clothing Store,Pub,Café,Department Store,Bookstore,Pharmacy,Fish & Chips Shop,Supermarket,Gift Shop
10,Greenwich,45.3,51.482084,-0.004542,0,Boat or Ferry,Pub,Market,Bakery,History Museum,Garden,Burger Joint,Café,Grocery Store,Pizza Place


In [61]:
# Create Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Area'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 0

In [67]:

london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]

Unnamed: 0,Attainment8,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,46.4,Bus Stop,Convenience Store,Liquor Store,Grocery Store,English Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
7,45.5,Pub,Coffee Shop,Malay Restaurant,Sushi Restaurant,Bookstore,Museum,Gaming Cafe,Mediterranean Restaurant,Clothing Store,Italian Restaurant
9,46.5,Coffee Shop,Clothing Store,Pub,Café,Department Store,Bookstore,Pharmacy,Fish & Chips Shop,Supermarket,Gift Shop
10,45.3,Boat or Ferry,Pub,Market,Bakery,History Museum,Garden,Burger Joint,Café,Grocery Store,Pizza Place
13,46.9,Park,Café,Bus Stop,Coffee Shop,Light Rail Station,Platform,Malay Restaurant,Fast Food Restaurant,Middle Eastern Restaurant,Bulgarian Restaurant
18,45.8,Pub,Bakery,Mediterranean Restaurant,Burger Joint,Restaurant,Café,Japanese Restaurant,Theater,Park,Ice Cream Shop
21,44.1,Bar,Coffee Shop,Hotel,Korean Restaurant,Sandwich Place,Event Space,Movie Theater,Café,Bakery,Beer Bar
22,43.7,Clothing Store,Fast Food Restaurant,Coffee Shop,Platform,Café,Bus Stop,Supermarket,Grocery Store,Optical Shop,Pharmacy
30,46.2,,,,,,,,,,


### Cluster 1

In [77]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]

Unnamed: 0,Attainment8,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,57.1,Coffee Shop,Park,Pharmacy,Convenience Store,Fast Food Restaurant,Restaurant,Bookstore,Pub,Italian Restaurant,Supermarket
20,56.9,Coffee Shop,Café,Pub,Clothing Store,Italian Restaurant,Bakery,Department Store,Sandwich Place,Ice Cream Shop,Hotel
28,58.6,Historic Site,Tennis Court,Yoga Studio,Electronics Store,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


### Cluster 2

In [69]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]

Unnamed: 0,Attainment8,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,49.6,Coffee Shop,Clothing Store,Pub,Supermarket,Fast Food Restaurant,Pharmacy,Italian Restaurant,American Restaurant,Bakery,Furniture / Home Store
6,48.6,Business Service,Skate Park,Gym,Rugby Pitch,Yoga Studio,English Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
11,49.2,Coffee Shop,Pub,Café,Brewery,Supermarket,Yoga Studio,Theater,Coffee Roaster,Cocktail Bar,Movie Theater
15,48.5,Coffee Shop,Clothing Store,Shopping Mall,Department Store,Hotel,Bookstore,Café,Pub,Fast Food Restaurant,Bakery
16,47.7,Pub,Fast Food Restaurant,Park,Chinese Restaurant,Yoga Studio,Electronics Store,Fish Market,Fish & Chips Shop,Farmers Market,Farm
17,49.3,Clothing Store,Coffee Shop,Fast Food Restaurant,Grocery Store,Indian Restaurant,Hotel,Bakery,Pharmacy,Café,Sandwich Place
24,48.8,Convenience Store,Pub,Café,Park,Electronics Store,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
27,49.5,Hotel,Pub,Coffee Shop,Gym / Fitness Center,Bar,Cocktail Bar,Café,Burger Joint,Sandwich Place,Italian Restaurant
29,48.4,Platform,Coffee Shop,Gym / Fitness Center,Park,Nature Preserve,Lighthouse,Diner,Italian Restaurant,Food & Drink Shop,Dance Studio
31,49.4,Clothing Store,Coffee Shop,Pub,Pizza Place,Asian Restaurant,Burger Joint,Supermarket,Gym / Fitness Center,Café,Bookstore


### Cluster 3

In [70]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]

Unnamed: 0,Attainment8,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,53.9,Pub,Café,Coffee Shop,Clothing Store,Sandwich Place,Gym / Fitness Center,Hotel,Pharmacy,Pizza Place,Burger Joint
19,53.6,Bakery,Italian Restaurant,Pub,French Restaurant,Park,Ice Cream Shop,Burger Joint,English Restaurant,Coffee Shop,Pizza Place
25,54.0,Hotel,Eastern European Restaurant,Pizza Place,Pub,Metro Station,Historic Site,Health & Beauty Service,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
26,54.1,Construction & Landscaping,Home Service,Hobby Shop,Bus Station,Pub,Electronics Store,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
32,53.4,Outdoor Sculpture,Pub,Coffee Shop,Historic Site,Café,Sandwich Place,Plaza,Garden,Monument / Landmark,Hotel


### Cluster 4

In [71]:

london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]

Unnamed: 0,Attainment8,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,50.2,Pub,Golf Course,Yoga Studio,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
5,50.8,Clothing Store,Coffee Shop,Gym / Fitness Center,Burger Joint,Pub,Portuguese Restaurant,Pizza Place,Electronics Store,Movie Theater,Chocolate Shop
8,50.9,Coffee Shop,Pub,Platform,Café,Clothing Store,Burger Joint,Park,Italian Restaurant,Bakery,Restaurant
14,50.9,Indian Restaurant,Afghan Restaurant,Sandwich Place,Fast Food Restaurant,Coffee Shop,Grocery Store,Fish Market,Fish & Chips Shop,Farmers Market,Farm
23,51.1,Tram Station,Pub,Thai Restaurant,Cricket Ground,Sushi Restaurant,Hardware Store,Deli / Bodega,Park,Flea Market,Dessert Shop


## Results

Clusters 1 and 3 have got the highest average attainments (GCSE grades), these locations are mostly located on the outskirts of London. Clusters 0 and 2 have got the lowest GCSE results, these are mostly located in east London and closer to the centre of London. 

## Discussion

If we look at data entry 28 (Sutton) which has the highest average attainment score, we will find that the most common venues at that location are places which we would associate with high culture and prestige, such as historic sights, tennis courts and yoga studios. We also find that Sutton is located on the outskirts of the city. Comparing this with Lewisham (data entry 22) which has the lowest grades, we find that the venues surrounding that location are retailers, coffee shops/cafes and public transport facilities, places with one can associate with a busy city. Additionally Lewisham is located near the centre of the city, this further adds to the idea that the further from the centre the higher the grades.

## Conclusion



A final comment would be to use the data to access the level of urbanization of the area to access how well a student would perform in their GCSEs. Highly urbanized areas seem to have a negative impact on the GCSE grades. Of course there are several other factors at play which effect the GCSE results of pupils that this basic analysis has not taken into account, and it would be ludicrous to think otherwise.