# Covid-19 Cases and Neighbourhood Venues Analysis for Manchester, UK
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction/Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## 1. Introduction/Problem
Manchester is one of the hotspots of the Covid-19 cases compared to the rest of the UK. The UK Government has tighten social distancing measures in the area to stop the spread of the virus.

This project aims to provide understanding of relationship between the number of confirmed Covid-19 cases and neighbourhoods in Manchester. In particular, it will focus on how the infection rates differ by neighbourhoods and their composition of venue types to understand if neighbourhoods which have the majority of a certain type of venues are likely to have higher number of cases. This report will target policy makers and healthcare-sector workers who may be interested in effectiveness of measures to stop the spread of the virus by restraining people from particular activities.

## 2. Data
To solve the defined problem above, Middle-Layer Super Output Areas (MSOAs) are used as a unit of neighbourhood data. MSOAs are a statistical geography created for the 2011 Census of England and Wales. A typycal population is 7,000-10,000 people per MSOA. 

The folloing list of data will be used for a MSOA-level analysis.

- 2011 Population weighted centroid location by MSOA from the ONS Geography Open Data for coordinates of MSOAs (https://geoportal.statistics.gov.uk/datasets/middle-layer-super-output-areas-december-2011-population-weighted-centroids?geometry=-2.466%2C53.438%2C-2.032%2C53.510);
- Neighbourhood names by MSOA from the House of Commons Library MSOA Names (https://visual.parliament.uk/msoanames);
- Number of Covid-19 confirmed cases by Middle Super Output Area (MSOA) from the Government's Coronavirus webstie (https://coronavirus.data.gov.uk/cases retrieved on 23rd August 2020);
- 2018 Population estimates by MSOA from the National Statistics to calcualte infection rates (https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/middlesuperoutputareamidyearpopulationestimatesnationalstatistics);
- Forsquare API to get the most common venues of the neighbourhoods in Manchester.

Using the data, neighbourhoods in Manchester will be clustered based on similariy of common venue types. Afterwards infection rates will be compared by the neighbourhood clusters. 

### 2.1 Neighbourhood Locations
Firstly, I need latitude and longitude coordinates for the neighbourhoods in Manchester. As the ONS Geography Open Data provides 2011 population weighted centroid locations by MSOA for England and Wales, I will use this data and matches neighbourhood names with the MSOAs. Afterwards I will filter the MSOAs only in Manchester.

In [3]:
import pandas as pd 
import os
from pyproj import Proj, transform # X,Y to Longitude,Latitude

In [4]:
#Local location to save all data required for the analysis
location = r"C:\Users\Iseul.Song\PythonProjects\training\AppliedDataScience\Covid19-Infection-and-Venues-Data-Analysis-of-Manchester-UK\data"

Let's look at how the coordinate data looks.

In [5]:
centroid = pd.read_csv(os.path.join(location, 'Middle_Layer_Super_Output_Areas__December_2011__Population_Weighted_Centroids.csv'))
centroid.head()

Unnamed: 0,X,Y,objectid,msoa11cd,msoa11nm
0,445583.305,524174.111,1,E02002536,Stockton-on-Tees 002
1,446778.119,524255.508,2,E02002537,Stockton-on-Tees 003
2,461357.951,515117.478,3,E02002534,Redcar and Cleveland 020
3,446118.0,525454.53,4,E02002535,Stockton-on-Tees 001
4,461054.231,516173.954,5,E02002532,Redcar and Cleveland 018


Let's load the data for the neighbourhood names and join it with the coordinate data using MSOA codes.

In [6]:
msoa_name = pd.read_csv(os.path.join(location, 'MSOA-Names-1.4.0.csv'))
centroid = centroid.merge(msoa_name, on = 'msoa11cd') # join neighbourhood names for each msoa code
centroid.head()

Unnamed: 0,X,Y,objectid,msoa11cd,msoa11nm_x,msoa11nm_y,msoa11nmw,msoa11hclnm,msoa11hclnmw,Laname
0,445583.305,524174.111,1,E02002536,Stockton-on-Tees 002,Stockton-on-Tees 002,Stockton-on-Tees 002,Billingham Central,,Stockton-on-Tees
1,446778.119,524255.508,2,E02002537,Stockton-on-Tees 003,Stockton-on-Tees 003,Stockton-on-Tees 003,Billingham East & Haverton Hill,,Stockton-on-Tees
2,461357.951,515117.478,3,E02002534,Redcar and Cleveland 020,Redcar and Cleveland 020,Redcar and Cleveland 020,Guisborough Outer & Upleatham,,Redcar and Cleveland
3,446118.0,525454.53,4,E02002535,Stockton-on-Tees 001,Stockton-on-Tees 001,Stockton-on-Tees 001,Billingham North & Wolviston,,Stockton-on-Tees
4,461054.231,516173.954,5,E02002532,Redcar and Cleveland 018,Redcar and Cleveland 018,Redcar and Cleveland 018,Guisborough North,,Redcar and Cleveland


Let's filter Manchester MSOAs only. Afterwards, I'm going to remove unnecessary columns and rename the remaining columns with more meaningful names.

In [7]:
centroid_mcr = centroid[centroid['msoa11nm_x'].str.contains('Manchester')]
centroid_mcr.drop(columns = ['objectid', 'msoa11nm_y', 'msoa11nmw', 'msoa11hclnmw','Laname'], inplace = True)
centroid_mcr.rename(columns = {'msoa11cd': 'MSOA code', 'msoa11nm_x': 'MSOA name', 'msoa11hclnm': 'Neighbourhood'}, inplace = True)
centroid_mcr.reset_index(drop=True).head(16)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood
0,388328.79,399090.429,E02001056,Manchester 012,Clayton Vale
1,385709.138,398630.708,E02001057,Manchester 013,New Islington & Miles Platting
2,388304.872,400243.228,E02001055,Manchester 011,Newton Heath
3,384344.95,401603.394,E02001052,Manchester 008,Crumpsall South
4,386211.496,401029.976,E02001053,Manchester 009,Hapurhey South & Monsall
5,385857.998,402087.339,E02001050,Manchester 006,Hapurhey North
6,387155.798,401800.764,E02001051,Manchester 007,Moston West
7,387418.28,398157.945,E02001059,Manchester 015,"Beswick, Eastlands & Openshaw Park"
8,384567.12,398957.002,E02006902,Manchester 054,City Centre North & Collyhurst
9,382798.593,397122.179,E02006916,Manchester 059,Hulme Park & St George's


Let's check if there is any missing values in the Neighbourhood column to make sure we have all pairs between MSOAs and neighbourhood names.

In [8]:
missing_value = centroid_mcr['Neighbourhood'].isnull()
missing_value.value_counts()

False    57
Name: Neighbourhood, dtype: int64

In [9]:
centroid_mcr.shape[0]

57

Great, there is no missing value! We also know there are 67 MSOAs in Manchester.

Now, we need to convert X, Y coordinates into longitude and latitude coordinates.

In [10]:
p84 = Proj(proj="latlong",towgs84="0,0,0",ellps="WGS84")
p36 = Proj(proj="latlong", k=0.9996012717, ellps="airy", towgs84="446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894")
vgrid = Proj(init="world:bng")

def Cov_EN_LL(easting, northing):
    """Returns (longitude, latitude) tuple
    """
    lon36, lat36 = vgrid(easting, northing, inverse=True)
    return transform(p36, p84, lon36, lat36)

longitude = []
latitude = []

for x, y in zip(centroid_mcr['X'], centroid_mcr['Y']):
    lon, lat = Cov_EN_LL(x, y)
    longitude.append(lon)
    latitude.append(lat)
    
centroid_mcr['Longitude'] = longitude
centroid_mcr['Latitude'] = latitude
centroid_mcr.reset_index(drop = True).head()

  return _prepare_from_string(" ".join(pjargs))
  projstring = _prepare_from_string(" ".join((projstring, projkwargs)))
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude
0,388328.79,399090.429,E02001056,Manchester 012,Clayton Vale,-2.177365,53.488394
1,385709.138,398630.708,E02001057,Manchester 013,New Islington & Miles Platting,-2.216825,53.484197
2,388304.872,400243.228,E02001055,Manchester 011,Newton Heath,-2.177769,53.498755
3,384344.95,401603.394,E02001052,Manchester 008,Crumpsall South,-2.237532,53.510877
4,386211.496,401029.976,E02001053,Manchester 009,Hapurhey South & Monsall,-2.209361,53.505776


Now we have longitude and latitude coordinates for each neighbourhood.

### 2.2 Covid-19 Infection Rates by MSOA
To analyse infection rates by neighbourhood cluster in the later stage, let's prepare the number of confirmed cases by MSOA. The most recent data at the time I was analysing was the 21st of August 2020.

In [11]:
df_covid = pd.read_excel(os.path.join(location, 'MSOAs_latest.xlsx'), sheet_name = 'MSOAs-21-08-2020' ,skiprows = 8)
df_covid.head()

Unnamed: 0,Region code,Region name,Upper tier local authority code,Upper tier local authority name,Local authority district code,Local authority district name,MSOA code,House of Commons Library MSOA Name,wk_05,wk_06,...,wk_26,wk_27,wk_28,wk_29,wk_30,wk_31,wk_32,wk_33,Column1,Latest_7_days
0,E12000007,London,E09000001,City of London,E09000001,City of London,E02000001,City of London,..,..,...,..,..,..,..,..,..,..,..,,..
1,E12000007,London,E09000002,Barking and Dagenham,E09000002,Barking and Dagenham,E02000002,Marks Gate,..,..,...,..,..,..,..,3,..,..,..,,..
2,E12000007,London,E09000002,Barking and Dagenham,E09000002,Barking and Dagenham,E02000003,Chadwell Heath East,..,..,...,..,..,..,..,6,..,4,4,,..
3,E12000007,London,E09000002,Barking and Dagenham,E09000002,Barking and Dagenham,E02000004,Eastbrookend,..,..,...,..,..,..,..,..,..,..,..,,..
4,E12000007,London,E09000002,Barking and Dagenham,E09000002,Barking and Dagenham,E02000005,Becontree Heath,..,..,...,..,..,..,..,..,..,..,..,,..


Let's calculate the total number of cases by MSOA.

In [12]:
weekly_case = df_covid.iloc[:, 8:37]
weekly_case.replace('..', 1, inplace= True) # 'Numbers from 0 to 2 (inclusive) are represented by ".." , therefore, replace with the average.
weekly_case['Total'] = weekly_case.iloc[:,:-1].sum(axis = 1)
weekly_case.head()

Unnamed: 0,wk_05,wk_06,wk_07,wk_08,wk_09,wk_10,wk_11,wk_12,wk_13,wk_14,...,wk_25,wk_26,wk_27,wk_28,wk_29,wk_30,wk_31,wk_32,wk_33,Total
0,1,1,1,1,1,1,1,4,3,3,...,1,1,1,1,1,1,1,1,1,37
1,1,1,1,1,1,1,1,1,1,5,...,1,1,1,1,1,3,1,1,1,41
2,1,1,1,1,1,1,1,1,3,10,...,1,1,1,1,1,6,1,4,4,65
3,1,1,1,1,1,1,1,1,6,10,...,1,1,1,1,1,1,1,1,1,55
4,1,1,1,1,1,1,1,1,4,6,...,1,1,1,1,1,1,1,1,1,47


Now we are going to create a dataframe for Manchester to show the number of cases by MSOA.

In [13]:
columns = ['Local authority district name', 'MSOA code', 'Neighbourhood','Total case']
df_covid_mcr = pd.DataFrame(columns = columns)
df_covid_mcr['MSOA code'] = df_covid['MSOA code']
df_covid_mcr['Neighbourhood'] = df_covid['House of Commons Library MSOA Name']
df_covid_mcr['Local authority district name'] = df_covid['Local authority district name']
df_covid_mcr['Total case'] = weekly_case['Total']
df_covid_mcr = df_covid_mcr[df_covid_mcr['Local authority district name'] == 'Manchester'].reset_index(drop=True)
df_covid_mcr.head()

Unnamed: 0,Local authority district name,MSOA code,Neighbourhood,Total case
0,Manchester,E02001045,Boothroyden & Higher Blackley,64
1,Manchester,E02001046,Blackley,98
2,Manchester,E02001047,Charlestown,98
3,Manchester,E02001048,Crumpsall North & Heaton Park,93
4,Manchester,E02001049,New Moston,75


Next, we need to calculate infection rate using the total number of cases and population by MSOA. Therefore, let's load population data.

In [14]:
df_pop = pd.read_excel(os.path.join(location, 'SAPE21DT14a-mid-2018-msoa-on-2019-LA-quinary-estimates-formatted.xlsx'), sheet_name = 'Mid-2018 Persons', skiprows = 4)
df_pop = df_pop.iloc[:,:4]
df_pop.head()

Unnamed: 0,Area Codes,LA (2019 boundaries),MSOA,All Ages
0,E06000047,County Durham,,526980
1,E02004297,,County Durham 001,8099
2,E02004290,,County Durham 002,5808
3,E02004298,,County Durham 003,10031
4,E02004299,,County Durham 004,8588


Let's clean the data and filter Manchester data.

In [15]:
df_pop.dropna(subset = ['MSOA'], inplace = True)
df_pop.rename(columns = {'Area Codes' : 'MSOA code'}, inplace = True)
df_pop_mcr = df_pop[df_pop['MSOA'].str.contains('Manchester')]
df_pop_mcr.reset_index(drop = True).head()

Unnamed: 0,MSOA code,LA (2019 boundaries),MSOA,All Ages
0,E02001045,,Manchester 001,8269
1,E02001046,,Manchester 002,10377
2,E02001047,,Manchester 003,10002
3,E02001048,,Manchester 004,8841
4,E02001049,,Manchester 005,9764


Now, I'm going to combine the population data and the covid-19 data and calcuate the number of confirmed cases per hundread people.

In [16]:
df_covid_mcr=  df_covid_mcr.merge(df_pop, on='MSOA code')
df_covid_mcr.drop(columns = ['LA (2019 boundaries)', 'MSOA'], axis = 0, inplace = True)
df_covid_mcr['Number of confirmed cases per hundred people'] = df_covid_mcr['Total case']/df_covid_mcr['All Ages']*100
df_covid_mcr.head()

Unnamed: 0,Local authority district name,MSOA code,Neighbourhood,Total case,All Ages,Number of confirmed cases per hundred people
0,Manchester,E02001045,Boothroyden & Higher Blackley,64,8269,0.773975
1,Manchester,E02001046,Blackley,98,10377,0.944396
2,Manchester,E02001047,Charlestown,98,10002,0.979804
3,Manchester,E02001048,Crumpsall North & Heaton Park,93,8841,1.051917
4,Manchester,E02001049,New Moston,75,9764,0.768128


### Venues in neighbourhoods
Using the longitude and latitude coordinates of the centroids of each neighbourhood, let's search venues in 800 metres from the centrolds.

## 3. Methodology
In this project, we will cluster the neighbourhoods based on their similarity of common venue types and compare infection rate between the neighbourhood clusters. It is assumed that venues are most likely to be used by usual residents who live in walking distance from the venues in the neighbourhoods rather than people from other neighbourhoods. This is a resonable assumption under the social distancing rules and the government's guide on people should work from home if they can. 

The first step is exploration of frequency of venue types across the neighbouorhoods in Manchester, using Foursquare API. The radius for counting venues will be set as 800 metres, which is widely considered as a ten minute-walking distance. The limit will be set as 200 venues in this analysis. The venues will be analysed by venue type and the number of venues by venue type will be calculated for each neighbourhood. This analysis will provide common types of venues in order by neighbourhood.

Secondly, neighbourhoods will be clustered using unsupervised learning K-means algorithm, which is one of the most common cluster method. In this analysis, the number of 1st most common venues by cluster will be provided. 

Lastly, Covid-19 infection rate will be compared between neighbourhood clusters. This will provide understanding of potential relationship between neighbourhoods' venue composition and Covid-19 infection rate. 


Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

### Venues in neighbourhoods
Using the longitude and latitude coordinates of the centroids of each neighbourhood, let's plot them in a map first.

In [17]:
import numpy as np
from geopy.geocoders import Nominatim

import requests

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

In [18]:
address = 'Manchester, UK'

geolocator = Nominatim(user_agent="mcr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

53.4794892 -2.2451148


In [19]:
# create map of Manchester using latitude and longitude values
map_mcr = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, msoa_name, neighbourhood in zip(centroid_mcr['Latitude'], centroid_mcr['Longitude'], centroid_mcr['MSOA name'], centroid_mcr['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, msoa_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mcr)  
    
map_mcr

Using Foursquare API, let's search up to 200 venues in 800 metres from the centrolds of the neighbourhoods.

In [670]:
CLIENT_ID = 'XXXX' 
CLIENT_SECRET = 'YYYY'
VERSION = '23200822' 
LIMIT = 200

In [671]:
def getNearbyVenues(names, latitudes, longitudes, radius=800): # radius 800metre is 10min walking distance
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [672]:
mcr_venues = getNearbyVenues(names=centroid_mcr['Neighbourhood'],
                                   latitudes=centroid_mcr['Latitude'],
                                   longitudes=centroid_mcr['Longitude']
                                  )

Clayton Vale
New Islington & Miles Platting
Newton Heath
Crumpsall South
Hapurhey South & Monsall
Hapurhey North
Moston West
Beswick, Eastlands & Openshaw Park
City Centre North & Collyhurst
Hulme Park & St George's
Castlefield & Deansgate
University North & Whitworth Street
Strangeways
Piccadilly & Ancoats
Didsbury Village
Merseybank & Barlow Moor
Burnage South
Withington East
West Didsbury
Withington West
Beech Road & Chorlton Meadows
Ladybarn
East Didsbury
Northern Moor
Woodhouse Park & Airport
Wythenshawe East & Peel Hall
Newall Green
Benchill South & Wythenshawe Central
Benchill North & Sharston
Baguley East & Wythenshawe Park
Baguley West & Brooklands
Northenden
Victoria Park
Gorton South
Belle Vue & West Gorton
Abbey Hey
Ardwick
Hulme & University
Openshaw & Gorton North
Moss Side West
Rusholme West & Moss Side East
Fallowfield Central
Chorlton South
Fallowfield West & Whalley Range South
Levenshulme Central
Whalley Range North
Chorlton North
Rusholme East
Levenshulme North
Leve

In [673]:
print(mcr_venues.shape)
mcr_venues.head()

(1379, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Clayton Vale,53.488394,-2.177365,Clayton Vale,53.484735,-2.177158,Park
1,Clayton Vale,53.488394,-2.177365,Brewers Fayre,53.486428,-2.173513,Pub
2,Clayton Vale,53.488394,-2.177365,Clayton Hall Metrolink Station,53.485007,-2.177443,Tram Station
3,Clayton Vale,53.488394,-2.177365,Hewlet Johnson Playing Fields,53.483562,-2.168788,Park
4,New Islington & Miles Platting,53.484197,-2.216825,Pollen Bakery,53.483487,-2.224372,Bakery


As you can see above, 1379 venues are returned by Foursquare. Let's group the data by neighoubrhood so that we can understand how many venues are captured in each neighbourhood.

In [675]:
mcr_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbey Hey,10,10,10,10,10,10
Ardwick,12,12,12,12,12,12
Baguley East & Wythenshawe Park,5,5,5,5,5,5
Baguley West & Brooklands,13,13,13,13,13,13
Beech Road & Chorlton Meadows,54,54,54,54,54,54
Belle Vue & West Gorton,12,12,12,12,12,12
Benchill North & Sharston,4,4,4,4,4,4
Benchill South & Wythenshawe Central,18,18,18,18,18,18
"Beswick, Eastlands & Openshaw Park",20,20,20,20,20,20
Blackley,6,6,6,6,6,6


Let's check how many venus types exist in the data and the average frequency of each type by neighbourhood usig one-hot encoding.

In [674]:
print(len(mcr_venues['Venue Category'].unique()))

196


In [676]:
# one hot encoding
mcr_onehot = pd.get_dummies(mcr_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mcr_onehot['Neighbourhood'] = mcr_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [mcr_onehot.columns[-1]] + list(mcr_onehot.columns[:-1])
mcr_onehot = mcr_onehot[fixed_columns]

mcr_onehot.head()

Unnamed: 0,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport Service,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Track Stadium,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Women's Store
0,Clayton Vale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Clayton Vale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Clayton Vale,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
3,Clayton Vale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,New Islington & Miles Platting,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [677]:
mcr_onehot.shape

(1379, 197)

In [678]:
mcr_grouped = mcr_onehot.groupby('Neighbourhood').mean().reset_index()
mcr_grouped.head()

Unnamed: 0,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport Service,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Track Stadium,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Women's Store
0,Abbey Hey,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ardwick,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Baguley East & Wythenshawe Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Baguley West & Brooklands,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Beech Road & Chorlton Meadows,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.018519,0.037037,0.0,0.0,0.0,0.0,0.0,0.0


In [679]:
mcr_grouped.shape

(57, 197)

Now we have the average frequency of each venue type for 57 neighbourhoods in Manchester.
Let's created a new dataframe which is sorted based on the most common venue types.

In [680]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [681]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = mcr_grouped['Neighbourhood']

for ind in np.arange(mcr_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mcr_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Hey,Gym / Fitness Center,Park,Fast Food Restaurant,Hotel,Bus Station,Train Station,Market,General College & University,Bakery,Electronics Store
1,Ardwick,Music Venue,Chinese Restaurant,Concert Hall,Museum,Café,Beer Bar,Bar,Theater,Korean Restaurant,Discount Store
2,Baguley East & Wythenshawe Park,Tram Station,Fast Food Restaurant,Photography Studio,Gym / Fitness Center,Women's Store,Donut Shop,Fish & Chips Shop,Farmers Market,Falafel Restaurant,Fabric Shop
3,Baguley West & Brooklands,Tram Station,Supermarket,Coffee Shop,Donut Shop,Hardware Store,Clothing Store,Grocery Store,Pet Store,Furniture / Home Store,Farmers Market
4,Beech Road & Chorlton Meadows,Pub,Bar,Pizza Place,Café,Grocery Store,Coffee Shop,Fast Food Restaurant,Park,Turkish Restaurant,Food & Drink Shop


# FROM HERE TO DO TODAY

In [682]:
# set number of clusters
kclusters = 10

mcr_grouped_clustering = mcr_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mcr_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 0, 0, 1, 4, 5, 0, 1, 1])

In [683]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mcr_merged = centroid_mcr

# merge mcr_grouped with mcr_data to add latitude/longitude for each neighborhood
mcr_merged = mcr_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

mcr_merged.head() # check the last columns!

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
399,388328.79,399090.429,E02001056,Manchester 012,Clayton Vale,-2.177365,53.488394,3,Park,Pub,Tram Station,Women's Store,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Fabric Shop,Ethiopian Restaurant
400,385709.138,398630.708,E02001057,Manchester 013,New Islington & Miles Platting,-2.216825,53.484197,1,Coffee Shop,Italian Restaurant,Tram Station,Pizza Place,Brewery,Residential Building (Apartment / Condo),Café,Canal Lock,Beer Bar,Supermarket
402,388304.872,400243.228,E02001055,Manchester 011,Newton Heath,-2.177769,53.498755,5,Supermarket,Fast Food Restaurant,Shoe Store,Tram Station,Park,Bus Stop,Women's Store,Electronics Store,Farmers Market,Falafel Restaurant
403,384344.95,401603.394,E02001052,Manchester 008,Crumpsall South,-2.237532,53.510877,0,Tram Station,Coffee Shop,Bakery,Grocery Store,Ice Cream Shop,Middle Eastern Restaurant,Fast Food Restaurant,Park,Sandwich Place,Burger Joint
404,386211.496,401029.976,E02001053,Manchester 009,Hapurhey South & Monsall,-2.209361,53.505776,5,Supermarket,Hotel,Tram Station,Flea Market,Women's Store,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [684]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mcr_merged['Latitude'], mcr_merged['Longitude'], mcr_merged['Neighbourhood'], mcr_merged['Cluster Labels'].astype('int')):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster 0

In [685]:
cluster0 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 0, :]
cluster0

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
403,384344.95,401603.394,E02001052,Manchester 008,Crumpsall South,-2.237532,53.510877,0,Tram Station,Coffee Shop,Bakery,Grocery Store,Ice Cream Shop,Middle Eastern Restaurant,Fast Food Restaurant,Park,Sandwich Place,Burger Joint
1036,383474.343,386806.221,E02001096,Manchester 052,Wythenshawe East & Peel Hall,-2.249877,53.377847,0,Tram Station,Fast Food Restaurant,Market,Discount Store,Tanning Salon,Bakery,Hotel,Fish & Chips Shop,Warehouse Store,Farmers Market
1040,382429.418,387665.207,E02001094,Manchester 050,Benchill South & Wythenshawe Central,-2.265632,53.385534,0,Tram Station,Fast Food Restaurant,Bakery,Pub,Fish & Chips Shop,Market,Sandwich Place,Bus Station,Supermarket,Tanning Salon
1044,381325.456,388933.665,E02001092,Manchester 048,Baguley East & Wythenshawe Park,-2.282305,53.396898,0,Tram Station,Fast Food Restaurant,Photography Studio,Gym / Fitness Center,Women's Store,Donut Shop,Fish & Chips Shop,Farmers Market,Falafel Restaurant,Fabric Shop
1046,379874.571,389672.839,E02001091,Manchester 047,Baguley West & Brooklands,-2.304172,53.403488,0,Tram Station,Supermarket,Coffee Shop,Donut Shop,Hardware Store,Clothing Store,Grocery Store,Pet Store,Furniture / Home Store,Farmers Market
1414,384045.066,402858.028,E02001048,Manchester 004,Crumpsall North & Heaton Park,-2.242118,53.522145,0,Coffee Shop,Tram Station,Indian Restaurant,River,Lawyer,Women's Store,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [686]:
cluster1 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 1, :]
cluster1

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
400,385709.138,398630.708,E02001057,Manchester 013,New Islington & Miles Platting,-2.216825,53.484197,1,Coffee Shop,Italian Restaurant,Tram Station,Pizza Place,Brewery,Residential Building (Apartment / Condo),Café,Canal Lock,Beer Bar,Supermarket
411,387418.28,398157.945,E02001059,Manchester 015,"Beswick, Eastlands & Openshaw Park",-2.19105,53.479991,1,Track Stadium,Gym / Fitness Center,Bar,Pharmacy,Park,Pub,Café,Bus Stop,Restaurant,Fast Food Restaurant
627,384567.12,398957.002,E02006902,Manchester 054,City Centre North & Collyhurst,-2.234051,53.487097,1,Coffee Shop,Bar,Pub,Italian Restaurant,Tea Room,Indian Restaurant,Record Shop,Bookstore,Café,Burger Joint
688,382798.593,397122.179,E02006916,Manchester 059,Hulme Park & St George's,-2.260602,53.47055,1,Bar,Steakhouse,Hotel,Gastropub,Train Station,Fast Food Restaurant,Cocktail Bar,Performing Arts Venue,Canal Lock,Pub
690,383357.653,397648.015,E02006917,Manchester 060,Castlefield & Deansgate,-2.252208,53.475294,1,Bar,Pub,Hotel,Plaza,Italian Restaurant,Indian Restaurant,Coffee Shop,Steakhouse,Cocktail Bar,Asian Restaurant
692,383952.522,397190.05,E02006914,Manchester 057,University North & Whitworth Street,-2.243222,53.471197,1,Pub,Hotel,Bar,Gay Bar,Coffee Shop,Indian Restaurant,Chinese Restaurant,Bakery,Cocktail Bar,Café
696,384642.184,397847.245,E02006912,Manchester 055,Piccadilly & Ancoats,-2.232865,53.477124,1,Coffee Shop,Hotel,Gay Bar,Bar,Tea Room,Beer Bar,Record Shop,Sushi Restaurant,Plaza,Brewery
708,384699.684,391244.743,E02001087,Manchester 043,Didsbury Village,-2.231675,53.41778,1,Pub,Italian Restaurant,Indian Restaurant,Grocery Store,Gym / Fitness Center,Bar,Pharmacy,Coffee Shop,Fish & Chips Shop,Cheese Shop
716,383797.807,392308.573,E02001083,Manchester 039,West Didsbury,-2.245299,53.427315,1,Pub,Bus Stop,Bar,Italian Restaurant,Indian Restaurant,Deli / Bodega,Tennis Court,Restaurant,Café,Persian Restaurant
718,384490.212,392636.529,E02001082,Manchester 038,Withington West,-2.234895,53.430284,1,Pub,Grocery Store,Bar,Italian Restaurant,Indian Restaurant,Tennis Court,Restaurant,Café,Asian Restaurant,Vegetarian / Vegan Restaurant


In [687]:
cluster2 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 2, :]
cluster2

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
694,383772.242,400556.76,E02006915,Manchester 058,Strangeways,-2.246113,53.501452,2,Fast Food Restaurant,Pizza Place,Shopping Plaza,Electronics Store,Sushi Restaurant,Bakery,Casino,Auto Garage,Restaurant,Supermarket
710,382840.968,391839.785,E02001086,Manchester 042,Merseybank & Barlow Moor,-2.259673,53.423071,2,Fast Food Restaurant,Middle Eastern Restaurant,Hotel,Bus Station,Grocery Store,Golf Course,Outdoor Supply Store,Lake,Women's Store,English Restaurant
714,385324.717,392392.146,E02001084,Manchester 040,Withington East,-2.222324,53.428111,2,Restaurant,Convenience Store,Middle Eastern Restaurant,Grocery Store,Japanese Restaurant,Sandwich Place,Park,Café,Lake,Women's Store
726,381170.894,390493.368,E02001088,Manchester 044,Northern Moor,-2.284723,53.410912,2,Grocery Store,Park,Fast Food Restaurant,Convenience Store,Bowling Green,Monument / Landmark,Tram Station,Lebanese Restaurant,Chinese Restaurant,Soccer Field
1048,382940.386,389949.081,E02001090,Manchester 046,Northenden,-2.258074,53.40608,2,Middle Eastern Restaurant,Vietnamese Restaurant,Pub,Sandwich Place,Park,Fish & Chips Shop,Café,Women's Store,Donut Shop,Farmers Market
1223,386051.115,395692.2,E02001066,Manchester 022,Victoria Park,-2.21154,53.457794,2,Grocery Store,Market,Bakery,Pharmacy,Pastry Shop,Park,Electronics Store,Sandwich Place,Middle Eastern Restaurant,Sports Club
1226,389361.524,396367.211,E02001065,Manchester 021,Abbey Hey,-2.161708,53.463939,2,Gym / Fitness Center,Park,Fast Food Restaurant,Hotel,Bus Station,Train Station,Market,General College & University,Bakery,Electronics Store
1232,383853.987,395414.4,E02001068,Manchester 024,Moss Side West,-2.244614,53.455233,2,Tea Room,Park,Food,Café,Supermarket,Lake,Discount Store,Brewery,Gym Pool,Pizza Place
1234,384726.482,395207.815,E02001069,Manchester 025,Rusholme West & Moss Side East,-2.231465,53.453403,2,Middle Eastern Restaurant,Indian Restaurant,Hookah Bar,Café,Grocery Store,Park,Halal Restaurant,Ice Cream Shop,Fast Food Restaurant,Dessert Shop
1318,383943.346,394347.33,E02001074,Manchester 030,Fallowfield West & Whalley Range South,-2.243214,53.445645,2,Bus Stop,Bakery,Sandwich Place,Park,Supermarket,Lake,Basketball Court,Middle Eastern Restaurant,Tea Room,Escape Room


In [688]:
cluster3 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 3, :]
cluster3

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
399,388328.79,399090.429,E02001056,Manchester 012,Clayton Vale,-2.177365,53.488394,3,Park,Pub,Tram Station,Women's Store,Dive Bar,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Fabric Shop,Ethiopian Restaurant


In [689]:
cluster4 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 4, :]
cluster4

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1225,387224.811,396211.022,E02001064,Manchester 020,Belle Vue & West Gorton,-2.193885,53.462487,4,Grocery Store,Supermarket,Park,Electronics Store,Hotel,Movie Theater,Gym,Track Stadium,Bus Stop,Discount Store
1405,385008.629,404186.798,E02001045,Manchester 001,Boothroyden & Higher Blackley,-2.227649,53.534117,4,Clothing Store,Grocery Store,Supermarket,Discount Store,Women's Store,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [690]:
cluster5 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 5, :]
cluster5

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
402,388304.872,400243.228,E02001055,Manchester 011,Newton Heath,-2.177769,53.498755,5,Supermarket,Fast Food Restaurant,Shoe Store,Tram Station,Park,Bus Stop,Women's Store,Electronics Store,Farmers Market,Falafel Restaurant
404,386211.496,401029.976,E02001053,Manchester 009,Hapurhey South & Monsall,-2.209361,53.505776,5,Supermarket,Hotel,Tram Station,Flea Market,Women's Store,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant
1042,382978.424,388285.745,E02001093,Manchester 049,Benchill North & Sharston,-2.257412,53.39113,5,Airport Service,Tram Station,Supermarket,Bus Stop,Women's Store,English Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [691]:
cluster6 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 6, :]
cluster6

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
406,387155.798,401800.764,E02001051,Manchester 007,Moston West,-2.195155,53.512728,6,Bakery,Building,Pub,Chinese Restaurant,Women's Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Fabric Shop


In [692]:
cluster7 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 7, :]
cluster7

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1403,387778.975,403089.749,E02001047,Manchester 003,Charlestown,-2.185809,53.524329,7,Breakfast Spot,Canal,Food Truck,Gym / Fitness Center,Golf Course,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [693]:
cluster8 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 8, :]
cluster8

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
712,386349.277,392246.319,E02001085,Manchester 041,Burnage South,-2.206899,53.426828,8,Supermarket,Pizza Place,Train Station,Discount Store,Liquor Store,Park,Toy / Game Store,Indian Restaurant,Chinese Restaurant,Bus Station
1034,382400.969,386301.412,E02001097,Manchester 053,Woodhouse Park & Airport,-2.265983,53.373275,8,Discount Store,Tram Station,Sandwich Place,Business Service,Soccer Field,Park,Deli / Bodega,Fish & Chips Shop,Coffee Shop,Hotel
1224,388395.195,395556.625,E02001067,Manchester 023,Gorton South,-2.176233,53.456632,8,Train Station,Gymnastics Gym,Soccer Stadium,Market,Sandwich Place,Racetrack,Supermarket,Women's Store,Donut Shop,Farmers Market
1229,388982.912,397040.883,E02001061,Manchester 017,Openshaw & Gorton North,-2.167435,53.469986,8,Supermarket,Coffee Shop,Bakery,Pizza Place,Pet Store,Clothing Store,Discount Store,Bus Stop,Sandwich Place,Farmers Market


In [694]:
cluster9 = mcr_merged.loc[mcr_merged['Cluster Labels'] == 9, :]
cluster9

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
405,385857.998,402087.339,E02001050,Manchester 006,Hapurhey North,-2.214739,53.51527,9,Hotel,Gym / Fitness Center,Shopping Mall,Supermarket,Sandwich Place,Park,Women's Store,Donut Shop,Farmers Market,Falafel Restaurant


# Comebine covid cases

In [695]:
mcr_merged_covid = mcr_merged.merge(df_covid_mcr, on='MSOA code')

mcr_merged_covid.head() # check the last columns!

Unnamed: 0,X,Y,MSOA code,MSOA name,Neighbourhood_x,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,...,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Local authority district name,Neighbourhood_y,Weekly average case,Total case,All Ages,Number of confirmed cases per hundred people
0,388328.79,399090.429,E02001056,Manchester 012,Clayton Vale,-2.177365,53.488394,3,Park,Pub,...,Farmers Market,Falafel Restaurant,Fabric Shop,Ethiopian Restaurant,Manchester,Clayton Vale,2.793103,81,8319,0.973675
1,385709.138,398630.708,E02001057,Manchester 013,New Islington & Miles Platting,-2.216825,53.484197,1,Coffee Shop,Italian Restaurant,...,Café,Canal Lock,Beer Bar,Supermarket,Manchester,New Islington & Miles Platting,3.137931,91,11195,0.812863
2,388304.872,400243.228,E02001055,Manchester 011,Newton Heath,-2.177769,53.498755,5,Supermarket,Fast Food Restaurant,...,Women's Store,Electronics Store,Farmers Market,Falafel Restaurant,Manchester,Newton Heath,2.965517,86,9189,0.935902
3,384344.95,401603.394,E02001052,Manchester 008,Crumpsall South,-2.237532,53.510877,0,Tram Station,Coffee Shop,...,Fast Food Restaurant,Park,Sandwich Place,Burger Joint,Manchester,Crumpsall South,5.172414,150,14132,1.061421
4,386211.496,401029.976,E02001053,Manchester 009,Hapurhey South & Monsall,-2.209361,53.505776,5,Supermarket,Hotel,...,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Manchester,Hapurhey South & Monsall,3.586207,104,11093,0.937528


In [696]:
mcr_merged_covid.groupby('Cluster Labels').mean()

Unnamed: 0_level_0,X,Y,Longitude,Latitude,Weekly average case,Total case,All Ages,Number of confirmed cases per hundred people
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,382582.300667,392923.225667,-2.263606,53.432798,2.890805,83.833333,9424.0,0.874598
1,384335.913273,395544.515182,-2.237352,53.456414,2.302508,66.772727,9502.0,0.730002
2,385032.401063,394609.037812,-2.226825,53.448026,2.637931,76.5,10064.9375,0.778031
3,388328.79,399090.429,-2.177365,53.488394,2.793103,81.0,8319.0,0.973675
4,386116.72,400198.91,-2.210767,53.498302,2.706897,78.5,9178.5,0.851202
5,385831.597333,396519.649667,-2.214847,53.46522,3.666667,106.333333,9878.333333,1.084222
6,387155.798,401800.764,-2.195155,53.512728,2.448276,71.0,8995.0,0.789327
7,387778.975,403089.749,-2.185809,53.524329,3.517241,102.0,10002.0,1.019796
8,386532.08825,392786.30975,-2.204137,53.431681,2.655172,77.0,9323.75,0.845574
9,385857.998,402087.339,-2.214739,53.51527,2.137931,62.0,8397.0,0.738359


### discussion: 
limitation, observation, recommandation.
### conclution: 
For the next step in the future, if there is a trend, wider geographcial study