## Estimating Nightlife Activity per District

Goal: create, for each postcode district in Greater London, a normalized index that can characterize the nature, dynamics, activity, and trends in food and nightlife.


To estimate how "hot" the nightlife is in a given postcode district in London, we can look at a number of factors:

* Population access to the area (TfL)
* Number and quality of restaurants in the area (Foursquare, Yelp, Google Places, TripAdvisor, Food Hygiene, Open Street Map, Zomato)
* Number of bars and clubs in the area (Licenses, Place Search as above)

We should start by defining the area around the district. As most APIs take a distance around a central point, it makes sense to use a circular area around the postcode district centroid.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import datetime as dt

Import the list of postcode districts from the output of the Mortgage exercise.

In [2]:
pcd = pd.read_csv('/home/alessandro/Documents/placemake/norm_index.csv')

In [3]:
pcd.head()

Unnamed: 0.1,Unnamed: 0,index,lat,long
0,BR1,0.21656,51.410753,0.01942
1,BR2,0.197952,51.390385,0.021641
2,BR3,0.158689,51.403509,-0.031492
3,BR4,0.168705,51.375654,-0.009797
4,BR5,0.13293,51.389225,0.102537


In [4]:
pcd = pcd.rename(columns={'Unnamed: 0': 'District'})

### District areas

Need to create a distance matrix to evaluate the distances between each district and then get the minimum distance as the length of the diameter of the circle around the centroid. This is obviously only an approximation but at least minimises overlap between districts.

In [5]:
from geopy.distance import vincenty


# assumes your DataFrame is named df, and its lon and lat columns are named lon and lat. Adjust as needed.
pcd['coords'] = zip(pcd.lat, pcd.long)
# first, let's create a square DataFrame (think of it as a matrix if you like)
square = pd.DataFrame(
    np.zeros(len(pcd) ** 2).reshape(len(pcd), len(pcd)),
    index=pcd.index, columns=pcd.index)

def get_distance(col):
    end = pcd.ix[col.name]['coords']
    return pcd['coords'].apply(vincenty, args=(end,), ellipsoid='WGS-84')

distance_matrix = square.apply(get_distance, axis=1).T

def units(input_instance):
    return input_instance.meters

distance_matrix = distance_matrix.applymap(units)

In [6]:
distance_matrix.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,253,254,255,256,257,258,259,260,261,262
0,0.0,2271.313684,3633.067259,4402.743871,6260.590948,6968.601748,2820.476281,8570.491282,7957.067636,10839.127995,...,15265.714954,15811.214348,15004.506271,15153.470679,15072.945961,15355.034576,14805.414835,34773.959855,42819.662399,32135.008298
1,2271.313684,0.0,3975.719387,2734.277534,5632.46494,5506.914831,3573.628114,8284.537237,7185.11983,9589.019854,...,17179.480273,17759.685945,16921.647412,17041.197054,16934.153684,17206.211362,16689.034779,36520.75965,44253.521808,34071.711727
2,3633.067259,3975.719387,0.0,3447.355595,9462.536324,9469.777765,6427.259116,11985.614133,4670.122331,7890.171455,...,14062.525294,14707.643665,13816.70188,13873.692195,13717.420174,13966.110638,13520.012049,32991.963616,40422.990052,30857.97181
3,4402.743871,2734.277534,3447.355595,0.0,7964.983108,7127.497989,6302.012366,10694.066692,4744.675621,6869.393353,...,17509.693127,18153.118691,17263.587922,17320.900485,17163.314375,17410.648535,16967.256743,36352.880271,43542.354707,34296.594583
4,6260.590948,5632.46494,9462.536324,7964.983108,0.0,2540.199837,3885.727777,2737.670657,12683.327927,14635.15368,...,20969.125792,21435.723022,20705.671625,20906.356917,20871.968374,21166.300573,20570.774563,40620.597568,48968.324554,37635.034586


In [7]:
min_distances = distance_matrix[distance_matrix>0].min()

In [8]:
pcd['radius'] = min_distances/2

### Government Data - Licenses

There are a number of government datasets that can be used to answer some of the questions in the problem. One of the most useful ones is the dataset for licenses for bars, pubs, etc. The data is a bit patchy but can aid in normalising the indexes and can help understand what kind of trends occur (data is available, with some years missing, from 2008 to 2016).

It is available here:

https://data.london.gov.uk/dataset/number-premises-licences-and-club-premises-certificates

In [9]:
licenses = pd.read_excel('/home/alessandro/Documents/placemake/nightlife/number-premises-licences-club-premises-certificates.xls', sheetname='31 Mar 16', skiprows=1)

In [10]:
licenses.head()

Unnamed: 0,LA Code,LA Code (old format),Licensing Authority,Total number of premises licences in force authorising the provision of some form of regulated entertainment,Plays,Films,Indoor sporting events,Boxing or wrestling,Live music,Recorded music,...,"Entertainment similar to live music, recorded music or dance",Total number of club premises certificates in force authorising the provision of some form of regulated entertainment,Plays.1,Films.1,Indoor sporting events.1,Boxing or wrestling.1,Live music.1,Recorded music.1,Performance of dance.1,"Entertainment similar to live music, recorded music or dance.1"
0,E09000001,00AA,City of London Corporation,619,88,197,74,13,274,606,...,0,1,0,0,0,0,1,0,1,1
1,E09000002,00AB,Barking and Dagenham,86,12,30,22,2,60,83,...,19,4,2,3,3,0,3,4,2,2
2,E09000003,00AC,Barnet,-,-,-,-,-,-,-,...,-,-,-,-,-,-,-,-,-,-
3,E09000004,00AD,Bexley,174,37,68,52,3,142,157,...,7,44,10,10,10,0,42,34,23,0
4,E09000005,00AE,Brent,252,52,94,47,16,217,252,...,140,20,5,4,9,1,19,20,18,8


In [11]:
licenses = licenses.iloc[:-3,:]

Now need to attribute postcodes to each "licensing authority". This can be done by matching them on the ONS postcode directory:

https://data.london.gov.uk/dataset/postcode-directory-for-london

In [12]:
postcode_ons = pd.read_csv('/home/alessandro/Documents/placemake/London_postcode-ONS-postcode-Directory-May15.csv')

In [13]:
postcode_la = postcode_ons[['pcd', 'oslaua']]
del postcode_ons

In [14]:
postcode_la['district'] = postcode_la.pcd.apply(lambda x: x[:-4])

In [15]:
postcode_la = postcode_la.groupby(postcode_la.district).first().reset_index(drop=True)

In [16]:
pcd['LocalAuthority'] = postcode_la.oslaua

In [17]:
pcd.head()

Unnamed: 0,District,index,lat,long,coords,radius,LocalAuthority
0,BR1,0.21656,51.410753,0.01942,"(51.4107531862, 0.0194202509747)",1135.656842,E09000006
1,BR2,0.197952,51.390385,0.021641,"(51.3903853052, 0.0216414455587)",1135.656842,E09000006
2,BR3,0.158689,51.403509,-0.031492,"(51.4035087892, -0.0314917635659)",1066.962303,E09000006
3,BR4,0.168705,51.375654,-0.009797,"(51.375654222, -0.00979661456483)",1367.138767,E09000006
4,BR5,0.13293,51.389225,0.102537,"(51.3892249387, 0.10253728911)",1270.099918,E09000006


In [18]:
licenses_short = pd.DataFrame(data=[licenses['LA Code'], licenses['Total number of premises licences in force authorising the provision of some form of regulated entertainment'], licenses['Total number of club premises certificates in force authorising the provision of some form of regulated entertainment']])

In [19]:
licenses_short = licenses_short.T
licenses_short.columns = ['LA Code', 'Premises', 'Certificates']

In [20]:
licenses_short['Premises'] = pd.to_numeric(licenses_short['Premises'], errors='coerce')
licenses_short['Certificates'] = pd.to_numeric(licenses_short['Certificates'], errors='coerce')

Let's distribute these amongst the post-code districts. As we have no prior knowledge, these will be distributed evenly.

In [21]:
nlix = pd.DataFrame(index=pcd.District, columns=['Premises', 'Certificates'])

In [22]:
la_group = pcd.groupby('LocalAuthority')
for i in range(0, len(licenses_short)):
    if licenses_short['LA Code'][i] in la_group.groups.keys():
        df = la_group.get_group(licenses_short['LA Code'][i])
        for distr in df['District'].index:
            nlix.ix[distr, 'Premises'] = licenses_short['Premises'][i]/df.shape[0]
            nlix.ix[distr, 'Certificates'] = licenses_short['Certificates'][i]/df.shape[0]

In [23]:
nlix.head()

Unnamed: 0_level_0,Premises,Certificates
District,Unnamed: 1_level_1,Unnamed: 2_level_1
BR1,,
BR2,,
BR3,,
BR4,,
BR5,,


### Popularity - TfL

A good way of capturing the popularity of an area is also to look at historic trends to and from the area. TfL provides a couple of datasets that can help in this:

https://api-portal.tfl.gov.uk/docs

We're particularly interested in evening to late morning entries and exits from the area.

In [24]:
weekday_entries = pd.read_csv('/home/alessandro/Documents/placemake/nightlife/counts/En16week.csv', skiprows=6)

In [25]:
weekday_entries.head()

Unnamed: 0,nlc,Station,Date,Note,0200-0215,0215-0230,0230-0245,0245-0300,0300-0315,0315-0330,...,0000-0015,0015-0030,0030-0045,0045-0100,0100-0115,0115-0130,0130-0145,0145-0200,Unnamed: 20,Total
0,500.0,Acton Town,Nov-15,,0,0,0,0,0,0,...,15,9,6,4,2,0,0,0,,9994
1,502.0,Aldgate,Nov-16,,0,0,0,0,0,0,...,15,6,2,0,0,0,0,0,,14212
2,503.0,Aldgate East,Nov-16,,0,0,0,0,0,0,...,67,37,13,4,0,0,0,0,,21468
3,505.0,Alperton,Nov-16,,0,0,0,0,0,0,...,3,2,2,1,0,0,0,0,,4821
4,506.0,Amersham,Nov-16,,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,,3899


Once again, we need to convert between the Stations and the postcode districts. Luckily Doogal has provided for us here, with a handy conversion list.

https://www.doogal.co.uk/london_stations.php


In [26]:
tube_postcodes = pd.read_csv('/home/alessandro/Documents/placemake/nightlife/London stations.csv')

In [27]:
tube_postcodes = tube_postcodes.drop(['OS X', 'OS Y', 'Latitude', 'Longitude', 'Zone'], axis=1)

In [28]:
tube_postcodes['Postcode'] = tube_postcodes['Postcode'].apply(lambda x: x[:-4])

In [29]:
weekday_entries = weekday_entries.rename(columns = lambda x : str(x)[1:]) # remove an unwanted space in column names

In [30]:
weekday_entries = weekday_entries.merge(tube_postcodes, on='Station', how='left')

In [31]:
weekday_entries[weekday_entries['Postcode'].isnull()].head()

Unnamed: 0,lc,Station,Date,Note,200-0215,0215-0230,0230-0245,0245-0300,0300-0315,0315-0330,...,0015-0030,0030-0045,0045-0100,0100-0115,0115-0130,0130-0145,0145-0200,Unnamed: 19,Total,Postcode
11,513.0,Bank & Monument,Nov-16,,0,0,0,0,0,0,...,128,41,1,0,2,0,0,,117640,
40,539.0,Chalfont & Latimer,Nov-15,,0,0,0,0,0,0,...,0,0,0,0,0,0,0,,2543,
62,562.0,Earl's Court,Nov-16,,0,0,0,0,0,0,...,49,21,8,2,0,0,0,,30499,
69,774.0,Edgware Road (Bak),Nov-16,,0,0,0,0,0,0,...,5,1,0,0,0,0,0,,8014,
70,569.0,Edgware Road (Cir),Nov-16,,0,0,0,0,0,0,...,11,3,0,0,0,0,0,,11643,


Didn't quite work. Need to clean up the station names on TfL's side.

In [32]:
import re

weekday_entries = pd.read_csv('/home/alessandro/Documents/placemake/nightlife/counts/En16week.csv', skiprows=6)
weekday_entries = weekday_entries.rename(columns = lambda x : str(x)[1:]) # remove an unwanted space in column names
weekday_entries.Station = weekday_entries.Station.apply(lambda x: x.replace("&", "and"))
weekday_entries.Station = weekday_entries.Station.apply(lambda x: x.replace("'", ""))
weekday_entries.Station = weekday_entries.Station.apply(lambda x: re.sub(" [\(\[].*?[\)\]]", "", x))
weekday_entries.Station = weekday_entries.Station.apply(lambda x: x.replace(" and Monument", ""))
weekday_entries.Station = weekday_entries.Station.apply(lambda x: x.replace("123", "1 2 3"))

tube_postcodes.Station = tube_postcodes.Station.apply(lambda x: x.replace("&", "and"))
tube_postcodes.Station = tube_postcodes.Station.apply(lambda x: x.replace("'", ""))
tube_postcodes.Station = tube_postcodes.Station.apply(lambda x: re.sub(" [\(\[].*?[\)\]]", "", x))

In [33]:
weekday_entries = weekday_entries.merge(tube_postcodes, on='Station', how='left')

In [34]:
weekday_entries[weekday_entries['Postcode'].isnull()].head()

Unnamed: 0,lc,Station,Date,Note,200-0215,0215-0230,0230-0245,0245-0300,0300-0315,0315-0330,...,0015-0030,0030-0045,0045-0100,0100-0115,0115-0130,0130-0145,0145-0200,Unnamed: 19,Total,Postcode
272,,Total,,,0,0,0,0,0,0,...,5662,2235,440,109,14,5,0,,4731801,


In fact, since we need to repeat this multiple times, let's put into a function, and drop all irrelevant data while we're at it:

In [35]:
def tube(filepath, tube2pc):
    w = pd.read_csv(filepath, skiprows=6)
    w = w.rename(columns = lambda x : str(x)[1:]) # remove an unwanted space in column names
    if w['Station'][0] != 'Acton Town':
        w = w.drop(0)
    w.Station = w.Station.apply(lambda x: x.replace("&", "and"))
    w.Station = w.Station.apply(lambda x: x.replace("'", ""))
    w.Station = w.Station.apply(lambda x: re.sub(" [\(\[].*?[\)\]]", "", x))
    w.Station = w.Station.apply(lambda x: x.replace(" and Monument", ""))
    w.Station = w.Station.apply(lambda x: x.replace("123", "1 2 3"))

    tube2pc.Station = tube2pc.Station.apply(lambda x: x.replace("&", "and"))
    tube2pc.Station = tube2pc.Station.apply(lambda x: x.replace("'", ""))
    tube2pc.Station = tube2pc.Station.apply(lambda x: re.sub(" [\(\[].*?[\)\]]", "", x))
    
    w = w.merge(tube_postcodes, on='Station', how='left')
    
    # Drop everything except the 9pm onwards and postcode data
    w = w[w.columns[80:]]
    w = w.drop(['', 'Total'], axis=1)
    
    # Delete the last row as reports totals
    w = w.iloc[:-1,:]
    
    return w

In [36]:
weekday_entries = tube('/home/alessandro/Documents/placemake/nightlife/counts/En16week.csv', tube_postcodes)
sat_entries = tube('/home/alessandro/Documents/placemake/nightlife/counts/En16sat.csv', tube_postcodes)
sun_entries = tube('/home/alessandro/Documents/placemake/nightlife/counts/En16sun.csv', tube_postcodes)
weekday_exits = tube('/home/alessandro/Documents/placemake/nightlife/counts/Ex16week.csv', tube_postcodes)
sat_exits = tube('/home/alessandro/Documents/placemake/nightlife/counts/Ex16sat.csv', tube_postcodes)
sun_exits = tube('/home/alessandro/Documents/placemake/nightlife/counts/Ex16sun.csv', tube_postcodes)


Now we can summarise the data by total entry-exits during the week, saturdays, and sundays, as well as giving a lateness score (0-20): the average entry/exit time

In [37]:
def tube_summary(entry, exit):
    df = pd.DataFrame(columns=['District', 'Total', 'Lateness'])
    df['District'] = entry.Postcode
    tube_week = entry.iloc[:,:-1] + exit.iloc[:,:-1]
    df['Lateness'] = np.sum(tube_week.divide(tube_week.sum(axis=1), axis=0)*np.tile(range(0,20),[len(tube_week),1]),axis=1)
    df['Total'] = tube_week.sum(axis=1)
    return df

In [38]:
tube_week = tube_summary(weekday_entries, weekday_exits)
tube_sat = tube_summary(sat_entries, sat_exits)
tube_sun = tube_summary(sun_entries, sun_exits)

These useful numbers can now be plugged into the nightlife index matrix - nlix - and ordered by area.

In [39]:
tube_week = tube_week.groupby('District').agg({'Lateness': 'mean', 'Total':'sum'})
tube_week = tube_week.rename(columns = lambda x : x+'_week')
nlix = nlix.join(tube_week, how='left')

tube_sat = tube_sat.groupby('District').agg({'Lateness': 'mean', 'Total':'sum'})
tube_sat = tube_sat.rename(columns = lambda x : x+'_sat')
nlix = nlix.join(tube_sat, how='left')

tube_sun = tube_sun.groupby('District').agg({'Lateness': 'mean', 'Total':'sum'})
tube_sun = tube_sun.rename(columns = lambda x : x+'_sun')
nlix = nlix.join(tube_sun, how='left')

In [40]:
nlix.tail(8)

Unnamed: 0_level_0,Premises,Certificates,Total_week,Lateness_week,Total_sat,Lateness_sat,Total_sun,Lateness_sun
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
WC2A,,,,,,,,
WC2B,,,8987.0,5.102926,10336.0,7.316273,3378.0,5.119597
WC2E,,,6485.0,5.639784,9046.0,5.648906,3005.0,4.252246
WC2H,,,20001.0,5.681166,33278.0,6.226246,11054.0,4.365388
WC2R,,,1770.0,4.948588,1670.0,6.097006,451.0,4.67184
WD23,,,,,,,,
WD3,,,1024.0,6.079515,934.0,7.170662,353.0,5.463242
WD6,,,,,,,,


## Hotness by restaurants

Another way to measure the nightlife in the area is to look at restaurants in the area. One way of getting a list of restaurants in the area is by using the OpenMaps API.

https://python-overpy.readthedocs.io/en/latest/index.html

In [41]:
import overpy

api = overpy.Overpass()
result = api.query("""<osm-script>
    <query type="node">
      <has-kv k="amenity" v="restaurant"/>
      <bbox-query s="51.27" n="51.687" w="-0.488" e="0.235"/>
    </query>
    <print/>
</osm-script>""")

In [42]:
len(result.nodes)

3397

In [43]:
node = result.nodes[2]

In [44]:
node

<overpy.Node id=26544484 lat=51.3980144 lon=-0.1722345>

In [45]:
node.tags

{u'addr:housename': u'The Crown Inn',
 u'addr:housenumber': u'407',
 u'addr:postcode': u'CR4 4BG',
 u'addr:street': u'London Road',
 u'amenity': u'restaurant',
 u'cuisine': u'indian',
 u'fhrs:id': u'200005',
 u'name': u'Casuarina Tree',
 u'old_name': u'The Crown Inn',
 u'toilets': u'yes',
 u'toilets:access': u'customers'}

One issue is that although here we have the postcode under the tag: "addr:postcode", this varies significantly (sometimes not present at all, other times with a different name). Could address this issue by instead relying on the latitude and longitude and converting these back into a postcode, i.e. via:

http://postcodes.io/docs


In [46]:
import requests

r = requests.get("https://api.postcodes.io/postcodes?lon="+str(node.lon)+"&lat="+str(node.lat))

In [47]:
r.content[37:40]

'CR4'

The following code took around 5 mins for 500 results, so a bit slow for my tastes:

```
nlix['n_restaurants'] = 0 

for node in result.nodes:
    r = requests.get("https://api.postcodes.io/postcodes?lon="+str(node.lon)+"&lat="+str(node.lat))
    postcode = r.content[37:40]
    if postcode in nlix.index:
        nlix.loc[postcode,'n_restaurants'] += 1
    r.close()
```

Will instead opt for a quicker fix, using the ones that have post-codes available to estimate the density.

In [48]:
nlix['n_rest_osm'] = 0

In [49]:
for node in result.nodes:
    if 'addr:postcode' in node.tags.keys():
        if node.tags['addr:postcode'][:-4] in nlix.index:
            nlix.loc[node.tags['addr:postcode'][:-4], 'n_rest_osm'] += 1
    elif 'postal_code' in node.tags.keys():
        if node.tags['postal_code'][:-4] in nlix.index:
            nlix.loc[node.tags['postal_code'][:-4], 'n_rest_osm'] += 1

In [50]:
nlix.n_rest_osm.sum()

1215

This is only about 1/5 of the total number of restaurants identified. Even then, the restaurants identified seem too few: I would expect London to have around 30,000

In [51]:
nlix.tail()

Unnamed: 0_level_0,Premises,Certificates,Total_week,Lateness_week,Total_sat,Lateness_sat,Total_sun,Lateness_sun,n_rest_osm
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
WC2H,,,20001.0,5.681166,33278.0,6.226246,11054.0,4.365388,21
WC2R,,,1770.0,4.948588,1670.0,6.097006,451.0,4.67184,5
WD23,,,,,,,,,0
WD3,,,1024.0,6.079515,934.0,7.170662,353.0,5.463242,0
WD6,,,,,,,,,0


### More APIs - Zomato

Can try some other APIs such as Zomato, to get the quality as well as an estimate of numbers around each district centroid.

In [52]:
api_keys = pd.read_csv('/home/alessandro/code/personal_api_keys.csv')

In [53]:
pcd.ix[2,:]

District                                        BR3
index                                      0.158689
lat                                         51.4035
long                                     -0.0314918
coords            (51.4035087892, -0.0314917635659)
radius                                      1066.96
LocalAuthority                            E09000006
Name: 2, dtype: object

In [54]:
locationUrlFromLatLong = "https://developers.zomato.com/api/v2.1/geocode?lat=51.4035652709&lon=-0.0313445208231"
header = {"User-agent": "curl/7.43.0", "Accept": "application/json", "user_key": api_keys['zomato'][0]}

response = requests.get(locationUrlFromLatLong, headers=header)

In [55]:
rr = response.json()

In [56]:
rr['popularity']

{u'city': u'London',
 u'nearby_res': [u'6122774',
  u'6110729',
  u'6117661',
  u'6115752',
  u'6117297',
  u'6102645',
  u'6116923',
  u'6111241',
  u'6111392'],
 u'nightlife_index': u'2.00',
 u'nightlife_res': u'10',
 u'popularity': u'3.39',
 u'popularity_res': u'100',
 u'subzone': u'Beckenham',
 u'subzone_id': 61499,
 u'top_cuisines': [u'Cafe', u'Curry', u'British', u'Italian', u'Indian']}

In [57]:
nlix['nl_ix_zomato'] = 0.0
nlix['nl_res_zomato'] = 0.0
nlix['pop_ix_zomato'] = 0.0
nlix['pop_res_zomato'] = 0.0

header = {"User-agent": "curl/7.43.0", "Accept": "application/json", "user_key": api_keys['zomato'][0]}
for i in range(0, len(pcd)):
    locationUrlFromLatLong = "https://developers.zomato.com/api/v2.1/geocode?lat="+str(pcd.lat[i])+"&lon="+str(pcd.long[i])
    response = requests.get(locationUrlFromLatLong, headers=header)
    jresp = response.json()
    nlix.ix[i,'nl_ix_zomato'] = float(jresp['popularity']['nightlife_index'])
    nlix.ix[i,'nl_res_zomato'] = float(jresp['popularity']['nightlife_res'])
    nlix.ix[i,'pop_ix_zomato'] = float(jresp['popularity']['popularity'])
    nlix.ix[i,'pop_res_zomato'] = float(jresp['popularity']['popularity_res'])
    

In [58]:
nlix.tail()

Unnamed: 0_level_0,Premises,Certificates,Total_week,Lateness_week,Total_sat,Lateness_sat,Total_sun,Lateness_sun,n_rest_osm,nl_ix_zomato,nl_res_zomato,pop_ix_zomato,pop_res_zomato
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
WC2H,,,20001.0,5.681166,33278.0,6.226246,11054.0,4.365388,21,5.0,10.0,5.0,100.0
WC2R,,,1770.0,4.948588,1670.0,6.097006,451.0,4.67184,5,5.0,10.0,5.0,100.0
WD23,,,,,,,,,0,0.76,10.0,1.09,100.0
WD3,,,1024.0,6.079515,934.0,7.170662,353.0,5.463242,0,0.67,10.0,1.3,100.0
WD6,,,,,,,,,0,1.88,10.0,2.78,100.0


In [59]:
nlix.to_pickle('/home/alessandro/Documents/placemake/nightlife/nightlife_index.pkl')

In [60]:
len(nlix)

263

In [61]:
nlix_max = nlix/nlix.max(axis=0)

In [62]:
nightlifeix = nlix_max[['Premises', 'Certificates', 'Lateness_week', 'Total_week', 'Total_sat', 'Lateness_sat', 'nl_ix_zomato']].fillna(0).mean(axis=1)

In [63]:
nightlifeix = pd.DataFrame(index=nlix_max.index, columns=['Popularity', 'Partying', 'Food', 'LateNight'])

In [64]:
nightlifeix['Partying'] = nlix_max[['Premises', 'Certificates', 'Lateness_week', 'Lateness_sat']].fillna(0).mean(axis=1)

In [65]:
nightlifeix['Food'] = nlix_max[['Total_week', 'Total_sat', 'Total_sun', 'nl_ix_zomato']].fillna(0).mean(axis=1)

In [66]:
nightlifeix['Popularity'] = nlix_max[['Total_week', 'Total_sat', 'Total_sun', 'pop_ix_zomato']].fillna(0).mean(axis=1)

In [67]:
nightlifeix['LateNight'] = nlix_max[['Lateness_week', 'Lateness_sat']].fillna(0).mean(axis=1)

In [69]:
nightlifeix['Latitude'] = pcd['lat'].values
nightlifeix['Longitude'] = pcd['long'].values
nightlifeix['sme_index'] = pcd['index'].values

In [70]:
nightlifeix.to_csv('/home/alessandro/Documents/placemake/nightlife/nightlife_index.csv')

In [71]:
nightlifeix.corr()

Unnamed: 0,Popularity,Partying,Food,LateNight,Latitude,Longitude,sme_index
Popularity,1.0,0.332663,0.969851,0.411197,0.065,0.0136,-0.193029
Partying,0.332663,1.0,0.289326,0.920662,0.347187,-0.056075,-0.014259
Food,0.969851,0.289326,1.0,0.37605,0.024182,-0.012988,-0.212399
LateNight,0.411197,0.920662,0.37605,1.0,0.350244,-0.102286,-0.043261
Latitude,0.065,0.347187,0.024182,0.350244,1.0,0.14657,0.160182
Longitude,0.0136,-0.056075,-0.012988,-0.102286,0.14657,1.0,-0.070447
sme_index,-0.193029,-0.014259,-0.212399,-0.043261,0.160182,-0.070447,1.0


Finally added all the links so that they can be visualised on GMaps

https://drive.google.com/open?id=1TjAnbGpEY1kBMQV-lYeJOl6bnqk&usp=sharing