# Priority Places Index

## Index domains and indices:

The focus is on identifying the places most at need, following discussions with Which? using this concept as a focus and to avoid too much variable multicollinearity

Format:
- Domain:
    - Indices

Priority Places Index:
- Proximity to and density of grocery retail facilities
    - Distance to the nearest largest store (km) (proximity)
    - Count of stores within 1km (density)
- Transport to and accessibility of grocery retail facilities
    - Average travel distance (km) (SIM)
    - Acessibility via public transport (2017-2020)   
- E-commerce access? *

- Proximity to and density of Non-supermarket food provision
    - Ethnic food stores and or convenience stores
        - Distance to nearest store (km)
        - Count of stores within 1km  
    - Market access:
        - Distance to nearest market (km)
        - Count of markets within 1km  
    - <span style="color:red"> Excluded dual inclusion of foodbanks for which as foodbank not viewed as a tenable solution for food insecurity </span>    
- Neighbourhood socio-economic and demographic
    - Proportion of population wexperiencing income deprivation (2011)
    - Proportion of population with no car access (2011)
    - Proportion of population who are pensioners (2011)
- Food for Families
    - Free school meal uptake (if data not available eligibiltiy)
    - Healthy start voucher usage
    - Food banks:
        - Distance to nearest foodbank (km)
        - Count of foodbanks within 1km ***       
- Fuel poverty pressures
    - Proportion of household in fuel poverty (2017-20)
    - Prepayment meter prevalence 
- Low cost grocery retail food provision **
    - Number of retailers within 1 km providing budget line 
    - Distance to nearest retailer providing budget line
    - <span style="color:red"> Discussion of some sort of moderation of these values based on the quality of offering (e.g. cost saving and breadth of useful products- how we'd qunaitify this might be based on general conesus/inhouse ranking of supermarket provision) </span>    


\* Discussion around including this in the index or as a way to toggle/ identify whether instore vs. online budget lines would be most useful (online shopping not regularly use by most deprived but could be used to budget by struggling families). 

** Whilst Which? seem sure we will get this data we are sceptical - only domain not relying on open data 

\*** Might be worth doing some sensitivity testing, want to capture areas with high density/demand but not over too large a geography ( considering ability to travel to the food bank), however 1 km might be too small a spatial scale?


# Set-up
## Install required packages

In [1]:
import pandas as pd
import numpy as np
import scipy as sp

## Set data directory

In [2]:
data_directory = '/workspaces/priority-places-calculator/data/'

## Read in data

In [None]:
df = pd.read_csv(data_directory + 'infuse_lsoa_lyr_2011/infuse_lsoa_lyr_2011.csv')

# LSOA level
#lsoa_centroids = pd.read_csv(data_directory + 'Lower_layer_Super_Output_Areas_(December_2011)_Population_Weighted_Centroids_WGS.csv')

# efdi merging
efdi = pd.read_csv(data_directory + 'EFDI_IndividualVariables.csv')
df = df.merge(efdi, left_on='geo_code', right_on='id', how='left', indicator=True)
df.rename({'_merge': 'efdi_merge', 'Propn':'pensioners'}, inplace=True, axis=1)

# fuel poverty merging
fuel_poverty = pd.read_csv(data_directory + 'fuel_poverty.csv')
df = df.merge(fuel_poverty, left_on='geo_code', right_on='lsoa11cd', how='left', indicator=True)
df.rename({'_merge':'fuel_merge', 'Percent of households in fuel poverty':'fuel_poverty_pct'}, inplace=True, axis=1)

# other retailers distance
other_retailers_distance = pd.read_csv(data_directory + 'lsoa_non-supermarket_dist.csv')
df = df.merge(other_retailers_distance, left_on='geo_code', right_on='lsoa11cd', how='left', indicator=True)
df.rename({'_merge': 'other_retailers_distance_merge', '0': 'other_retailers_distance'}, inplace=True, axis=1)

# other retailers 1km count
other_retailers_1km_count = pd.read_csv(data_directory + 'lsoa_non-supermarket_1km_count.csv')
df = df.merge(other_retailers_1km_count, left_on='geo_code', right_on='lsoa11cd', how='left', indicator=True)
df.rename({'_merge': 'other_retailers_1km_count_merge', 'FHRSID':'other_retailers_1km_count'}, inplace=True, axis=1)

# prepayment_meters merging
prepayment_meters = pd.read_csv(data_directory + 'Prepayment_meter_2017_with_household_count.csv')
df = df.merge(prepayment_meters, left_on='geo_code', right_on='Lower Layer Super Output Area (LSOA) Code', how='left', indicator=True)
df.rename({'_merge': 'prepayment_merge'}, inplace=True, axis=1)
df['prepayment_prevalence'] = df['Total meters'] / df['Occupied_Households']

# postcode level lookup
postcode_centroids = pd.read_csv(data_directory + 'ONS_Postcode_Directory_(Latest)_Centroids.csv')
df = df.merge(postcode_centroids.groupby('lsoa11')['oslaua'].first(), left_on='geo_code', right_on='lsoa11', how='left', indicator=True)
df.rename({'_merge': 'postcode_lad_merge'}, inplace=True, axis=1)

# Impute prepayment prevalence
lad_prepayment_median = df[['geo_code', 'oslaua', 'prepayment_prevalence']].groupby('oslaua')['prepayment_prevalence'].median()
df = df.merge(lad_prepayment_median, left_on='oslaua', right_index=True, how='inner', suffixes=('', '_lad'))
df['prepayment_prevalence'] = df['prepayment_prevalence'].fillna(df['prepayment_prevalence_lad'])

# Food bank distance merging
foodbank_distance = pd.read_csv(data_directory + 'postcode_to_nearest_foodbank_distance.csv')
foodbank_distance = foodbank_distance.merge(postcode_centroids[['pcd', 'lsoa11']], left_on='pcd', right_on='pcd', how='left', indicator=True)
foodbank_distance = foodbank_distance.groupby('lsoa11')['0'].mean()
df = df.merge(foodbank_distance, left_on='geo_code', right_index=True, how='left', indicator=True)
df.rename({'_merge': 'foodbank_distance_merge', '0': 'foodbank_distance'}, inplace=True, axis=1)

#market_distance
market_distance = pd.read_csv(data_directory + 'postcode_to_nearest_market_distances.csv')
market_distance = market_distance.merge(postcode_centroids[['pcd', 'lsoa11']], left_on='pcd', right_on='pcd', how='left', indicator=True)
market_distance = market_distance.groupby('lsoa11')['0'].mean()
market_distance = market_distance.reset_index()
market_distance = market_distance[~market_distance['lsoa11'].str[0].isin(['9', 'S'])]
df = df.merge(market_distance, left_on='geo_code', right_on='lsoa11', how='left', indicator=True)
df.rename({'_merge': 'market_distance_merge', '0': 'market_distance'}, inplace=True, axis=1)

market_1km_count = pd.read_csv(data_directory + 'postcode_market_1km_count.csv')
market_1km_count = market_1km_count.merge(postcode_centroids[['pcd', 'lsoa11']], left_on='pcd', right_on='pcd', how='left', indicator=True)
market_1km_count = market_1km_count.groupby('lsoa11')['overlap_count'].mean()
market_1km_count = market_1km_count.reset_index()
market_1km_count = market_1km_count[~market_1km_count['lsoa11'].str[0].isin(['9', 'S'])]
df = df.merge(market_1km_count, left_on='geo_code', right_on='lsoa11', how='left', indicator=True)
df.rename({'_merge': 'market_1km_count_merge', 'overlap_count': 'market_1km_count'}, inplace=True, axis=1)


# healthy start voucher uptake
hsv = pd.read_csv(data_directory + 'healthy_start_voucher_uptake.csv')
hsv = hsv[~hsv['Uptake (%)'].isna()][['lsoa11cd', 'Uptake (%)']]
df = df.merge(hsv, left_on='geo_code', right_on='lsoa11cd', how='left', indicator=True)
df.rename({'_merge': 'hsv_merge', 'Uptake (%)': 'healthy_start_voucher_uptake'}, inplace=True, axis=1)

# free school meals
fsm = pd.read_csv(data_directory + 'Free_school_meals.csv')
df = df.merge(fsm, left_on='geo_code', right_on='lsoa11cd', how='left', indicator=True)
df.rename({'_merge': 'fsm_merge'}, inplace=True, axis=1)

# NI supermarket distance
NI_supermarket_distance = pd.read_csv(data_directory + 'NI_postcode_to_nearest_large_supermarket_distance.csv')
NI_supermarket_distance.rename({'0':'NI_nearest_lstore'}, axis=1, inplace=True)
NI_supermarket_distance = NI_supermarket_distance.merge(postcode_centroids[['pcd', 'lsoa11']], left_on='pcd', right_on='pcd', how='left', indicator=True)
NI_supermarket_distance = NI_supermarket_distance.groupby('lsoa11')['NI_nearest_lstore'].mean()
df = df.merge(NI_supermarket_distance, how='left', left_on='geo_code', right_on='lsoa11')
df['nearest lstore'] = df['nearest lstore'].fillna(df['NI_nearest_lstore'])

# NI supermarket count
NI_supermarket_count = pd.read_csv(data_directory + 'NI_postcode_to_nearest_large_supermarket_1km_count.csv')
NI_supermarket_count.rename({'overlap_count':'NI_stores1km'}, axis=1, inplace=True)
NI_supermarket_count = NI_supermarket_count.merge(postcode_centroids[['pcd', 'lsoa11']], left_on='pcd', right_on='pcd', how='left', indicator=True)
NI_supermarket_count = NI_supermarket_count.groupby('lsoa11')['NI_stores1km'].mean()
df = df.merge(NI_supermarket_count, how='left', left_on='geo_code', right_on='lsoa11')
df['stores1km'] = df['stores1km'].fillna(df['NI_stores1km'])

# NI income dep 
NI_income = pd.read_csv(data_directory + 'NIMDM17_SOA_income.csv')
NI_income = NI_income.drop(['Unnamed: 6', 'Unnamed: 7'], axis=1)
NI_income = NI_income.dropna()
NI_income['NI_income_dep'] = pd.to_numeric(NI_income['Proportion of the population living in households whose equivalised income is below 60 per cent of the NI median \n(%)'].str.replace('%', '')) / 100.0
df = df.merge(NI_income[['SOA2001', 'NI_income_dep']], how='left', left_on='geo_code', right_on='SOA2001')
df['income deprivation'] = df['income deprivation'].fillna(df['NI_income_dep'])

# NI car access
car_access = pd.read_csv(data_directory + 'UK_car_access_2011census.csv', skiprows=7)
car_access = car_access[~car_access.Area.isna()]
car_access['geography_type'] = car_access['Area'].str.split(':').apply(lambda x: x[0])
car_access['geo_code'] = car_access['Area'].str.split(':').apply(lambda x: x[1]).str.replace(' ', '')
car_access['proportion no car'] = car_access['No cars or vans in household']/ pd.to_numeric(car_access['All categories: Car or van availability'])
df = df.merge(car_access[['geo_code', 'proportion no car']], how='left', left_on='geo_code', right_on='geo_code')

# NI FSM
ni_fsm = pd.read_csv(data_directory + 'NI_FSM_cleaned.csv')
df = df.merge(ni_fsm[['LSOA11CD', 'prop fsme']], how='left', left_on='geo_code', right_on='LSOA11CD')
df['fsm_eligible_percent'] = df['fsm_eligible_percent'].fillna(df['prop fsme'])

In [None]:
indicator_cols = [
    # Proximity to and density of retail facilities
    'stores1km', 
    'nearest lstore', 
    
    # Transport to and accessibility of grocery retail facilities
    'pt-retail', 
    'Average trip distance', 
    
    # E-commerce access
    'zshoponline', 
    'totaldeliv', 
    
    # Neighbourhood socio-economic and demographic
    'proportion no car', 
    'income deprivation', 
    
    # Proximity to and density of non-supermarket food provision
    'other_retailers_distance',
    'other_retailers_1km_count',
    'market_1km_count', 
    'market_distance', 
    
    # Food for families
    'fsm_eligible_percent',
    'healthy_start_voucher_uptake', 
    'foodbank_distance', 
    
    #Fuel poverty pressures
    'fuel_poverty_pct',
    'prepayment_prevalence']


priority_places = df[['geo_code'] + indicator_cols].copy()

priority_places = priority_places.drop_duplicates()

# Drop Isles of Scilly
priority_places.drop(priority_places[priority_places['geo_code']=='E01019077'].index, inplace=True)

# Now we have all the raw data, we can transform it into our index. 
# First set the geo_code as the index
priority_places.set_index('geo_code', inplace=True)

# The first task is to orient each indicator in the correct direction
# i.e. so that high values correspond to higher priority places
priority_places = pd.concat([1 * priority_places[[
                    'nearest lstore', 
                    'pt-retail', 
                    'Average trip distance', 
                    'proportion no car', 
                    'income deprivation', 
                    'market_distance', 
                    'fuel_poverty_pct', 
                    'prepayment_prevalence', 
                    'other_retailers_distance', 
                    'fsm_eligible_percent', 
                    'healthy_start_voucher_uptake']], 
                  -1 * priority_places[[
                      'foodbank_distance', 
                      'stores1km', 
                      'zshoponline', 
                      'totaldeliv', 
                      'market_1km_count', 
                      'other_retailers_1km_count']]], axis=1)

# Find our country-level denominators
priority_places['country'] = priority_places.index.str[0]
country_counts = priority_places.reset_index().groupby('country')['geo_code'].count()
priority_places = priority_places.merge(country_counts, left_on='country', right_index=True, how='inner')
priority_places.rename({'geo_code': 'country_denominator'}, inplace=True, axis=1)


# Perform ranking of each indicator
priority_places.fillna(0, inplace=True)
priority_places_ranked = priority_places.groupby('country').rank(method='min', ascending=False).astype(int)
for c in priority_places_ranked[indicator_cols].columns: 
    priority_places_ranked[c] = (priority_places_ranked[c] - 0.5) / priority_places['country_denominator']
    priority_places_ranked[c] = sp.stats.norm.ppf(priority_places_ranked[c],loc=0,scale=1)
    
priority_places_ranked['country'] = priority_places_ranked.index.str[0]

#Combine transformed indicators into domains
priority_places_ranked['domain_supermarket_proximity'] = 0.5 * priority_places_ranked[['nearest lstore', 'stores1km']].sum(axis=1)
priority_places_ranked['domain_supermarket_accessibility'] = 0.5 * priority_places_ranked[['pt-retail', 'Average trip distance']].sum(axis=1)
priority_places_ranked['domain_ecommerce_access'] = 0.5 * priority_places_ranked[['zshoponline', 'totaldeliv']].sum(axis=1)
priority_places_ranked['domain_socio_demographic'] = (1./2.) * priority_places_ranked[[ 'proportion no car', 'income deprivation']].sum(axis=1)
priority_places_ranked['domain_nonsupermarket_proximity'] = (1./4.) * priority_places_ranked[['other_retailers_distance','other_retailers_1km_count','market_1km_count', 'market_distance']].sum(axis=1)
priority_places_ranked['domain_food_for_families'] = (1./4.) * priority_places_ranked[['foodbank_distance', 'healthy_start_voucher_uptake', 'fsm_eligible_percent']].sum(axis=1)
priority_places_ranked['domain_fuel_poverty'] = 0.5 * priority_places_ranked[['fuel_poverty_pct','prepayment_prevalence']].sum(axis=1)

domain_columns = ['domain_supermarket_proximity', 
                  'domain_supermarket_accessibility', 
                  'domain_ecommerce_access', 
                  'domain_socio_demographic', 
                  'domain_nonsupermarket_proximity', 
                  'domain_food_for_families', 
                  'domain_fuel_poverty']

# Rank the domains
priority_places_domains = priority_places_ranked[domain_columns + ['country']].groupby('country').rank(method='min').astype(int)
priority_places_domains['country'] = priority_places_domains.index.str[0]

priority_places_domains = priority_places_domains.merge(country_counts, left_on='country', right_index=True, how='inner')
priority_places_domains.rename({'geo_code': 'country_denominator'}, inplace=True, axis=1)

priority_places_domains_normalised = pd.DataFrame(columns=priority_places_domains[domain_columns].columns)
for c in priority_places_domains[domain_columns].columns:
    priority_places_domains_normalised[c] = -23 * np.log(1 - (priority_places_domains[c] / priority_places_domains['country_denominator']) * (1 - np.exp(- 100 / 23)))

priority_places_domains['combined'] = (1./8.) * priority_places_domains_normalised['domain_supermarket_proximity'] + \
(1./8.) * priority_places_domains_normalised['domain_supermarket_accessibility'] + \
(1./8.) * priority_places_domains_normalised['domain_ecommerce_access'] + \
(1./8.) * priority_places_domains_normalised['domain_nonsupermarket_proximity'] + \
(1./6.) * priority_places_domains_normalised['domain_socio_demographic'] + \
(1./6.) * priority_places_domains_normalised['domain_food_for_families'] + \
(1./6.) * priority_places_domains_normalised['domain_fuel_poverty']

priority_places_domains['combined'] = priority_places_domains[['country', 'combined']].groupby('country').rank(method='min').astype(int)

priority_places_deciles = priority_places_domains.copy()
for country in ['E', 'S', 'W', '9']:
    for col in domain_columns + ['combined']:
        if country == '9' and col in ['domain_ecommerce_access', 'domain_supermarket_accessibility', 'domain_fuel_poverty']:
            priority_places_deciles.loc[priority_places_deciles['country']==country, col] = 0
        else:
            priority_places_deciles.loc[priority_places_deciles['country']==country, col] = pd.to_numeric(pd.qcut(priority_places_domains.loc[priority_places_deciles['country']==country, col], 10, duplicates='drop', labels=range(1,11)))
            
priority_places_full = priority_places_domains.merge(priority_places_deciles, left_index=True, right_index=True, suffixes=('', '_decile'))
priority_places_full.drop(['country_decile', 'country_denominator_decile'], axis=1, inplace=True)           

priority_places_full.loc[priority_places.index.str.startswith('9'), 
                        ['domain_supermarket_accessibility', 
                         'domain_ecommerce_access', 
                         'domain_fuel_poverty', 
                         'domain_supermarket_accessibility_decile', 
                         'domain_domain_ecommerce_access_decile', 
                         'domain_fuel_poverty_decile']] = pd.NA

priority_places_full.loc[priority_places.index.str.startswith('9'), ['country']] = 'NI'

priority_places_full.to_csv(data_directory + 'priority_places_new.csv')

In [None]:
# Reduce weighting on accessibility domains to test London specific version of the index
priority_places_domains['combined'] = (1./12.) * priority_places_domains_normalised['domain_supermarket_proximity'] + \
(1./12.) * priority_places_domains_normalised['domain_supermarket_accessibility'] + \
(1./12.) * priority_places_domains_normalised['domain_ecommerce_access'] + \
(1./12.) * priority_places_domains_normalised['domain_nonsupermarket_proximity'] + \
(2./9.) * priority_places_domains_normalised['domain_socio_demographic'] + \
(2./9.) * priority_places_domains_normalised['domain_food_for_families'] + \
(2./9.) * priority_places_domains_normalised['domain_fuel_poverty']

priority_places_domains['combined'] = priority_places_domains[['country', 'combined']].groupby('country').rank(method='min').astype(int)

priority_places_deciles = priority_places_domains.copy()
for country in ['E', 'S', 'W', '9']:
    for col in domain_columns + ['combined']:
        if country == '9' and col in ['domain_ecommerce_access', 'domain_supermarket_accessibility', 'domain_fuel_poverty']:
            priority_places_deciles.loc[priority_places_deciles['country']==country, col] = 0
        else:
            priority_places_deciles.loc[priority_places_deciles['country']==country, col] = pd.to_numeric(pd.qcut(priority_places_domains.loc[priority_places_deciles['country']==country, col], 10, duplicates='drop', labels=range(1,11)))
            
priority_places_full = priority_places_domains.merge(priority_places_deciles, left_index=True, right_index=True, suffixes=('', '_decile'))
priority_places_full.drop(['country_decile', 'country_denominator_decile'], axis=1, inplace=True)           

priority_places_full.loc[priority_places.index.str.startswith('9'), 
                        ['domain_supermarket_accessibility', 
                         'domain_ecommerce_access', 
                         'domain_fuel_poverty', 
                         'domain_supermarket_accessibility_decile', 
                         'domain_domain_ecommerce_access_decile', 
                         'domain_fuel_poverty_decile']] = pd.NA

priority_places_full.loc[priority_places.index.str.startswith('9'), ['country']] = 'NI'

priority_places_full.to_csv(data_directory + 'priority_places_london_weighting.csv')