# Geographic data formatting for Priority Places Explorer

This notebook contains code that prepares the geographic data for use within the [Priority Places Explorer](https://priorityplaces.cdrc.ac.uk/) tool. Population weighted centroids for each geographic area contained within the [Priority Places for Food Index](https://data.cdrc.ac.uk/dataset/priority-places-food-index) are used within the associated visualisation. 

The source centroid data can be downloaded from the following links:

1. [Scotland Data Zone 2011 population weighted centroids](https://www.data.gov.uk/dataset/8aabd120-6e15-41bf-be7c-2536cbc4b2e5/data-zone-centroids-2011)
2. [England and Wales LSOA 2011 population weighted centroids](https://geoportal.statistics.gov.uk/datasets/ons::lsoa-dec-2011-population-weighted-centroids-in-england-and-wales/)
3. [Northern Ireland Super Output Area boundaries](https://www.nisra.gov.uk/publications/super-output-area-boundaries-gis-format)

The below code also requires the Priority Places for Food Index data to be stored within the directory given by the `data_directory` variable.

This notebook also formats the [GEOLYTIX UK Retail Points](https://geolytix.com/blog/supermarket-retail-points/) data for use within the retailer toggle button within the Priority Places Explorer tool. 

Before running the below code, ensure that the file names within your data_directory match with the file names shown. 


In [None]:
import pandas as pd
import geopandas as gpd

# Configure the data directory where the source data has been downloaded.
data_directory = '/data/'

In [None]:
# Load Scotland data
gdf_scotland = gpd.read_file(data_directory + 'SG_DataZoneCent_2011/SG_DataZone_Cent_2011.shp')
gdf_scotland = gdf_scotland.to_crs('EPSG:4326')
gdf_scotland['longitude'] = gdf_scotland.geometry.x
gdf_scotland['latitude'] = gdf_scotland.geometry.y

# Load England and Wales data
df_ew = pd.read_csv(data_directory + 'Lower_layer_Super_Output_Areas_(December_2011)_Population_Weighted_Centroids.csv')
gdf_ew = gpd.GeoDataFrame(df_ew, geometry=gpd.points_from_xy(df_ew['X'], df_ew['Y']), crs='EPSG:27700')
gdf_ew = gdf_ew.to_crs(4326)
gdf_ew['longitude'] = gdf_ew.geometry.x
gdf_ew['latitude'] = gdf_ew.geometry.y

# Load Northern Ireland data
gdf_ni = gpd.read_file(data_directory + 'SOA2011_Esri_Shapefile_0.zip')
gdf_ni.geometry = gdf_ni.geometry.centroid
gdf_ni = gdf_ni.to_crs(4326)
gdf_ni['lon'] = gdf_ni.geometry.x
gdf_ni['lat'] = gdf_ni.geometry.y

# Merge dataframes
pp_data = pd.read_csv(data_directory + 'priority_places_for_food_oct22.csv', index_col=0)
pp_data = pp_data.reset_index().rename({'index':'geo_code'}, axis=1)

pp_data = pp_data.merge(gdf_ew[['longitude', 'latitude', 'objectid', 'lsoa11cd', 'lsoa11nm']], left_on='geo_code', right_on='lsoa11cd', how='left')
pp_data = pp_data.merge(gdf_scotland, left_on='geo_code', right_on='DataZone', how='left')
pp_data = pp_data.merge(gdf_ni, left_on='geo_code', right_on='SOA_CODE', how='left')

# Lon/lats column mapping
pp_data['geo_label'] = pp_data['lsoa11nm'].fillna(pp_data['Name']).fillna(pp_data['SOA_LABEL'])
pp_data['longitude'] = pp_data['longitude_x'].fillna(pp_data['longitude_y'])
pp_data['latitude'] = pp_data['latitude_x'].fillna(pp_data['latitude_y'])
pp_data['longitude'] = pp_data['longitude'].fillna(pp_data['lon'])
pp_data['latitude'] = pp_data['latitude'].fillna(pp_data['lat'])

# Filter to columns of interest
pp_data = pp_data[['geo_code', 'geo_label', 'longitude', 'latitude'] + pp_data.columns[pp_data.columns.str.startswith('pp_dec')].tolist()]

# Convert to integer datatypes (rather than float)
pp_data.loc[:, pp_data.columns[pp_data.columns.str.startswith('pp_dec')].tolist()] = pp_data[pp_data.columns[pp_data.columns.str.startswith('pp_dec')].tolist()].astype('Int64')

# Save resulting dataframe
pp_data.to_csv(data_directory + 'priority_places_Oct2022_decile_domains_WGS.csv')

In [None]:
# Load Geolytix retail points data
retail_df = pd.read_csv(data_directory + 'GEOLYTIX - UK RetailPoints/uk_glx_open_retail_points_v24_202206.csv')
retail_df.loc[retail_df['size_band']=='< 3,013 ft2 (280m2)','size_code'] = 'Small convenience'
retail_df.loc[retail_df['size_band']=='3,013 < 15,069 ft2 (280 < 1,400 m2)', 'size_code'] = 'Mid-size'
retail_df.loc[retail_df['size_band']=='15,069 < 30,138 ft2 (1,400 < 2,800 m2)', 'size_code'] = 'Large'
retail_df.loc[retail_df['size_band']=='30,138 ft2 > (2,800 m2)', 'size_code'] = 'Very large'

retail_df[(~retail_df.county.isna()) & \
    (~retail_df.store_name.str.contains('Scilly')) & \
    (retail_df.store_name!='Spar Old Town Store')][['id', 'retailer', 'long_wgs', 'lat_wgs', 'size_band', 'size_code']] \
.to_csv(data_directory + 'retail_locations_glxv24_202206.csv', index=False)