# Data processing for the SIM model

Initial data handling for preparing material to be used in [SpatInteModel.ipynb](SpatInteModel.ipynb).

NOTE: initial development is generated for the students population, with idea of the extending the methodology to the other categories of interest (residents, workers, international visitors).

# Initialisation

Uploading libraries and data

In [44]:

# Loading libraries
import os
import pandas as pd
import geopandas as gpd
from pathlib import Path
import matplotlib.pyplot as plt

# Dealing with the data
## Paths definition 
base_dir = str(Path(os.getcwd()).parent)  # Get main directory (should be parent to this file)
data_dir = os.path.join(base_dir,
                        "data",
                        "national_data") 
results_dir = os.path.join(base_dir,
                           "output")

if os.path.basename(base_dir) != "FutureHighStreets": #base_dir.split("/")[-1] != "FutureHighStreet":
    raise Exception(f"The base directory should point to the main project directory,"
                    f"but it points to {base_dir}")
    
## Variables definition
origin_data_folder = "LSOA"
origin_data_filename = "Lower_Layer_Super_Output_Areas_(December_2011)_Population_Weighted_Centroids"  # change depending on costum edits
origin_shp_path = os.path.join(data_dir,
                               origin_data_folder,
                               origin_data_filename) + ".shp"
destination_data_folder = "Retail_Centres"
destination_data_filename = "Retail_Centres_UK" # change depending on costum edits
destination_gpkg_path = os.path.join(data_dir,
                         destination_data_folder,
                         destination_data_filename) + ".gpkg"

# Data to generate destination attractiveness
retailLUT_filename = "RetailCentres_LUT.csv"
retailLUT_path = os.path.join(data_dir,
                              retailLUT_filename)

In [45]:

## Loading data
origin_shp = gpd.read_file(origin_shp_path)
origin_shp.head()

Unnamed: 0,objectid,lsoa11cd,lsoa11nm,x_coord,y_coord,geometry
0,1,E01012007,Middlesbrough 012A,449119.175,517017.509,POINT (449119.175 517017.509)
1,2,E01012085,Middlesbrough 010D,451722.55,517577.735,POINT (451722.550 517577.735)
2,3,E01012005,Hartlepool 014G,448657.056,533984.633,POINT (448657.056 533984.633)
3,4,E01012084,Middlesbrough 010C,451977.348,517832.468,POINT (451977.348 517832.468)
4,5,E01012002,Hartlepool 006D,449565.447,533268.699,POINT (449565.447 533268.699)


In [46]:
destination_gpkg = gpd.read_file(destination_gpkg_path,
                    layer = destination_data_filename)
destination_gpkg.head()

Unnamed: 0,RC_ID,RC_Name,Classification,Country,RegionNM,geometry
0,RC_EW_101,Hessle Road; Dairycoates; City of Kingston upo...,District Centre,England,Yorkshire and The Humber,"POLYGON ((507139.480 427542.820, 507119.354 42..."
1,RC_EW_1010,Belgrave Road; Belgrave; Leicester (East Midla...,District Centre,England,East Midlands,"POLYGON ((459267.618 305738.275, 459272.557 30..."
2,RC_EW_1014,Leicester; Leicester (East Midlands; England),Major Town Centre,England,East Midlands,"POLYGON ((458740.247 304138.277, 458745.186 30..."
3,RC_EW_1023,Nottingham; Nottingham (East Midlands; England),Regional Centre,England,East Midlands,"POLYGON ((458012.374 339846.724, 457992.010 33..."
4,RC_EW_1032,West Bridgford; Rushcliffe (East Midlands; Eng...,Town Centre,England,East Midlands,"POLYGON ((458803.075 337394.991, 458808.008 33..."


In [47]:
retailLUT = pd.read_csv(retailLUT_path)
retailLUT.head()

Unnamed: 0,Classification,Rank
0,District Centre,3
1,Local Centre,5
2,Major Town Centre,2
3,Market Town,9
4,Out of Town Shopping Centres,8


# Actual data processing

Three main processes:
1. Generate destination attractiveness;
2. Generate destination centroids;
3. Generate origin production/demand (population)

NOTE: this must be repeated for the other categories (not only students).

1. Generate destination attractivenes

We need to generate a column for the attractiveness of the retail centres, this will depend on the type (see Classification field) and on the size (which we'll extrapolate from their areal extension).

A way to have a combined value could be by simply multiplying the two, as we 'rank' higher the retails by importance and we consider that bigger retails centre are more attractive.

NOTE: we miss a count of actual retail units by retail centre so we are left with measuring their size by their areal extension

In [48]:
## Add 'rank' column from LUT (from the Classification)
# NOTE: the merge is possible because both tables have a common field (column) name 'Classification'

destination_gpkg = destination_gpkg.merge(retailLUT,
                                          on = 'Classification',
                                          how = 'left')
## Add 'size column as calculated from spatial extension
destination_gpkg["area"] = destination_gpkg['geometry'].area/ 10**6 # .area method give area in square, this will give area in sq km

destination_gpkg.head(5)

## generate combined column for attractiveness

Unnamed: 0,RC_ID,RC_Name,Classification,Country,RegionNM,geometry,Rank,area
0,RC_EW_101,Hessle Road; Dairycoates; City of Kingston upo...,District Centre,England,Yorkshire and The Humber,"POLYGON ((507139.480 427542.820, 507119.354 42...",3,0.245969
1,RC_EW_1010,Belgrave Road; Belgrave; Leicester (East Midla...,District Centre,England,East Midlands,"POLYGON ((459267.618 305738.275, 459272.557 30...",3,0.082949
2,RC_EW_1014,Leicester; Leicester (East Midlands; England),Major Town Centre,England,East Midlands,"POLYGON ((458740.247 304138.277, 458745.186 30...",2,0.515804
3,RC_EW_1023,Nottingham; Nottingham (East Midlands; England),Regional Centre,England,East Midlands,"POLYGON ((458012.374 339846.724, 457992.010 33...",1,0.712908
4,RC_EW_1032,West Bridgford; Rushcliffe (East Midlands; Eng...,Town Centre,England,East Midlands,"POLYGON ((458803.075 337394.991, 458808.008 33...",4,0.080642


2. Generate destination centroids

Once we added the pieces of information on their attractiveness, we can now proceed to generate the Retail centres centroids from the polygons file (CDRC) 

In [49]:
## Centroids of the Retail centres shp (destination shp)

# copy the original gpkg file, to change geometry after
centroids_destinations = destination_gpkg.copy()
# change the geometry
centroids_destinations.geometry = centroids_destinations['geometry'].centroid

centroids_destinations.head()


Unnamed: 0,RC_ID,RC_Name,Classification,Country,RegionNM,geometry,Rank,area
0,RC_EW_101,Hessle Road; Dairycoates; City of Kingston upo...,District Centre,England,Yorkshire and The Humber,POINT (507664.225 427664.830),3,0.245969
1,RC_EW_1010,Belgrave Road; Belgrave; Leicester (East Midla...,District Centre,England,East Midlands,POINT (459411.296 306107.108),3,0.082949
2,RC_EW_1014,Leicester; Leicester (East Midlands; England),Major Town Centre,England,East Midlands,POINT (458788.687 304569.653),2,0.515804
3,RC_EW_1023,Nottingham; Nottingham (East Midlands; England),Regional Centre,England,East Midlands,POINT (457363.648 339965.649),1,0.712908
4,RC_EW_1032,West Bridgford; Rushcliffe (East Midlands; Eng...,Town Centre,England,East Midlands,POINT (458787.427 337481.247),4,0.080642


3. Generate origin production/demand (population)

We need to join the n. of students (for now, then other categories to follow) to the LSOA centroids shp

In [50]:
## Simple join of the selected column

# Generating local data for the model

Create separate 'regional' folders where to run the SIM for specific area of interest, for example LADs. This folder can be then called from within the model, that is [SpatInteModel.ipynb](SpatInteModel.ipynb)

In [51]:
## First select LSOAs (our geographical unit) by ID from the national list... use specific LUT?


## Output selection of the processed data (origin and destination) in the specific folder
