# Data processing for the SIM model

Initial data handling for preparing material to be used in [SpatInteModel.ipynb](SpatInteModel.ipynb).

NOTE: initial development is generated for the students population, with idea of the extending the methodology to the other categories of interest (residents, workers, international visitors).

# Initialisation

Uploading libraries and data

In [30]:

# Loading libraries
import os
import pandas as pd
import geopandas as gpd
from pathlib import Path
import matplotlib.pyplot as plt

# Dealing with the data
## Paths definition 
base_dir = str(Path(os.getcwd()).parent)  # Get main directory (should be parent to this file)
data_dir = os.path.join(base_dir,
                        "data",
                        "national_data") 
results_dir = os.path.join(base_dir,
                           "output")

if os.path.basename(base_dir) != "FutureHighStreets": #base_dir.split("/")[-1] != "FutureHighStreet":
    raise Exception(f"The base directory should point to the main project directory,"
                    f"but it points to {base_dir}")
    
## Variables definition
origin_data_folder = "LSOA"
origin_data_filename = "Lower_Layer_Super_Output_Areas_(December_2011)_Population_Weighted_Centroids"  # change depending on costum edits
origin_shp_path = os.path.join(data_dir,
                               origin_data_folder,
                               origin_data_filename) + ".shp"
destination_data_folder = "Retail_Centres"
destination_data_filename = "Retail_Centres_UK" # change depending on costum edits
destination_gpkg_path = os.path.join(data_dir,
                         destination_data_folder,
                         destination_data_filename) + ".gpkg"

# Data to generate destination attractiveness
retailLUT_filename = "RetailCentres_LUT.csv"
retailLUT_path = os.path.join(data_dir,
                              retailLUT_filename)

## Loading data
origin_shp = gpd.read_file(origin_shp_path)
origin_shp.head()

destination_gpkg = gpd.read_file(destination_gpkg_path,
                    layer = destination_data_filename)
destination_gpkg.head()

retailLUT = pd.read_csv(retailLUT_path)
retailLUT.head()

Unnamed: 0,classification,Rank
0,District Centre,3
1,Local Centre,5
2,Major Town Centre,2
3,Market Town,9
4,Out of Town Shopping Centres,8


# Actual data processing

Three main processes:
1. Generate destination centroids;
2. Generate destination attractiveness;
3. Generate origin production/demand (population)

NOTE: this must be repeated for the other categories (not only students).

1. Generate destination centroids

We need to generate the Retail centres centroids from the polygons file (CDRC

In [29]:
# Centroids of the Retail centres shp (destination shp)

# copy the original gpkg file, to change geometry after
centroids_destinations = destination_gpkg.copy()
# change the geometry
centroids_destinations.geometry = centroids_destinations['geometry'].centroid

centroids_destinations.head()


Unnamed: 0,RC_ID,RC_Name,Classification,Country,RegionNM,geometry
0,RC_EW_101,Hessle Road; Dairycoates; City of Kingston upo...,District Centre,England,Yorkshire and The Humber,POINT (507664.225 427664.830)
1,RC_EW_1010,Belgrave Road; Belgrave; Leicester (East Midla...,District Centre,England,East Midlands,POINT (459411.296 306107.108)
2,RC_EW_1014,Leicester; Leicester (East Midlands; England),Major Town Centre,England,East Midlands,POINT (458788.687 304569.653)
3,RC_EW_1023,Nottingham; Nottingham (East Midlands; England),Regional Centre,England,East Midlands,POINT (457363.648 339965.649)
4,RC_EW_1032,West Bridgford; Rushcliffe (East Midlands; Eng...,Town Centre,England,East Midlands,POINT (458787.427 337481.247)


2. Generate destination attractivenes

We need to generate a column for the attractiveness of the retail centres, this will depend on the type (see Classification field) and on the size (which we'll extrapolate from their areal extension).

We obtain a combined value by simply multiplying the two (?).

In [None]:
# Generate destinations' attractiveness column

## Add 'rank' column from LUT (from the Classification)


## Add 'size column as calculated from spatial extension



3. Generate origin production/demand (population)

We need to join n. of students (for now, then other categories) to the LSOA centroids shp

In [None]:
# Add population (students) to the LSOA table

## Simple join of the selected column

# Generating local data for the model

Create separate 'regional' folders where to run the SIM for specific area of interest, for example LADs. This folder can be then called from within the model, that is [SpatInteModel.ipynb](SpatInteModel.ipynb)

In [None]:
## First select LSOAs (our geographical unit) by ID from the national list... use specific LUT?


## Output selection of the processed data (origin and destination) in the specific folder
