# GEOG5990M Final Project

Student ID number: 201702946



## What are spatial patterns of socio-economic deprivation across Lower Super Output Areas (LSOAs) in Leeds and how can these patterns be best utilised to target a support programme?

### Introduction

Socio-economic deprivation exhibits significant spatial heterogeneity across urban neighborhoods with pockets of severe deprivation often concentrated in certain areas (Townsend, 1987). Identifying and understanding these spatial patterns is crucial for targeted interventions and resource allocation. This study aims to investigate the spatial distribution of socio-economic deprivation across LSOAs in Leeds, UK, utilizing census data from 2011. By employing multivariate techniques and spatial analysis methods, the research seeks to uncover the underlying patterns and identify the most deprived areas. Furthermore, it explores how these spatial insights can inform the design and implementation of support programs to effectively target and address the specific needs of the most disadvantaged communities (Talen and Anselin, 1998; Pampalon et al., 2009). The findings have the potential to guide policymakers and stakeholders in developing evidence-based strategies for tackling socio-economic inequalities and promoting inclusive urban development.

#### Loading packages

In [None]:
# importing required packages

import pandas as pd                # for creating and handling dataframes
import numpy as np                 # for numerical and scientific computing tasks
import matplotlib.pyplot as plt    # for creating visualizations, data exploration, analysis and presentation
import matplotlib.patheffects as path_effects   # for adding effects to text in Matplotlib plots
import seaborn as sns              # for creating informative and visually appealing statistical graphics

import geopandas as gpd            # for working with geospatial data
import pyproj                      # for geospatial transformations and projections
import contextily as ctx           # for fetching web map tiles for spatial visualizations.
import geoplot as gplt             # for geospatial plotting
import geoplot.crs as gcrs         # for setting coordinate reference systems
import scipy.stats as stats        # for providing statistical analysis functions and distributions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

from matplotlib_scalebar.scalebar import ScaleBar # for adding scale bars to plots.
from sklearn.preprocessing import StandardScaler  # for Standardising features by removing mean and scaling

import warnings
warnings.filterwarnings('ignore')  # for suppressing warning messages


#### Loading dataset 
Please note these datasets have been put together through this link and it is set to expire on 16/06/2024: https://leeds365-my.sharepoint.com/:f:/g/personal/gy23c2b_leeds_ac_uk/Eq7xR6dcWrRKgkgOtEe_AowBaTjYdYLDcf7WmHTydBINRQ?e=0FS6mj

In [None]:
# Data of deprivation variables for Leeds Lower Super Output Areas
# This is 2011 Census data sourced from https://www.nomisweb.co.uk/
leeds_variables = pd.read_csv("C:/Users/Chifuniro Baluwa/Documents/LEEDS/SEM 2/Programming for GIA (GEOG5990M)/Assignment 2/Data/Deprivation-Variables-LSOA-2011-Census.csv")

# england_LSOA shapefile sourced from https://geoportal.statistics.gov.uk/
england_LSOA = gpd.read_file("C:/Users/Chifuniro Baluwa/Documents/LEEDS/SEM 2/Programming for GIA (GEOG5990M)/Assignment 2/Data/Lower_layer_Super_Output_Areas_Dec_2011_Boundaries.geojson")


# Leeds Wards shapefile sourced from https://infuse.ukdataservice.ac.uk/
leeds_wards = gpd.read_file("C:/Users/Chifuniro Baluwa/Documents/LEEDS\SEM 2/Programming for GIA (GEOG5990M)/Assignment 2/Data/LeedsWards/LeedsWards.shp")

#### Justification of variable selection for Leeds socio-economic deprivation
The variables selected here to show spatial patterns of socio-economic deprivation are based on Townsend indexes, which focus on fewer, more direct indicators of 'lack' or 'want' (Townsend et al., 1988). Townsend et al. (1988) recommend using no more than six variables to capture the key dimensions of deprivation. The six selected variables from the 2011 Census data for Leeds are:

1. Unemployed: A high unemployment rate is a direct indicator of deprivation and lack of economic resources (Townsend et al., 1988; Norman, 2010).

2. Households with No Cars: Car ownership is a marker of material deprivation and lack of mobility (Townsend et al., 1988; Norman, 2010).

3. Overcrowded Households: Overcrowding is a direct measure of poor living conditions and lack of adequate housing (Townsend et al., 1988; Norman, 2010).

4. Lone Parent Households: Lone parenthood is associated with higher risks of poverty and deprivation (Townsend et al., 1988; Norman, 2010).

5. Low Social Class: Low occupational social class is a proxy for low income and economic deprivation (Townsend et al., 1988; Norman, 2010).

6. Social Rented Housing: High levels of social housing are indicative of disadvantaged areas with concentrations of deprivation (Townsend et al., 1988; Norman, 2010).

### Data preprocessing

#### Getting to know the selected Leeds census socio-economic variables

In [None]:
leeds_variables.head()    # Getting the picture of the census variables data set by looking at the first 5 rows

                           # The code has displayed that the Leeds census socio-economic variables have 10 columns
                           # and each value of the variable is assigned a Lower Super Output Area (LSOA) code

In [None]:
leeds_variables.index      # Getting to know the row index labels

                           # The executed code indicates that there are 32844 rows in the Leeds variables,
                           # starting at 0, stops after 891 entries and increases by a step of 1 for each row.

In [None]:
leeds_variables.info()      # Getting the overall summary of the Leeds census variable dataset

                           # Here it is noted that the data types for the Leeds census socio-economic variables are float and the LSOA Codes are object.
                           # So, overall, the dataset has 32844 entries and has used 2.5+ MB memory.

In [None]:
leeds_variables.describe()    # Understanding summary statistics for the numeric columns. 
                              # Knowing how much variability there is for these variables.

In [None]:
leeds_variables.isnull().sum() # Checking which columns have missing values and how many they are in the Leeds variables dataset

                               # There are no missing values in the Leeds variables.
                               # However, the zeroes shown for minimum in descriptive summary entail there are some rows with no values  

In [None]:
leeds_variables.nunique()      # Checking for unique elements in the  dataset

                               # The code returns unique values there are in each column of the leeds census variable dataset giving 
                               # information on the variety of data making up each feature.

### Data Cleaning, Data Exploratory and Data Manipulation

In [None]:
# A pre-requisite of calculating deprivation index requires data to be in percentages
# Therefore, calculating percentages of Leeds socio-economic variabled

leeds_variables['% unemployed'] = (leeds_variables['unemployed'] / leeds_variables['Persons']) * 100
leeds_variables['% no car'] = (leeds_variables['nocar'] / leeds_variables['Persons']) * 100
leeds_variables['% overcrowded'] = (leeds_variables['overcrowded'] / leeds_variables['Persons']) * 100
leeds_variables['% lone parents'] = (leeds_variables['loneparents'] / leeds_variables['Persons']) * 100     # Calculating percentages for each variable
leeds_variables['% low class'] = (leeds_variables['lowclass'] / leeds_variables['Persons']) * 100
leeds_variables['% social rent'] = (leeds_variables['socialrent'] / leeds_variables['Persons']) * 100

leeds_variables.head()    # Checking if the calculation has been performed 


In [None]:
leeds_variables.info()  # Checking the data types for the generated percentages

                        # The executed code indicate the datatype is in the right format

In [None]:
 # Rounding percentage leeds socio-economic variables to 2 decimal places for better visualisation.
leeds_variables[['% unemployed','% no car',
                 '% overcrowded','% lone parents',
                 '% low class','% social rent']] = leeds_variables[['% unemployed','% no car',
                                                                        '% overcrowded','% lone parents',
                                                                        '% low class','% social rent']].round(2)

leeds_variables.head()    # checking if the values have been rounded to 2 decimal places

In [None]:
# Dropping the columns of raw socio-economic variables as they will not be used in subsequence analysis

leeds_variables.drop(columns =['Persons', 'unemployed','nocar', 'overcrowded', 'loneparents', 'lowclass', 'socialrent'],
                     inplace=True)                                                                      # Dropping columns

leeds_variables.head()  # Checking if the operation is successful

#### Explore the association between socio-economic variables 

In [None]:
# plotting pairplot, adjust height so domain names fit on axis

sns.pairplot(leeds_variables[['% unemployed', '% no car',
                              '% overcrowded','% lone parents',    # Subsetting DataFrame to selected variables for pairplot
                              '% low class', '% social rent']],   
             palette='Dark2',  # Setting color palette
             height=2) # Setting the height of each subplot

It is noted that there is a positive association among the socio-economic variables. 

#### Quantify the association between socio-economic variables using Spearman's rank correlation

In [None]:
## Calculating Spearman's rank correlation to check how variables correlate to each other

leeds_variables_corr = leeds_variables[['% unemployed', '% no car',
                                        '% overcrowded', '% lone parents',
                                        '% low class', '% social rent']].corr(method = 'spearman') # Calculating Spearman's correlation

leeds_variables_corr   # checking the executed Spearman correlation

#### Modelling Leeds socio-economic variables using Spearman's rank correlation

In [None]:
###### Visualising the Spearman's rank correlation of socio-economic variables

fig, ax = plt.subplots(figsize=(8, 8))   # Defining plot size

upper_triangle_mask = np.triu(np.ones_like(leeds_variables_corr))  # Defining mask to apply to upper right-hand corner of the plot

# Plotting a heatmap of the correlation dataframe
sns.heatmap(leeds_variables_corr, 
            annot=True,   # Annotating with Spearman's rank correlation values
            cmap='RdBu',  # Defining color map cividis
            vmin=-1,  # Defining minimum color on color bar
            vmax=1,  # Defining maximum color on color bar
            mask=upper_triangle_mask,  # Apply mask to upper triangle
            cbar_kws={'label': "Spearman's Rank Correlation"},  # Labelling for color bar
            ax=ax)  # Plotting on the defined axis

ax.set_xlabel("Socio-economic Deprivation Variables", fontsize=11) # Setting axis labels x with font size 11
ax.set_ylabel("Socio-economic Deprivation Variables", fontsize=11) # Setting axis labels y with font size 11

ax.set_title('Correlation of Socio-economic Deprivation Variables', fontsize=14)  # Setting title with font size of 14

plt.show() # Displaying the plot

A Spearman's rank correlation is chosen for modelling the association between socio-economic variables for LSOAs because:

1. It is suitable for ordinal data, such as socio-economic deprivation variables that rank areas from most to least deprived.

2. It is resistant to outliers, which is important when dealing with socio-economic data that may have extreme values in certain LSOAs.

3. It does not assume normality of the variables, which is often violated in socio-economic data.

These reasons are supported by Vyas and Kumaranayake (2006), who stated that non-parametric methods like Spearman's rank correlation are more appropriate for ordinal or non-normally distributed variables when describing associations.

#### Standardising socio-economic variables

In [None]:
leeds_variables.describe()  # re-checking the summary statistics of leeds variables.

                            #The derived descriptive statistics indicated that the variables are not on comparable scales, 
                            # they need to be standardised for the variables to be weighed the same when developing the deprivation index.

In [None]:
# Displaying how variables look when unstandardised and not on a comparable scale

socio_economic_variables = leeds_variables[['% unemployed', '% no car',
                                            '% overcrowded', '% lone parents',  # Subsetting the DataFrame based on selected variables
                                            '% low class', '% social rent']]

# Plot boxplot
plt.figure(figsize=(10, 8))  # Setting the size of the figure
sns.boxplot(data=socio_economic_variables, showfliers=True)  # Creating a boxplot for the socio-economic variables, showing outliers
   
plt.title('Unstandardised Socio-economic Variables') # Setting the title of the plot
plt.xlabel('Socio-economic variables') # Setting the label for the x-axis

plt.grid(axis='y')  # Adding gridlines to y-axis

plt.show()     # Displaying the plot

In [None]:
# Standardize the socio-economic variables - to make the variables be on the same comparable scale

scaler = StandardScaler()   # Initialising StandardScaler object
standardized_variables = scaler.fit_transform(socio_economic_variables) # Standardizing the socio-economic variables

# Creating a DataFrame for standardized variables
standardized_df = pd.DataFrame(standardized_variables, columns=socio_economic_variables.columns)

# Adding standardized variables to the original DataFrame
leeds_variables = pd.concat([leeds_variables, standardized_df.add_prefix('std_')], axis=1)

leeds_variables.head() # checking if standardisation has been achieved and variables added to the original dataframe of leed variables

In [None]:
### Plotting boxplot of standardized variables

plt.figure(figsize=(10, 8))   # Setting the size of the figure
sns.boxplot(data=standardized_df, showfliers=True)  # Creating a boxplot for the standardized socio-economic variables showing outlier

plt.title('Standardized Socio-economic Variables')  # Setting the title of the plot
plt.xlabel('Socio-economic variables')  # Setting the label for the x-axis

plt.grid(axis='y')  # Adding gridlines to y-axis

plt.show()  # Displaying the plot

The plot now shows that the variables are now on a comparable scale. 	Posiitve-scores can be interpreted as the value for the location is higher than the(nationa) average and negative values lower than averag (Noble et al., 2006).


#### Combining standardised values to be a deprivation index

In [None]:
#### Creating new variables of "deprivation_score"

# Creating a column called "deprivation_index" calculated from standardised socio-economic variables
leeds_variables['deprivation_index'] = leeds_variables['std_% unemployed'] + leeds_variables['std_% no car'] + leeds_variables['std_% overcrowded'] + leeds_variables['std_% lone parents'] + leeds_variables['std_% low class'] + leeds_variables['std_% social rent'] 

leeds_variables['deprivation_index'] = leeds_variables['deprivation_index'].round(2)  # Rounding deprivation index values to 2 decimal places

leeds_variables.head()   # Checking if the operations have been executed

In [None]:
## Dropping unnecessary columns from the Leeds variables before merging with Leeds shapefile

leeds_variables.drop(columns =['LSOA_Name','lgd'], inplace=True) # Dropping duplicates or unnecessary columns

leeds_variables.head()  # Checking if the operation is successful

#### Getting to know the geography of England Lower Super Output Area (LSOA)

In [None]:
england_LSOA.head() # Getting the picture of the england LSOA shapefile by looking at the first 5 rows

                    # The code displayed that england_LSOA has 10 columns having LSOA codes and geometry.

In [None]:
england_LSOA.info() # Getting the overall summary of the england LSOA shapefile

                    # Here it is noted that the data types for england LSOA has float, geometry, integer and string.
                    # So, overall, the dataset has 34753 entries and has used 2.7+ MB memory.

In [None]:
england_LSOA.isnull().sum()  # Checking which columns have missing values and how many they are in the england LSOA shapefile

                             # It shows there are no missing values in the england LSOA shapefile

In [None]:
england_LSOA.columns # Checking the columns of the england LSOA shapefile

In [None]:
print(england_LSOA.crs) #checking that the crs is for Great Britain

The CRS is for British National grid and suitable for further processing

#### Clipping the England LSOA to the extent of Leeds 

In [None]:
# Select rows where 'LSOA11NM' column starts with 'Leeds' from the england LSOA shapafile
leeds_LSOA = england_LSOA[england_LSOA['LSOA11NM'].str.startswith('Leeds')]

leeds_LSOA.head()  # checking if the clipping has been done successfully 

In [None]:
leeds_LSOA.info()  # Getting the overall summary of Leeds LSOA shapefile

                   # Here Leeds LSOA has 482 entries with data types of float, geometry, integer and string. The shapefile occupies 41.4+ KB of space

In [None]:
leeds_LSOA.explore()    # visualissing how Leeds LSOA boundaries look like in space and also if demarcation is done at LSOA level

In [None]:
# Before merging the Leeds LSOA shapefile to Leeds socio-economic variables, duplicates or unnecessary columns should be removed from the shapefile
leeds_LSOA.drop(columns =['OBJECTID','LSOA11NM','BNG_E','BNG_N','LONG_','LAT','Shape_Leng','GlobalID'], inplace=True) # Dropping duplicates or unnecessary columns

leeds_LSOA.head()  # Checking if the operation is successful

##### Joining a non spatial (leeds socio-economic variables) dataset to a  spatial (leeds LSOA) shapefile

In [None]:
#Joining a non spatial (leeds socio-economic variables) dataset to a spatial(leeds LSOA) dataset
leeds_deprivation = pd.merge(leeds_LSOA, leeds_variables,  left_on='LSOA11CD', right_on='LSOA_Code', how='left')

leeds_deprivation.head()    # Checking if the joining is successful

                            # The joining is done successfully, however, the columns of LSOA11CD and LSOA_Code seems to represent the same thing.
                            # One column should be dropped.

In [None]:
leeds_deprivation.columns  # understanding what columns are included in the leeds deprivation file

In [None]:
leeds_deprivation.drop(columns =['LSOA11CD'], inplace=True)   # dropping the duplicate column which gives same information

leeds_deprivation.head() # checking if the operation is done successful

In [None]:
# mapping the Leeds deprivation index to get a picture of how the data looks like in space
leeds_deprivation.explore('deprivation_index', cmap='Reds_r')  # smaller numbers = less deprivation

### Data Visualisation

#### non-spatial data visualisation

##### Using pairwise scatter plot

In [None]:
leeds_variables.head() # Re-checking what is in Leeds variable dataset

In [None]:
leeds_variables.drop(columns =['% unemployed','% no car','% overcrowded',
                               '% lone parents','% low class','% social rent'],
                     inplace=True) # Dropping the original percentage variables since i will only be visualising standardised percentage variables


leeds_variables.head() # Checking if the operation is successful


In [None]:
# Despite these variables being quotated standardised percentages, they will be remaned to just percentages for easier readability

new_column_names = {
    'std_% unemployed': '% unemployed',
    'std_% no car': '% no car',
    'std_% overcrowded': '% overcrowded',           # Renaming columns of variables
    'std_% lone parents': '% lone parents',
    'std_% low class': '% low class',
    'std_% social rent': '% social rent',
    
}

leeds_variables.rename(columns=new_column_names, inplace=True) # Renaming the columns in leeds_variables according to the mappings in 'new_column_names'

leeds_variables.head()   # checking if the operation is successful

In [None]:
# Plotting pairwise scatter to explore relationships between standardized variables and the deprivation index


selected_variables = ['% unemployed', '% no car', '% overcrowded',
                       '% lone parents', '% low class', '% social rent',       # Selecting variables of interest to be mapped
                       'deprivation_index']

# Customizing the color palette and figure size for better visualization
pairplot = sns.pairplot(leeds_variables[selected_variables],
                       palette='colorblind',  # Using a colorblind-friendly palette
                       height=2.5, aspect=1.2)  # Adjusting height and aspect ratio for better readability

# Mapping density plots to the lower triangle (off-diagonal plots)
pairplot.map_lower(sns.kdeplot, cmap="cividis")  # Using KDE plot with cividis, a colour-blind colormap


# Customizing diagonal plots (histograms or KDE plots) for better visualization
pairplot.map_diag(sns.histplot, kde=True, linewidth=0.5, edgecolor='k')  # Adding KDE lines and border

# Mapping scatter plots to the upper triangle (off-diagonal plots)
pairplot.map_upper(sns.scatterplot, s=50, edgecolors='k', linewidths=0.5, alpha=0.7)

# Adding correlation coefficients above the scatterplots
def annotate_corr(x, y, **kwargs):       # Define function with x, y, and kwargs.
     r = round(x.corr(y), 2)             # Calculate and round correlation coefficient, r.
     ax = plt.gca()                      # Get current Axes instance for annotation.
     ax.annotate(f"r = {r}", xy=(0.5, 0.95), xycoords=ax.transAxes, ha='center', fontsize=17)   # Annotate Axes with correlation.

pairplot.map_upper(annotate_corr)

# Accessing the underlying axes objects
ax = pairplot.axes.flat  # This flattens the multi-dimensional array of axes into a single list

# Increase font size and set labels for x and y axes
for a in ax:
    a.xaxis.label.set_fontsize(17)  # Adjusting font size for x-axis labels
    a.yaxis.label.set_fontsize(17)  # Adjusting font size for y-axis labels

# Adjusting top margin for title
plt.subplots_adjust(top=0.93)  # Adjusting the top margin to make space for the title

# Plotting title of the plot and adjusting fontsize
plt.suptitle("Pair Plot of Socio-economic Variables and the Calculated Deprivation Index for Leeds LSOAs", fontsize=28) 

# Displaying the pairplot
plt.show()


#### Description of data visualisation choice made

To effectively communicate insights about the relationship between the deprivation index and socio-economic variables in Leeds to welfare stakeholders, the pairwise scatter plot is used and it is a widely used technique for exploring relationships between multiple variables (Chambers, 1983). The use of a colour blindness palette ensures that the visualization is accessible to individuals with colour vision deficiencies (Harrower and Brewer, 2003). The mid-histogram plot divides the plot into two parts, allowing for a clear separation of the distribution and correlation components (Friendly, 2002).

In the lower triangle, density plots are mapped using a colour-blind colour scheme, providing an intuitive representation of the variable distributions (Wilkinson, 2012). In the upper triangle, scatter plots are accompanied by correlation coefficients, which quantify the strength and direction of the relationships between variables (Lee Rodgers and Nicewander, 1988).

For welfare stakeholders, the combination of density and scatter plots is desirable as it provides a comprehensive understanding of the data. Density plots offer a clear visualization of the distribution of each variable, allowing stakeholders to identify potential outliers or skewness. Scatter plots, on the other hand, enable the detection of patterns and relationships between variables, which can inform decision-making and policy formulation related to socio-economic deprivation in Leeds.

#### Spatial data visualisation

##### Using Choropleth mapping to visualise deprived areas in Leeds

In [None]:
leeds_wards.head()  # Checking leeds wards shapefile that would be used to add context to the visualisation

In [None]:
print(leeds_wards.crs) # Checking the crs if it is British National Grid

In [None]:
## Visualising deprivation pattern based on socio-economic variables in Leeds, UK

f, ax = plt.subplots(1, figsize=(16, 10))   # Creating a figure with a single subplot of specific size

leeds_wards.plot(color='none', edgecolor='black', linewidth=2.5, ax=ax)   # Plotting Leeds wards to aid in visualising neighbourhood patterns
leeds_wards.boundary.plot(color='black', linewidth=0.3, ax=ax)

leeds_deprivation.plot(column='deprivation_index',  # Plotting the 'deprivation index' column of socio-economic variables for Leeds
                                legend=True,        # Creating a legend for Leeds deprivation score
                                cmap='flare',       # Setting the blindness colour map
                                scheme='quantiles', # Applying quantile classification scheme
                                edgecolor='black',  # Setting the edge colour of the plotted geometries to black
                                linewidth=0.03,     # Setting the width of the lines used for plotting
                                ax=ax)              # Specifying the axes on which the plot will be drawn 


ctx.add_basemap(ax, crs=27700, alpha=0.5)   # Adding basemap with transparency
                                            # Adjust alpha parameter by 50% for transparency



legend = ax.get_legend()                  # Retrieving the legend associated with the Leeds plot.
legend.set_title('Deprivation index')     # Setting the title of the legend
legend.get_frame().set_edgecolor('black') # Setting the color of the legend frame
legend.set_bbox_to_anchor((0.2, 0.23))    # Moving the legend to the bottom left within the plot

for handle in legend.legendHandles:
    handle.set_marker('s')                # Setting legend marker to rectangle


scalebar = ScaleBar(1, location='lower right')  # Creating a scale bar with a length of 1 pixel = 1 unit, 
                                                # positioned at the lower right corner of the plot.
ax.add_artist(scalebar)                         # Adding the scale bar to the plot


ax.annotate('N', xy=(0.05, 0.95), xycoords='axes fraction',    # Annotating the letter 'N' at the specified position on the axes.
            xytext=(0.045, 1.01), textcoords='axes fraction',  # Specifying the position for the text 'N' annotation
            arrowprops=dict(color='black', arrowstyle='<-'))   # Setting the properties of the arrow used in the annotation.

ax.grid(False)              # Removing the grid lines from the plot

# Plot title and axis labels
plt.title('Deprivation map for Lower Super Output Areas in Leeds derived from 2011 census data', fontsize=13)  # Set the label for the title with fontsize 13
plt.ylabel('Latitude [meters]', fontsize=12)   # Set the label for the y-axis with fontsize 12
plt.xlabel('Longitude [meters]', fontsize=12)  # Set the label for the x-axis with fontsize 12

# Plot labels with buffer
for idx, row in leeds_wards.iterrows():  # Iterate over each row in the 'leeds_wards' GeoDataFrame
    label = row['name']                  # Retrieve the value from the 'name' column as the label for annotation
    centroid = row['geometry'].centroid  # Calculate the centroid of the geometry associated with the row
    ax.text(centroid.x, centroid.y, label, fontsize=8, ha='center', va='center', color='white', 
            path_effects=[path_effects.withStroke(linewidth=2, foreground='black')])         # Add text annotation to the plot at the centroid coordinates

plt.show()      # Displaying the plot

#### Description of data visualisation choice made

The visualization aims to effectively communicate the spatial patterns of socio-economic deprivation across LSOAs in Leeds to social welfare stakeholders. A choropleth mapping technique is employed, which is widely used for visualizing geographic data and highlighting spatial variations (Brewer and Pickle, 2002). The map uses a quantile classification scheme with five classes, ensuring an equal distribution of LSOAs across the classes and effectively capturing the range of deprivation levels (Slocum et al., 2022).

The colour scheme chosen is the "blindness" colour ramp where darker shades represent higher levels of deprivation and lighter shades represent lower levels. This diverging colour scheme effectively conveys the contrasting levels of deprivation and is perceptually optimized for both colour vision deficiencies and typical colour vision (Harrower and Brewer, 2003). The map is overlaid with Leeds ward boundaries, providing spatial context and enabling easier interpretation of the patterns by associating them with familiar administrative units.

To enhance readability and prevent cluttering, a small buffer has been added around the ward names and a base map with 50% transparency has been included to provide geographic context without overwhelming the thematic data (Buckley, 2012). Other elements such as a scale bar, north arrow, and a legend, have been incorporated to aid in map interpretation and comply with cartographic conventions (Slocum et al., 2022). Grid lines have been removed to declutter the map and axes have been annotated for clarity. The boundary of Leeds has been highlighted to delineate the study area and text elements have been adjusted to improve readability.

#### Interpretation of results

The pairwise plot suggests unemployment, lone parent households, and social renting housing are the most influential factors contributing to socio-economic deprivation in Leeds, as evidenced by their high correlation coefficients with the deprivation index (0.92, 0.85, and 0.85, respectively). This interpretation is supported by the shapes of the density plots, which provide a visual representation of the distributions of these variables. These findings align with previous research that has established strong links between unemployment, single-parent households, and social housing with increased levels of deprivation. For instance, a study by Friedman and Rosenbaum (2004) found that unemployment and lack of economic resources are among the primary drivers of socio-economic deprivation, as they limit access to essential goods and services. Furthermore, the high correlation between social renting housing and deprivation is consistent with the findings of Burrows (1999), who demonstrated that areas with a higher concentration of social housing often experience higher levels of deprivation due to the complex interplay of various socio-economic factors, such as low-income levels, limited educational attainment, and poor health outcomes.

The map reveals the most deprived areas with deprivation index values ranging from 1.07 to 18.26 are concentrated in the city center and inner-city regions. Specific areas of concern include Burmantofts, Gipton and Harehills, Killingbeck and Seacroft, and Richmond Hill located in the eastern and southeastern parts of the city. These neighbourhoods likely face significant challenges related to poverty, unemployment, poor housing conditions, and limited access to resources and services, as highlighted by a study from Norman (2010). In contrast, the least deprived areas with deprivation index values ranging from -7.18 to -4.3 are predominantly located in the outer suburbs to the north of Leeds, including Adel and Wharfedale, Alwoodley, and Harewood, which tend to have higher income levels, better housing conditions, and better access to amenities, according to findings from Higgs et al. (2015).

#### Conclusion

The spatial analysis of socio-economic deprivation in Leeds provides invaluable insights for social welfare stakeholders to effectively target interventions and allocate resources. By understanding the geographic concentrations of deprivation in access to resources, stakeholders can design tailored programs that address the specific needs of disadvantaged communities and ultimately promoting inclusive development and social equity.


## References

Brewer, C.A. and Pickle, L. 2002. Evaluation of methods for classifying epidemiological data on choropleth maps in series. Annals of the Association of American Geographers. 92(4), pp.662-681.

Buckley, A. 2012. Make maps people want to look at: five primary design principles for cartography. ArcUser Winter. pp.46-51.

Burrows, R. 1999. Residential mobility and residualisation in social housing in England. Journal of social policy. 28(1), pp.27-52.

Carstairs, V. and Morris, R. 1990. Deprivation and health in Scotland. Health bulletin. 48(4), pp.162-175.

Chambers, J.M. 1983. Graphical methods for data analysis.  Chapman and Hall/CRC.

Friedman, S. and Rosenbaum, E. 2004. Nativity status and racial/ethnic differences in access to quality housing: Does homeownership bring greater parity? Housing Policy Debate. 15(4), pp.865-901.

Friendly, M. 2002. Corrgrams: Exploratory displays for correlation matrices. The american statistician. 56(4), pp.316-324.

Harrower, M. and Brewer, C.A. 2003. ColorBrewer. org: an online tool for selecting colour schemes for maps. The Cartographic Journal. 40(1), pp.27-37.

Higgs, G., Langford, M. and Norman, P. 2015. Accessibility to sport facilities in Wales: A GIS-based analysis of socio-economic variations in provision. Geoforum. 62, pp.105-120.

Lee Rodgers, J. and Nicewander, W.A. 1988. Thirteen ways to look at the correlation coefficient. The American Statistician. 42(1), pp.59-66.

Noble, M., Wright, G., Smith, G. and Dibben, C. 2006. Measuring multiple deprivation at the small-area level. Environment and planning A. 38(1), pp.169-185.

Norman, P. 2010. Identifying change over time in small area socio-economic deprivation. Applied Spatial Analysis and Policy. 3, pp.107-138.

Pampalon, R., Hamel, D., Gamache, P. and Raymond, G. 2009. A deprivation index for health planning in Canada. Chronic Dis Can. 29(4), pp.178-191.

Slocum, T.A., McMaster, R.B., Kessler, F.C. and Howard, H.H. 2022. Thematic cartography and geovisualization.  CRC Press.

Talen, E. and Anselin, L. 1998. Assessing spatial equity: an evaluation of measures of accessibility to public playgrounds. Environment and planning A. 30(4), pp.595-613.

Townsend, P. 1987. Deprivation. Journal of social policy. 16(2), pp.125-146.

Townsend, P., Phillimore, P. and Beattie, A. 1988. Health and deprivation: inequality and the North.  Routledge.

Vyas, S. and Kumaranayake, L. 2006. Constructing socio-economic status indices: how to use principal components analysis. Health policy and planning. 21(6), pp.459-468.

Wilkinson, L. 2012. The grammar of graphics.  Springer.