Utilising a Python script to investigate deprivation locations and their relationships to football violence in England

First, we must import the external packages that we will be using to complete our script. The external packages that we are using are going to allow us to complete 
the desired functions for our analysis. 

In [None]:
import pandas as pd
import folium
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt
import rasterio as rio
import rasterio.features
import numpy as np
from matplotlib.patches import Patch

Next, we access the England deprivation areas shapefile. This will be utilised to provide an insight into the locations with varying levels of deprivation. To gauge 
an intial insight, we will define deprived_locations to sort the deprivation ranks of locations in decending order.  

In [None]:
locations = gpd.read_file('data_files/England_Deprivation_Areas.shp')
deprived_locations = locations.sort_values(by='RAvgRank')

In [None]:
print(deprived_locations.head(152))

After running this function -print(deprived_locations.head(152))-, we can see the first 5 and last 5 locations. An intial quick analysis allows us to realise that 4 of the most deprived locations are in the North of Englans, with 3 being located in the North West of England. Compared to the least deprived ranks, we can see that these values are within the South of England. 

Now, we will create the Stadiums data using a csv file which is located in the data_files folder. To do this, we will ustilse the pandas data frame to generate
our point data. We will also define the geometry using the longitude and latidue data in the csv file. After ensuring the set up of the geometry, we will set the coordinate system to EPSG:4326. 

In [None]:
df = pd.read_csv('data_files/Stadiums_FBO.csv')
df['geometry'] = [Point(lon, lat) for lon, lat in zip(df['LONGITUDE'], df['LATITUDE'])]
stadiums = gpd.GeoDataFrame(df, geometry='geometry')
stadiums.crs = 'EPSG:4326'

After running this cell, we can then run the next cell to show the data that was created.

In [None]:
stadiums.head

Here we can see the stadiums data of the 20 current Premier League teams with the specific data that we will be using throughout to complete analysis.

Reviewing the deprivation areas data, the indivdual ranks for deprivation can be seen. However, to produce an appropriate analysis, we can modify the shapefile to produce a new shapefile. In the new shapefile, we can create a new column to sort the deprivation ranks into thirds. The first being the areas which are most deprived, with the ranks between 1 and 50. The second being the moderately deprived areas, with the ranks between 51 and 100. The third being the areas which are least deprived, with the ranks between 101 and 151. The assigned categories can be attributed into the correct column. This can then be saved into the data files folder. This is the initial steps into rasterizing the vector data from the data files folder. 

In [None]:
locations_categorised = gpd.read_file('data_files/England_Deprivation_Areas.shp')

# Define boundaries for each category
most_deprived_boundary = (1, 50)
moderately_deprived_boundary = (51, 100)
least_deprived_boundary = (101, 151)

# Create a new column to store the deprivation category
locations_categorised['DepCat'] = ''

# Assign each location to its corresponding category
for idx, row in locations_categorised.iterrows():
    deprivation_rank = row['RAvgRank']
    if deprivation_rank >= most_deprived_boundary[0] and deprivation_rank <= most_deprived_boundary[1]:
        locations_categorised.at[idx, 'DepCat'] = 'Most Deprived'
    elif deprivation_rank >= moderately_deprived_boundary[0] and deprivation_rank <= moderately_deprived_boundary[1]:
        locations_categorised.at[idx, 'DepCat'] = 'Moderately Deprived'
    else:
        locations_categorised.at[idx, 'DepCat'] = 'Least Deprived'

# Save the updated shapefile
locations_categorised.to_file('data_files/Categorised_England_Deprivation.shp')

We then utilise rasterio. We set the resolution to our desired resolution. The coordinates can then be set in the raster.

In [None]:
resolution = 0.01
affine_tfm = rasterio.transform.from_origin(locations.total_bounds[0], locations.total_bounds[1], resolution, resolution)

Next we can plot the new rasterized map that has been created. Using matplotlib, we can set the figure size intially of the raster. Firstly, the outline of England can be plotted from the deprivation areas data. Then, we can add the categorised areas which were previously created. These are Most Deprived, Moderately Deprived and Least Deprived. After this, we can then overlay the satadiums data. This will plot the stadiums accurately onto the raster. Finally, we add the legend to the raster map.

In [None]:
# Plot the rasterized map
plt.figure(figsize=(10, 8))

# Plot the outline of England locations
locations_categorised.plot(ax=plt.gca(), edgecolor='black', facecolor='none')

# Plot the categorized areas
handles = []
labels = []
for category, color in zip(['Most Deprived', 'Moderately Deprived', 'Least Deprived'], ['red', 'yellow', 'green']):
    plot = locations_categorised[locations_categorised['DepCat'] == category].plot(ax=plt.gca(), color=color, alpha=0.5)
    handles.append(Patch(color=color, alpha=0.5))
    labels.append(category)

# Overlay your stadium data on this rasterized map
stadiums.plot(ax=plt.gca(), color='navy', markersize=20, label='Stadiums')  

# Add custom legend
plt.legend(handles, labels, loc='upper right')


plt.title('England Deprivation Locations with Stadiums')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

And so, the raster map is created from the intially used vector data. The new raster visually represents the stadiums and their location in which they fall in, with reference to the deprivation categories. To be able to quantify the number of stadiums that fall into each category, we can plot the data onto a graph. This can be created using the function below. 

In [None]:
locations_categorised.reset_index(drop=True, inplace=True)

# Calculate counts of stadiums within each category
most_deprived_count = len(stadiums[locations_categorised['DepCat'] == 'Most Deprived'])
moderately_deprived_count = len(stadiums[locations_categorised['DepCat'] == 'Moderately Deprived'])
least_deprived_count = len(stadiums[locations_categorised['DepCat'] == 'Least Deprived'])

categories = ['Most Deprived', 'Moderately Deprived', 'Least Deprived']
counts = [most_deprived_count, moderately_deprived_count, least_deprived_count]

# Plot the counts on a bar graph
plt.figure(figsize=(8, 6))
bars = plt.bar(categories, counts, color=['red', 'yellow', 'green'])
plt.xlabel('Deprivation Category')
plt.ylabel('Number of Stadiums')
plt.title('Count of Stadiums in Deprivation Categories')

# Add counts on top of each bar
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2.0, height, '%d' % int(height), ha='center', va='bottom')

plt.show()