Utilising a Python script to investigate deprivation locations and their relationships to football violence in England

First, we must import the external packages that we will be using to complete our script. The external packages that we are using are going to allow us to complete 
the desired functions for our analysis. 

In [None]:
import pandas as pd
import folium
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt
import rasterio as rio
import rasterio.features
import numpy as np
from matplotlib.patches import Patch

Next, we access the England deprivation areas shapefile. This will be utilised to provide an insight into the locations with varying levels of deprivation. To gauge 
an intial insight, we will define deprived_locations to sort the deprivation ranks of locations in decending order.  

In [None]:
def load_deprivation_data(filepath):
    """
    Load shapefile containing England deprivation areas and create a sorted GeoDataFrame.

    Parameters:
    - filepath (str): The file path to the shapefile containing England deprivation areas.

    Returns:
    - GeoDataFrame: A GeoDataFrame containing deprivation areas.

    This function reads the shapefile containing England deprivation areas.
    It creates a GeoDataFrame containing the deprivation areas and sorts it by deprivation rank.
    """

In [None]:
locations = gpd.read_file('data_files/England_Deprivation_Areas.shp')
deprived_locations = locations.sort_values(by='RAvgRank')

In [None]:
print(deprived_locations.head(152))

After running this function -print(deprived_locations.head(152))-, we can see the first 5 and last 5 locations. An intial quick analysis allows us to realise that 4 of the most deprived locations are in the North of Englans, with 3 being located in the North West of England. Compared to the least deprived ranks, we can see that these values are within the South of England. 

Now, we will create the Stadiums data using a csv file which is located in the data_files folder. To do this, we will ustilse the pandas data frame to generate
our point data. We will also define the geometry using the longitude and latidue data in the csv file. After ensuring the set up of the geometry, we will set the coordinate system to EPSG:4326. 

In [None]:
def create_geo_dataframe_from_csv(file_path):
    """
    Create a GeoDataFrame from a CSV file containing stadium data.

    Parameters:
    - file_path (str): The file path to the CSV file containing stadium data.
    Returns:
    - GeoDataFrame: A GeoDataFrame containing the Stadiums data. 

    This function reads the stadiums data from a CSV file.
    It converts longitude and latitude coordinates into Shapely Point geometries and creates a GeoDataFrame with these geometries. 
    The resulting GeoDataFrame is then assigned to the EPSG:4326 CRS.
    """

In [None]:
df = pd.read_csv('data_files/Stadiums_FBO.csv')
df['geometry'] = [Point(lon, lat) for lon, lat in zip(df['LONGITUDE'], df['LATITUDE'])]
stadiums = gpd.GeoDataFrame(df, geometry='geometry')
stadiums.crs = 'EPSG:4326'

After running this cell, we can then run the next cell to show the data that was created.

In [None]:
stadiums.head

Here we can see the stadiums data of the 20 current Premier League teams with the specific data that we will be using throughout to complete analysis.

Reviewing the deprivation areas data, the indivdual ranks for deprivation can be seen. However, to produce an appropriate analysis, we can modify the shapefile to produce a new shapefile. In the new shapefile, we can create a new column to sort the deprivation ranks into thirds. The first being the areas which are most deprived, with the ranks between 1 and 50. The second being the moderately deprived areas, with the ranks between 51 and 100. The third being the areas which are least deprived, with the ranks between 101 and 151. The assigned categories can be attributed into the correct column. This can then be saved into the data files folder. This is the initial steps into rasterizing the vector data from the data files folder. 

In [None]:
def categorise_deprivation_areas():
    """
    Categorise the England deprivation areas based on their deprivation rank.

    This function reads a shapefile containing England deprivation areas and categorises each area based on its deprivation rank. Three categories are defined: 
    'Most Deprived', 'Moderately Deprived' and 'Least Deprived'. The function assigns each area to one of these categories and adds a new column 'DepCat' 
    to the GeoDataFrame to store the data. The categorised GeoDataFrame is then saved to a new shapefile.
    """

In [None]:
locations_categorised = gpd.read_file('data_files/England_Deprivation_Areas.shp')


most_deprived_boundary = (1, 50)
moderately_deprived_boundary = (51, 100)
least_deprived_boundary = (101, 151)


locations_categorised['DepCat'] = ''


for idx, row in locations_categorised.iterrows():
    deprivation_rank = row['RAvgRank']
    if deprivation_rank >= most_deprived_boundary[0] and deprivation_rank <= most_deprived_boundary[1]:
        locations_categorised.at[idx, 'DepCat'] = 'Most Deprived'
    elif deprivation_rank >= moderately_deprived_boundary[0] and deprivation_rank <= moderately_deprived_boundary[1]:
        locations_categorised.at[idx, 'DepCat'] = 'Moderately Deprived'
    else:
        locations_categorised.at[idx, 'DepCat'] = 'Least Deprived'


locations_categorised.to_file('data_files/Categorised_England_Deprivation.shp')

We then utilise rasterio. We set the resolution to our desired resolution. The coordinates can then be set in the raster.

In [None]:
def create_affine_transform(locations, resolution):
    """
    Create an affine transformation for raster data.

    Parameters:
    - locations (GeoDataFrame): A GeoDataFrame containing the locations.
    - resolution (float): The resolution of the raster data to be made.

    Returns:
    - affine_tfm (Affine): An affine transformation matrix.

    This function creates an affine transformation matrix for raster data based on the total bounds
    of the provided locations GeoDataFrame and the specified resolution.
    """

In [None]:
resolution = 0.01
affine_tfm = rasterio.transform.from_origin(locations.total_bounds[0], locations.total_bounds[1], resolution, resolution)

Next we can plot the new rasterized map that has been created. Using matplotlib, we can set the figure size intially of the raster. Firstly, the outline of England can be plotted from the deprivation areas data. Then, we can add the categorised areas which were previously created. These are Most Deprived, Moderately Deprived and Least Deprived. After this, we can then overlay the satadiums data. This will plot the stadiums accurately onto the raster. Finally, we add the legend to the raster map.

In [None]:
def plot_deprivation_map(locations_categorised, stadiums):
    """
    Plot a map of England deprivation areas with the stadium locations.

    Parameters:
    - locations_categorised (GeoDataFrame): A GeoDataFrame containing categorised deprivation areas.
    - stadiums (GeoDataFrame): A GeoDataFrame containing stadium locations.

    This function generates a visualisation of England deprivation areas with categorised polygons plotted
    in different colors representing different levels of deprivation. It overlays stadium locations on the map.
    The function uses Matplotlib to create the plot, with custom legend and styling options applied.
    The resulting map is displayed to visualise the spatial distribution of deprivation areas and stadium locations.
    """

In [None]:
plt.figure(figsize=(10, 8))


locations_categorised.plot(ax=plt.gca(), edgecolor='black', facecolor='none')


handles = []
labels = []
for category, color in zip(['Most Deprived', 'Moderately Deprived', 'Least Deprived'], ['red', 'yellow', 'green']):
    plot = locations_categorised[locations_categorised['DepCat'] == category].plot(ax=plt.gca(), color=color, alpha=0.5)
    handles.append(Patch(color=color, alpha=0.5))
    labels.append(category)


stadiums.plot(ax=plt.gca(), color='navy', markersize=20, label='Stadiums')  


plt.legend(handles, labels, loc='upper right')


plt.title('Stadiums Located in Deprivation Locations')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

And so, the raster map is created from the intially used vector data. The new raster visually represents the stadiums and their location in which they fall in, with reference to the deprivation categories. To be able to quantify the number of stadiums that fall into each category, we can plot the data onto a graph. This can be created using the function below. 

In [None]:
def plot_stadium_counts_by_category(locations_categorised, stadiums):
    """ 
    Produce a bar plot to outline the number of stadiums that fall into each category. 

    Paramaters: 
    - locations_categorised (GeoDataFrame): A GeoDataFrame containing categorised deprivation areas.
    - stadiums (GeoDataFrame): A GeoDataFrame containing stadium locations.

    This function produces a bar plot which calculates the counts of the stadiums which are within each category.
    The bar plot produces the reults visually for each deprivation category. The counts are then plotted onto the 
    bar plot using Matplotlib, with customisation of the bars. The resulting bar plot is displayed for further analysis 
    of the results generated. 
    """

In [None]:
locations_categorised.reset_index(drop=True, inplace=True)


most_deprived_count = len(stadiums[locations_categorised['DepCat'] == 'Most Deprived'])
moderately_deprived_count = len(stadiums[locations_categorised['DepCat'] == 'Moderately Deprived'])
least_deprived_count = len(stadiums[locations_categorised['DepCat'] == 'Least Deprived'])

categories = ['Most Deprived', 'Moderately Deprived', 'Least Deprived']
counts = [most_deprived_count, moderately_deprived_count, least_deprived_count]


plt.figure(figsize=(8, 6))
bars = plt.bar(categories, counts, color=['red', 'yellow', 'green'])
plt.xlabel('Deprivation Category')
plt.ylabel('Number of Stadiums')
plt.title('Count of Stadiums in Deprivation Categories')


for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2.0, height, '%d' % int(height), ha='center', va='bottom')

plt.show()

We then start to define the deprivation counts and calculate them for the representation on the graph. Next, begin to plot the counts onto the graph. We set the colours accordingly to match the deprivation categories seen in the raster map. To ensure the count is clear on the graphy, we can add the counts for each category on top of their respective bars. The resulting graph shows us that there are more football stadiums that are located in the more deprived areas.

To further build upon the results of our graphs, we can utilse folium to generate an interactive map which displays the deprivation data and also the stadium data.  

To begin with, we define the map for the locations data which was previously defined. utilising .explore will allow us to use an interactive map. We then ensure that the average rank is the outlier for the classification of the map. This ensures that as we navigate the interactive map, each location will be displayed as the rank of the deprivation for that location. 

In [None]:
dep_m = locations.explore('RAvgRank', cmap='viridis')
dep_m

Next, we add the stadiums data onto the interactive map. We initally itterate each row in stadiums. This creates the marker for the interactive map. We then generate the popup information that will display when clicking on the markers. This provides important information to be displayed. We then create the marker and customise the marker, in this case adding a football marker, to ensure it is appropriate for the interactive map. 

In [None]:
def add_stadium_markers_to_map(stadiums, deprived_locations, dep_m):
    """
    Add stadium markers with popup content to an interactive Folium map.

    Parameters:
    - stadiums (DataFrame): DataFrame containing stadium information.
    - deprived_locations (DataFrame): DataFrame containing deprivation locations.
    - dep_m (folium.Map): Folium map object to which stadium markers will be added.

    This function iterates over rows in the stadiums DataFrame and 
    creates popup content for each stadium. It then adds a marker 
    for each stadium with the corresponding popup content to the interactive map. 
    """

In [None]:
for idx, row in stadiums.iterrows():

    popup_content = f"<b>Stadium:</b> {row['STADIUM']}<br>"
    popup_content += f"<b>Location:</b> {row['CTYUA19NM']}<br>"
    popup_content += f"<b>Club:</b> {row['CLUB']}<br>"
    popup_content += f"<b>Banning Orders (2019):</b> {row['BANNING_ORDERS_19']}<br>"
    popup_content += f"<b>Banning Orders (2022):</b> {row['BANNING_ORDERS_22']}<br>"
   


    folium.Marker(
        location=[row['LATITUDE'], row['LONGITUDE']],
        popup=popup_content,
        icon = folium.Icon(icon='futbol', prefix='fa', color='red')
    ).add_to(dep_m)


dep_m

The interactive map now allows us to navigate the data as we wish. We can display the deprivation information and also display the stadiums data, which crucially displays the football banning order data from the years 2019 and 2022 which is key information to analyse. 

Next, we will be generating a function which can help inform us on the particular football banning order data from 2019 for the stadiums that we are looking to produce the results for. This will help inform our analysis. 

In [None]:
def plot_banning_orders_2019(df):
    """
    Plot the number of banning orders per club in 2019.

    Parameters:
    - df (DataFrame): DataFrame containing banning order information.

    This function extracts club names and the corresponding number of banning orders 
    in 2019 from the DataFrame, df. It then generates a bar plot to visualise the 
    distribution of banning orders across the individual clubs. The resulting plot provides 
    insight into the prevalence of banning orders among different clubs in 2019.
    """

In [None]:
clubs = df['CLUB'].tolist()  
banning_orders_19 = df['BANNING_ORDERS_19']


plt.figure(figsize=(10, 6))
bars = plt.bar(clubs, banning_orders_19, color='skyblue')
plt.xlabel('Clubs')
plt.ylabel('Banning Orders 2019')
plt.title('Banning Orders in 2019')
plt.xticks(rotation=45, ha='right')

for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2.0, height, '%d' % int(height), ha='center', va='bottom')

plt.tight_layout()
plt.show()

The bar plot that has been produced provides an insight into the conditions of violence in 2019 for the different stadium locations, which can provide a wider reflection on the community and it's issues with violence.

Similarly, we can complete a similar function but this time to produce a bar plot representing the data from 2022. We will be expecting similar outputs, but obviously displaying the data from 2022 instead of 2019. 

In [None]:
def plot_banning_orders_2022(df):
    """
    Plot the number of banning orders per club in 2022.

    Parameters:
    - df (DataFrame): DataFrame containing banning order information.

    This function extracts club names and the corresponding number of banning orders 
    in 2022 from the DataFrame, df. It then generates a bar plot to visualise the 
    distribution of banning orders across the individual clubs. The resulting plot provides 
    insight into the prevalence of banning orders among different clubs in 2022.
    """

In [None]:
clubs = df['CLUB'].tolist()  
banning_orders_22 = df['BANNING_ORDERS_22']


plt.figure(figsize=(10, 6))
bars = plt.bar(clubs, banning_orders_22, color='red')
plt.xlabel('Stadium')
plt.ylabel('Banning Orders 2022')
plt.title('Banning Orders in 2022')
plt.xticks(rotation=45, ha='right')

for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2.0, height, '%d' % int(height), ha='center', va='bottom')

plt.tight_layout()
plt.show()

Now we have created individual bar plots for the years 2019 and 2022, we can begin to compare and contrast the results to continue with our analysis. However, our next function will prove to simplify this task and overlay the 2019 and 2022 bar plots. This will allow us to visually see the differences in banning order offences between the two years. Firstly, we plot the results from 2019 and 2022. We provide the different years with differening colours to help us distinguish the results. After this, we can add the counts on top of the indivdual clubs to provide the exact amount of banning orders for each year. 

In [None]:
def plot_comparison_banning_orders(clubs, banning_orders_19, banning_orders_22):
    """
    Plot a comparison of banning orders between 2019 and 2022 for different clubs.

    Parameters:
    - clubs (list): List of club names.
    - banning_orders_19 (list): List of banning orders counts for 2019.
    - banning_orders_22 (list): List of banning orders counts for 2022.

    This function generates a bar plot to compare the number of banning orders 
    for each club between 2019 and 2022. The resulting plot provides a visual 
    comparison of banning order trends between the two years.
    """

In [None]:
plt.figure(figsize=(10, 6))
bars_2019 = plt.bar(clubs, banning_orders_19, color='skyblue', label='2019')

bars_2022 = plt.bar(clubs, banning_orders_22, color='red', alpha=0.5, label='2022')

plt.xlabel('Clubs')
plt.ylabel('Banning Orders')
plt.title('Comparison of Banning Orders in 2019 and 2022')
plt.xticks(rotation=45, ha='right')
plt.legend()

for bar in bars_2019:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2.0, height, '%d' % int(height), ha='center', va='bottom')

for bar in bars_2022:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2.0, height, '%d' % int(height), ha='center', va='bottom')

plt.tight_layout()
plt.show()

Finally, after the completion of each function, we can continue with our analysis. The results provide great insights into the issues into the relationship between football violence and deprivation. There is also evidently disparities between the "steryotypical" North and South divide within England. We can now utilise these results to build upon previous fininding and research into this subject matter. 