# Student Details

***

**Name:** Hoai Nhan Nguyen <br>
**Student Number:** sba24098 <br>
**Course:** Higher Diploma in Science in Artificial Intelligence Applications 

***

# Assessment Task

Students are advised to review and adhere to the submission requirements documented after the assessment task.

# Assessment details

You are required to use the dataset attached “mye22final.xlsx”, that contains data on Estimates of the population for the UK, England, Wales, Scotland, and Northern Ireland for the years 2011 and 2022.

This Data is Licenced under open-government-licence v3.

https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

Create an interactive Dashboard aimed at younger adults (18 - 35 years).

The dashboard may be created using ANY pythonic method within a Jupyter Notebook. PYTHON FILES May be used after they have been generated from the Jupyter Notebook or as helper files, but the main work MUST be completed in Jupyter Notebook. (You may (if you wish) Host the Dashboard using STREAMLIT or Similar). The Dashboard may be viewed in HTML format, within the notebook or as a standalone application based on the techniques covered in the class, (Tkinter, panel, dash, Jupyter-dash etc…). All final code and files MUST be uploaded to MOODLE and progress shown in your GitHub Classroom commits.

<b>YOU MAY NOT USE POWERBI, TABLEAU, LOOKER, or any other generalized tool for this assignment.</b>

The Dashboard must detail the following requirements:

* Be Interactive and include Hover functionality.
* Contain at least 2 Visualizations.
* Display population densities by Geographic location.
* Allow the user to select and display population densities by age AND by Gender.
* Display a comparison between the 2011 figures and the 2022 figures.

(70 Marks)

Discuss in detail your rationale and justification for all stages of data preparation for your visualizations and how your dashboard is designed with this demographic (younger adults (18 - 35 years)) in mind.  

(30 marks)
***

## Data Analysis and Preprocessing

**Importing libraries**

In [None]:
import pandas as pd

**Loading Excel file and printing sheet names**

In [None]:
# Load the Excel file
xls = pd.ExcelFile("data/mye22final.xlsx")

# Print the names of all sheets
print(xls.sheet_names)

**Reviewing information on MYEB1 and MYE5 sheet in the Excel File**

In [None]:
df_uk_population = pd.read_excel('data/mye22final.xlsx', sheet_name='MYEB1')
df_uk_density = pd.read_excel('data/mye22final.xlsx', sheet_name='MYE5', skiprows=7)

In [None]:
# Check the head of the dataset
df_uk_population.head()

In [None]:
# Check the head of the dataset
df_uk_density.head()

**Checking the data type of dataframe**

In [None]:
df_uk_population.info()

In [None]:
df_uk_density.info()

**Adjusting the column names for df_uk_density dataframe**

In [None]:
df_uk_density.columns = [
    'code',
    'name',
    'geography',
    'area_sq_km',
    'population_2022',
    'person_per_sq_km_2022',
    'population_2011',
    'person_per_sq_km_2011'
]


**Checking any null or duplicate rows in dataframe.**

In [None]:
def checking_null_and_duplicate(df):
    if df.isnull().values.any():
        print("There is a null value in the DataFrame.")
    else:
        print("There is no null value in the DataFrame.")    
    
    # Check for duplicates
    if df.duplicated().sum() > 0:
        print("There is a duplicate value in the DataFrame.\n")
    else:
        print("There is no duplicate value in the DataFrame.\n")    

# Checking MYEB1 Sheet
print("== MYEB1 Sheet ==")
checking_null_and_duplicate(df_uk_population)

# Checking MYE5 Sheet
print("== MYE5 Sheet ==")
checking_null_and_duplicate(df_uk_density)



**Checking the uniqueness of the dataframe**

In [None]:
df_uk_population.nunique()


In [None]:
df_uk_density.nunique()

**Checking Unique values for the column "name".**

In [None]:
df_uk_population['name'].unique()

In [None]:
df_uk_density['name'].unique()

**Checking Unique values for the column "geography".**

In [None]:
df_uk_population['geography'].unique()

In [None]:
df_uk_density['geography'].unique()

# Interactive Visualisation One - Geographical Data Choropleth Map

**Importing libraries**

In [None]:
import plotly.express as px
import streamlit as st
import json

**Spilting the df_uk_density dataframe between 2022 and 2011**

In [None]:
# Defining the columns based on the information from 2022
df_uk_density_2022 = df_uk_density[['code', 'name','geography','area_sq_km','population_2022','person_per_sq_km_2022']]

# Changing the name for the columns
df_uk_density_2022 = df_uk_density_2022.rename(columns={
    'population_2022': 'population',
    'person_per_sq_km_2022': 'person_per_sq_km'
})

# Defining the columns based on the information from 2011
df_uk_density_2011 = df_uk_density[['code', 'name','geography','area_sq_km','population_2011','person_per_sq_km_2011']]

# Changing the name for the columns
df_uk_density_2011 = df_uk_density_2011.rename(columns={
    'population_2011': 'population',
    'person_per_sq_km_2011': 'person_per_sq_km'
})


**Defining the geographical data location**

**Creating the interactive choropleth map**

In [None]:
# List of geography types to keep for the visualisation
geographies = [
    'Unitary Authority',
    'London Borough',
    'Metropolitan District',
    'Non-metropolitan District',
    'Council Area',
    'Local Government District'
]

# Filtering the dataframes to only use the above geography types
df_uk_density_2022 = df_uk_density_2022[df_uk_density_2022['geography'].isin(geographies)]
# Saving the dataframe as a CSV file
df_uk_density_2022.to_csv("data/uk_density_2022.csv", index=False)

# Filtering the dataframes to only use the above geography types
df_uk_density_2011 = df_uk_density_2011[df_uk_density_2011['geography'].isin(geographies)]
# Saving the dataframe as a CSV file
df_uk_density_2011.to_csv("data/uk_density_2011.csv", index=False)


In [1]:
%%writefile uk_population_map.py

# Importing Library for uk_population_map.py
import pandas as pd
import plotly.express as px
import streamlit as st
import json

# Loading the dataset based on the uk density for 2022 and 2011
df_uk_density_2022 = pd.read_csv("data/uk_density_2022.csv")
df_uk_density_2011 = pd.read_csv("data/uk_density_2011.csv")

# Loading the GeoJSON file for the UK boundaries
with open("data/Local_Authority_Districts_May_2024_Boundaries_UK.geojson", "r") as f:
    geojson = json.load(f)

# Streamlit user input options
st.sidebar.title('Options')
year = st.sidebar.selectbox("Select Year:", [2011, 2022])

# Load the appropriate DataFrame based on the selected year
if year == 2011:
    df_selected = df_uk_density_2011
else:
    df_selected = df_uk_density_2022

# Create the choropleth map
fig = px.choropleth_map(
    df_selected,
    geojson=geojson,
    locations='code',
    color='person_per_sq_km',
    featureidkey="properties.LAD24CD",
    color_continuous_scale=px.colors.sequential.Plasma,
    center={"lat": 55.09, "lon": -4.03},
    custom_data=['name','code','population', 'area_sq_km', 'person_per_sq_km'],
    labels={'person_per_sq_km': 'Population<br>Density'},
    zoom=4
)

# Updating the hover template for the choropleth map. 
fig.update_traces(
    hovertemplate="""
    <br><b>%{customdata[0]}</b><br>
    <br><b>Code: </b> %{customdata[1]}
    <br><b>Population: </b> %{customdata[2]}
    <br><b>Area per km²: </b> %{customdata[3]:.0f}
    <br><b>Person per km²: </b> %{customdata[4]:.0f}<br>
    """
)
    
# Updating layout for the choropleth map. 
fig.update_layout(
     title={
        'text': f"UK Population by Local Authority District - {year}",
        'y': 0.95,  # Slightly lower than default
        'x': 0.0,
        'xanchor': 'left',
        'yanchor': 'top'
    },
    hoverlabel=dict(
        bgcolor="white",     
        font_size=14,
        font_color="grey", 
    ),
    margin={"r": 150, "t": 50, "l": 0, "b": 0}
)

# Display the choropleth map
st.plotly_chart(fig, use_container_width=True)

# Execute 'streamlit run uk_population_map.py' on the terminal

Overwriting uk_population_map.py


# Interactive Visualisation Two - Population Data Bar Chart

In [None]:
# List of geography types to keep for the visualisation
geographies = [
    'Unitary Authority',
    'London Borough',
    'Metropolitan District',
    'Non-metropolitan District',
    'Council Area',
    'Local Government District'
]

# Filtering the dataframes to only use the above geography types
df_uk_population = df_uk_population[df_uk_population['geography'].isin(geographies)]
# Saving the dataframe as a CSV file
df_uk_population.to_csv("data/df_uk_population.csv", index=False)

In [2]:
%%writefile uk_population_bar_chart.py

# Importing Library for uk_population_bar_chart.py
import pandas as pd
import plotly.express as px
import streamlit as st

# Loading the dataset based on the uk population
df_uk_population = pd.read_csv("data/df_uk_population.csv")

# Streamlit user input options
st.sidebar.title('Options')
selected_year = st.sidebar.selectbox("Select Year", options=["2011", "2022"], index=1)
selected_gender = st.sidebar.radio("Select Gender", options=["All", "M", "F"], index=0)

# Selecting appropriate population column based on year
if selected_year == "2022":
    df_uk_population["population"] = df_uk_population["population_2022"]
else:
    df_uk_population["population"] = df_uk_population["population_2011"]

# Selecting appropriate filter based on gender if its a Male or Female
if selected_gender in ["M", "F"]:
    df_uk_population = df_uk_population[df_uk_population["sex"] == selected_gender]

# Grouping the sum values based on age 
df_selected = df_uk_population.groupby("age")["population"].sum().reset_index()

# df_selected = df_selected.sort_values("age")

# Colour mapping based on gender
gender_color_map = {
    "M": "#1f77b4",   # blue
    "F": "#e377c2",   # pink
    "All": "#7f7f7f"  # grey
}
# Setting the the bar colour based on the gender
bar_color = gender_color_map.get(selected_gender)

# string mapping based on gender
gender_map = {"M": "Male", "F": "Female"} 
# Setting the the title based on the gender
gender_title = gender_map.get(selected_gender, "All Genders")


# Creating the bar chart
fig = px.bar(
    df_selected,
    x="age",
    y="population",
    labels={"age": "Age", "population": "Population"},
    title=f"UK Population by Age - {gender_title} ({selected_year})",
    hover_data={"age": True, "population": True}
)

# Updating layout for the bar chart map 
fig.update_layout(bargap=0.2)

# Updating colour for bar chart map 
fig.update_traces(marker_color=bar_color)

# Display the bar map
st.plotly_chart(fig, use_container_width=True)

# Execute 'streamlit run uk_population_bar_chart.py' on the terminal

Overwriting uk_population_bar_chart.py
