In [None]:
# https://youtu.be/rD94i2sV-Uw 

In [19]:
import dash
from dash import Dash, html, dcc, callback, Output, Input
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import json

In [20]:
# Main Dataset
data = pd.read_csv(r"C:\Users\ADMIN\Downloads\CP 321 - Data Visualization\Project\employment_data.csv")
data.columns = data.columns.str.replace(r'\.\d+$', '', regex=True)
data = data.set_index('Geography').transpose()
data.reset_index(inplace=True)
data.rename(columns={'index': 'Geography'}, inplace=True)
data_col = data.drop(columns=['Geography', 'Gender (3) 5 6'])
for col in data_col:
    data[col] = data[col].str.replace(',', '', regex=True)
    data[col] = pd.to_numeric(data[col], errors='coerce')
data = data.apply(lambda x: x.astype('int64') if x.dtype == 'float' else x)
data.columns = data.columns.str.strip()
data.head()

Geography,Geography.1,Gender (3) 5 6,Total - Occupation - Unit group - National Occupational Classification (NOC) 2021 12,Occupation - not applicable 13,All occupations 14,0 Legislative and senior management occupations,00 Legislative and senior managers,000 Legislative and senior managers,0001 Legislative and senior managers,00010 Legislators,...,"9510 Labourers in processing, manufacturing and utilities",95100 Labourers in mineral and metal processing,95101 Labourers in metal fabrication,95102 Labourers in chemical products processing and utilities,"95103 Labourers in wood, pulp and paper processing",95104 Labourers in rubber and plastic products manufacturing,95105 Labourers in textile processing and cutting,95106 Labourers in food and beverage processing,95107 Labourers in fish and seafood processing,"95109 Other labourers in processing, manufacturing and utilities"
0,Newfoundland and Labrador,Total,433955,165980,267975,2180,2180,2180,2180,75,...,4745,75,105,95,180,35,50,675,2585,950
1,Newfoundland and Labrador,Men,211180,74715,136465,1270,1270,1270,1270,25,...,3050,65,105,70,165,25,15,410,1520,670
2,Newfoundland and Labrador,Women,222775,91270,131510,915,915,915,915,45,...,1705,10,0,25,15,0,35,270,1065,285
3,Prince Edward Island,Total,126900,36600,90300,875,875,875,875,50,...,1705,10,20,55,25,0,20,585,645,340
4,Prince Edward Island,Men,61450,15335,46115,540,540,540,540,45,...,1080,10,20,35,25,0,10,395,365,220


## Main Dataset Overview
The primary dataset used in this dashboard is sourced from **Statistics Canada**, specifically from the **Labour Force Survey (LFS)** and **National Occupational Classification (NOC) 2021** framework. 
It contains detailed employment data by **occupation**, **province/territory**, and **gender**.

### Dataset Structure
Each row represents a unique combination of:
- **Geography**: One of the 13 Canadian provinces or territories.
- **Gender**: Men, Women, or Total.
- **Occupation**: Based on the **NOC 2021 Unit Group** (4-digit codes).
  
The columns include:
- **Total employment** figures.
- **Breakdowns for over 500 occupational unit groups**, such as:
  - `00010 Legislators`
  - `21100 Physicists and astronomers`
  - `95106 Labourers in food and beverage processing`
- **Granularity**: The dataset goes down to very specific occupation titles nested within broader categories (e.g., "00 Legislative and senior managers" > "00010 Legislators").

### Data Insights Enabled
- Allows fine-grained analysis of labor distribution by field and region.
- Supports gender-based comparisons within specific job groups.
- Useful for identifying occupational shortages or overrepresentations across provinces.

### Source:
**Statistics Canada**  
- Table: Labour Force Survey data by occupation (NOC 2021), province/territory, and gender  
- URL: [https://www.statcan.gc.ca](https://www.statcan.gc.ca) *(insert direct link to the table if available)*

### Application in Dashboard
This dataset powers all four visualization tasks in the dashboard, enabling interactive analysis of employment trends, essential services, STEM representation, and natural science workforce gaps.


In [6]:
#Task 1 Complementary Dataset
population = pd.read_csv(r"C:\Users\ADMIN\Downloads\CP 321 - Data Visualization\Project\canada_population.csv")
population.columns = population.columns.str.strip()
data_col_1 = population.columns.drop('Geography')
for col in data_col_1:
    population[col] = population[col].str.replace(',', '', regex=True)
    population[col] = pd.to_numeric(population[col], errors='coerce')
population['Average'] = population.iloc[:, 1:].mean(axis=1)
population['Average'] = population['Average'].round(0)
population['Average'] = population['Average'].astype('int64')
population = population.apply(lambda x: x.astype('Int64') if x.dtype == 'float' else x)
population.head()

Unnamed: 0,Geography,Q1 2023,Q2 2023,Q3 2023,Q4 2023,Average
0,Canada,39527986,39748878,40083484,40513781,39968532
1,Newfoundland and Labrador,535298,536686,538907,541000,537973
2,Prince Edward Island,170077,171856,173713,175871,172879
3,Nova Scotia,1040187,1047129,1056486,1064297,1052025
4,New Brunswick,820576,825417,832190,840578,829690


## Complementary Dataset: Quarterly Population Estimates
This secondary dataset provides **quarterly population estimates** for Canada and its provinces, and is sourced from the same Statistics Canada repository. It complements the occupational dataset by allowing **per capita analysis** of employment in essential services and other workforce categories.

### Dataset Structure
Each row contains:
- **Geography**: Canada or one of the provinces/territories
- **Population Estimates**: For **Q1 to Q4 of 2023**
- **Average**: The mean population across the four quarters

### Data Insights Enabled
- Used in **Task 1 (Essential Services per Capita)** to calculate accurate personnel-to-population ratios.
- Helps identify **relative resource availability** across regions.
- Supports more **equitable workforce planning** by normalizing values to population size.

### Source:
**Statistics Canada**  
- Table: Quarterly population estimates, Canada, provinces, and territories  
- URL: [https://www.statcan.gc.ca](https://www.statcan.gc.ca) *(insert direct link to the table if available)*

In [7]:
# ========================
# Part 1: Task 1 Data Cleaning
# ========================
task1data = data[data['Gender (3) 5 6'] == 'Total'].copy()
task1data = task1data[[  
    'Geography',  
    '31301 Registered nurses and registered psychiatric nurses',  
    '31302 Nurse practitioners',  
    '32101 Licensed practical nurses',  
    '33102 Nurse aides, orderlies and patient service associates',  
    '40040 Commissioned police officers and related occupations in public protection services',  
    '41310 Police investigators and other investigative occupations',  
    '42100 Police officers (except commissioned)',  
    '40041 Fire chiefs and senior firefighting officers',  
    '42101 Firefighters'  
]].copy()

task1data["Total Nurses"] = (
    task1data["31301 Registered nurses and registered psychiatric nurses"] +
    task1data["31302 Nurse practitioners"] +
    task1data["32101 Licensed practical nurses"] +
    task1data["33102 Nurse aides, orderlies and patient service associates"]
)
task1data["Total Police"] = (
    task1data["40040 Commissioned police officers and related occupations in public protection services"] +
    task1data["41310 Police investigators and other investigative occupations"] +
    task1data["42100 Police officers (except commissioned)"]
)
task1data["Total Firefighters"] = (
    task1data["40041 Fire chiefs and senior firefighting officers"] +
    task1data["42101 Firefighters"]
)
task1data["Total"] = (
    task1data["Total Nurses"] +
    task1data["Total Police"] +
    task1data["Total Firefighters"]
)

task1data = task1data[[
    'Geography',
    'Total Nurses',
    'Total Police',
    'Total Firefighters',
    'Total'
]]

average_data = population[['Geography', 'Average']]
merged_data = task1data.merge(average_data, on="Geography")

merged_data["Nurses/Population"] = merged_data["Total Nurses"] / merged_data["Average"]
merged_data["Police/Population"] = merged_data["Total Police"] / merged_data["Average"]
merged_data["Firefighters/Population"] = merged_data["Total Firefighters"] / merged_data["Average"]
merged_data["Total/Population"] = merged_data["Total"] / merged_data["Average"]

merged_data = merged_data[[
    "Geography", 
    "Nurses/Population", 
    "Police/Population", 
    "Firefighters/Population", 
    "Total/Population"
]]

In [8]:
# ========================
# Part 2: Task 2 Data Cleaning
# ========================
task2data = data[data['Gender (3) 5 6'].str.contains('Men|Women', na=False)]
columns_to_include = [
    'Geography',
    'Gender (3) 5 6',
    "0 Legislative and senior management occupations",
    "1 Business, finance and administration occupations",
    "2 Natural and applied sciences and related occupations",
    "3 Health occupations",
    "4 Occupations in education, law and social, community and government services",
    "5 Occupations in art, culture, recreation and sport",
    "6 Sales and service occupations",
    "7 Trades, transport and equipment operators and related occupations",
    "8 Natural resources, agriculture and related production occupations",
    "9 Occupations in manufacturing and utilities"
]
task2data = task2data[columns_to_include]

task2data = task2data.rename(columns={
    "0 Legislative and senior management occupations": "Mgmt",
    "1 Business, finance and administration occupations": "BFA",
    "2 Natural and applied sciences and related occupations": "STEM",
    "3 Health occupations": "Health",
    "4 Occupations in education, law and social, community and government services": "Ed/Soc",
    "5 Occupations in art, culture, recreation and sport": "Arts",
    "6 Sales and service occupations": "Sales",
    "7 Trades, transport and equipment operators and related occupations": "Trades",
    "8 Natural resources, agriculture and related production occupations": "Agri",
    "9 Occupations in manufacturing and utilities": "Mfg"
})
task2data = task2data.set_index('Geography').transpose()
task2data.reset_index(inplace=True)
task2data = task2data.rename(columns={'Geography': 'Occupation'})
territories = task2data.columns[1:].unique().tolist()

In [9]:
# ========================
# Part 3: Task 3 Data Cleaning
# ========================
task3data = data[data['Gender (3) 5 6'] == 'Total']
task3data = task3data[[  
    'Geography',
    '21310 Electrical and electronics engineers',
    '21311 Computer engineers (except software engineers and designers)',
    '2231 Technical occupations in electronics and electrical engineering',
    '21301 Mechanical engineers',
    '22301 Mechanical engineering technologists and technicians'
]].copy()

task3data["Computer"] = (task3data['21311 Computer engineers (except software engineers and designers)'])
task3data["Mechanical"] = (
    task3data['21301 Mechanical engineers'] +
    task3data['22301 Mechanical engineering technologists and technicians']
)
task3data["Electrical"] = (
    task3data['21310 Electrical and electronics engineers'] +
    task3data['2231 Technical occupations in electronics and electrical engineering']
)
task3data = task3data[[
    "Geography",
    "Computer",
    "Electrical",
    "Mechanical"
]]

In [10]:
# ========================
# Part 4: Task 4 Data Cleaning
# ========================
task4data = data[[
    'Geography',
    'Gender (3) 5 6',
    '21100 Physicists and astronomers', 
    '21101 Chemists', 
    '21102 Geoscientists and oceanographers', 
    '21103 Meteorologists and climatologists', 
    '21109 Other professional occupations in physical sciences',  
    '21110 Biologists and related scientists', 
    '21111 Forestry professionals', 
    '21112 Agricultural representatives, consultants and specialists', 
    '22110 Biological technologists and technicians', 
    '22111 Agricultural and fish products inspectors', 
    '22112 Forestry technologists and technicians', 
    '22113 Conservation and fishery officers', 
    '22114 Landscape and horticulture technicians and specialists',  
    '21331 Geological engineers', 
    '21332 Petroleum engineers', 
    '22101 Geological and mineral technologists and technicians', 
    '21210 Mathematicians, statisticians and actuaries', 
    '21211 Data scientists' 
]].copy()
task4data['Gender'] = (task4data['Gender (3) 5 6'])
task4data['Physical Sciences'] = (
    task4data['21100 Physicists and astronomers'] + 
    task4data['21101 Chemists'] + 
    task4data['21102 Geoscientists and oceanographers'] + 
    task4data['21103 Meteorologists and climatologists'] + 
    task4data['21109 Other professional occupations in physical sciences']
)
task4data['Biological Sciences'] = (
    task4data['21110 Biologists and related scientists'] + 
    task4data['21111 Forestry professionals'] + 
    task4data['21112 Agricultural representatives, consultants and specialists'] + 
    task4data['22110 Biological technologists and technicians'] + 
    task4data['22111 Agricultural and fish products inspectors'] + 
    task4data['22112 Forestry technologists and technicians'] + 
    task4data['22113 Conservation and fishery officers'] + 
    task4data['22114 Landscape and horticulture technicians and specialists']
)
task4data['Geophysical Sciences'] = (
    task4data['21331 Geological engineers'] + 
    task4data['21332 Petroleum engineers'] + 
    task4data['22101 Geological and mineral technologists and technicians']
)
task4data['Mathematical Sciences'] = (
    task4data['21210 Mathematicians, statisticians and actuaries'] + 
    task4data['21211 Data scientists']
)
province_abbreviations = {
    'Newfoundland and Labrador': 'NL',
    'Prince Edward Island': 'PE',
    'Nova Scotia': 'NS',
    'New Brunswick': 'NB',
    'Quebec': 'QC',
    'Ontario': 'ON',
    'Manitoba': 'MB',
    'Saskatchewan': 'SK',
    'Alberta': 'AB',
    'British Columbia': 'BC',
    'Yukon Territory': 'YT',
    'Northwest Territories': 'NT',
    'Nunavut': 'NU'
}
task4data['Geography'] = task4data['Geography'].replace(province_abbreviations)

In [12]:
# ========================
# Part 5: GeoJSON Loading
# ========================
with open(r"C:\Users\ADMIN\Downloads\CP 321 - Data Visualization\Project\canada.geojson") as f:
    canada_geojson = json.load(f)

In [15]:
# ===================== Task 1 App: Essential Services Choropleth =====================
app1 = dash.Dash(__name__, title="Task 1 - Essential Services Choropleth")

app1.layout = html.Div([
    html.H1("Distribution of Essential Services in Canada", style={'textAlign': 'center'}),
    dcc.Dropdown(
        id='dropdown',
        options=[
            {"label": "Nurses", "value": "Nurses/Population"},
            {"label": "Police", "value": "Police/Population"},
            {"label": "Firefighters", "value": "Firefighters/Population"},
            {"label": "Total", "value": "Total/Population"},
        ],
        value="Total/Population",  
        clearable=False,
        style={'width': '50%', 'margin': 'auto', 'display': 'block'}
    ),
    dcc.Graph(id='choropleth-map')
])

@app1.callback(
    Output("choropleth-map", "figure"),
    Input("dropdown", "value")
)
def update_map(column):
    fig = px.choropleth(
        merged_data,
        geojson=canada_geojson,
        featureidkey="properties.name",  
        locations='Geography',
        color=column,
        hover_name="Geography",
        title=f"Essential Services Distribution: {column}",
        color_continuous_scale='Reds',
        scope="north america", 
    )
    fig.update_geos(fitbounds='geojson', visible=False)
    return fig
    
if __name__ == '__main__':
    app1.run(debug=True)

### **Task 1**
#### Goal:
To determine whether essential service workers (nurses, police, firefighters) are uniformly distributed across provinces and territories.

#### Visualization Used:
- Choropleth Map (interactive)
- Dropdown widget to switch between views (Nurses, Police, Firefighters, Total)

#### Insights:
The choropleth map demonstrates resource availability per capita (essential service workers per capita, for each province), helping users spot disparities. For example, if a province like Nunavut shows a lower “Total/Population” ratio than Ontario, it may indicate a potential service gap. This drives resource reallocation or recruitment planning.

#### How it solves the question:
- Users can instantly compare provincial ratios
- Reveals imbalances in service personnel deployment
- Informs resource equity initiatives

In [16]:
# ===================== Task 2 App: Gender Distribution by Occupation =====================
app2 = dash.Dash(__name__, title="Task 2 - Gender Distribution by Occupation")

app2.layout = html.Div([
    html.H1("Gender Distribution by Occupations by Province", style={'textAlign': 'center'}),
    dcc.Dropdown(
        id='territory-dropdown',
        options=[{'label': t, 'value': t} for t in territories],
        value=territories[0],
        style={'width': '50%', 'margin': 'auto', 'display': 'block'}
    ),
    dcc.Graph(id='grouped-bar')
])

@app2.callback(
    Output('grouped-bar', 'figure'),
    Input('territory-dropdown', 'value')
)
def update_bar(territory):
    temp = task2data[['Occupation', territory]]
    temp.columns = temp.iloc[0]
    temp = temp.drop(0).reset_index(drop=True)
    temp = temp.melt(id_vars=["Gender (3) 5 6"], 
                     value_vars=["Men", "Women"],
                     var_name="Gender", 
                     value_name="Employment")
    temp = temp.rename(columns={'Gender (3) 5 6': 'Occupation'})
    fig = px.bar(temp, 
                 x="Occupation", 
                 y="Employment",
                 color="Gender", 
                 barmode="group",
                 title=f"Gender Distribution Across Occupations in {territory}",
                 color_discrete_sequence=["#FF6F61", "#3E72A0"])
    fig.update_layout(xaxis=dict(tickangle=0))
    return fig
    
if __name__ == '__main__':
    app2.run(debug=True)

### **Task 2**   
#### Goal: 
To analyze gender representation in top-level NOC occupation categories across administrative units.

#### Visualization Used:
- Grouped Bar Chart  
- Dropdown to select a province/territory  

#### Insights:
This visualization highlights employment disparities between men and women across various occupational sectors. For instance, selecting Alberta may reveal a high concentration of men in “Trades, Transport and Equipment Operators,” while women may be more represented in “Business, Finance and Administration.” These patterns indicate where gender gaps exist and where targeted outreach or policy changes may be needed to foster workplace equality.

#### How it solves the question:
- Allows direct gender comparison within each occupation category  
- Identifies significant disparities across regions  
- Informs gender equity efforts and employment support programs  

In [17]:
# ===================== Task 3 App: EV Manpower by Engineering =====================
app3 = dash.Dash(__name__, title="Task 3 - EV Manpower by Province")

app3.layout = html.Div([
    html.H1("Manpower Availability for EV Production by Province", style={'textAlign': 'center'}),
    dcc.Dropdown(
        id='bar-dropdown',
        options=[{'label': t, 'value': t} for t in territories],
        value=territories[0],
        style={'width': '50%', 'margin': 'auto', 'display': 'block'}
    ),
    dcc.Graph(id='barchart')
])

@app3.callback(
    Output('barchart', 'figure'),
    Input('bar-dropdown', 'value')
)
def update_barchart(province):
    tempo = task3data[task3data['Geography'] == province]
    tempo = tempo[["Computer", "Electrical", "Mechanical"]]
    tempo = tempo.T.reset_index()
    tempo.columns = ['Occupation', 'Value']
    fig = px.bar(tempo, 
                 x="Occupation", 
                 y="Value",
                 color="Occupation", 
                 title=f"Manpower Availability for EV Production in {province}",
                 color_discrete_sequence=px.colors.diverging.RdBu)
    return fig

if __name__ == '__main__':
    app3.run(debug=True)

### **Task 3**  
#### Goal: 
To let users explore and decide which province has enough **Computer, Mechanical, and Electrical engineers** to support an EV manufacturing facility.

#### Visualization Used:  
- Bar Chart  
- Dropdown to choose a province  

#### Insights:  
This visualization provides a snapshot of the engineering talent pool in each province. For example, Ontario may show strong numbers across all three engineering fields, making it a prime location for EV infrastructure development. Users can assess workforce readiness themselves, based on data rather than predefined thresholds.

#### How it solves the question:  
- Presents manpower levels in key engineering fields  
- Enables user-defined assessments of “readiness”  
- Supports strategic decision-making for industrial development  

In [18]:
# ===================== Task 4 App: Natural Science Graduates =====================
app4 = dash.Dash(__name__, title="Task 4 - Natural Science Graduates")

app4.layout = html.Div([
    html.H1("Distribution and Composition of Natural Science Graduates in Canada", style={'textAlign': 'center'}),
    dcc.Dropdown(
        id='stacked-dropdown',
        options=[
            {"label": "Men", "value": "Men"},
            {"label": "Women", "value": "Women"},
            {"label": "Total", "value": "Total"},
        ],
        value="Total",  
        clearable=False,
        style={'width': '50%', 'margin': 'auto', 'display': 'block'}
    ),
    dcc.Graph(id='stacked-bar')
])

@app4.callback(
    Output('stacked-bar', 'figure'),
    Input("stacked-dropdown", "value")
)
def update_stacked(gender):
    tempor = task4data[task4data['Gender'] == gender] 
    tempor = tempor[[
        'Geography',
        'Physical Sciences', 
        'Biological Sciences', 
        'Geophysical Sciences', 
        'Mathematical Sciences']]
    tempor = tempor.melt(id_vars=['Geography'], 
                         value_vars=['Physical Sciences', 'Biological Sciences', 'Geophysical Sciences', 'Mathematical Sciences'],
                         var_name='Degree', value_name='Count')
    tempor['Degree'] = tempor['Degree'].replace({
        'Physical Sciences': 'Physics',
        'Biological Sciences': 'Biology',
        'Geophysical Sciences': 'Geography',
        'Mathematical Sciences': 'Mathematics'
    })
    fig = px.bar(tempor, 
                 x="Geography", 
                 y="Count", 
                 color="Degree",
                 title=f"Distribution and Composition of Natural Science Graduates in Canada ({gender})",
                 color_discrete_sequence=px.colors.sequential.Turbo)
    return fig

if __name__ == '__main__':
    app4.run(debug=True)

### **Task 4**  
#### Goal: 
To explore the composition and distribution of Natural Science occupations across Canadian provinces, and compare that by gender. The ultimate aim is to identify which provinces are lacking workforce in specific disciplines—**Physical, Biological, Geophysical, and Mathematical Sciences**—to inform which university degrees should be prioritized for hiring or incentivization.

#### Visualization Used:  
- Stacked Bar Chart  
- Dropdown to filter by gender (Men, Women, Total)  

#### Insights:  
This visualization enables governments or institutions to assess whether provinces have sufficient talent in key scientific fields. For instance, a province like Saskatchewan may have fewer graduates in Mathematical Sciences compared to British Columbia. When combined with gender filters, the chart may also highlight where gender gaps exist—e.g., fewer women in Physical Sciences in Alberta. This dual analysis helps prioritize recruitment or educational incentives tailored to specific needs.

#### How it solves the question:  
- Shows workforce composition in four core Natural Science disciplines  
- Allows gender-wise comparisons to assess diversity gaps  
- Reveals regional shortages, guiding degree-based hiring strategies and funding decisions  