In [19]:
import pandas as pd
import plotly.graph_objects as go
import nbformat  

df = pd.read_csv(r"C:\Users\andre\Visualization\02_activities\assignments\1314-012_traffic_violation_numbers_csv_2008-12_mod.csv",
    encoding="latin1")

print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1344 entries, 0 to 1343
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Geography    1344 non-null   object 
 1   Violations   1344 non-null   object 
 2   2008         1009 non-null   float64
 3   2009         1013 non-null   float64
 4   2010         1344 non-null   int64  
 5   2011         1344 non-null   int64  
 6   2012         1344 non-null   int64  
dtypes: float64(2), int64(3), object(2)
memory usage: 73.6+ KB
None


In [20]:
#visualization 1

# Filter required rows in raw data
filtered = df[
    (df["Geography"].astype(str).str.strip() == "Ontario (7)") &
    (df["Violations "].astype(str).str.strip() == "Total, all violations")
].copy()

# Pick year columns (2008–2012) and convert to long or direct arrays
year_cols = [c for c in filtered.columns if c.strip().isdigit()]  # "2008"..."2012"
years = [int(c) for c in year_cols]
totals = pd.to_numeric(filtered[year_cols].iloc[0], errors="coerce").tolist()

# Build interactive plot
fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=years,
        y=totals,
        mode="lines+markers",
        name="Total violations",
        hovertemplate="Year: %{x}<br>Total violations: %{y:,}<extra></extra>"
    )
)

# Format axes + title
fig.update_layout(
    title="Ontario: Total Traffic Violations (2008–2012)",
    xaxis_title="Year",
    yaxis_title="Total violations",
    template="plotly_white"
)

# X axis: show each year, no decimals
fig.update_xaxes(
    tickmode="linear",
    dtick=1,
    tickformat="d"
)

# Y axis: thousands separators
fig.update_yaxes(
    tickformat=","
)

fig.show()

In [21]:
fig.write_html("visualization1.html", include_plotlyjs="cdn")

>Who is your intended audience?
The intended audience is Ontario policymakers, Ministry of Transportation officials, road safety analysts, and municipal planners

>What information or message are you trying to convey with your visualization?
The visualization communicates the overall trend in total recorded traffic violations across Ontario between 2008 and 2012. The key message is whether traffic violations increased, decreased, or remained relatively stable during this period. Observing trends over time can help interpret changes in enforcement intensity, driver behaviour, or the impact of road safety initiatives.

>What aspects of design did you consider when making your visualization? How did you apply them?
Chart Type Selection: A line chart was chosen because trends over time are best communicated using position along a common scale, which is perceptually strong

Simplicity and Clarity: A single consistent line color was used to reduce cognitive load and avoid unnecessary distractions.

Markers: Circular markers highlight each discrete year without overcrowding the visual.

>How did you ensure that your data visualizations are reproducible? If the tool you used is not reproducible, how will this impact your data visualization?

Reproducibility was ensured by Using open-source Python libraries. Because the entire process is programmatic, anyone with the dataset and script can replicate identical results. The code includes comments.


>How did you ensure that your data visualization is accessible?
High contrast between the line and background.

Clear axis titles and descriptive chart title. Avoidance of reliance on multiple colors (minimizing barriers for colorblind users). Interactive hover tooltips that display exact yearly values.

>Who are the individuals and communities who might be impacted by your visualization?

Ontario drivers, whose behaviours influence violation rates. Law enforcement agencies adjusting enforcement strategies. Insurance companies evaluating risk patterns.

>How did you choose which features of your chosen dataset to include or exclude from your visualization?

Filtering specifically for Geography = “Ontario” and Violations = “Total, all violations”, the chart ensures that the data reflects aggregate provincial totals rather than offence-specific or regional

>What ‘underwater labour’ contributed to your final data visualization product?
Inspecting the dataset structure and column names. Identifying the correct aggregate row. 
Filtering accurately for Ontario. Converting year columns to numeric values.


In [22]:
# visualization 2

# Filter for Canada (50) and exclude total row
filtered = df[
    (df["Geography"].astype(str).str.strip() == "Canada (50)") &
    (df["Violations "].astype(str).str.strip() != "Total, all violations")
].copy()

# Identify year columns
year_cols = [c for c in filtered.columns if c.strip().isdigit()]

# Convert to numeric
for col in year_cols:
    filtered[col] = pd.to_numeric(filtered[col], errors="coerce")

# Sum totals from 2008–2012
filtered["Total_2008_2012"] = filtered[year_cols].sum(axis=1)

# Select top 5
top5 = filtered.sort_values("Total_2008_2012", ascending=False).head(5)

# pallete
colors = ["#4C78A8", "#F58518", "#54A24B", "#E45756", "#72B7B2"]

# Create bar chart
fig = go.Figure()

fig.add_trace(
    go.Bar(
        x=top5["Total_2008_2012"],
        y=top5["Violations "],
        orientation="h",
        marker=dict(color=colors),
        hovertemplate="Offence: %{y}<br>Total (2008–2012): %{x:,}<extra></extra>"
    )
)

fig.update_layout(
    title="Top 5 Traffic Offences in Canada (2008–2012)",
    xaxis_title="Total Violations (5-Year Sum)",
    yaxis_title="Offence Type",
    template="plotly_white"
)

fig.update_xaxes(tickformat=",")
fig.update_yaxes(autorange="reversed")

fig.show()

In [23]:
fig.write_html("visualization2.html", include_plotlyjs="cdn")

>Who is your intended audience?

The intended audience includes federal transportation agencies, national road safety organizations.

>What information or message are you trying to convey with your visualization?

This visualization identifies the five most frequent traffic offences in Canada between 2008 and 2012.

>What aspects of design did you consider when making your visualization? How did you apply them?

Horizontal Bar Chart: Length comparison along a common axis is perceptually strong.
Sorted Ranking: Offences are ordered descending, allowing viewers to immediately interpret rank.
Limited Categories: Only the top five offences are displayed to prevent cognitive overload.
Interactive Hover: Tooltips display exact totals, improving transparency and usability.

>How did you ensure that your data visualizations are reproducible? If the tool you used is not reproducible, how will this impact your data visualization?

Reproducibility was ensured by Using open-source Python libraries. Because the entire process is programmatic, anyone with the dataset and script can replicate identical results. The code includes comments.

>How did you ensure that your data visualization is accessible?

High contrast between bars and background. Color coding for bars. Clear axis labels and descriptive title. Horizontal layout for improved readability. Interactive hover tooltips that present formatted totals.

>Who are the individuals and communities who might be impacted by your visualization?

Drivers whose behaviour contributes to violation frequency. National road safety organizations. Federal policymakers.

>How did you choose which features of your chosen dataset to include or exclude from your visualization?

Focused in Canada statistics and top 5 violations.

>How did you choose which features of your chosen dataset to include or exclude from your visualization?

Inspecting the dataset structure and column names. Identifying the correct aggregate row. 
Filtering accurately for Ontario. Converting year columns to numeric values.


>link to dataset

https://files.ontario.ca/opendata/1314-012_traffic_violation_numbers_csv_2008-12.csv