#**Hands-On Activity 14 | Telling the Truth with Data Visualization**





---



Name : <br>
Course Code and Title : <br>
Date Submitted : <br>
Instructor :


---



**1. Objectives:**

This activity aims to demonstrate studentsâ€™ ability to visualize data truthfully and ethically. Students will identify missing or biased data, correct misleading visualizations, and apply techniques to ensure integrity in data presentation.

**2. Intended Learning Outcomes (ILOs):**

By the end of this activity, students should be able to:

1. Analyze datasets to detect missing values, errors, and biases.

2. Evaluate the accuracy and fairness of different data visualization designs.

3. Create ethical and truthful charts by correcting deceptive visualizations.

**3. Discussions:**

Telling the truth with data visualization means ensuring that every visual accurately represents the data and context without distortion.
Misleading charts can manipulate interpretation through poor scaling, selective data, or biased representation.

Missing Data and Data Errors:
Missing values or outliers can lead to incorrect conclusions if ignored. Visualizations should either indicate missing data or use methods like interpolation or removal.

Biased Data:
Data can be biased through selection bias (only certain data is collected) or survivor bias (excluding failures or dropouts). Identifying these biases prevents misleading visuals.

Adjusting for Inflation:
When comparing values over time (e.g., prices, income), data should be adjusted for inflation to reflect real value changes.

Deceptive Design:
Visualization design choices such as truncated axes, dual-axis charts, or selective time frames can distort perception. Ethical visualization maintains consistent scales and transparency.

**4. Procedures:**

Step 1: Import Libraries

In [1]:
!pip install pandas plotly numpy
import pandas as pd
import numpy as np
import plotly.express as px



Step 2: Create a Sample Dataset

This dataset simulates product prices, sales, and inflation across years.

In [2]:
# Sample data
years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04


Step 3: Identify Missing Data and Errors

In [3]:
# Check missing and invalid data
print("Missing Data per Column:")
print(df.isna().sum())

# Fill or interpolate missing sales values
df["Sales"] = df["Sales"].interpolate()
df

Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04
5,2020,220.0,63,1.03
6,2021,240.0,65,1.02
7,2022,260.0,70,1.03
8,2023,290.0,75,1.02
9,2024,320.0,78,1.02


Step 4: Adjust for Inflation

In [4]:
# Adjust sales for inflation
df["Adjusted_Sales"] = df["Sales"] / df["InflationRate"].cumprod()
fig = px.line(df, x="Year", y=["Sales", "Adjusted_Sales"],
              title="Sales Over Time (Adjusted for Inflation)",
              labels={"value": "Sales", "variable": "Metric"})
fig.show()

Step 5: Demonstrate Deceptive Design

Bad Example (Truncated Axis):

In [5]:
bad_chart = px.bar(df, x="Year", y="Sales", title="Deceptive Chart (Truncated Axis)")
bad_chart.update_yaxes(range=[150, 350])  # starts too high
bad_chart.show()

Good Example (Honest Axis):

In [6]:
good_chart = px.bar(df, x="Year", y="Sales", title="Truthful Chart (Proper Scale)")
good_chart.update_yaxes(range=[0, 350])
good_chart.show()

**Task 1:** Handling Missing and Erroneous Data

Identify missing or inconsistent data points in your own dataset (or this one).

Apply at least one correction method (interpolation, imputation, or exclusion).

Visualize the corrected dataset.

In [9]:
#Code Here

years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)

print("Missing Data per Column:")
print(df.isna().sum())

df["Sales"] = df["Sales"].interpolate()

print("\nAfter Correction:")
print(df.isna().sum())

fig = px.line(df, x="Year", y="Sales",
              title="Sales Over Time (After Handling Missing Data)",
              labels={"Sales": "Sales (Corrected)"})
fig.show()


Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64

After Correction:
Year             0
Sales            0
Price            0
InflationRate    0
dtype: int64


**Task 2:** Detecting and Correcting Bias

Create or simulate a biased dataset (e.g., only showing top-performing products or regions).

1. Visualize the biased data.

2. Then, include the full dataset and create a truthful comparison chart.

3. Briefly explain how bias affected interpretation.

In [11]:
#Code Here

data = {
    "Region": ["North", "South", "East", "West", "Central", "Northeast", "Southeast", "Northwest"],
    "Sales": [320, 150, 210, 180, 130, 300, 280, 160]
}
df_full = pd.DataFrame(data)

df_biased = df_full.nlargest(3, "Sales")

fig_biased = px.bar(df_biased, x="Region", y="Sales",
                    title="Biased View: Only Top 3 Regions Shown",
                    labels={"Sales": "Sales (in units)"})
fig_biased.show()

fig_full = px.bar(df_full, x="Region", y="Sales",
                  title="Truthful View: All Regions Included",
                  labels={"Sales": "Sales (in units)"})
fig_full.show()


**Task 3:** Deceptive vs. Truthful Visualization

Create one misleading chart using axis manipulation or selective data range.

Create a corrected version that shows the same data honestly.

Explain the difference in interpretation between the two visuals.

In [13]:
#Code Here

data = {
    "Year": [2018, 2019, 2020, 2021, 2022, 2023],
    "Sales": [190, 200, 205, 210, 215, 220]
}
df = pd.DataFrame(data)

# Misleading chart
fig_deceptive = px.bar(df, x="Year", y="Sales",
                       title="Deceptive Chart: Axis Manipulation Makes Growth Look Dramatic",
                       labels={"Sales": "Sales (in units)"})

# Truncate Y-axis to start close to lowest value
fig_deceptive.update_yaxes(range=[180, 225])
fig_deceptive.show()

# Truthful chart
fig_truthful = px.bar(df, x="Year", y="Sales",
                      title="Truthful Chart: Proper Scale Reveals Modest Growth",
                      labels={"Sales": "Sales (in units)"})

fig_truthful.update_yaxes(range=[0, 225])
fig_truthful.show()




---


**5. Supplementary Activity:**

Visual Truth Challenge

Create a small project where you visualize a real-world dataset (e.g., population, income, environmental data).

1. Detect and correct at least two forms of distortion (missing data, bias, or misleading scaling).

2. Annotate your charts with titles and labels explaining your corrections.

3. Reflect on how ethical visualization improves trust and understanding.

In [17]:
years = np.arange(2010, 2025)
population = [6.9, 7.0, 7.1, 7.2, np.nan, 7.4, 7.5, 7.6, 7.7, 7.8, 7.9, np.nan, 8.1, 8.2, 8.3]

df = pd.DataFrame({"Year": years, "Population": population})

print("Missing Data per Column:")
print(df.isna().sum())


df["Population"] = df["Population"].interpolate()


fig_deceptive = px.line(df, x="Year", y="Population",
                        title="Deceptive Visualization: Truncated Y-Axis Exaggerates Growth",
                        labels={"Population": "World Population (Billions)"})
fig_deceptive.update_yaxes(range=[7.0, 8.4])
fig_deceptive.show()

fig_truthful = px.line(df, x="Year", y="Population",
                       title="Truthful Visualization: Corrected Missing Data & Proper Scale",
                       labels={"Population": "World Population (Billions)"})
fig_truthful.update_yaxes(range=[0, 8.5])
fig_truthful.show()

print("Reflection: \nEthical visualization shows data honestly by fixing errors and avoiding exaggeration. \nAccurate charts build trust and help people understand the real message behind the numbers.")


Missing Data per Column:
Year          0
Population    2
dtype: int64


Reflection: 
Ethical visualization shows data honestly by fixing errors and avoiding exaggeration. 
Accurate charts build trust and help people understand the real message behind the numbers.


**6. Conclusion/Learnings/Analysis:**

Through handling missing data, correcting bias, and avoiding misleading visuals, we learned how data presentation can greatly influence interpretation.
By applying honest scaling, complete datasets, and clear labeling, visualizations become more accurate and trustworthy.
Ethical data visualization not only prevents misinterpretation but also strengthens credibility and supports better decision-making.