<a href="https://colab.research.google.com/github/AiconKeliste/CPE031-Visualization-and-Data-Analysis/blob/main/Keliste_Lim_Hands_On_Activity_14___Telling_the_Truth_with_Data_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Hands-On Activity 14 | Telling the Truth with Data Visualization**





---



Name : Aicon H. Keliste & Kenzo N. Lim

Course Code and Title : CPE031 / Visualizations and Data Analysis

Date Submitted : 11-10-2025

Instructor : Mrs Maria Rizette Sayo


---



**1. Objectives:**

This activity aims to demonstrate studentsâ€™ ability to visualize data truthfully and ethically. Students will identify missing or biased data, correct misleading visualizations, and apply techniques to ensure integrity in data presentation.

**2. Intended Learning Outcomes (ILOs):**

By the end of this activity, students should be able to:

1. Analyze datasets to detect missing values, errors, and biases.

2. Evaluate the accuracy and fairness of different data visualization designs.

3. Create ethical and truthful charts by correcting deceptive visualizations.

**3. Discussions:**

Telling the truth with data visualization means ensuring that every visual accurately represents the data and context without distortion.
Misleading charts can manipulate interpretation through poor scaling, selective data, or biased representation.

Missing Data and Data Errors:
Missing values or outliers can lead to incorrect conclusions if ignored. Visualizations should either indicate missing data or use methods like interpolation or removal.

Biased Data:
Data can be biased through selection bias (only certain data is collected) or survivor bias (excluding failures or dropouts). Identifying these biases prevents misleading visuals.

Adjusting for Inflation:
When comparing values over time (e.g., prices, income), data should be adjusted for inflation to reflect real value changes.

Deceptive Design:
Visualization design choices such as truncated axes, dual-axis charts, or selective time frames can distort perception. Ethical visualization maintains consistent scales and transparency.

**4. Procedures:**

Step 1: Import Libraries

In [None]:
!pip install pandas plotly numpy
import pandas as pd
import numpy as np
import plotly.express as px



Step 2: Create a Sample Dataset

This dataset simulates product prices, sales, and inflation across years.

In [None]:
# Sample data
years = np.arange(2015, 2025)
data = {
    "Year": years,
    "Sales": [120, 130, 150, 170, 200, np.nan, 240, 260, 290, 320],
    "Price": [50, 52, 55, 57, 60, 63, 65, 70, 75, 78],
    "InflationRate": [1.02, 1.03, 1.01, 1.05, 1.04, 1.03, 1.02, 1.03, 1.02, 1.02]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04


Step 3: Identify Missing Data and Errors

In [None]:
# Check missing and invalid data
print("Missing Data per Column:")
print(df.isna().sum())

# Fill or interpolate missing sales values
df["Sales"] = df["Sales"].interpolate()
df

Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64


Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04
5,2020,220.0,63,1.03
6,2021,240.0,65,1.02
7,2022,260.0,70,1.03
8,2023,290.0,75,1.02
9,2024,320.0,78,1.02


Step 4: Adjust for Inflation

In [None]:
# Adjust sales for inflation
df["Adjusted_Sales"] = df["Sales"] / df["InflationRate"].cumprod()
fig = px.line(df, x="Year", y=["Sales", "Adjusted_Sales"],
              title="Sales Over Time (Adjusted for Inflation)",
              labels={"value": "Sales", "variable": "Metric"})
fig.show()

Step 5: Demonstrate Deceptive Design

Bad Example (Truncated Axis):

In [None]:
bad_chart = px.bar(df, x="Year", y="Sales", title="Deceptive Chart (Truncated Axis)")
bad_chart.update_yaxes(range=[150, 350])  # starts too high
bad_chart.show()

Good Example (Honest Axis):

In [None]:
good_chart = px.bar(df, x="Year", y="Sales", title="Truthful Chart (Proper Scale)")
good_chart.update_yaxes(range=[0, 350])
good_chart.show()

**Task 1:** Handling Missing and Erroneous Data

Identify missing or inconsistent data points in your own dataset (or this one).

Apply at least one correction method (interpolation, imputation, or exclusion).

Visualize the corrected dataset.

In [None]:
# Check missing and invalid data
print("Missing Data per Column:")
print(df.isna().sum())

# Fill or interpolate missing sales values
df["Sales"] = df["Sales"].interpolate()
print("\nDataFrame after interpolation:")
display(df)

# Visualize the corrected dataset
fig = px.line(df, x="Year", y="Sales", title="Sales Over Time (Missing Data Handled)",
              labels={"Sales": "Sales"})
fig.show()

Missing Data per Column:
Year             0
Sales            1
Price            0
InflationRate    0
dtype: int64

DataFrame after interpolation:


Unnamed: 0,Year,Sales,Price,InflationRate
0,2015,120.0,50,1.02
1,2016,130.0,52,1.03
2,2017,150.0,55,1.01
3,2018,170.0,57,1.05
4,2019,200.0,60,1.04
5,2020,220.0,63,1.03
6,2021,240.0,65,1.02
7,2022,260.0,70,1.03
8,2023,290.0,75,1.02
9,2024,320.0,78,1.02


**Task 2:** Detecting and Correcting Bias

Create or simulate a biased dataset (e.g., only showing top-performing products or regions).

1. Visualize the biased data.

2. Then, include the full dataset and create a truthful comparison chart.

3. Briefly explain how bias affected interpretation.

In [None]:
# Simulate a biased dataset (e.g., showing only later years which have higher sales)
biased_df = df[df['Year'] > 2020].copy()

# Visualize the biased data
fig_biased = px.line(biased_df, x="Year", y="Sales", title="Biased Sales Data (Later Years Only)",
                     labels={"Sales": "Sales"})
fig_biased.show()

# Visualize the full dataset for comparison
fig_full = px.line(df, x="Year", y="Sales", title="Full Sales Data",
                   labels={"Sales": "Sales"})
fig_full.show()

# Create a truthful comparison chart
fig_comparison = px.line(df, x="Year", y="Sales", title="Comparison of Full vs. Biased Sales Data",
                         labels={"Sales": "Sales"})
fig_comparison.add_trace(px.line(biased_df, x="Year", y="Sales").data[0].update(name="Biased Data"))
fig_comparison.show()

**Explanation of Bias Effect:**

The biased visualization, which only shows the later years, gives the impression of a consistent increase in sales. However, when comparing it to the full dataset, it becomes clear that the growth in earlier years was less shown. This shows how selectively choosing data can create a misleading picture and distort the overall trend. The full dataset provides a more truthful representation of the sales performance over time.

**Task 3:** Deceptive vs. Truthful Visualization

Create one misleading chart using axis manipulation or selective data range.

Create a corrected version that shows the same data honestly.

Explain the difference in interpretation between the two visuals.

In [None]:
# Create a deceptive chart with a truncated y-axis
deceptive_chart = px.bar(df, x="Year", y="Sales", title="Deceptive Sales Chart (Truncated Axis)")
deceptive_chart.update_yaxes(range=[150, 350])  # Truncated axis
deceptive_chart.show()

# Create a truthful chart with a proper y-axis
truthful_chart = px.bar(df, x="Year", y="Sales", title="Truthful Sales Chart (Proper Scale)")
truthful_chart.update_yaxes(range=[0, 350]) # Proper scale starting from 0
truthful_chart.show()

**Explanation of Deceptive vs. Truthful Visualization:**

In the deceptive chart, the y-axis starts at 150 instead of 0, which makes the differences between the sales values look much larger than they really are. This gives the impression that sales increased dramatically over the years, even though the actual growth is much smaller.

On the other hand, the truthful chart starts the y-axis at 0, showing the data in proper proportion. This version accurately reflects the gradual increase in sales and gives a fair and honest view of the trend. This comparison shows how simply adjusting the axis can completely change how people interpret the same data.




---


**5. Supplementary Activity:**

Visual Truth Challenge

Create a small project where you visualize a real-world dataset (e.g., population, income, environmental data).

1. Detect and correct at least two forms of distortion (missing data, bias, or misleading scaling).

2. Annotate your charts with titles and labels explaining your corrections.

3. Reflect on how ethical visualization improves trust and understanding.

In [None]:
#sample real-world-like dataset
data = {
    'Year': np.repeat(np.arange(2015, 2025), 3),
    'Region': ['North', 'South', 'East'] * 10,
    'Sales': [
        100, 150, 120, 110, 160, 130, 120, 170, 140, 130, 180, 150, 140, 190, 160,
        150, 200, 170, 160, 210, 180, 170, np.nan, 190, 180, 230, 200, 190, 240, 210
    ],
    'Profit': [
        20, 30, 25, 22, 32, 27, 24, 35, 30, 26, 38, 33, 28, 40, 35,
        30, 42, 37, 32, 45, 40, 35, 48, 43, 38, 50, 45, 40, 52, 47
    ]
}

real_world_df = pd.DataFrame(data)
display(real_world_df.head())

# Identify missing data
print("Missing Data per Column:")
print(real_world_df.isnull().sum())

# Handle missing data (e.g., interpolate 'Sales')
real_world_df['Sales'] = real_world_df['Sales'].interpolate()

print("\nDataFrame after handling missing data:")
display(real_world_df.head())

# misleading chart
deceptive_sales_chart = px.line(real_world_df, x='Year', y='Sales', color='Region',
                                title='Deceptive Sales Trend by Region (Truncated Axis)')
deceptive_sales_chart.update_yaxes(range=[100, real_world_df['Sales'].max() + 10]) # Truncate the y-axis
deceptive_sales_chart.show()

#  truthful chart
truthful_sales_chart = px.line(real_world_df, x='Year', y='Sales', color='Region',
                               title='Truthful Sales Trend by Region (Proper Scale)')
truthful_sales_chart.update_yaxes(range=[0, real_world_df['Sales'].max() + 10]) # Start y-axis from 0
truthful_sales_chart.show()

Unnamed: 0,Year,Region,Sales,Profit
0,2015,North,100.0,20
1,2015,South,150.0,30
2,2015,East,120.0,25
3,2016,North,110.0,22
4,2016,South,160.0,32


Missing Data per Column:
Year      0
Region    0
Sales     1
Profit    0
dtype: int64

DataFrame after handling missing data:


Unnamed: 0,Year,Region,Sales,Profit
0,2015,North,100.0,20
1,2015,South,150.0,30
2,2015,East,120.0,25
3,2016,North,110.0,22
4,2016,South,160.0,32


Reflect on Ethical Visualization

Ethical visualization is important for building trust and ensuring accurate understanding of data. By identifying and correcting distortions like missing data, bias, and misleading scaling, we are able to present a more truthful representation of the information. This transparency allows viewers to make informed decisions and prevents manipulation of perception. When visualizations are honest and clear, they enhance credibility and facilitate a deeper and more accurate understanding of complex data.

**6. Conclusion/Learnings/Analysis:**

*Type it here

**6. Conclusion/Learnings/Analysis:**

This activity demonstrated the importance of telling the truth with data visualization by highlighting potential sources of distortion and methods for creating ethical and truthful charts. We learned to identify and handle missing data through techniques like interpolation, understanding how such omissions can otherwise mislead interpretation. The tasks also illustrated how bias, whether through selective data inclusion or other means, can significantly skew perceptions of trends and comparisons. Furthermore, we explored how deceptive design choices, particularly axis manipulation, can visually distort the magnitude of changes in data. By creating corrected, truthful visualizations alongside the deceptive ones, the activity reinforced the impact of proper scaling and complete data representation. The supplementary activity provided an opportunity to apply these concepts to a real-world-like dataset, emphasizing the practical application of detecting and correcting distortions. Ultimately, this activity underscored that ethical visualization is not just about presenting data, but about doing so in a manner that builds trust and facilitates accurate understanding, preventing the manipulation of insights through visual means.