Programming Assignment 4 – Data Wrangling and Visualization

Author: SAEZ, Eljenzal Hoper U.
Course: Advanced Computer Programming and Algorithms / ECE2112

📌 Description

This Jupyter Notebook presents solutions to a data-driven storytelling task focused on data wrangling and data visualization using Python and Pandas. The experiment explores how structured data can be cleaned, filtered, and visualized to reveal meaningful insights—especially in the context of academic performance.

Key objectives include:

Constructing multiple filtered DataFrames based on conditions like track, gender, hometown, and subject scores
Applying logical indexing and conditional selection to extract relevant subsets
Visualizing how features such as track, gender, and hometown influence average grades
Saving processed data for reproducibility and future analysis

This task emphasizes the power of data visualization — turning raw exam data into structured insights that can inform decisions and highlight trends.

⚙️ Requirements

Python 3.x (Recommended version: 3.8 or higher)
Jupyter Notebook
Numpy
Pandas
Matplotlib (for basic plotting)
Seaborn (for enhanced statistical visualizations)

▶️ How to Run

Install Jupyter Notebook if not already installed.
Open the notebook file: EXPERIMENT_4_DataWrangling_Visualization.ipynb
Run each cell one by one to see the results.
Check the tables and graphs shown in the notebook.
Look in the folder for any saved files like charts or cleaned data.

💡 Code Explanation

01.a ECE BOARD EXAM PROBLEM

# DataFrame Representation

instru = pd.DataFrame(boards, columns=['Name','GEAS','Electronics','Track','Hometown']) # set the data frame in order for it to only show Name, Geas, Electronics
# Track and Hometown will be filtered out later since they're constant

x = instru.loc[(instru['Electronics'] > 70) & (instru['Hometown']=='Luzon') & (instru['Track']=='Instrumentation'), ['Name','GEAS','Electronics']] # use .loc to find the Electronics column to be > 70, 
# find Hometown which is equal to Luzon, and Track which is equal to Instrumentation
# only show Name, GEAS, and Electronics in DataFrame set.

x

instru = pd.DataFrame(boards, columns=['Name','GEAS','Electronics','Track','Hometown'])

Creates a new DataFrame named instru from the existing data boards.

x = instru.loc[
    (instru['Electronics'] > 70) &
    (instru['Hometown'] == 'Luzon') &
    (instru['Track'] == 'Instrumentation'),
    ['Name','GEAS','Electronics']
]

Filters the instru DataFrame to find specific students based on three conditions.

- Condition 1: instru['Electronics'] > 70 → selects students with Electronics scores above 70.

- Condition 2: instru['Hometown'] == 'Luzon' → limits results to students from Luzon.

- Condition 3: instru['Track'] == 'Instrumentation' → focuses only on those in the Instrumentation track.

- Final Output Columns: Only shows 'Name', 'GEAS', and 'Electronics' in the result.

01.b ECE BOARD EXAM PROBLEM

Mindy = pd.DataFrame(boards, columns=['Name','Gender','Track','Math','Electronics','GEAS','Communication','Hometown']) # set the data frame in order for it to only show Name, Track, Electronics
# Hometown and Gender  will be filtered out later since they're constant

Mindy['Average'] = Mindy[['Math', 'Electronics', 'GEAS', 'Communication']].mean(axis=1) # make a new column with the mean of Math, Electronics, GEAS, and Communication in each row.

Mindydata = Mindy.loc[(Mindy['Average'] >= 55) & 
        (Mindy['Hometown']=='Mindanao') & 
        (Mindy['Gender']=='Female'), 
        ['Name','Track','Electronics','Average']] # use .loc so that it will only show the specific parameters needed.

Mindydata

Mindy = pd.DataFrame(boards, columns=['Name','Gender','Track','Math','Electronics','GEAS','Communication','Hometown'])

Creates a new DataFrame named Mindy from the original dataset boards.

Mindy['Average'] = Mindy[['Math', 'Electronics', 'GEAS', 'Communication']].mean(axis=1)

Adds a new column called 'Average' to the Mindy DataFrame.

Mindydata = Mindy.loc[
    (Mindy['Average'] >= 55) &
    (Mindy['Hometown'] == 'Mindanao') &
    (Mindy['Gender'] == 'Female'),
    ['Name','Track','Electronics','Average']
]

Filters the Mindy DataFrame to find specific students based on three conditions.

- Condition 1: Average >= 55 → selects students with solid overall performance.

- Condition 2: Hometown == 'Mindanao' → focuses on students from Mindanao.

- Condition 3: Gender == 'Female' → limits results to female students.

- Final Output Columns: Displays only 'Name', 'Track', 'Electronics', and 'Average'.

02 ECE BOARD EXAM PROBLEM VISUALIZATION

gender = Mindy.groupby('Gender')['Average'].mean()
track = Mindy.groupby('Track')['Average'].mean()
hometown = Mindy.groupby('Hometown')['Average'].mean()

plt.figure(figsize=(12, 6)) # resize the bar graph to make it readable

bars = plt.bar(['Male', 'Female', 'Instrumentation', 'Communication', 'Microelectronic', 'Luzon', 'Visayas', 'Mindanao'], list(gender.values)+list(track.values)+list(hometown.values))
#  makes a bar graph, the x labels the combination of categories, and y labels the the average scores.

plt.xlabel('Categories') # add x label as categories
plt.ylabel('Average Score') # add y label as average score
plt.title('Graph for Tracks, Gender, and Hometown') # add title as graph for tracks, gender, and hometown

# show the plot
plt.tight_layout()
plt.show()

gender = Mindy.groupby('Gender')['Average'].mean()
track = Mindy.groupby('Track')['Average'].mean()
hometown = Mindy.groupby('Hometown')['Average'].mean()

Calculate the average score for each gender.
Compute the average score for each academic track.
Find the average score for each hometown region.

plt.figure(figsize=(12, 6))

Set up the plot canvas size

bars = plt.bar(
    ['Male', 'Female', 'Instrumentation', 'Communication', 'Microelectronic', 'Luzon', 'Visayas', 'Mindanao'],
    list(gender.values) + list(track.values) + list(hometown.values)

Draw the bar chart combining three category sets.
The x-axis labels are a single list of all category names: genders, tracks, and hometowns.
The y-axis heights are the corresponding mean values from gender, track, and hometown.

- 🌱“Failure is not the opposite of success; it’s part of success.”

📝 Commitments

v1.0 – Initial draft
- Loaded and cleaned the dataset
- Built initial filtered DataFrames
v1.1 – Rechecking
- Calculated average scores and created the combined bar chart
v1.2 – Final polish
- Added clear explanations and finalized the README layout

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
SAEZ_PA4.ipynb		SAEZ_PA4.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Programming Assignment 4 – Data Wrangling and Visualization

📌 Description

⚙️ Requirements

▶️ How to Run

💡 Code Explanation

01.a ECE BOARD EXAM PROBLEM

01.b ECE BOARD EXAM PROBLEM

02 ECE BOARD EXAM PROBLEM VISUALIZATION

- 🌱“Failure is not the opposite of success; it’s part of success.”

📝 Commitments

About

Uh oh!

Releases

Packages

Languages

HopeTechDev/Programming-Assignment-4

Folders and files

Latest commit

History

Repository files navigation

Programming Assignment 4 – Data Wrangling and Visualization

📌 Description

⚙️ Requirements

▶️ How to Run

💡 Code Explanation

01.a ECE BOARD EXAM PROBLEM

01.b ECE BOARD EXAM PROBLEM

02 ECE BOARD EXAM PROBLEM VISUALIZATION

- 🌱“Failure is not the opposite of success; it’s part of success.”

📝 Commitments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages