TODO: Click on the badge, then `Copy to Drive` before continuing.


TODO: double click and replace this text with your name

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/allegheny-college-cmpsc-105-spring-2025/data-interactivity-starter/blob/main/data_interactivity.ipynb)

# Data Interactivity Lab

---

In this lab, you will explore [World Happiness Data](https://zenodo.org/records/14826965). The World Happiness Report Team annually collects data from individuals around the world about their level of happiness and other judgements they make about their life. The team analyzes those data and produces reports that relate happiness ratings to other life factors. See their work for further details (https://worldhappiness.report/)

What is in this CMPSC 105 lab has already been de-identified and aggregated by country of residence. While this is not the raw data, there are still many mysteries contained within.

Your job is to make interactive displays that facilitate data exploration and lead to insights through visual comparisons.

In this lab, you have access to happiness data for approximately 155 locations over 5 years, 2015-2019. For each location and each year, there are several additional variables such as the Happiness Rank, Happiness Score, GDP per capita, Average social support, Average healthy life expectancy, Average freedom to make life choices, Average generosity level, and Average perceptions of corruption.

The variables for the five years likely have similar definitions to those quoted below from https://worldhappiness.report/ed/2023/world-happiness-trust-and-social-connections-in-times-of-crisis/:


>GDP per capita is in terms of Purchasing Power Parity (PPP) adjusted to constant 2017 international dollars, taken from the World Development Indicators (WDI) by the World Bank (version 17, metadata last updated on January 22, 2023). See Statistical Appendix 1 for more details. GDP data for 2022 are not yet available, so we extend the GDP time series from 2021 to 2022 using country-specific forecasts of real GDP growth from the OECD Economic Outlook No. 112 (November 2022) or, if missing, from the World Bank’s Global Economic Prospects (last updated: January 10, 2023), after adjustment for population growth. The equation uses the natural log of GDP per capita, as this form fits the data significantly better than GDP per capita.

>The time series for healthy life expectancy at birth are constructed based on data from the World Health Organization (WHO) Global Health Observatory data repository, with data available for 2005, 2010, 2015, 2016, and 2019. To match this report’s sample period (2005-2022), interpolation and extrapolation are used. See Statistical Appendix 1 for more details.

>Social support is the national average of the binary responses (0=no, 1=yes) to the Gallup World Poll (GWP) question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

>Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

>Generosity is the residual of regressing the national average of GWP responses to the donation question “Have you donated money to a charity in the past month?” on log GDP per capita.

>Perceptions of corruption are the average of binary answers to two GWP questions: “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure.

Code is provided to load in the dataset into a Pandas dataframe called `dfwh`. The column names for the dataframe are `Country,Happiness Rank,Happiness Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Year`. The raw data can be viewed here:

- https://raw.githubusercontent.com/allegheny-college-cmpsc-105-spring-2025/data-interactivity-starter/refs/heads/main/data/world-happiness-2015-2019-zenodo-14826965.csv

## Goals

Your goal is to use data filtering and interactivity techniques to isolate specific parts of the data, and then make visual comparisons

1. As a warm-up, explore the precoded figure.
2. Then do the follow-up explorations to gain additional understanding of the relationships between the GDP and happiness using two chart types.

Don't forget to answer all `T O D O` markers and to remove the markers.

## Deadline

This lab should be completed during the lab meeting time, but modifations may be made until the official deadline of 2:30pm on Feb 20th.

## Learning Outcomes

By completing this lab you will

- explore a dataset with interactive filtering
- write a report in a markdown cell
- update a GitHub Repository
- check gatorgrade in GitHub Actions

## Reminders

- If needed, one of four available automatic extension tokens can be applied by using the form on found on the course syllabus.
- If you have a question, post in the 105 Discord channel.
- Refer to previous materials to learn how to use Notebooks in Google Colab, make your work accessible, submit, and check your work.

## Saving Your Work

Saving your work and making it accessible is a required part of all labs.

### Accessability Part 1

Update the Colab file name to be `data_interactivity.ipynb`.

Make sure that all cells have been executed and are SHOWING their outputs.

### Saving

I would recommend a workflow of `Copy to Drive --> editing/running/testing --> downloading`. You should have already hit `Copy to Drive`, but if you haven't, then please do so now. Then go ahead and download the file from the menu at the top of the Colab interface. File > Download > Download ipynb

### Accessibility Part 2

If you are satisfied with your work, you can now upload the downloaded file to your GitHub repo!

- Use GitHub to upload the file.
  - Next to the green code button in GitHub, you will see an `add file` button. Click that, then choose the downloaded notebook from your computer.

## Checking Your Work

### Check your file output

Once your file is uploaded, please open it on GitHub. All the outputs and plots should be showing. If they are not showing, this means that not all code cells were executed before you downloaded your file.

### Check GitHub Actions

In GitHub, go to the Actions tab. Click on the top Action to see a report about your T O D Os. If the T O D Os have been done and deleted, then the actions will show that gatorgrade reports 100%. This is not your grade, but it is a good indication that you did not forget to do something.

If you see remaining T O D Os, then you may want to edit and update your file.

## Warmup

In [None]:
# This code cell makes pre-existing tools available for data exploration

import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact # pre existing code for interaction

In [None]:
# TODO: Run this code cell that creates a pandas dataframe for Meadville

dfwh = pd.read_csv(
    'https://raw.githubusercontent.com/allegheny-college-cmpsc-105-spring-2025/data-interactivity-starter/refs/heads/main/data/world-happiness-2015-2019-zenodo-14826965.csv',
    sep=',',
    names = ['Country',
             'Happiness Rank',
             'Happiness Score',
             'GDP per capita',
             'Social support',
             'Healthy life expectancy',
             'Freedom to make life choices',
             'Generosity',
             'Perceptions of corruption',
             'Year'],
    skiprows=1
)

print(dfwh.head()) # Display the first few rows of the DataFrame

In [None]:
### Please answer these questions before continuing.

# TODO: How many rows of data are there for dfwh (HINT: The answer is not 5)
# TODO: How many columns of data are there for dfwh?
# TODO: What is the range of the GDP column?
# TODO: According to the definition given above, what transformation has been done to GDP?

In [None]:
# The code below defines a plotting function that makes two subplots.
# TODO: Run this code. Briefly explain why nothing happens
# TODO: Looking at the code and the axis labels, fill in a suptitle (super title) for the figure


# Clear out memory (doesn't impact the visualization, but keeps the computer happy)
plt.close('all')

# predefine some useful masks
mask_2015 = dfwh['Year'] == 2015 # this marks rows from 2015 as True, all others False
mask_2016 = dfwh['Year'] == 2016
mask_2017 = dfwh['Year'] == 2017
mask_2018 = dfwh['Year'] == 2018
mask_2019 = dfwh['Year'] == 2019

def plot_happiness_rank(hrank1, hrank2):

  mask_hrank1 = dfwh['Happiness Rank'] == hrank1

  mask_hrank2 = dfwh['Happiness Rank'] == hrank2

  # make a blank canvas
  plt.figure(figsize=(12, 4))
  # TODO: make a super title starting with 'Figure 1:' Fill in a relevant title
  plt.suptitle('Figure 1: TODO')

  # subplot 1
  plt.subplot(1,2,1)
  plt.title(f'Ranking {hrank1}')
  plt.bar(dfwh.loc[mask_hrank1 & mask_2015,'Year'],
          dfwh.loc[mask_hrank1 & mask_2015,'Happiness Score'],
          label=dfwh.loc[mask_hrank1 & mask_2015,'Country']) # layer 1
  plt.bar(dfwh.loc[mask_hrank1 & mask_2016,'Year'],
          dfwh.loc[mask_hrank1 & mask_2016,'Happiness Score'],
          label=dfwh.loc[mask_hrank1 & mask_2016,'Country']) # layer 2
  plt.bar(dfwh.loc[mask_hrank1 & mask_2017,'Year'],
          dfwh.loc[mask_hrank1 & mask_2017,'Happiness Score'],
          label=dfwh.loc[mask_hrank1 & mask_2017,'Country']) # layer 3
  plt.bar(dfwh.loc[mask_hrank1 & mask_2018,'Year'],
          dfwh.loc[mask_hrank1 & mask_2018,'Happiness Score'],
          label=dfwh.loc[mask_hrank1 & mask_2018,'Country']) # layer 4
  plt.bar(dfwh.loc[mask_hrank1 & mask_2019,'Year'],
          dfwh.loc[mask_hrank1 & mask_2019,'Happiness Score'],
          label=dfwh.loc[mask_hrank1 & mask_2019,'Country']) # layer 5

  plt.legend()
  plt.xlabel('Year')
  plt.ylabel('Happiness Score')
  plt.ylim(0, 10)


  # subplot 2
  plt.subplot(1,2,2)
  plt.title(f'Ranking {hrank2}')
  plt.bar(dfwh.loc[mask_hrank2 & mask_2015,'Year'],
          dfwh.loc[mask_hrank2 & mask_2015,'Happiness Score'],
          label=dfwh.loc[mask_hrank2 & mask_2015,'Country']) # layer 1
  plt.bar(dfwh.loc[mask_hrank2 & mask_2016,'Year'],
          dfwh.loc[mask_hrank2 & mask_2016,'Happiness Score'],
          label=dfwh.loc[mask_hrank2 & mask_2016,'Country']) # layer 2
  plt.bar(dfwh.loc[mask_hrank2 & mask_2017,'Year'],
          dfwh.loc[mask_hrank2 & mask_2017,'Happiness Score'],
          label=dfwh.loc[mask_hrank2 & mask_2017,'Country']) # layer 3
  plt.bar(dfwh.loc[mask_hrank2 & mask_2018,'Year'],
          dfwh.loc[mask_hrank2 & mask_2018,'Happiness Score'],
          label=dfwh.loc[mask_hrank2 & mask_2018,'Country']) # layer 4
  plt.bar(dfwh.loc[mask_hrank2 & mask_2019,'Year'],
          dfwh.loc[mask_hrank2 & mask_2019,'Happiness Score'],
          label=dfwh.loc[mask_hrank2 & mask_2019,'Country']) # layer 5

  plt.legend()
  plt.xlabel('Year')
  plt.ylabel('Happiness Score')
  plt.ylim(0, 10)

  # reveal everything whenever the plotting function is called
  plt.show()

In [None]:
# This code sets up ranges over which hrank1 and hrank2 are adjustable!
# TODO: Run this cell

interact(plot_happiness_rank, hrank1=(1,156,1), hrank2=(1,156,1))

In [None]:
### Please answer these questions before continuing.

# Based on the code above:
# TODO: What column does mask_2015 base the comparison on and why?
# TODO: What column does mask_2016 base the comparison on and why?
# TODO: Why could those masks not reference a different column?
# TODO: What is the difference in code between subplot 1 and subplot 2
# TODO: What visually happens if you remove the y axis limits in the code from both subplots?
# TODO: The data are still accurate, but explain why the figure is less impactful
# TODO: Bring back the plt.ylim(0,10) for both subplots.

# Interact:
# TODO: Can you find any data inconsistencies? What?
# TODO: Adjust the hrank sliders until you find an interesting visual comparison
# TODO: Explain what you see, why what you see is interesting, and the slider settings required
# TODO: Move this explanation into your report later.

In [None]:
### Please ensure that you have participated on Discord this week.

# TODO: Please go to discord and see if your colleagues have posted questions.
# If you know the answer to an unanswered question, please respond!
# Otherwise, please post a question you have about this lab so far or a short description
# of what you found to be the most challenging part so far.

## Follow-up Exploration 1

In [None]:
# Now let's do some follow-up investigations.

# TODO: reuse/restructure/modify the previous code to explore how Happiness Score changes based on GDP.

# For the plotting function
# Make sure that you have given this plotting function a different name from the one above.
# Indented under the function,
# Make a mask to select countries with the natural log of GDP greater than a certain threshold.
# Make a mask to select countries with the natural log of GDP smaller or equal to some other threshold.
# The "certain threshold" and the "other threshold" should become adjustable parameters in the plotting function.

# In subplot 1
# Apply the first mask to retrieve the Happiness Score from those countries.
# Make a historgram of those countries' Happiness Score in 10 bins.
# You should use plt.hist and inside please specificy, bins=[1,2,3,4,5,6,7,8,9,10]
# This historgram is the only layer!
# Label the axes for a historgram
# No legend is needed for this subplot

# In subplot 2
# Apply the second mask to retrieve the Happiness Score from those countries.
# Make a historgram of those countries' Happiness Score in 10 bins.
# You should use plt.hist and inside please specificy, bins=[1,2,3,4,5,6,7,8,9,10]
# This historgram is the only layer!
# Label the axes for a historgram
# No legend is needed for this subplot




# TODO: declare the plotting function

  # TODO: mask_largeGDP = ...

  # TODO: mask_smallGDP = ...


  # TODO: make a blank canvas

  # TODO: make a super title starting with 'Figure 2:' Fill in a relevant title

  # subplot 1
  plt.subplot(1,2,1)
  # TODO: make the plot
  # TODO: don't forget labels

  # subplot 2
  plt.subplot(1,2,2)
  # TODO: make the plot
  # TODO: don't forget labels

  # TODO: reveal everything whenever the plotting function is called



In [None]:
# After the function definition
# Use the `interact` function to run your plotting function interactively!
# You will have to pass in the name of the plotting function
# You will also have to pass in the two threshold parameters and appropriate ranges
# You can find the range by making a simple plot of the GDP column, or by looking at the raw data


# Take a looks at your output, and make any adjustments needed in the plotting function or sliders
# For valid comparison, be sure to set x limits and y limits to avoid showing misleading visuals!

# TODO: Adjust the threshold sliders until you find an interesting visual comparison
# TODO: Explain what you see, why what you see is interesting, and the slider settings required
# TODO: Move this explanation into your report later.


## Follow-up Exploration 2

In [None]:
# Now repeat the code above, but make a box plot instead.
# This requires the plt.boxplot function.
# Bins are no longer needed
# Adjust the axis labels


In [None]:
# After the function definition
# Use the `interact` function to run your plotting function interactively!
# You will have to pass in the name of the plotting function
# You will also have to pass in the two threshold parameters and appropriate ranges
# You can find the range by making a simple plot of the GDP column, or by looking at the raw data

# Take a look at your output, and make any adjustments needed in the plotting function or sliders
# For valid comparison, be sure to set x limits and y limits as needed to avoid showing misleading visuals!

# TODO: Adjust the threshold sliders until you get the same visual comparison as with the histogram.
# TODO: Explain what you see, the slider settings required, and how it connects to the histogram representation
# TODO: Move this explanation into your report later.

## Report

### What this was lab about

TODO: Briefly summarize what this lab was about and what you did at a high level.

### Methods

TODO: Briefly talk about the methods you used to explore the data and any specific functions that helped reveal the patterns.

### What was found

TODO: Briefly restate what you found out about World Happiness by exploring the data. Please refer to the figures you made above. For example, "As shown in figure 1..."

### Future directions

TODO: Briefly state what could be explored within the data in the future, building on what you already did. Your answer must be something that you could actually do with the data, related to this lab.

