jupytext

kernelspec

orphan

text_representation

extension	format_name
.md	myst

display_name	language	name
Python 3	python	python3

true

Problem Set 7

import matplotlib.colors as mplc
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import pandas as pd
import qeds

Question 1

From {doc}Data Visualization: Rules and Guidelines <../applications/visualization_rules>

Create a bar chart of the below data on Canadian GDP growth. Use a non-red color for the years 2000 to 2008, red for 2009, and the first color again for 2010 to 2018.

ca_gdp = pd.Series(
    [5.2, 1.8, 3.0, 1.9, 3.1, 3.2, 2.8, 2.2, 1.0, -2.8, 3.2, 3.1, 1.7, 2.5, 2.9, 1.0, 1.4, 3.0],
    index=list(range(2000, 2018))
)

fig, ax = plt.subplots()

for side in ["right", "top", "left", "bottom"]:
    ax.spines[side].set_visible(False)

Question 2

From {doc}Data Visualization: Rules and Guidelines <../applications/visualization_rules>

Draft another way to organize time and education by modifying the code below. That is, have two subplots (one for each education level) and four groups of points (one for each year).

Why do you think they chose to organize the information the way they did rather than this way?

# Read in data
df = pd.read_csv("https://datascience.quantecon.org/assets/data/density_wage_data.csv")
df["year"] = df.year.astype(int)  # Convert year to int


def single_scatter_plot(df, year, educ, ax, color):
    """
    This function creates a single year's and education level's
    log density to log wage plot
    """
    # Filter data to keep only the data of interest
    _df = df.query("(year == @year) & (group == @educ)")
    _df.plot(
        kind="scatter", x="density_log", y="wages_logs", ax=ax, color=color
    )

    return ax

# Create initial plot
fig, ax = plt.subplots(1, 4, figsize=(16, 6), sharey=True)

for (i, year) in enumerate(df.year.unique()):
    single_scatter_plot(df, year, "college", ax[i], "b")
    single_scatter_plot(df, year, "noncollege", ax[i], "r")
    ax[i].set_title(str(year))

Questions 3-5

These question uses a dataset from the Bureau of Transportation Statistics that describes the cause for all US domestic flight delays in November 2016. We used the same data in the previous problem set.

air_perf = qeds.load("airline_performance_dec16") #[["CRSDepTime", "Carrier", "CarrierDelay", "ArrDelay"]]
air_perf.info()
air_perf.head

The following questions are intentionally somewhat open-ended. For each one, carefully choose the type of visualization you'll create. Put some effort into choosing colors, labels, and other formatting.

Question 3

Create a visualization of the relationship between airline (carrier) and delays.

Question 4

Create a visualization of the relationship between date and delays.

Question 5

Create a visualization of the relationship between location (origin and/or destination) and delays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem_set_7.md

problem_set_7.md

Problem Set 7

Question 1

Question 2

Questions 3-5

Question 3

Question 4

Question 5

Files

problem_set_7.md

Latest commit

History

problem_set_7.md

File metadata and controls

Problem Set 7

Question 1

Question 2

Questions 3-5

Question 3

Question 4

Question 5