![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=oil-gas-prices/oil-gas-prices.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto’s Weekly Data Visualization

## Oil Gas Prices 

### Recommended Grade levels: 6-12
<br>

### Instructions
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.<br> 

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

# Question

**Is the fuel costs in Canada reaching *record* highs?**

On March 3rd, 2022, [CTV](https://atlantic.ctvnews.ca/gas-prices-reach-record-highs-across-canada-1.5805183) reported that gas prices have reached record highs across the country. Is this claim true? If so, how dramatic is the increase in fuel costs over the last few years?


### Goal
Our goal is to show that fuel gas costs have reached record highs, based on all provinces and three different types of gases (regular, premium, diesel). 

We will use line and bar graphs to visually represent the data in an informative way. 

# Gather

### Code:
The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
## import libraries
%pip install -q pyodide_http plotly nbformat
import pyodide_http
pyodide_http.patch_all()
import pandas as pd
import plotly.graph_objects as go
import random
import plotly.subplots as sp
import datetime as dt
import os

### Data:

There are several online sources for fuel costs, including [Statistics Canada](https://www.statcan.gc.ca/en/start). The dataset is found here: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1810000101. 

From Statistics Canada website, we can find fuel costs across 12 provinces in Canada from 1979 to 2022. The data can be downloaded as a CSV (comma separated values) files. To download the data, click **Download options** and choose **CSV (Download selected data (for database loading)**. 

For your convenience, we have already downloaded the following files to accompany this notebook: 
- regular_gas_df.csv 

Pre-downloaded data has fuel costs recorded from 1990 to 2022 March. Some of the data was missing between 1979 and 1990 and thus excluded from analysis.


To see how the price of fuel has changed relative to other products, we compared change in fuel prices to changes in **Big Mac** prices in Canada. Big Mac price record is found [here](https://github.com/TheEconomist/big-mac-data), and we will be using Big Mac price data from 2011 to 2022. 

For your convenience, we have already downloaded the following files to accompany this notebook: 
- big_mac_index.csv 

Big Mac price has been adjusted based on the GDP value of each year.

### Import the data

In [None]:
## import data
gas_path = os.path.join('data', 'regular_gas_df.csv')
#read the csv file in and save it as a pandas dataframe
df = pd.read_csv(gas_path)
df

### Comment on the data
The dataframe above has the categories we want encoded in the index and broken down into granular sub categories and has no separation between data points separated by a millennia or a year.


# Organize

The code below will arrange the data cleanly so that we can do analysis on it. This is a quality control step for our data and involves examining the data to detect anything odd with the data (e.g. structure, missing values), fixing the oddities, and checking if the fixes worked. 

First, we want to pick out few variables that would help us answer our question. We are interested in observing fuel costs change over the years, so we choose `REF_DATE`, `GEO`, and `VALUE` as our variables of interests. This process is called as **feature selection**, and it will make the data look much cleaner than it is now. 

After choosing the variables of interests, we proceed to clean the data by each **province**. In this dataset, we are provided with monthly average of fuel costs across 12 provinces. 

In [None]:
# data cleaning
def provinceAverage(df):
    df = df[["REF_DATE", "GEO", "VALUE"]]
    df = df.set_index(["REF_DATE"])
    df = df.pivot(columns = "GEO")
    df.columns = df.columns.droplevel()
    del df["Ottawa-Gatineau, Ontario part, Ontario/Quebec"]
    
    province = ["Alberta", "British Columbia", "Ontario", "Quebec", "Saskatchewan"]
    
    # Calculate provincial means of gas/oril prices
    for p in province:
        prov = df.columns[df.columns.str.contains(str(p))][[0,-1]]
        prov_no = pd.DataFrame(prov).shape[0]
        prov_cols = df.columns[df.columns.str.contains(str(p))]
        df[str(p)] = df.loc[:,prov_cols].sum(axis=1) / prov_no
    
    if "Canada" in df:
        del df["Canada"]
        del df["Thunder Bay, Ontario"]
        
    if "Thunder Bay, Ontario" in df:
        del df["Thunder Bay, Ontario"]
    
    # Reorganize data; delete duplicate cities
    df = df.drop(["Calgary, Alberta", "Edmonton, Alberta", "Montréal, Quebec", "Québec, Quebec",
           "Regina, Saskatchewan", "Vancouver, British Columbia", "Victoria, British Columbia",
           "Toronto, Ontario", "Saskatoon, Saskatchewan"], axis=1)
    
    # Rename remaining cities as provincial representatives 
    df = df.rename(columns = {"Charlottetown and Summerside, Prince Edward Island":"Prince Edward Island", 
                              "Halifax, Nova Scotia":"Nova Scotia", 
                              "Saint John, New Brunswick":"New Brunswick", 
                              "St. John's, Newfoundland and Labrador":"Newfoundland and Labrador", 
                              "Whitehorse, Yukon":"Yukon", 
                              "Winnipeg, Manitoba":"Manitoba", 
                              "Yellowknife, Northwest Territories":"Northwest Territories"})
    
    df = df.reset_index(level=0)
    df = df.melt(id_vars = ["REF_DATE"],
                      var_name = "GEO",
                      value_name = "Value")
    
    return df

provAvg = provinceAverage(df)
provAvg

Second, we take the **yearly average** across all provinces. This provides us with a clearer depiction of how fuel costs rise each year. 

In [None]:
def yearlyAverage(df):
    df["Year"] = pd.to_datetime(df["REF_DATE"]).dt.year
    df = df.drop(columns=["REF_DATE"])
    yearlyAvg = df.groupby(["Year","GEO"]).mean()
    price_yearly = yearlyAvg.sort_values(by=["Year"])
    price_yearly.rename(columns = {"Value":"Price"}, inplace=True)
    price_yearly = price_yearly.reset_index()
    return price_yearly

yearlyAvg = yearlyAverage(provAvg)
yearlyAvg

### Comment on the data
We notice now there are 396 rows with 3 columns. We have organized data to capture yearly average across Canada by province. 

# Explore

The code below will be used to help us look for evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.

We will start by displaying change in regular gas prices by province. Click on the **legend** to select a province. 

In [None]:
prov = yearlyAvg["GEO"].unique()
fig = go.Figure()

markers = ["circle","square","diamond","cross","x","triangle-up","triangle-down","triangle-left","triangle-right","star","hexagram","hourglass"]
dash_styles = ["solid","dash","dot"]

for i, province in enumerate(prov):
    fig.add_trace(go.Scatter(x=yearlyAvg[yearlyAvg["GEO"].str.contains(str(province))]["Year"],
                            y=yearlyAvg[yearlyAvg["GEO"].str.contains(str(province))]["Price"],
                            name=str(province), mode="lines+markers", line={"dash":dash_styles[i%3]}, 
                            marker_symbol=markers[i], marker_size=10))

fig.for_each_trace(lambda trace: trace.update(visible=True) if trace.name=="Alberta" else (trace.update(visible="legendonly")))

fig.update_layout(xaxis_title="Year", yaxis_title="Cents per Litre (CAD)",
                    title_text="Gas Prices Across Canada, by Province", title_x=0.5, title_y = 0.9,
                    hovermode="x unified", showlegend=True)
fig.update_yaxes(range=(yearlyAvg["Price"].min()-10, yearlyAvg["Price"].max()))

#fig.write_html("./visualizations/fig1-regularGasByProvince.html")
fig.show()

In [None]:
def highestPrice(df):
    prov = df["GEO"].unique()
    
    for province in prov:
        maxPrice = df[df["GEO"].str.contains(str(province))]["Price"].max()
        maxYear = df[df["GEO"].str.contains(str(province))].loc[df["Price"]==maxPrice, ["Year"]].iloc[0]["Year"]
        print(str(province), "has the highest fuel cost of", int(maxPrice), "cents per litre at year", maxYear.astype(str) + ".")
    
highestPrice(yearlyAvg)

We observe that for all provinces, annual average fuel prices to date in 2022 are their **highest average price** since 1990.

# Interpret

We want to confirm that fuel costs are reaching record highs in 2022. Indeed, from the graph and the code above, we conclude that regular fuel costs reached its peak in 2022 in all provinces. What about two other types of gases, *premium* and *diesel*? 

Now we want to look at all three types of gases, and see if fuel costs have reached the highest in all provinces for all gas types. 

In [None]:
# Get the national average for regular gas across all provinces.
def nationalAverage(df, name):
    df = df.groupby(["Year"]).mean(["Price"])
    df.columns = [name]
    return df

regularGasAverage = nationalAverage(yearlyAvg, "Regular")
regularGasAverage.head()

In [None]:
# Get the national average for premium and diesel gases across all provinces.
def nationalAverageByGas(df,name):
    provincialAverage=provinceAverage(df)
    yearlyprovincialAverage=yearlyAverage(provincialAverage)
    nationalYearlyAverage=nationalAverage(yearlyprovincialAverage, name)
    return nationalYearlyAverage

regular_df = nationalAverageByGas(df, "Regular")
premium_df = nationalAverageByGas(pd.read_csv("data/premium_gas_df.csv"), "Premium")
diesel_df = nationalAverageByGas(pd.read_csv("data/diesel_gas_df.csv"), "Diesel")

# Combine all three types of gases into a single dataframe. 
allThreeGas = pd.concat([regular_df, premium_df, diesel_df], axis=1)
allThreeGas = allThreeGas.reset_index()
allThreeGas.head()

In [None]:
# Organize the dataframe to make it easier to be plotted. 
allThreeGas = allThreeGas.melt(id_vars = ["Year"],
                               var_name = "Type",
                               value_name = "Price")
allThreeGas

In [None]:
fig1 = go.Figure()
fig1.add_trace(go.Bar(x=allThreeGas["Year"],
                    y=allThreeGas[allThreeGas["Type"].str.contains("Regular")]["Price"],
                    name="Regular", visible=True))

fig1.add_trace(go.Bar(x=allThreeGas["Year"],
                    y=allThreeGas[allThreeGas["Type"].str.contains("Premium")]["Price"],
                    name="Premium", visible=False))
fig1.add_trace(go.Bar(x=allThreeGas["Year"],
                    y=allThreeGas[allThreeGas["Type"].str.contains("Diesel")]["Price"],
                    name="Diesel", visible=False))
fig1.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=list([
                dict(
                    args=[{"visible":[True, False, False]},
                          {"title": "Yearly Average Regular Gas Price"}],
                    label="Regular",
                    method="update"
                ),
                dict(
                    args=[{"visible":[False, True, False]},
                          {"title": "Yearly Average Premium Gas Price"}],
                    label="Premium",
                    method="update"
                ),
                dict(
                    args=[{"visible":[False, False, True]},
                          {"title": "Yearly Average Diesel Gas Price"}],
                    label="Diesel",
                    method="update"
                ),
                dict(
                    args=[{"visible":[True, True, True]},
                          {"title": "Yearly Change In Gas Price by Type"}],
                    label="View All",
                    method="update"
                )
            ]),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.11,
            xanchor="left",
            y=1.15,
            yanchor="top"
        ),
    ], 
    showlegend=True,legend=dict(yanchor="top", y=1, xanchor="left", x=0.01, bgcolor='rgba(0,0,0,0)'),
    annotations=[dict(text="Gas type:",x=0, y=1.09, yref="paper", xref="paper", align="left", showarrow=False)],
    xaxis_title="Year", yaxis_title="Cents per Litre (CAD)",
    title_text="National Average In Gas Price by Type", title_x=0.5,
    hovermode="x unified",barmode="group")
fig1.update_yaxes(range=(allThreeGas["Price"].min()-10, 180))
#fig1.write_html("./visualizations/fig2-allGasTypes.html")
fig1.show()

Notice that average annual fuel costs are the highest in 2022 for all three types of gases (regular gas, premium gas, and diesel gas).

## Compare with Big Mac Index

In [None]:
# Download Big Mac data and undergo feature selection
bigmac_df = pd.read_csv("data/big_mac_index.csv")
bigmac_df = bigmac_df[bigmac_df["iso_a3"].str.contains("CAN")]
bigmac_dates = bigmac_df.date.str.split(pat='-',expand=True)
bigmac_dates.columns = ["Year", "Month", "Day"]
bigmac_df = bigmac_df[["date", "local_price"]]
bigmac = pd.concat([bigmac_dates, bigmac_df], axis=1)
bigmac = bigmac[["Year", "local_price"]]
bigmac = bigmac.groupby(["Year"]).mean()
bigmac = bigmac.reset_index()

# Combine Big Mac data and regular gas price data from 2011 to 2022
regular_df = regular_df.reset_index()
regular_df = regular_df.astype({"Year":int})
gas_shortened = regular_df.loc[regular_df.Year > 2010]
gas_shortened = gas_shortened[["Regular"]]
gas_shortened = gas_shortened.reset_index()
bigmacC = pd.concat([bigmac, gas_shortened], axis=1)
bigmacC = bigmacC.set_index("Year")
bigmacC = bigmacC.drop(["index"],axis=1)
bigmacC.columns = ["BigMac", "Gas"]

# Calculate how many litres of regular fuel you can buy with that year's Big Mac price
bigmacC["Gas(1L)"] = bigmacC["Gas"] / 100
bigmacC["numLiters"] = bigmacC["BigMac"] / bigmacC["Gas(1L)"]
bigmacC = bigmacC.reset_index()
bigmacC = bigmacC[["Year", "BigMac", "Gas(1L)", "numLiters"]]
bigmacC.numLiters = bigmacC.numLiters.astype(int)
bigmacC

In [None]:
fig2 = sp.make_subplots(
    rows=2, cols=1,
    specs=[[{"secondary_y": True}],
          [{"secondary_y": False}]],
    subplot_titles=("Big Mac vs. Regular Gas Price in Canada","Number of Litres of Gas that One Big Mac Can Buy"),
    vertical_spacing = 0.15)

fig2.add_trace(go.Scatter(x=bigmacC["Year"], y=bigmacC["BigMac"], name="Big Mac Price", showlegend=True, legendgroup="group1"),
                 row=1, col=1, secondary_y=False)
fig2.add_trace(go.Scatter(x=bigmacC["Year"], y=bigmacC["Gas(1L)"], name="Gasoline Price", showlegend=True, legendgroup="group2", marker_symbol='square'),
                 row=1, col=1, secondary_y=True)
fig2.add_trace(go.Bar(x=bigmacC["Year"], y=bigmacC["numLiters"], name="# of Litres", legendgroup="group3", showlegend=False),
                 row=2, col=1, secondary_y=False)

fig2.update_xaxes(title_text="Year", row=1, col=1)
fig2.update_xaxes(title_text="Year", row=2, col=1)
fig2.update_yaxes(title_text="Big Mac Price (CAD)", row=1, col=1, secondary_y=False)
fig2.update_yaxes(title_text="Gasoline Price per Litre (CAD)", row=1, col=1, secondary_y=True)
fig2.update_yaxes(title_text="Litres", row=2, col=1)

fig2.add_hline(y=3, line_width=1.5, line_dash="dot", line_color="gray",row=2, col=1)

fig2.update_annotations(font_size=15)


fig2.update_layout(hovermode="x unified", font=dict(size=10),
                  legend=dict(yanchor="top", y=1, xanchor="left", x=0.01, bgcolor='rgba(0,0,0,0)'),
                  showlegend= True, title_x = 0.5, height=700
                  )
#fig2.write_html("./visualizations/fig3-compareWithBigMac.html")
fig2.show()

Notice that between the year 2011 to 2014, regular fuel price remained somewhat steady as Big Mac price increased dramatically. Big Mac price showed continual increase, whereas regular gas price showed more fluctuations. 

In 2022, we can buy **4L of regular fuel** with **one Big Mac**. However, in 2020, we could have bought up to 6L with one Big Mac. We can conclude that regular fuel price rapidly increased over the last two years. 

# Communicate

Below are some writing prompts to help you reflect on the new information that is presented from the data. When we look at the evidence, think about what you perceive about the information. Is this perception based on what the evidence shows? If others were to view it, what perceptions might they have?

- I used to think ____________________but now I know____________________. 
- I wish I knew more about ____________________. 
- This visualization reminds me of ____________________. 
- I really like ____________________.


[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)