# Project Milestone Template

### Step 1a: Planning 
#### Identify the information in the file your program will read

Describe (all) the information that is available. Be sure to note any surprising or unusual features. (For example, some information sources have missing data, which may be blank or flagged using values like -99, NaN, or something else.)

<font color="blue">
    
Our information source is publicly available data about CPI and its 8 components for Canada: "food", "shelter", "household operations, furnishings, and equipment", "clothing and footwear", "transportation", "health and personal care", "recreation, education and reading", and "alcoholic beverages, tobacco products, and recreational cannabis". The data covers every month from Jan 2013 to Dec 2023. The data is from Stats Canada URL: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1810000401. As requested, I have uploaded the file with this submission which provides a full spectrum of data with no missing data that could cause errors. CPI measures the average price change for the basket of goods and services consumers buy in Canada and is a widely used indicator for inflation. Our question is: How much does the overall CPI change monthly and how does this change compare to two major components of CPI shelter and food which are necessities? All three are fields in my information source. This information is important in calibrating monetary policy, making economic decisions, and gauging economic health.

</font>

### Step 1b: Planning 
#### Brainstorm ideas for what your program will produce
#### Select the idea you will build on for subsequent steps

You must brainstorm at least three ideas for graphs or charts that your program could produce and choose the one that you'd like to work on. You can choose between a line chart, histogram, bar chart, scatterplot, or pie chart.

If you would like to change your project idea from what was described in the proposal, you will need to get permission from your project TA. This is intended to help ensure that your new project idea will meet the requirements of the project. Please see the project proposal for things to be aware of when communicating with your project TA.

<font color="blue">

Idea 1: Line graph.

Idea 2: bar chart.

Idea 3: scatterplot.

Which one would be the best to represent the data? 

For the bar chart, we would show each year's bar chart highlighting food and housing in different colors while the other 6 components are kept the same color. The height of the full bar graph would be the CPI and the different colors within each bar would be its components. 

Our goal for visuals: able to visually show trends and changes. Able to visually see the relationship between food and housing and CPI. 

The line chart can show changing trends and changes over time including major shifts in the data. The relationship between the variables can be visually displayed. We can compare data sets seeing monthly changes: one line for CPI, one line for food, and one line for housing on the same line graph. The tradeoff would be more difficult to see individual data points. 
We would display on monthly basis the 10 year period showing CPI, food, and housing. 

Project approach: We plan to make 2 line graphs. Data definitions provides framework and structure for data points that we would plot on the x and y coordinates. The x-axis will be time covering a monthly 10-year period from Jan 2013 to Dec 2023 (120 data points). Our y-axis will be changes in CPI, and two components that make up the CPI: shelter and food totaling three lines on line graph. We will also be making a line graph with the x-axis being time covering a monthly 10-year period from Jan 2013 to Dec 2023 (120 data points). Our y-axis will be plotting CPI data, and two components that make up the CPI: shelter and food data totaling three lines on line graph

We plan to show how CPI, shelter, and food change over a 10-year period. We will standardize our data using lessons from data science showing dates monthly internal from Jan through Dec covering the years 2013 to 2023 with 120 columns running vertically and food, shelter, and other 6 components of CPI rows running horizontally. We will filter for shelter and food columns independently. To find the monthly change in CPI or one of its 8 components, we will subtract the difference between monthly CPI divided by the prior month's CPI. (Feb 2013 CPI - Jan 2013 CPI)/Jan 2013 CPI.

Data definition would be a compound for CPI linking its 8 components. The helper functions would parse data, filter data based on food and housing, and aggregate the data to plot on graph after calculating average. The graphing functions would provide the line graph with data points and lines that connect each related data point. 

How main program might work: Opens the CPI file, reads the file, extracts the relevant components (CPI, food and housing), organize it into CPIHistorical and plot a line chart. 

The bar chart is best used for comparing magnitudes of different categories. The bar graph is less relevant when all the data is related to CPI and comparing related categories. It is a strong clear visual that can be shown with different colors to emphasize ideas visually. One can show a greater number of different categories without feeling cluttered and overstimulated by too much info. The proportional bar graph can visually show all the components of CPI each with different colors. It is difficult to show individual data points or trends over time. It is more difficult to show whether there may be relationships between variables compared to the line graph. 
Compare the average CPI for food and housing components of CPI over 10 year period using two bar charts. Or annual proportional bar charts on an annual basis using 10 bar charts. 

Project approach: We plan to make 10 proportional bar graph showing food and housing as components of CPI in different colors while the other 6 categories are greyed out on annual basis covering 10 year period. X coordinate would be time and Y coordinate would be changes in CPI for food and housing. Changes in CPI calculated on monthly basis. Then take average over 12 months to plot. Data definition would be a list of tuples. 

cpi_data = [
    ("Jan", {"Food":131.60, "Housing": 27.80, "CPI":121.35}),
    ("Feb", {"Food":32.922, "Housing":27.912, "CPI"122.718}) 
    ("Mar", {"Food"32.4124, "Housing"28.0115, "CPI":2.912]1})}),the 

How main program might work: Opens the CPI file, reads the file, extracts the relevant components (CPI, food and housing), organize it into CPIHistorical, organize into the right data structure and plot a bar chart. 

A scatterplot is best used to show the relationship between two variables. It is best for finding outlier data and identifying patterns from data. We have three variables and this is not ideal for a variable scatterplot and plotting one would lead to a scatterplot not being effective in representing what we want it to show as it leads to overcrowded data points. 
Allow us to visualize how the food and housing components of CPI are related. Plot food and housing CPI values to identify trends and any outliers. The outliers and patterns can be looked at more closely to identity inflation t

Data definition compound CPIHistorical is created like line graph. X axis show month and Y axis show changes in CPI for CPI, or its components food and shelter. ends. 

How main program might work: Opens the CPI file, reads the file, extracts the relevant components (CPI, food and housing), organize it into CPIHistorical and plot a scatterplot showing relationship between CPI and food and housing. 

</font>

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

You must include a **hand-drawn image** that shows what your chart or plot will look like. You can insert an image using Edit -> Insert Image.

![Screenshot 2024-03-18 221834.png](attachment:b098bdfb-2891-438a-b24b-f18c95463222.png)

![Screenshot 2024-03-18 221853.png](attachment:a62cfa6f-6315-49cf-bbf3-032db8a664b4.png)

### Step 2a: Building
#### Document which information you will represent in your data definitions

Before you design data definitions in the code cell below, you must explicitly document here which information in the file you chose to represent and why that information is crucial to the chart or graph that you'll produce when you complete step 2c.

<font color="blue">
    
In recent years, especially after the passing of the pandemic's height, inflation in food and housing has risen dramatically outpacing wage growth. Before the pandemic, home price increases were limited to major Canadian cities such as Toronto, or Vancouver where many immigrants and younger generations gravitate to better opportunities. During the pandemic, the supply chain became constrained causing delays and supply shortages with port prices increasing as much as 10x the normal price. This increases the cost of goods increasing construction material costs making building homes more expensive. With a rising population and limited goods, greater demand chased fewer goods. Human necessity; outside the basket of goods the average Canadian buys are food, shelter, and clothing. Out of the 3 options, food and shelter are needed for basic survival. 

We saw increases in both home prices and home rental costs in major cities and across Canada from Halifax Nova Scotia to small towns. There are few places one can go to escape the effects of inflation. Moreover, extreme weather from climate change from droughts to floods to cold snaps after the budding of crops has devastated crops combined with higher transportation costs from higher oil costs (partly related to wars happening globally) have caused grocery chains to pass on the extra cost to consumers. 

We wish to compare the changes in monthly CPI that the average Canadian purchases as a basket of goods to changes in Food and Housing, the latter of which more greatly impacts price-sensitive consumers, those with little disposable income, and the vulnerable population. We wish to compare over 10 years shown on the x-axis monthly (132 data points) and overall CPI, food, and housing on the y-axis. Moreover, Food and Housing are considered to be volatile goods subject to spikes which are often taken out of core CPI calculations. Time is represented as monthly shown as string Jan 2013, Feb 2013 to Dec 2023 or 01-2013 to 01-2023 numeric value. 

Hypothesis: We predict due to the above factors that we would see greater fluctuations to changes in Food and Housing CPI compared to overall CPI. These changes could be periods with deep declines compared to overall CPI but also rapid rises in price showing up as inflation which we have experienced after the pandemic. 

</font>

#### Design the data definitions

In [1]:
from cs103 import *
from typing import NamedTuple, List

In [15]:
##################
# Data Definitions

CanadianCpi = NamedTuple('CanadianCpi',[('product_group',str),
                                       ('month',int), # in range[1,12]
                                       ('year',int), # in range[2013,2023]
                                       ('cpi',float)])

# interp. The Canadian CPI data recorded by Stat Canada, with product groups, month, year, and CPI in each observation

CC1 = CanadianCpi("All-items", 1, 2013, 121.3)
CC2 = CanadianCpi("Food", 5, 2016, 143.3)

@typecheck
def fn_for_canadian_cpi(cc:CanadianCpi)->...: #template based on Compound
    return (cc.product_group,
            cc.month,
            cc.year,
            cc.cpi)

#List[CanadianCpi]
# interp. a list of CanadianCpi

LOCC0 = []
LOCC1 = [CC1]
LOCC2 = [CC1, CC2]

@typecheck
def fn_for_locc(locc: List[CanadianCpi]) -> ...: # template based on Arbitrary-sized and reference rule
    # description of acc
    acc = ... # type: ...
    for cc in locc:
        acc = ... (acc, fn_for_canadian_cpi(cc))

    return ...(acc)

MonthlyChange = NamedTuple('MonthlyChange', [('product_group', str),
                                         ('month', int), # in range [1,12]
                                         ('year', int), # in range [2013,2023]
                                        ('cpi_change',float)])
# interp. The MonthlyChange shows the change in CPI monthly, grouped by product group, year, and month

MC1 = MonthlyChange("All-items", 2, 2013,1.4)
MC2 = MonthlyChange("Shelter", 2, 2013,0.1)

@typecheck
def fn_for_monthly_change(mc:MonthlyChange)->...: #template based on Compound
    return (mc.product_group,
            mc.month,
            mc.year,
            mc.cpi_change)

#List[CanadianCpi]
# interp. a list of CanadianCpi

LOMC0 = []
LOMC1 = [MC1, MC2]
LOMC2 = [MC1, MC2]

@typecheck
def fn_for_lomc(locc: List[MonthlyChange]) -> ...: # template based on Arbitrary-sized and reference rule
    # description of acc
    acc = ... # type: ...
    for cc in locc:
        acc = ... (acc, fn_for_monthly_change(mc))

    return ...(acc)


### Step 2b: Building
#### Design a function to read the information and store it as data in your program

Complete this step in the code cell below. Your `read` function should remove any row with invalid or missing data but otherwise keep all the rows. I.e., you should **not** design the `read` function such that it only returns the rows you need for step 2c.

You can choose to continue to build on this file when completing the final submission for the project (as opposed to copying your code over to the `project_final_submission_template.ipynb` file). However, if this is the approach you are taking, please go to the `project_final_submission_template.ipynb` file and read through the "Step 2b and 2c: Building" section. This section contains crucial information about common issues students encounter. We expect that you will be familiar with this information.

In [17]:
###########
# Functions
import csv

@typecheck
def main(filename:str)->List[MonthlyChange]:
    """
    Given a filename, then:
    1) read and store the data as a list of CanadianCpi, 
    2) compute the monthly change and return a list of MonthlyChange,
       which consists of the change of cpi each month and its corresponding month, year, and product group
    """
    # template from HtDAP, and function composition
    return analyze(read(filename))

@typecheck
def read(filename: str) -> List[CanadianCpi]:
    """    
    reads information from the specified file and returns a list of CanadianCpi
    """
    #return []  #stub
    # Template from HtDAP
    # loc contains the result so far
    locc = [] # type: List[CanadianCpi]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        # next(reader) # skip header line

        for row in reader:
            month = split_month(row[1])
            year = split_year(row[1])
            cc = CanadianCpi(row[0], row[2], month, year)
            loc.append(cc)
    
    return locc

@typecheck
def analyze(locc: List[CanadianCpi], product_group: str)->List[MonthlyChange]:
    """
    Given a list of CanadianCpi, then:
    1) group the data by their product groups
    2) compute the change per month based on different product groups
    3) return a list of MonthlyChange
    """
    return [] # stub
    # template based on composition
    

@typecheck
def split_month(time:str)->int:
    """
    Given a time in the format "MMM-YY", split out month and return it as an int
    """
    # return 0 # stub
    # template based on Composition
    string_split = time.split("-")
    month = string_split[0]
    return month_to_int(month)

@typecheck
def month_to_int(month:str)->int:
    """
    Given one month in str, return its corresponding int number 
    """
    # return 0 # stub
    # template from  mapping dictionary
    
    month_dict = {
        'Jan': 1,
        'Feb': 2,
        'Mar': 3,
        'Apr': 4,
        'May': 5,
        'Jun': 6,
        'Jul': 7,
        'Aug': 8,
        'Sep': 9,
        'Oct': 10,
        'Nov': 11,
        'Dec': 12
    }
    return month_dict.get(month)

@typecheck
def split_year(time:str)->int:
    """
    Given a time in the format "MMM-YY", split out the year and return it as an int
    """
    # return 0 # stub
    # template from Atomic Non-Distinct
    string_split = time.split("-")
    return int('20' + string_split[1])

@typecheck
def group_by(locc: List[CanadianCpi], product_group: str)->List[CanadianCpi]:
    """
    Given a List of CanadianCpi and a product group, return a List of CanadianCpi that has the group
    """
    # return [] # stub
    # template from List[CanadianCpi] with an additional parameter, product_group

    # the CanadianCpi that matched the given product_group
    acc = [] # type: List[CanadianCpi]

    for cc in locc:
        if is_same_group(cc, product_group):
            acc.append(cc)

    return acc

@typecheck
def is_same_group(cc: CanadianCpi, product_group: str)->bool:
    """
     Given a CanadianCpi and a product group, return True if they matched, false otherwise
    """
    # return True # stub
    #  template from CanadianCpi with an additional parameter, product_group
    
    return cc.product_group == product_group 

# Begin testing
start_testing()

# Examples and tests for read
expect(..., ...)

# Examples and tests for split_month
expect(split_month("Jan-13"), 1)
expect(split_month("Feb-23"), 2)

# Examples and tests for month_to_int
expect(month_to_int('Jan'), 1)
expect(month_to_int('Feb'), 2)
expect(month_to_int('Mar'), 3)
expect(month_to_int('Apr'), 4)
expect(month_to_int('May'), 5)
expect(month_to_int('Jun'), 6)
expect(month_to_int('Jul'), 7)
expect(month_to_int('Aug'), 8)
expect(month_to_int('Sep'), 9)
expect(month_to_int('Oct'), 10)
expect(month_to_int('Nov'), 11)
expect(month_to_int('Dec'), 12)

# Examples and tests for split_year
expect(split_year("Jan-13"), 2013)
expect(split_year("Feb-23"), 2023)

# Examples and tests for group_by
expect(group_by(LOCC0, 'All-items'),[])
expect(group_by(LOCC1, 'All-items'),[CC1])
expect(group_by(LOCC2, 'All-items'),[CC1])

# Examples and tests for is_same_group
expect(is_same_group(CC1, 'All-items'), True)
expect(is_same_group(CC2, 'All-items'), False)

# show testing summary
summary()

[92m22 of 22 tests passed[0m


In [None]:
# Be sure to select ALL THE FILES YOU NEED (including csv's) 
# when you submit. Also, UNLIKE USUAL, YOU CAN EDIT THIS CELL!
# That's in case you want to switch the ASSIGNMENT code for the final
# submission. Run this cell to start the submission process.
from cs103 import submit

COURSE = 130118

# Uncomment the ASSIGNMENT row for your section
# ASSIGNMENT = 1740563 #if you are in section 201
# ASSIGNMENT = 1788043 #if you are in section 202

# UNCOMMENT for final submission and RE-COMMENT line above
# ASSIGNMENT = 1740562 #if you are in section 201
# ASSIGNMENT = 1788053 #if you are in section 202

submit(COURSE, ASSIGNMENT)

# If your submission fails, SUBMIT by downloading your files and uploading them to 
# Canvas. You can learn how on the page "How to submit your Jupyter notebook" on 
# our Canvas site.

# Please double check your submission on Canvas to ensure that the right files (Jupyter file + CSVs) have been submitted and that the files do not contain unexpected errors.

<font color="red">**You should always check your submission on Canvas. It is your responsibility to ensure that the correct file has been submitted for grading.**</font> Regrade or accomodation requests using reasoning such as "I didn't realize I submitted the wrong file"/"I didn't realize the submission didn't work"/"I didn't realize I didn't save before submitting so some of my work is missing" will not be considered.