# 2022 Innovation Camp Coding Challenge

## Table of Contents
- Introduction
- Instructions
    1. Load the data
    2. Clean the data
    3. Create a plot
    4. Do some analysis
    5. Create a new plot
    6. Prepare a report
- Further Directions

## Introduction

Rmarkdown files consist of blocks or chunks of code written in R and text written in markdown. You can run the code chunk by chunk or by knitting the entire document at once.

In the following chunk we load packages we will need and set preferences for knitting the document. Anything behind a "#" symbol is "commented code" and will be ignored by the compiler.


In [None]:
# Load libraries
library(cansim) # read in CODR/NDM tables
library(dplyr) # allows for piping (%>%)
library(tidyr) # allows for shaping the data
library(plotly) # for creating plots

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

## Instructions

### 1. Load the data
Today we will be working with [Stocks of specified dairy products](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=3210000101#tables). This data is stored in a CODR (Census something something??) table


In [None]:
# Load our data
df <- get_cansim("32-10-0001-01")

Let's get to know the data a bit better. Let's checkout the columns and the range of values.

In [None]:
# print a summary of our data
print(summary(df))

What are some things you notice about the data?

### 2. Clean the data
In short, "cleaning data" means to prepare it for analysis. Removing empty values, converting values to the same format or otherwise manipulating your data to improve the quality and uniformity are all examples of cleaning your data.

How might we need to clean this data?

In [None]:
# drop totally empty columns
df <- df %>% select(-("STATUS" | "SYMBOL" | starts_with("Classification")))

# there are two date type columns. Let's keep the more precise one.
df <- df %>% select(-"REF_DATE")


Now let's drop some columns we aren't really interested in. Using the above as an example, drop "SCALAR_FACTOR", "SCALAR_ID", and "DECIMALS". Note that there are many different equivalent ways to drop columns.

In [None]:
# your code here

### 3. Create a plot
We're now a bit more familiar with our data, but it can be difficult to parse from a table! Let's create a plot so we can get a better idea of what's going on. What are some things we might want to find out and what is the best way to visualize them?

Since we have date information, it makes sense to make a time series plot! To keep things simple, let's focus on creamery butter in Canada over time.


In [None]:
#create a bar plot
fig <- plot_ly(
  data = df %>% filter(GEO == "Canada" & Commodity == "Creamery butter"), # filtering for Canada and Creamery butter only
  x = ~Date, # the x-axis will be the date
  y = ~VALUE, # the y-qaxis will be the values of the commodity
  type = "bar" # the type of plot will be bar
  ) %>%
  layout(title="Creamery Butter Stock in Canada", xaxis = list(title = "Date"), yaxis = list(title = "Value")) # set the title for the graph and axes.

fig # call the fig to display it


Using the code above, make a new plot showing the stocks of another dairy product in another region over time. Be sure to update the title as appropriate.


In [None]:
# your code here

We might also be curious about the breakdown of type of dairy product stocks. Let's visualize this in a pie chart. To keep it simple, let's focus on Canada and the most recent data, so June 2022.


In [None]:
fig <- plot_ly(
  data = df %>% filter(GEO == "Canada" & Date == "2022-06-01"), # filter for Canada and June 2022 only
  labels = ~Commodity, # the labels of the pie chart will be the commodities
  values = ~VALUE, # the sizes of each slice are determined by the values of commodities
  type = "pie" # the plot will be a pie chart
  ) %>%
  layout(title="Canadian Dairy Product Stocks for 2022-06-01") # set the chart name

fig # call the fig to display

In [None]:
Make another pie chart for another time period.

In [None]:
# your code here

### 4. Do some analysis
Next we need to transform our data a bit further. Let's compute the most numerous commodity

### 5. Create a new plot

### 6. Prepare a report

Write a short summary of what you have learned. You may wish to include some plots!

## Further Directions

1. Optimization:
Data.table is a structure that is faster than the built-in data.frame structure. Can you rewrite the code to make use of this structure instead?

2. Collaboration:
You could put your code on Gitlab and have your colleagues provide feedback or even contribute to your code.

3. Interactivity:
RShiny is a tool to create interactive dashboard etc...