# CPSC 330 final exam

The University of British Columbia

Instructor: Mike Gelbart

April 24, 2020

## Instructions

#### What is / is not allowed.

- This exam is open book. You are welcome to consult the course materials, online resources, etc.
- You are allowed to copy/adapt/reuse code from the course materials (lectures, my homework solutions, your homework solutions) **with attribution**.
  - For each of the coding questions, there is a section underneath that asks you to list the resources you borrowed code from. 
  - Using code from the course materials without attribution may be considered academic misconduct.
- You are **not** allowed to copy/adapt/reuse code from anywhere other than course materials.
  - Using code from anywhere other than the course materials will be considered academic misconduct.
- You are **not** allowed to copy text, visualizations, or anything other than code from anywhere.
- You are **not** allowed to communicate with **anyone else, in any way** during the exam. 
  - This includes talking in-person, phone, text, chat apps, screen sharing, email, sharing your notebook, or any form of communication. 
  - This restriction applies to any other person, regardless of whether or not they are enrolled in CPSC 330.

#### Submission instructions.

- You will receive and submit your exam notebook the same way you submit your homework, through github.students.cs.ubc.ca. 
- As with the homework assignments, **you must ensure that all your code outputs (scores, tables, figures, etc.) are displayed in the notebook**. For example, if you are required to calculate some value, it is not sufficient to just store the value to a variable, nor is it sufficient to have a `print(value)` in your code - the print code must actually be run and the notebook saved, so that the output is shown on the screen when the notebook is rendered. This allows us see your results without running your code. 
  - When you are done, take a look at your rendered notebook in a web browser at github.students.cs.ubc.ca, to make sure all the output is displayed properly.
- **It is essential that you commit and push your work to GitHub frequently.** If you have a connection problem at the end of the exam and you miss the deadline, we will grade your latest work that was successfully pushed. Thus, if you only try to push once at the end and something goes wrong, you will not have a submission and will receive zero. You have been warned.
- You will gain read and write access to your repository at 12:00pm. You will lose write access to your repository at 2:30pm.
- Answer the questions directly in this notebook, in the same way that you would for an assignment.

#### System requirements.

- You will need a computer with Python 3, Jupyter, and the main Python packages we have used in the course, such as pandas, scikit-learn, matplotlib, etc. 
- You will not need any of the "extra" packages in the course, such as graphviz, pandas_profiling, tensorflow, gensim, xgboost, lightgbm, catboost, lifelines, shap, etc.
- If you are using the same system as you used for the homework assignments, you should be fine.
- If you are using a new or different system than the one you used during the course, please make sure you can run all the homework solutions before the exam starts. 
- I have tried to create the exam such that you don't need to do any heavy-duty computations. 
  - If something is running too slowly on your machine, try something else and just add a quick note explaining that the code was too slow.

#### Questions and announcements.

- If I need to make any announcements or clarifications during the exam, I will post them as followup discussions on [this Piazza thread](https://piazza.com/class/k1gx4b3djbv3ph?cid=388).
- You are responsible for monitoring Piazza for any announcements or clarifications.
- If you have questions during the exam, send me a **private** post on Piazza. 
  - I have enabled private posts.
  - I will check Piazza regularly during the exam.
  - I will respond through the same private message thread on Piazza.
  - I will answer questions in the order they are received. 
  
#### Contingency plans.

- In the unlikely event that Piazza goes down during the exam, I will post announcements at the top of the course website README [here](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home). If you have a private question, email me at mgelbart@cs.ubc.ca.
- In the unlikely event that github.students.cs.ubc.ca goes down during the start of the exam, I will distribute the exam by posting it on Piazza.
- In the unlikely event that github.students.cs.ubc.ca goes down at the end of the exam, email your completed exam to mgelbart@cs.ubc.ca before the end time.
  - **Please do not** email me the exam if github.students.cs.ubc.ca is working. 

## Integrity Pledge

This is an online exam without invigilation. I, and your fellow classmates, are trusting you to approach this exam honourably and abide by the rules. The two main problems with cheating are (1) you might get caught and (2) you are permanently changing your path through your life in a way that you may later regret. 

IMHO it is easier to recover from a low grade than it is to recover from being a person who conducted themselves dishonestly. In case you disagree with me on that, hopefully problem (1) will deter you from cheating. 

We will be using the integrity pledge wording set out by the Faculty of Science:

> I hereby pledge that I have read and will abide by the rules, regulations, and expectations set out in the Academic Calendar, with particular attention paid to:
> 1. [The Student Declaration](http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,285,0,0)
> 2. [The Academic Honesty and Standards](http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,286,0,0)
> 3. [The Student Conduct During Examinations](http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,41,90,0)
> 4. And any special rules for conduct as set out by the examiner.

As far as "special rules" are concerned, please refer to the "What is / is not allowed." section in the Instructions above. 

The following wording is also from the Faculty of Science:

> I affirm that I will not give or receive any unauthorized help on this examination, that all work will be my own, and that I will abide by any special rules for conduct set out by the examiner.

<font color='red'>**In the markdown cell below, you are required to re-type, or copy/paste, the sentence above, and then "sign" your name (i.e. type your full name underneath it).**</font> Please do it now so you don't forget.

_copy the sentence starting with "I affirm" here_

_put your name here_

## Table of Contents

- Q1 (5 points)
- Q2 (5 points)
- Q3 (15 points) 
- Q4 (20 points)
- Q5 (10 points)
- Q6 (20 points)
- Q7 (25 points)

Total: 100 points.

## Imports

In [1]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder, FunctionTransformer
from sklearn.feature_extraction.text import CountVectorizer

## Canadian Cheese Directory

In this exam, we will be looking at the [Canadian Cheese Directory dataset](https://open.canada.ca/data/en/dataset/3c16cd48-3ac3-453f-8260-6f745181c83b) from Agriculture and Agri-Food Canada. Because this data is distributed under the [Canadian Open Government License](https://open.canada.ca/en/open-government-licence-canada), I was able to include the data in your final exam repositories, so you should not need to download the dataset. The following code should run:

In [3]:
df = pd.read_csv("canadianCheeseDirectory.csv", index_col=0)

We will be predicting `FatContentPercent`, which is not available for all the cheeses, so I will first filter out those where this is not available:

In [4]:
df = df.dropna(subset=['FatContentPercent'])

Let's take a look at the column names:

In [5]:
df.columns

Index(['CheeseNameEn', 'CheeseNameFr', 'ManufacturerNameEn',
       'ManufacturerNameFr', 'ManufacturerProvCode', 'ManufacturingTypeEn',
       'ManufacturingTypeFr', 'WebSiteEn', 'WebSiteFr', 'FatContentPercent',
       'MoisturePercent', 'ParticularitiesEn', 'ParticularitiesFr',
       'FlavourEn', 'FlavourFr', 'CharacteristicsEn', 'CharacteristicsFr',
       'RipeningEn', 'RipeningFr', 'Organic', 'CategoryTypeEn',
       'CategoryTypeFr', 'MilkTypeEn', 'MilkTypeFr', 'MilkTreatmentTypeEn',
       'MilkTreatmentTypeFr', 'RindTypeEn', 'RindTypeFr', 'LastUpdateDate'],
      dtype='object')

The columns are duplicated in English and French (e.g. `MilkTypeEn` vs. `MilkTypeFr`). In most cases, this is just duplicated information and we can drop the French columns. However, in two cases we need to first merge the English and French columns because the information may be stored in either column:

In [6]:
df["ManufacturerName"] = df["ManufacturerNameEn"].fillna(df["ManufacturerNameFr"])
df = df.drop(columns=["ManufacturerNameEn", "ManufacturerNameFr"])

In [7]:
df["CheeseName"] = df["CheeseNameEn"].fillna(df["CheeseNameFr"])
df = df.drop(columns=["CheeseNameEn", "CheeseNameFr"])

Now we're ready to drop all the French columns:

In [8]:
df = df.drop(columns=[col for col in df.columns if col.endswith("Fr")])
df.columns

Index(['ManufacturerProvCode', 'ManufacturingTypeEn', 'WebSiteEn',
       'FatContentPercent', 'MoisturePercent', 'ParticularitiesEn',
       'FlavourEn', 'CharacteristicsEn', 'RipeningEn', 'Organic',
       'CategoryTypeEn', 'MilkTypeEn', 'MilkTreatmentTypeEn', 'RindTypeEn',
       'LastUpdateDate', 'ManufacturerName', 'CheeseName'],
      dtype='object')

Next we'll do the train/test split:

In [9]:
df_train, df_test = train_test_split(df, random_state=123)

I will start with a bit of exploration:

In [10]:
df_train.head()

Unnamed: 0_level_0,ManufacturerProvCode,ManufacturingTypeEn,WebSiteEn,FatContentPercent,MoisturePercent,ParticularitiesEn,FlavourEn,CharacteristicsEn,RipeningEn,Organic,CategoryTypeEn,MilkTypeEn,MilkTreatmentTypeEn,RindTypeEn,LastUpdateDate,ManufacturerName,CheeseName
CheeseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1432,QC,Industrial,http://www.damafro.ca/en/home.html,22.0,58.0,Organic,Mild and acidulous,Creamy cheese,Unripened,1,Fresh Cheese,Goat,Pasteurized,No Rind,2016-02-03,Damafro,Chèvre des Alpes BIO
2281,QC,Artisan,,35.0,33.0,,Candied fruit flavor with hints of caramel.,,6 months,1,Firm Cheese,Cow,Raw Milk,,2016-02-03,Fromagerie Au Gré des Champs,Frère Chasseur (Le)
1908,QC,Artisan,,22.0,69.0,,,,Unripened,0,Fresh Cheese,Goat,Pasteurized,No Rind,2016-02-03,Fromagerie Couland,Mon précieux
2224,QC,Artisan,,33.0,33.0,,,,,0,Firm Cheese,Ewe,,Washed Rind,2016-02-03,Maison d'affinage Maurice Dufour (La),Tomme de Brebis de Charlevoix
2007,QC,Farmstead,,30.0,42.0,,Hazelnut flavour,,,0,Firm Cheese,Cow,Pasteurized,No Rind,2016-02-03,Fromagerie Ferme du littoral,Cheddar Littoral


In [11]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 781 entries, 1432 to 2391
Data columns (total 17 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ManufacturerProvCode  781 non-null    object 
 1   ManufacturingTypeEn   781 non-null    object 
 2   WebSiteEn             441 non-null    object 
 3   FatContentPercent     781 non-null    float64
 4   MoisturePercent       770 non-null    float64
 5   ParticularitiesEn     337 non-null    object 
 6   FlavourEn             599 non-null    object 
 7   CharacteristicsEn     491 non-null    object 
 8   RipeningEn            503 non-null    object 
 9   Organic               781 non-null    int64  
 10  CategoryTypeEn        762 non-null    object 
 11  MilkTypeEn            780 non-null    object 
 12  MilkTreatmentTypeEn   732 non-null    object 
 13  RindTypeEn            543 non-null    object 
 14  LastUpdateDate        781 non-null    object 
 15  ManufacturerName   

Next, I'll split up the features into the various types:

In [13]:
numeric_features = ['MoisturePercent']
categorical_features = ['ManufacturerProvCode', 'ManufacturingTypeEn', 'Organic', 'CategoryTypeEn', 'MilkTypeEn', 'MilkTreatmentTypeEn', 'RindTypeEn']
text_features = ['CheeseName', 'FlavourEn', 'CharacteristicsEn']
drop_features = ['WebSiteEn', 'ParticularitiesEn', 'RipeningEn', 'LastUpdateDate', 'ManufacturerName']
target_column = 'FatContentPercent'

In [14]:
assert set(numeric_features + categorical_features + text_features + drop_features + [target_column]) == set(df_train.columns)

### Q1: cheese ripening
rubric={points:5}

I decided to drop the feature `RipeningEn` because it was a hassle to deal with. Here are the unique values of this feature:

In [15]:
df_train['RipeningEn'].unique()

array(['Unripened', '6 months', nan, '3 Months', 'Less than 1 Month',
       '2 Months', '9 Months', '2 months', '4 Months',
       'More than 5 Years', '2 to 5 year', '6 Months', '2 Years', 'None',
       '10 day minimum', '10 days minimum', '1 Month', 'Minimum 10 days',
       '4 Years', '3 months', '2 weeks', '30 days', '3 to 6 months',
       '1 to 5 year', '1 Year', 'unripened', '18 Months', '6 or 7 weeks',
       '90 days', '4 months', '5 Months', '15 Months', '10 days',
       '3 Years', '3 years', '30 days in brine', '12 months', '2 days',
       '1 month', '10 Months', 'Unriped', '5 days', '45 days', '3 weeks',
       '3 to 5 months', '60 days', '5 Years', '1 year', '80 days',
       '2-3 months', '5 months', '9 months', 'Not required to ripen.'],
      dtype=object)

Describe how you would preprocess this feature into something useful. What type of feature (numeric, categorical, etc) would you end up with? Are there special cases you would need to handle? **Max 3 sentences**.

----------------

### Q2: target values
rubric={points:5}

Make an argument for or against log-transforming the target values in this problem. A good answer will reference this particular problem we're working on, rather than being generally applicable to any problem. **Max 3 sentences.**

Note: regardless of your argument, please do **not** transform your targets in the code below, as this would make your exam harder to grade.

--------

Next we'll preprocess the features. This should look fairly familiar to you, except for the preprocessing of text features. In [hw5](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/hw-solutions/hw5/hw5.ipynb) we also mixed text features with other features, but in that case we did not actually put the `CountVectorizer` directly into the `ColumnTransformer`, which I am doing here. Furthermore, there are 3 text columns and I am creating a separate `CountVectorizer` for each one. You do not need to understand every detail, but just understand generally what it does.

In [17]:
y_train = df_train[target_column]
y_test = df_test[target_column]

In [18]:
numeric_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='constant')),
    ('onehot', OneHotEncoder(sparse=False, handle_unknown='ignore'))
])

# Fit a separate CountVectorizer for each of the text columns.
# Need to convert the resulting sparse matrices to dense separately.
text_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='constant', fill_value='')),
    ('tolist', FunctionTransformer(lambda x: x.ravel(), validate=False)),
    ('countvec', CountVectorizer(max_features=10, stop_words='english')),
    ('todense', FunctionTransformer(lambda x: x.toarray(), validate=False))
])

preprocessor = ColumnTransformer([
    ('numeric', numeric_transformer, numeric_features),
    ('categorical', categorical_transformer, categorical_features)
] + [(f, text_transformer, [f]) for f in text_features])

In [19]:
preprocessor.fit(df_train);

In [20]:
def get_column_names(preprocessor):
    """
    Gets the feature names from a preprocessor.
    This entails looking at the OHE feature names and also
    the words used by the CountVectorizers.
    
    Arguments
    ---------
    preprocessor: ColumnTransformer
        A fit preprocessor following the specific format above.
    
    Returns
    -------
    list
        A list of column names.
    """
    ohe_feature_names = list(preprocessor.named_transformers_['categorical'].named_steps['onehot'].get_feature_names(categorical_features))
    text_feature_names = [f + "_" + word for f in text_features for word in preprocessor.named_transformers_[f].named_steps['countvec'].get_feature_names()]
    return numeric_features + ohe_feature_names + text_feature_names

In [21]:
new_columns = get_column_names(preprocessor)
    
df_train_enc = pd.DataFrame(preprocessor.transform(df_train), index=df_train.index, columns=new_columns)
df_train_enc.head()

Unnamed: 0_level_0,MoisturePercent,ManufacturerProvCode_AB,ManufacturerProvCode_BC,ManufacturerProvCode_MB,ManufacturerProvCode_NB,ManufacturerProvCode_NL,ManufacturerProvCode_NS,ManufacturerProvCode_ON,ManufacturerProvCode_PE,ManufacturerProvCode_QC,...,CharacteristicsEn_cheese,CharacteristicsEn_colored,CharacteristicsEn_creamy,CharacteristicsEn_interior,CharacteristicsEn_pressed,CharacteristicsEn_rind,CharacteristicsEn_ripened,CharacteristicsEn_smooth,CharacteristicsEn_texture,CharacteristicsEn_white
CheeseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1432,1.127656,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2281,-1.459684,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1908,2.266086,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2224,-1.459684,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2007,-0.528242,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


----------------

### Q3: initial models
rubric={points:15}

Let's compare three approaches:

1. A baseline model: choose either `DummyClassifier` or `DummyRegressor`, whichever is appropriate for this problem. 
2. A linear model: choose either `Ridge` or `LogisticRegression`, whichever is appropriate for this problem.
3. A random forest model: choose either `RandomForestClassifier` or `RandomForestRegressor`, whichever is appropriate for this problem.

For now, just use default hyperparameters.

Report the train and cross-validation score in each case. Which model performs best with default hyperparameters?

Don't violate the Golden Rule!

-------------

If you used code from the course materials (lecture, homework) in the above question, please list what resources you used (e.g. "Lecture 5", "hw5"). You do not need to specify exactly which lines of code you used, just the resources you took code from.

- Resource 1
- Resource 2
- etc.

-----------

### Q4: hyperparameter tuning
rubric={points:20}

- Using an automated hyperparameter tuning method of your choice, make a reasonable attempt at tuning the hyperperameters for your linear model and random forest model. An excellent solution will also involve tuning the hyperparameters of the preprocessing steps. 
- Briefly justify your choices of which hyperparameters your tuned and what sorts of values you tried. **Max 3 sentences.**
- Briefly discuss your scores after tuning. **Max 3 sentences.**

Note: your time is limited, so there is no need to perform large searches that take a long time to run. My code for this question takes about 1.5 minutes to run on my laptop.

----------

If you used code from the course materials (lecture, homework) in the above question, please list what resources you used (e.g. "Lecture 5", "hw5"). You do not need to specify exactly which lines of code you used, just the resources you took code from.

- Resource 1
- Resource 2
- etc.

-----------

### Q5: confidence and test set
rubric={points:10}

- For your best model from Q4, how confident are you in the score you reported? Base your answer on the sub-scores from the different folds of cross-validation. **Max 3 sentences.**
- When you are done, compute your score on the test set. Is it what you expected? Briefly discuss. **Max 3 sentences.**

----------

If you used code from the course materials (lecture, homework) in the above question, please list what resources you used (e.g. "Lecture 5", "hw5"). You do not need to specify exactly which lines of code you used, just the resources you took code from.

- Resource 1
- Resource 2
- etc.

-----------

### Q6: feature importances
rubric={points:20}

- What are your 5 most important features according to your tuned linear model?
- What are your 5 most important features according to your tuned random forest model? 
- Do they agree with each other? Briefly discuss. **Max 3 sentences.**
- Also, briefly discuss one other aspect of the feature importances that you find interesting. **Max 3 sentences.**

Note: for the 5 most important features, it is sufficient to display these as code output rather than typing them as text, so long as they are displayed very clearly (i.e. only display those 5, don't leave them as part of a big list). 

Hint: assuming you've tuned your preprocessor, you'll want to use the `get_columns_names` function provided above because the column names may have changed during hyperparameter tuning.

----------------

If you used code from the course materials (lecture, homework) in the above question, please list what resources you used (e.g. "Lecture 5", "hw5"). You do not need to specify exactly which lines of code you used, just the resources you took code from.

- Resource 1
- Resource 2
- etc.

-----------

### Q7: short answer questions
rubric={points:25}

The following questions are worth 5 points each. These questions refer to specific lectures or homework assignments from the second half of the course. **Max 3 sentences each.**

7(a): [**Lecture 15**](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/lectures/15_nearest-neighbours.ipynb)

Instead of trying to predict the fat content of a cheese, let's say you wanted to solve a different problem: given a query cheese, find similar cheeses in the dataset. How would you approach this problem? Would any of the code above (in this notebook) be useful for this task?

7(b): [**Lecture 16**](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/lectures/16_time-series-data.ipynb)

In [hw7](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/hw-solutions/hw7/hw7.ipynb) question 1(e) we used the current week's average avocado price as a baseline prediction for next week's avocado price. Under what circumstances would this approach yield particularly good or bad predictions of next week's avocado price?

7(c): [**Lecture 17**](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/lectures/17_survival-analysis.ipynb)

In Lecture 17 we looked at a customer churn dataset with a binary target column (yes/no) for whether a customer churned, and a `tenure` column for the length of time. In Lecture 16 we looked at the rain in Australia dataset which has a binary target column (yes/no) for whether it would rain tomorrow and a `Date` column for the time stamp. Both of these are binary classification problems, and both involve changes over time. Why did we have to worry about censoring for the churn dataset but not the rain dataset?

7(d): [**Lecture 19**](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/lectures/19_outliers.ipynb)

What is the key difference between regression and classification when it comes to outliers? 

7(e): [**Lecture 21**](https://github.students.cs.ubc.ca/cpsc330-2019w-t2/home/blob/master/lectures/21_communication.ipynb)

Consider the following summary I wrote of our cheese analysis, with the target audience of a CPSC 330 student:

> In this exam I worked on the [Canadian Cheese Directory dataset](https://open.canada.ca/data/en/dataset/3c16cd48-3ac3-453f-8260-6f745181c83b). I was trying to predict the fat content of a cheese based on numeric features like moisture content, categorical features like the milk type, and text features like the cheese name. I achieved a score of 0.5, which is a lot better than my baseline. And this was all using only 5 folds for cross-validation - imagine how good my model would be with 10 or even 20 folds! I also learned that soft cheeses contain a lot more fat than hard cheeses.

Critique this summary. What do you like about it, and what could be improved? Be specific.

----------------

### Final checks

- [ ] Did you check [Piazza](https://piazza.com/class/k1gx4b3djbv3ph?cid=388) for any announcements or clarifications about the exam? 
- [ ] Did you complete the integrity pledge near the top of this notebook?
- [ ] Did you answer all the questions fully? (Some ask for both code and explanations.)
- [ ] Did you make note of any course materials you reused code from, after each coding question?
- [ ] Did you keep your answers within the posted length limits (usually 3 sentences)?
- [ ] Did you run your notebook from beginning to end ("Restart Kernel and Run All Cells") to make sure that your entire notebook runs properly?
- [ ] Did you push to github.students.cs.ubc.ca and then view your rendered exam in a web browser?
- [ ] Did you make sure all your code output is saved/displayed in the notebook?
- [x] Did you read this list of final checks?