# Assignment 2: Object-Oriented Dataset Builder and Multi-Modal Preprocessing

    
This notebook contains a series of **open and guided exercises** that will progressively lead you through the design of an **object-oriented dataset builder** for a **real-world, multi-modal dataset**.

In this assignment, you will go beyond simple data wrangling and explore how to structure a **reproducible, maintainable preprocessing workflow** that can automatically handle multiple data types — **tabular**, **text**, and **image** — in a coherent and efficient way.

You will design a class-based architecture capable of:

* **Reading and organizing raw data** from different sources into consistent structures.
* **Preprocessing heterogeneous data** (cleaning, transforming, and encoding tabular features; extracting and vectorizing text data; and computing numerical descriptors from images).
* **Merging multiple modalities** into a single feature space ready for training and analysis.
* **Implementing caching and versioning mechanisms** to avoid redundant computation and ensure reproducibility.
* **Following object-oriented design principles**, such as encapsulation, modularity, inheritance, and reusability, to make your code clean, scalable, and extensible for future use.

Throughout the notebook, you will be encouraged to combine concepts from previous assignments, including **data cleaning, feature engineering, scaling, encoding, string processing, file handling, and efficient storage formats**, while focusing on **design clarity** and **code organization**.

By the end of this assignment, you will have implemented a mini data management class that demonstrates your ability to integrate **data preprocessing**, **object oriented programming**, and **machine learning readiness** within a single, well-structured pipeline.
<div class="alert alert-success">

Solutions must be **code-based** — hard-coded or manually computed results will not be accepted.
Write your answers in the designated cells, and do not modify or remove any provided test or instruction cells.
When finished, submit **this same notebook** back to Moodle in **`.ipynb` format**.

</div></div>
<div class="alert alert-danger"><b>Submission deadline:</b> Friday, November 21st, 23:55</div>



<div class="alert alert-info"><b>Exercise 0 — Load the Dataset</b>

Download the file <code>Cell_Phones_and_Accessories_5.json</code> from
<a href="https://www.kaggle.com/datasets/abdallahwagih/amazon-reviews" target="_blank">this Kaggle dataset</a>
and place it in the same folder as this notebook.

Then, load the file into a DataFrame named <code>df</code>.
Remember that the file is JSON Lines format (one JSON object per line).

<br><i>[0 points]</i>

In [6]:
import pandas as pd

df = pd.read_json("Cell_Phones_and_Accessories_5.json", lines=True)

df.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,A30TL5EWN6DFXT,120401325X,christina,"[0, 0]",They look good and stick good! I just don't li...,4,Looks Good,1400630400,"05 21, 2014"
1,ASY55RVNIL0UD,120401325X,emily l.,"[0, 0]",These stickers work like the review says they ...,5,Really great product.,1389657600,"01 14, 2014"
2,A2TMXE2AFO7ONB,120401325X,Erica,"[0, 0]",These are awesome and make my phone look so st...,5,LOVE LOVE LOVE,1403740800,"06 26, 2014"
3,AWJ0WZQYMYFQ4,120401325X,JM,"[4, 4]",Item arrived in great time and was in perfect ...,4,Cute!,1382313600,"10 21, 2013"
4,ATX7CZYFXI1KW,120401325X,patrice m rogoza,"[2, 3]","awesome! stays on, and looks great. can be use...",5,leopard home button sticker for iphone 4s,1359849600,"02 3, 2013"


<div class="alert alert-info"><b>Exercise 1 — Normalize and Clean Columns</b>

The dataset contains product reviews. The column <code>helpful</code> is a 2-length list:
<code>[helpful_yes, helpful_no]</code>.

Normalize it by creating three new integer columns:
* <code>helpful_yes</code>
* <code>helpful_no</code>
* <code>helpful_total</code> (must equal <code>helpful_yes + helpful_no</code>)
* drop <code>helpful</code>

Leave the rest of the columns unchanged.

<br><i>[0.5 points]</i>

</div> <div class="alert alert-warning">Recall that <code>asin</code> is the <i>Amazon Standard Identification Number</i>, a unique ID for each product.</div>

In [8]:
df["helpful_yes"] = df["helpful"].apply(lambda x: x[0])
df["helpful_no"] = df["helpful"].apply(lambda x: x[1])

df["helpful_total"] = df["helpful_yes"] + df["helpful_no"]

df = df.drop(columns=["helpful"])

In [11]:
# LEAVE BLANK
df.head()

Unnamed: 0,reviewerID,asin,reviewerName,reviewText,overall,summary,unixReviewTime,reviewTime,helpful_yes,helpful_no,helpful_total
0,A30TL5EWN6DFXT,120401325X,christina,They look good and stick good! I just don't li...,4,Looks Good,1400630400,"05 21, 2014",0,0,0
1,ASY55RVNIL0UD,120401325X,emily l.,These stickers work like the review says they ...,5,Really great product.,1389657600,"01 14, 2014",0,0,0
2,A2TMXE2AFO7ONB,120401325X,Erica,These are awesome and make my phone look so st...,5,LOVE LOVE LOVE,1403740800,"06 26, 2014",0,0,0
3,AWJ0WZQYMYFQ4,120401325X,JM,Item arrived in great time and was in perfect ...,4,Cute!,1382313600,"10 21, 2013",4,4,8
4,ATX7CZYFXI1KW,120401325X,patrice m rogoza,"awesome! stays on, and looks great. can be use...",5,leopard home button sticker for iphone 4s,1359849600,"02 3, 2013",2,3,5


In [None]:
# LEAVE BLANK

In [None]:
# LEAVE BLANK

<div class="alert alert-info"><b>Exercise 2 — Wide vs Long (pivot &amp; melt)</b>

In this exercise, you will practice reshaping data between wide and long formats.

1. Create a long-form table named <code>help_long</code> by melting the two columns
<code>helpful_yes</code> and <code>helpful_no</code> into a single column of values named <code>votes</code>.
The corresponding variable name should be stored in a new column called <code>helpful_type</code>.
Use a stable row identifier named <code>row_id</code> (derived from the original row index) as the <code>id_vars</code>.

2. From <code>help_long</code>, reconstruct a wide-form table named <code>help_wide</code> so that it has
one row per <code>row_id</code> and two columns — <code>helpful_yes</code> and <code>helpful_no</code> — holding the vote counts.

Do not modify <code>df</code> itself; create the two new DataFrames <code>help_long</code> and <code>help_wide</code>.

<br><i>[0.5 points]</i>

</div> <div class="alert alert-warning"> <strong>Hint:</strong> Start by creating a temporary DataFrame with an explicit row identifier:<br> <code>tmp = df.reset_index().rename(columns={"index": "row_id"})</code> </div>

In [None]:
tmp = df.reset_index().rename(columns={"index": "row_id"})

help_long = tmp.melt(
    id_vars="row_id",                       
    value_vars=["helpful_yes", "helpful_no"],
    var_name="helpful_type",
    value_name="votes" 
)

help_wide = help_long.pivot(
    index="row_id",
    columns="helpful_type",
    values="votes"
).reset_index()

help_wide = help_wide.rename_axis(None, axis=1)

In [36]:
# LEAVE BLANK
help_long

Unnamed: 0,row_id,helpful_type,votes
0,0,helpful_yes,0
1,1,helpful_yes,0
2,2,helpful_yes,0
3,3,helpful_yes,4
4,4,helpful_yes,2
...,...,...,...
388873,194434,helpful_no,0
388874,194435,helpful_no,0
388875,194436,helpful_no,0
388876,194437,helpful_no,0


In [37]:
# LEAVE BLANK
help_wide

Unnamed: 0,row_id,helpful_no,helpful_yes
0,0,0,0
1,1,0,0
2,2,0,0
3,3,4,4
4,4,3,2
...,...,...,...
194434,194434,0,0
194435,194435,0,0
194436,194436,0,0
194437,194437,0,0


In [35]:
# LEAVE BLANK


<div class="alert alert-info"><b>Exercise 3 — MultiIndex Setup and Subsetting</b>

Create a multiindex and use it to slice over <code>asin</code> product codes:

1. Create <code>df_mi</code> by setting a <b>MultiIndex</b> on <code>["reviewerID", "asin"]</code>.  
2. Pick a product code (e.g., <code>'120401325X'</code>) and store it in <code>chosen_asin</code> (string).  
3. Create <code>sub_df</code> containing only the rows for that <code>chosen_asin</code>, while <b>preserving the MultiIndex</b> (i.e., keep both levels).

<br><i>[0.5 points]</i>


</div>

<div class="alert alert-warning">

<strong>Hints</strong>

* To subset while preserving both levels, use <code>pd.IndexSlice</code>
* Call <code>.sort_index()</code> before comparisons to avoid false negatives due to ordering.


</div>

In [None]:
df_mi = df.set_index(["reviewerID", "asin"]).sort_index()
chosen_asin = "B006FEBZRC"
sub_df = df_mi.loc[pd.IndexSlice[:, chosen_asin], :]

In [51]:
# LEAVE BLANK
sub_df

Unnamed: 0_level_0,Unnamed: 1_level_0,reviewerName,reviewText,overall,summary,unixReviewTime,reviewTime,helpful_yes,helpful_no,helpful_total
reviewerID,asin,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
A00126503SUWI86KZBMIN,B006FEBZRC,Margaret,the ears break way to fast and the case is jus...,1,sucked,1399766400,"05 11, 2014",0,0,0
A10WE6XS7WJY4A,B006FEBZRC,houston25,"THE COLOR, THE STYLE, THE EARS, THE WAY IT HUG...",5,AWSOME,1367712000,"05 5, 2013",0,0,0
A13MV5HQW3M97D,B006FEBZRC,rachel castillo,this was adorbs but cheap... be careful with t...,3,so cute,1359417600,"01 29, 2013",0,0,0
A15T5PHF07CA6W,B006FEBZRC,Fernanda,This case was perfect for me. It's cute and it...,5,Adorable to the death.,1378339200,"09 5, 2013",0,0,0
A15WUEDYOIP9Z4,B006FEBZRC,Ariliaa,"The ear pops out which is adorable, and its ea...",4,very cute,1376438400,"08 14, 2013",0,0,0
...,...,...,...,...,...,...,...,...,...,...
AUMUXSFO2HOGB,B006FEBZRC,Brianna shearer,"pretty big, cant fit in a pocket obviously but...",3,fir a friend,1358726400,"01 21, 2013",0,0,0
AW099GWN8N4TW,B006FEBZRC,kanisha johnson,I absolutely love it stitch is amazing and the...,5,crazy for stitch,1382486400,"10 23, 2013",0,0,0
AXWMHQ4THP4AN,B006FEBZRC,Racheal Schuttloffel,This case is super cute but on the downside th...,3,Cute,1388966400,"01 6, 2014",1,1,2
AYOU65QXB1XTH,B006FEBZRC,April,"This case is very cute!! The ears move, and it...",5,Cute!,1361404800,"02 21, 2013",0,0,0


In [None]:
# LEAVE BLANK

In [None]:
# LEAVE BLANK


In [None]:
# LEAVE BLANK


<div class="alert alert-info">

<b>Exercise 4 — Aggregation with <code>.agg()</code></b>

Work directly with the original DataFrame <code>df</code>.  
We will use <code>.agg()</code> to summarize helpfulness metrics.

1. Create <code>reviewer_metrics</code>: group by <code>"asin"</code> and aggregate  
   <code>helpful_yes</code>, <code>helpful_no</code>, and <code>helpful_total</code> using <b>sum</b>, <b>mean</b> and <b>std</b>.  
2. Sort the result by its index for consistency.

<br><i>[1 points]</i>


</div>

<div class="alert alert-warning">

<strong>Hint</strong>  
After aggregation, call <code>.sort_index()</code>.

</div>


In [None]:
reviewer_metrics = (
    df
    .groupby("asin")[["helpful_yes", "helpful_no", "helpful_total"]]
    .agg(["sum", "mean", "std"])
    .sort_index()
)

In [54]:
# LEAVE BLANK
reviewer_metrics.head()

Unnamed: 0_level_0,helpful_yes,helpful_yes,helpful_yes,helpful_no,helpful_no,helpful_no,helpful_total,helpful_total,helpful_total
Unnamed: 0_level_1,sum,mean,std,sum,mean,std,sum,mean,std
asin,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
120401325X,7,1.0,1.527525,9,1.285714,1.704336,16,2.285714,3.199702
3998899561,24,2.4,3.50238,32,3.2,4.661902,56,5.6,8.154072
6073894996,8,0.216216,0.672274,12,0.324324,0.9444,20,0.540541,1.608909
7532385086,3,0.333333,1.0,3,0.333333,1.0,6,0.666667,2.0
7887421268,2,0.153846,0.5547,2,0.153846,0.5547,4,0.307692,1.1094


In [None]:
# LEAVE BLANK


In [None]:
# LEAVE BLANK


<div class="alert alert-info">
<b>Exercise 5 — Time Series & 7-Day Rolling Mean</b>

Work with the original DataFrame <code>df</code>.  
We will convert the UNIX timestamp to datetime, compute the daily average rating, and smooth it with a 7-day rolling mean.

1. Create a datetime column from <code>unixReviewTime</code> (seconds since epoch), then set it as the index.
2. Compute <code>daily_avg</code>: the **daily mean** of <code>overall</code> using <code>.resample("D").mean()</code>.
3. Compute <code>rolling_7d</code>: a **7-day rolling mean** over <code>daily_avg</code> using <code>.rolling(7, min_periods=1).mean()</code>.
4. Ensure the index is sorted and both series share the same index.

Result:
<ul>
  <li><code>daily_avg</code> – a <code>pd.Series</code> indexed by calendar day, with the mean of <code>overall</code>.</li>
  <li><code>rolling_7d</code> – a <code>pd.Series</code> on the same index, containing the 7-day rolling mean of <code>daily_avg</code>.</li>
</ul>

<br><i>[0.5 points]</i>

</div>

<div class="alert alert-warning">

<strong>Hint</strong>  
Remember that the timestamps are given in seconds when converting into datatime.  
After setting the datetime column as index, make sure to call <code>.sort_index()</code> so that the time series is in chronological order.

</div>



In [63]:
df["dt"] = pd.to_datetime(df["unixReviewTime"], unit="s")
df = df.set_index("dt").sort_index()

daily_avg = df["overall"].resample("D").mean()

rolling_7d = daily_avg.rolling(7, min_periods=1).mean()

In [64]:
# LEAVE BLANK
df.head()

Unnamed: 0_level_0,reviewerID,asin,reviewerName,reviewText,overall,summary,unixReviewTime,reviewTime,helpful_yes,helpful_no,helpful_total
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2001-02-22,A3TB9HNQR54B5V,B00002X29G,"John ""John""",You may not need these types of screwdrivers o...,5,A nice set and a must-have for any workshop or...,982800000,"02 22, 2001",38,51,89
2002-10-04,A2BH04B9G9LOYA,B000056PYW,Alex P.,I like this Jabra earboom so much that I have ...,5,Wonderful,1033689600,"10 4, 2002",1,2,3
2003-12-06,A1KD8NJPZ01R37,B0000SX3BK,doppelganger,And it was pretty much worth it (if you sign t...,5,swapped an i95cl for the .06 slimmer chassis o...,1070668800,"12 6, 2003",2,6,8
2003-12-22,A10RMVX6EE90N6,B0000SX3BK,"Morris Hanley ""Moe3754""",I have to say that this is a great phone and t...,5,Great Phone,1072051200,"12 22, 2003",3,10,13
2004-01-08,A5JLAU2ARJ0BO,B0000AGRYX,"Gadgester ""No Time, No Money""",I've had mine for almost two months now and I'...,5,Great all-around,1073520000,"01 8, 2004",1,2,3


In [65]:
# LEAVE BLANK


In [66]:
# LEAVE BLANK


<div class="alert alert-info">

<b>Exercise 6 — String Operations and Text Features</b>

Use string methods on <code>reviewText</code> to build simple, vectorized features.

1. Create a DataFrame <code>text_features</code> with the index of rows where <code>reviewText</code> is present (drop missing).  
2. Columns to compute (all numeric):
   - <code>review_length</code>: number of words in each review  
   - <code>exclamation_count</code>: number of exclamation marks (<code>!</code>)  
   - <code>has_great_keyword</code>: 1 if the text contains <code>"great"</code>, else 0  
   - <code>has_bad_keyword</code>: 1 if the text contains <code>"bad"</code>, else 0  
   - <code>has_refund_keyword</code>: 1 if the text contains <code>"refund"</code>, else 0  
3. Ensure <code>text_features</code> has the same index (and order) as the filtered <code>df</code> where <code>reviewText</code> is non-missing.

<br><i>[1 points]</i>

</div>

<div class="alert alert-warning">

<strong>Hint</strong>  
Use vectorized operations like <code>.str</code>.

</div>


In [68]:
df_text = df[df["reviewText"].notna()].copy()
text_features = pd.DataFrame(index=df_text.index)

text_features["review_length"] = df["reviewText"].str.split().str.len()

text_features["exclamation_count"] = df["reviewText"].str.count("!")

text_features["has_great_keyword"] = df["reviewText"].str.contains("great", case=False).astype(int)

text_features["has_bad_keyword"] = df["reviewText"].str.contains("bad", case=False).astype(int)

text_features["has_refund_keyword"] = df["reviewText"].str.contains("refund", case=False).astype(int)

text_features.head()

Unnamed: 0_level_0,review_length,exclamation_count,has_great_keyword,has_bad_keyword,has_refund_keyword
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2001-02-22,139,0,1,0,0
2002-10-04,133,0,1,0,0
2003-12-06,81,0,0,0,0
2003-12-22,162,0,1,0,0
2004-01-08,75,0,1,0,0


In [69]:
# LEAVE BLANK
df_text.head()

Unnamed: 0_level_0,reviewerID,asin,reviewerName,reviewText,overall,summary,unixReviewTime,reviewTime,helpful_yes,helpful_no,helpful_total
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2001-02-22,A3TB9HNQR54B5V,B00002X29G,"John ""John""",You may not need these types of screwdrivers o...,5,A nice set and a must-have for any workshop or...,982800000,"02 22, 2001",38,51,89
2002-10-04,A2BH04B9G9LOYA,B000056PYW,Alex P.,I like this Jabra earboom so much that I have ...,5,Wonderful,1033689600,"10 4, 2002",1,2,3
2003-12-06,A1KD8NJPZ01R37,B0000SX3BK,doppelganger,And it was pretty much worth it (if you sign t...,5,swapped an i95cl for the .06 slimmer chassis o...,1070668800,"12 6, 2003",2,6,8
2003-12-22,A10RMVX6EE90N6,B0000SX3BK,"Morris Hanley ""Moe3754""",I have to say that this is a great phone and t...,5,Great Phone,1072051200,"12 22, 2003",3,10,13
2004-01-08,A5JLAU2ARJ0BO,B0000AGRYX,"Gadgester ""No Time, No Money""",I've had mine for almost two months now and I'...,5,Great all-around,1073520000,"01 8, 2004",1,2,3


In [None]:
# LEAVE BLANK


In [None]:
# LEAVE BLANK


<div class="alert alert-info">

<b> Exercise 7 — TF-IDF Features </b>

Vectorize the review text using TF-IDF.

1. Filter to rows with non-missing <code>reviewText</code>. Keep the index and order.
2. Create and fit a <code>TfidfVectorizer</code> with:
   <ul>
     <li><code>min_df=5</code></li>
     <li><code>max_features=300</code></li>
   </ul>
3. Transform the filtered texts into a sparse matrix <code>X_tfidf</code>.
4. Extract the feature names into a list <code>feature_names</code>.
Result:
<ul>
  <li><code>vectorizer</code> — the fitted <code>TfidfVectorizer</code></li>
  <li><code>X_tfidf</code> — the TF-IDF sparse matrix with shape <code>(n_texts, n_features)</code></li>
  <li><code>feature_names</code> — a list of selected vocabulary terms (length ≤ 300)</li>
</ul>

<br><i>[1 points]</i>

</div>

<div class="alert alert-warning">

<strong>Hint</strong><br>
Import with <code>from sklearn.feature_extraction.text import TfidfVectorizer</code>.  
Use a string dtype (e.g., <code>.astype("string")</code>) before passing text to the vectorizer.  
After fitting, call <code>vectorizer.get_feature_names_out().tolist() to get the feature names.</code>.

</div>


In [None]:
# YOUR CODE HERE


In [None]:
# LEAVE BLANK


In [None]:
# LEAVE BLANK


In [None]:
# LEAVE BLANK


<div class="alert alert-info">

<b> Exercise 8 — Merge Text Features & Export </b>

Combine the original reviews with the text-derived features and save to disk.

1. Start from the original <code>df</code> and the DataFrame <code>text_features</code> from Exercise 6 (indexed like rows with non-missing <code>reviewText</code>).
2. Create <code>combined</code> by left-joining <code>df</code> with <code>text_features</code> **by index** (rows without text get NaNs for those new columns).
3. Save <code>combined</code> to <code>parquet_path = "amazon_reviews_features.parquet"</code> with engine <b>"pyarrow"</b>, <code>compression="snappy"</code>, <code>index=False</code>.
4. Read it back into <code>combined_parq</code>.

<br><i>[1 points]</i>

</div>

<div class="alert alert-warning">

<strong>Hint</strong>  
Use <code>df.join(text_features, how="left")</code> to align on the index.  

</div>



In [None]:
# YOUR CODE HERE


In [None]:
# LEAVE BLANK


In [None]:
# LEAVE BLANK


<div class="alert alert-info">
    
<b>Exercise 9: Object-Oriented Dataset Builder</b>  

In this open-ended exercise, you will design a reusable and extensible dataset class for the Kaggle dataset <a href="https://www.kaggle.com/datasets/jeffheaton/demand-forecasting-with-tabular-textual-images" target="_blank">Demand Forecasting with Tabular, Textual & Images</a>.
This dataset includes **tabular**, **text**, and **image** data, and will serve as a complete integration challenge.

Your goal is to build a class that can load, preprocess, and cache a clean feature set for future analysis or model training from the dataset.

**Your tasks are:**

1. **Class Definition:**
   Create a class named <code>DemandForecastingDataset</code> that:

   * Accepts a path to the dataset directory in its constructor.
   * Stores a cache directory for preprocessed data.
   * Organizes the code using clean OOP principles (consider <code>@property</code>, other objects, or subclasses).

2. **Preprocessing Method:**
   Implement a method <code>preprocess()</code> that:

   * Loads and cleans the **raw tabular data** (e.g., parsing dates, handling missing values, generating time-based features).
   * Builds **features** (you are free to use whatever you want).
   * Merges all features into a unified dataset.
   * Stores the result efficiently on disk.

3. **Loading and Caching:**
   Implement a <code>load()</code> method that:

   * Loads the cached dataset if it already exists.
   * Otherwise, automatically calls <code>preprocess()</code> to create it.
   * Ensures subsequent loads are fast and reliable.

4. **Design & Modularity:**
   Use inheritance or composition if it improves the design. You are encouraged to make your class flexible and reusable.

5. **Demonstration:**
   After implementing your class:

   * Instantiate it with your dataset path.
   * Run <code>preprocess()</code> once to build the cache.
   * Call <code>load()</code> to verify it loads from cache.
   * Print dataset shape, column types, and preview a few rows.

6. **Documentation:**
   Provide short explanations (8–12 bullet points) describing:

   * The preprocessing choices you made.
   * Which features were engineered and why.
   * How caching and loading are handled.
   * How your design could be extended (e.g., versioning, feature modules).

<br><i>[4 points]</i>

</div>


In [None]:
# LEAVE BLANK

In [None]:
# YOUR CODE HERE