# The Anatomy of a Power Outage

The U.S. Department of Energy (DOE) provides critical information about the status and impacts of energy sector disruptions through the **Environment for Analysis of Geo-Located Energy Information (EAGLE-I)** system, operated by Oak Ridge National Laboratory. EAGLE-I supports monitoring of energy infrastructure assets, reporting of power outages, visualization of threats to energy infrastructure, and coordination of emergency response and recovery efforts.

Effectively responding to and restoring power during disasters depends on having timely, accurate, and actionable data.

In this exercise, your goal is to learn about K-Means and how clustering data can help you characterize it. This activity is based on one of the exercises from a larger data bootcamp that seeks to identify the characteristics and causes of power outages in the United States.

## Background 

K-Means is a machine learning algorithm used for clustering, which means grouping data points that are similar to each other. Instead of having predefined labels (as in classification), K-Means finds structure in the data on its own.
- You choose a number of clusters, K.
- The algorithm finds centers (called centroids) for those clusters.
- Each data point is assigned to the cluster with the nearest centroid.
- The centroids are updated until the clusters stabilize.

In this lab, we‚Äôll use K-Means to group power outages based on their location (latitude and longitude) and time of year. This way, we can discover natural patterns, like whether outages cluster in certain regions or seasons.

## Why Normalize the Data?

Normalization (or scaling) is important because K-Means relies on distances to decide which points are similar. If one variable has a much larger numeric range than another, it can dominate the distance calculation.

For example:
- longitude values range roughly from -180 to 180.
- Time of year (say, months of the year 1‚Äì12) has a different scale.


If we don‚Äôt normalize, the clustering will be biased toward the feature with larger numbers.

By normalizing, we put all features on a comparable scale so that location and time both contribute fairly to the clustering.

## What is the best number of clusters to choose for a given set of data?

### The Elbow Method
- K-Means needs you to choose K, the number of clusters. But how do you know the best K?
- For each choice of K, we can calculate the within-cluster sum of squares (WCSS), which measures how close the points are to their cluster centers.
- As K increases, WCSS always goes down (more clusters = tighter groups).
- The trick is to look for the elbow in the WCSS vs. K graph:
  - At first, adding more clusters makes the WCSS drop a lot.
  - After a certain point, the improvement slows down.
  - That ‚Äúelbow‚Äù point is often a good choice for K.

There are other tests too that you will find in the exercises in this notebook.

## How to Use This Notebook

You must activate the code cells in the notebook below for the code to be used. In some cases, you are only reading functions into the Python interpreter, and they will not produce output until you call the function in another cell.

To activate a cell, click on it, and then hold down "Shift" while also pressing "Enter" or "Return" on the keyboard.

## Tips for Using Jupyter Notebooks and Python

- If you get a pink error box after running a cell, scroll to the **bottom** of the error box to see what the main error is.  
- Often, if you get an error box, it‚Äôs because you skipped activating a cell above your current cell.  
- You can use an AI assistant to help you understand what the error message means‚Äîjust make sure to **paste both the code that generated the error and the entire error message** into the AI for the best explanation.



## Imports

First, you will import all the Python packages you need for this project below.

Python packages are collections of modules that provide reusable code for specific tasks, such as data analysis or parallel computing. You import them using the `import` statement so that you can access the tools and functions they contain in your script.

### üü© **TODO**
To run the cell below, click on it, and then press **Shift + Enter** (or **Shift + Return**). This is how you will run all the code cells inside a Jupyter Notebook.


In [None]:
import os
import sys
import math
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn


## The Data

In this lab, we‚Äôll use K-Means to group power outages based on their location (latitude and longitude) and time of year. This way, we can discover natural patterns, such as whether outages cluster in certain regions or seasons.

## FIPS Codes and Census Regions

The data provided in the EAGLE-I datasets includes FIPS codes for outages reported from different areas. FIPS codes are unique identifiers that describe specific geographic locations. The FIPS codes provided in the EAGLE-I dataset are county-level, which means they are five digits in total. The first two digits represent the state, and the last three identify the county within that state.

More about FIPS codes: https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt

During our data cleaning, we provided a dictionary that uses the two-digit FIPS code for states to identify the state of each outage. We also provided a numerical identifier for the census region of the U.S. where each outage occurs. We do this to have additional features available for use in our unsupervised learning later on. The dictionary provided in the script, along with a map displaying the U.S. census regions, is shown below for convenience.

![Map with Census Regions Indicated](./Images/census_regions.gif)

## Regions  1: Pacific 2: Mountain 3: West North Central 4: West South Central 5: East North Central

## 6: East South Central 7: New England 8: Mid-Atlantic 9: South Atlantic



## Read in and Sort the Data

The dataset we provide is lightly cleaned but includes records from all U.S. states, including Alaska and Hawaii.  
The continental United States lies between approximately 66.95¬∞W and 124.67¬∞W longitude.  
Because Python (and most geographic systems) represents western longitudes as negative, the continental U.S. spans roughly ‚Äì67 to ‚Äì125.  
To exclude Alaska and Hawaii, we keep only data with longitude values greater than ‚Äì130 (i.e., east of 130¬∞W).  
This ensures we focus on outages within the contiguous United States.  

Functions are a useful structure that allows us to repeat the same process with different inputs.  
We‚Äôll define one to read in and clean the outage data so it can be reused easily for other datasets.

---

### üü© **TODO**

Hold down the **Shift** key as you click on the cell below to activate the function code.  
Do the same for the next cell to call the function.

In [None]:
# Reads a CSV file of power outage data, extracts the outage start year,
# removes invalid longitude entries (<= -130), and returns the cleaned DataFrame.

def get_data(filename="AllOutages"):
    data = pd.read_csv(f"/anvil/projects/x-cis230270/data/kmeans_data/{filename}.csv")
    # Convert the text values in the OutageStart column into datetime objects that Pandas can understand and work with.
    data['year'] = pd.to_datetime(data['OutageStart'])
    data['year'] = data['year'].dt.year

    drop_indices = data[data['Long']<=-130].index
    data.drop(drop_indices, inplace=True)

    return data

In [None]:
# Call the function and look at the first 200 rows.
data = get_data()
print(data.head(20))

## Understanding the Data Columns

Here is what the column headings mean:

- **State:** The name of the state where the outage was located.  
- **FIPS:** The code that corresponds to the state and county where the outage occurred.  
- **StateNum:** The state number extracted from the FIPS code.  
- **Region:** The larger U.S. region where the outage was located. See the map at the top of this notebook for regional definitions.  
- **Lat:** Latitude of the outage location.  
- **Long:** Longitude of the outage location.  
- **Month:** The month during which the outage took place (across any year in the dataset).  
- **Month_Sin:** The month expressed as a cyclic component to help ensure outages occurring at the end of one year (e.g., December) and at the beginning of the next (e.g., January) are grouped together.  
- **Month_Cos:** The second cyclic component used similarly to keep outages at the end and beginning of the year together.  
- **OutageStart:** The date and time when the outage began.  
- **OutageEnd:** The date and time when the outage ended.  
- **OutageLength:** The duration of each outage, calculated for all customers in each county.  

**Note:** The data cleaning script collected power outage data for every county in the U.S. at 15-minute intervals for all dates between 2016 and 2022. It defines an outage start as the first date and time when more than 10% of the total customers in a county are without power, and the outage length is calculated until the total number of customers without power falls below the 10% threshold.

- **Sum:** Represents the average number of customers who were without power over the duration of each outage for each county.  
- **Year:** The year in which the outage occurred.  

As you can see, this dataset gives you many different opportunities to sort or aggregate the data.

---

### üü© **TODO**

For a visual understanding of the data, run the cell below to display a histogram showing the total number of outages per region.

In [None]:
#Hold shift while clicking on the cell below to see a histogram of the total nubmer of outages in the data set for each region. 

# Count the number of outages per Region in all the data
# This line counts the number of entries per region in the dataframe
outages_in_data = data['Region'].value_counts().sort_index()


# Plot the histogram (bar chart)
plt.figure(figsize=(10,6))
outages_in_data.plot(kind='bar')

plt.title('Number of Outages from 2014-2022 by Region')
plt.xlabel('Region')
plt.ylabel('Number of Outages')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### Regions 
 1: Pacific 2: Mountain 3:West North Central 4: West South Central 5: East North Central 6: East South Central 7: New England
 8: Mid-Atlantic 9:South Atlantic

Right away, you can see that some regions report more power outages than others!  
Perhaps this is due to differences in weather, population, or power infrastructure.  
Clustering the outages by season and region can help us begin to understand their connection to weather.

---

### üü© **TODO**

1. Which regions have the largest number of power outages?  
2. Why might those regions have more outages than others?  
Make are note for yourself with your thoughts about these questions.

*(Scroll up to use the map if you need to see where the regions are located.)*


## K-Means

Let‚Äôs use K-Means to see whether power outages tend to cluster in regions that experience more severe weather. To do this rigorously, we would need to combine our outage data with a severe weather database‚Äîwhich is exactly what we do in the week-long Data Science Camp that uses this dataset. However, for this short exercise, we‚Äôll rely on our physical intuition and personal experience. For example, we know that in areas that are less arid, thunderstorms often occur during the summer and are known to cause damage to power lines.

---

### Exercise: Select the Features

First, we need to select the data features we‚Äôll use for clustering.  
Obvious choices from our dataset include **latitude**, **longitude**, and **month**.  
Instead of using raw month values, we‚Äôll use the **Month_Sin** and **Month_Cos** components to represent the cyclical nature of time‚Äîthis helps the model recognize smooth transitions between December and January.

---

### üü© **TODO**

In the cell below, replace `'REPLACE ME'` with the following column names to select your features:

```
'Long', 'Lat', 'Month_Sin', 'Month_Cos'
```

Then activate the cell by pressing **Shift + Return** (or **Shift + Enter**).



In [None]:
#Select Features to Cluster
columns=['REPLACE ME', 'REPALCE ME', 'REPLACE ME', 'REPLACE ME']
#Make a new data frame called 'X' with just those features
X=data[columns]
#drop any rows that contain not an number errors (nan)
X=X.dropna()

print(X)

### üü© **TODO**

Read and understand why we need to **normalize the features**.

Notice that the **Longitude** and **Latitude** values have magnitudes in the tens or hundreds, while the **Month_Sin** and **Month_Cos** components are much smaller‚Äîtypically between ‚Äì1 and 1.

As we discussed earlier, K-Means clustering is a geometric algorithm: it measures how far points are from each other in multidimensional space to decide which cluster they belong to.

If one feature (like **Longitude**) has values hundreds of times larger than another (like **Month_Sin**), it will dominate the distance calculation. This means K-Means will place its cluster centers closer to the features with the largest numerical scales, ignoring the smaller ones.

To give all features an equal influence on the clustering, we need to rescale the data so that every feature has roughly the same magnitude‚Äîtypically by transforming them to have a mean of 0 and a standard deviation of 1.  
This process is called **standardization** or **normalization**.

## Exercise: Normalize Features  

Next, we‚Äôll normalize the data so that all features contribute equally to the clustering.  

Your DataFrame is called **`X`**.  

The line in the cell below imports the **StandardScaler** class from the scikit-learn library‚Äôs preprocessing module.  
**StandardScaler** standardizes numerical data by removing the mean and scaling to unit variance (i.e., converting each feature into a z-score).  

This means that after scaling, each feature (column) will have:  
- a mean ‚âà 0  
- a standard deviation ‚âà 1  

---

### üü© **TODO**

Activate the cell below to import the **StandardScaler** class.

In [None]:
from sklearn.preprocessing import StandardScaler


### üü© **TODO**

Create an instance of the **StandardScaler** class called `scaler`.

- Think of it as creating a tool that can learn how to scale your data.  
- It will remember the mean and standard deviation of each column during fitting.

In the cell below, type:

```
scaler = StandardScaler()
```

Then activate the cell by pressing **Shift + Return** (or **Shift + Enter**).

In [None]:

## TODO Create an instance of the StandardScaler class called "scaler". 



This line does two steps in one:

- **`fit()`** ‚Äì calculates the mean and standard deviation for each column in **X**.  
- **`transform()`** ‚Äì uses those values to standardize each data point using the formula:

z = (x ‚àí Œº)/ œÉ

where:  
- **x** = original value  
- **Œº** = column mean  
- **œÉ** = column standard deviation  

The result, **`norm_X`**, is a NumPy array of scaled values.

---

### üü© **TODO**

Activate the cell below to fit and transform the data.


In [None]:
# fit and transform the data to normalize its vales. 
norm_X = scaler.fit_transform(X)

print (norm_X)

You can see that the values of all these featires fall between -2 and 2, so when we use geometery to find the distance between each point for K-means, it is not skewed.

## Methods of Determining K  

### The Elbow Method  

The **Within-Cluster Sum of Squares (WCSS)** measures how tightly the data points within each K-Means cluster are grouped around their respective centroids‚Äîin other words, how compact the clusters are.  

The loop below runs K-Means for cluster counts from 1 to 29 and records the WCSS for each value of **k**.  
This allows us to see how the compactness of the clusters changes as the number of clusters increases.  
In the resulting graph, look for a turning point or ‚Äúelbow‚Äù ‚Äî the point after which adding more clusters yields diminishing improvements in compactness.  
This elbow helps guide our choice for the most appropriate number of clusters.  

---

### üü© **TODO**

Activate the cell below to run the loop and generate the WCSS values. Note the X-axis value where you think the "elbow" of the plot is located, you will need it later. 

In [None]:
wcss = []
from sklearn.cluster import KMeans
for i in range(1, 30):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
    kmeans.fit(norm_X)
    wcss.append(kmeans.inertia_)
    
plt.plot(range(1,30), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
# plt.savefig(f"Plots/elbow-{filename}.png")
plt.show()

## Davies-Bouldin Index  

Another method we can use that provides a more explicit numerical value than the Elbow Method is the **Davies-Bouldin Index**.  
The Davies-Bouldin Index measures similarity between clusters and their most similar cluster by taking the ratio of within-cluster distances to between-cluster distances.  
A lower Davies-Bouldin score indicates better clustering performance.  

**Scikit-learn‚Äôs Davies-Bouldin Index documentation:**  
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html#sklearn.metrics.davies_bouldin_score  

---

### üü© **TODO**

Activate the cell below to run a Davies-Bouldin Index scoring on your data.  
Which value of **K** gives the lowest score?

In [None]:
from sklearn.metrics import davies_bouldin_score
for k in range(2,20):
    clusterer = KMeans(n_clusters=k, random_state=10,n_init='auto')
    labels = clusterer.fit_predict(norm_X)

    print(f"{k} clusters the Davies-Bouldin Score is: {davies_bouldin_score(norm_X, labels)}")

## Calinski-Harabasz Index  

The final method we‚Äôll use to determine the best **K** is the **Calinski-Harabasz Index**.  
This score evaluates the ratio of between-cluster dispersion to within-cluster dispersion.  
It is also known as the **Variance Ratio Criterion**.  
For this metric, **higher values** indicate better-defined clusters.  

**Scikit-learn‚Äôs Calinski-Harabasz Index documentation:**  
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html#sklearn.metrics.calinski_harabasz_score  

---

### üü© **TODO**

Activate the cell below to generate the Calinski-Harabasz scores for your data across different values of **K**.  
Which value of **K** gives the highest score?

In [None]:
from sklearn.metrics import calinski_harabasz_score
for k in range(2,20):
    clusterer = KMeans(n_clusters=k, random_state=10,n_init='auto')
    labels = clusterer.fit_predict(norm_X)

    print(f"{k} clusters the Calinski-Harabasz Score is: {calinski_harabasz_score(norm_X, labels)}")

# Choosing K and Plotting the Clusters  

After running through the different methods of determining **K** above, you should now have some good ideas about which value(s) of **K** might be best.  
Now you can test them and see what the clustering looks like when translated to a graph.  

Below is a clustering function that plots our data with color-coded clusters based on K-Means.  
Try experimenting with the value of **K**, or even with which columns you plot (be sure to use only **2 or 3 columns**), to see what patterns you can find.  

*(Note that changing the columns only changes how the points are displayed‚Äîit does not change which features are used for clustering. Those were set when we created `norm_X` in the Elbow Method.)*  

---

### üü© **TODO**

In the next cell:  
1. Enter the **K** value you chose based on your earlier analysis in place of "Replace_Me".(Do not put the nubmer in quotes) 
2. Activate the cell to load that value of **K** for the next set of exercises.

In [None]:
#TODO: Determine the best k to use here
k = Replace_Me

### Plotting the Clusters

Below is a clustering function that plots our data with color-coded clusters based on K-Means.

---

### üü© **TODO**

1. Try experimenting with different values of **K** to see how the plot changes.  
2. Try removing the **Month** column and creating a 2D plot using only **Latitude** and **Longitude** to observe any patterns.  
   *(Note: Changing the columns only affects how the points are plotted‚Äîit does not change which features are used for clustering, since those were set when we created `norm_X` in the Elbow Method.)*  
3. Reset the plot to use `['Long', 'Lat', 'Month']` before continuing to the next section.  
4. Activate the cell below to load the function, then run the following cell to call and use it.


In [None]:
def cluster(data, norm_X, k=3, columns=['Region', 'Month', 'OutageLength'], scale=False):
    dims = len(columns)
    if dims > 3 or dims < 2:
        print("Should be looking at 2 or 3 features, change number of columns evaluated.")

    kmeans = KMeans(n_clusters=k, init='k-means++', max_iter=300, n_init=10, random_state=0)
    data['clusters'] = kmeans.fit_predict(norm_X)

    fig = plt.figure(figsize=(16, 14))

    # 3d clustering
    if dims == 3:
        ax = fig.add_subplot(111, projection='3d')
    # 2d clustering
    else:
        ax = fig.add_subplot()

    for i in range(k):
        cluster_data = data[data['clusters'] == i]
        if scale:
            sizes = cluster_data['Sum']
        else:
            sizes = 10
        if dims == 3:
            ax.scatter(cluster_data[columns[0]], cluster_data[columns[1]], cluster_data[columns[2]], label=f'Cluster {i + 1}', edgecolor='black', alpha=0.5, s=sizes/30)
        else:
            ax.scatter(cluster_data[columns[0]], cluster_data[columns[1]], label=f'Cluster {i + 1}', edgecolor='black', alpha=0.5, s=sizes/30)

    ax.set_title('Clusters of Outages')
    ax.set_xlabel(columns[0])
    ax.set_ylabel(columns[1])
    if dims == 3:
        ax.set_zlabel(columns[2])
    lgnd = ax.legend(markerscale=1)
    for handle in lgnd.legend_handles:
        handle.set_sizes([24.0])
    return data, kmeans.cluster_centers_

In [None]:
# range_mask = (data['Month'] >= 6.0) & (data['Month'] < 12.0)
new_data, centroids = cluster(data, norm_X, k, columns=['Long', 'Lat','Month'], scale=True)

## Helpers  

In the following section, you are provided with several helper functions that will assist you in your workflow and analysis.

Below is a brief explanation of these functions and what they can be used for:

- **circular_toy**  
  - This function takes a time-of-year (**toy**) value from an outage entry and calculates two components of that value using basic trigonometric functions.  

- **plot_range_slice**  
  - This function takes your outage data, the number of clusters you‚Äôre separating it into, and a lower and upper range of time of year, then plots a 2D slice of the data from that time frame.  
  - You can optionally choose to scale the points by how many people are affected by each outage.  

- **revert_comp**  
  - This function takes two components of a time of year and converts them back to their original value.  

- **centroid_toy**  
  - This function takes the centroids of all clusters and the scaler used to normalize the original data, and converts the centroids to a time-of-year value for each cluster.  

- **concat_helper**  
  - This function takes two file names, each containing a CSV of outages, and concatenates them into a single CSV file.  

---

### üü© **TODO**

Press **Shift + Return** (or **Shift + Enter**) in each of the following cells to activate them.

In [None]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

In [None]:
def filter_column(df, column, filter_val):
    mask = (df[column] == filter_val)
    return df[mask]

In [None]:
def circular_toy(toy):

    x = (2*math.pi*(toy-1)) / 12
    toy_sin = math.sin(x)
    toy_cos = math.cos(x)
    
    return toy_sin, toy_cos

In [None]:
def plot_range_slice(data, k, lower, upper, scale=False):
    low_sin, low_cos = circular_toy(lower)
    up_sin, up_cos = circular_toy(upper)
    # print(f'Low Sin: {low_sin}, Low Cos: {low_cos}, High Sin: {up_sin}, High Cos: {up_cos}')
    # date_mask = (new_data['Month'] >= lower) & (new_data['Month'] < upper)
    date_mask = None
    if lower < upper:
        date_mask = (data['Month'] >= lower) & (data['Month'] < upper)
    else:
        date_mask = (data['Month'] >= lower) | (data['Month'] < upper)
    masked_data = data[date_mask]
    # print(masked_data)
    fig = plt.figure(figsize=(12, 10))
    
    ax = fig.add_subplot()
    for i in range(k):
        cluster_data = masked_data[masked_data['clusters'] == i]
        if scale:
            sizes = cluster_data['Sum']/30
        else:
            sizes = 100
        ax.set_aspect('equal', adjustable='box')
        ax.scatter(cluster_data['Long'], cluster_data['Lat'], label=f'Cluster {i+1}', edgecolor='black', alpha=0.5, s=sizes)
    
    ax.set_title(f'Clusters of Outages for Time Between Month {lower} and {upper}')
    lgnd = ax.legend(markerscale=1)
    for handle in lgnd.legend_handles:
        handle.set_sizes([24.0])

In [None]:
def centroid_toy(centroids, scaler):
    inversed_centroids = scaler.inverse_transform(centroids)
    for i in range(0, len(inversed_centroids)):
        print(f"Centroid for Cluster {i+1} occurs during time of year: {revert_comp(inversed_centroids[i][2], inversed_centroids[i][3])}")


In [None]:
def revert_comp(sin, cos):
    theta = math.atan2(sin, cos)

    x = 1 + (12 * theta) / (2 * math.pi)

    if x < 1:
        x += 12
    
    return x

# Analysis  

## When Are the Outage Clusters Centered  

We can use one of our helper functions, `centroid_toy`, to find the time of year where each cluster is centered.  

---

### üü© **TODO**

Press **Shift + Return** (or **Shift + Enter**) on the cells below to call the `centroid_toy` function.

**What months are represented by the cluster centroids?**

In [None]:
centroid_toy(centroids, scaler)

Another one of our helper functions, `plot_range_slice`, can be used to view our data in a 2D slice during a specified range of time.  
We can use this to evaluate clusters in different regions during times of the year that we‚Äôre interested in.  

*(Note: You can specify ranges where the lower bound is a month following the upper bound, and the range will wrap around the year.  
For example, `lower = 12` and `upper = 2` will show data from December through February.)*  

---

### üü© **TODO**

Press **Shift + Return** (or **Shift + Enter**) on the cells below.

In [None]:
plot_range_slice(data, k, 1.0, 12.0, scale=True)

Here we use the `plot_range_slice` function to create graphs for each month to see how our data changes and clusters throughout the year.  

---

### üü© **TODO**

Use the `plot_range_slice` function to generate plots for each month.  
Observe how the clusters shift or change across different times of the year.  
Consider:  
- Do certain regions experience more outages during specific months?  
- Do cluster patterns align with known seasonal weather trends?

In [None]:
for i in range(1, 13):
    plot_range_slice(new_data, 7, i, i + 1, scale=True)

## Analysis


### Step 1: Refine your clusters
1. Test and adjust the number of power outage clusters based on the results and tests in the notebook

### Step 2: Analyze and characterize the results
- This is the process of determining what story the data and results tell you.
Pick a region of
Questions to consider:
- When are the centers of those clusters for each region?
  - Use your plot slices and the results of the centroid_toy to answer
  - What sever weather happens in those the regions during the month where the clsuter is centered?
    
- What are the general characteristics of the power outages in those clusters?
  - Are they generally long or short outages (large or small circles)?
  - How many customers are impacted on average?
- Are there regions or states that mostly share clusters or that mostly do not share clusters?
  - Why might that be in each case?
  - Form some hypotheses about the possible relationships to:
    - Regional climate and weather
    - The time of year
    - The severity of the power outages
- How well do you trust the cluster results?


# üü© **Final TODO**
Please fill out this form with your analysis to get credit for this challenge: https://forms.gle/zajaAVTCQy1PCp5y6
---
