# Arctic Dataset Notebook

The dataset was found [here](https://github.com/big-data-lab-umbc/sea-ice-prediction) and the screenshot of feature significance came from an article found [here](https://s3.us-east-1.amazonaws.com/climate-change-ai/papers/icml2021/50/paper.pdf).

**MLA citation:** `Ali, Sahara, et al. "Sea Ice Forecasting using Attention-based Ensemble LSTM." arXiv preprint arXiv:2108.00853 (2021).`

### 0 Setup

For help, type `hint(x)`, replacing x with section number

In [1]:
# Run this cell
import matplotlib.pyplot as plt
import datascience as ds
import numpy as np
from IPython.lib.display import YouTubeVideo
from Hints.hints import hint
from sklearn.metrics import r2_score
from sklearn import preprocessing

### 1 Background
Learn about the context of the arctic data (provided in the `Data` folder).

In [None]:
# Run this cell
YouTubeVideo("V4lwQcho1No")

The video explores three fundamental topics, in chronological order:
1. Using primarily satelites and weather stations, arctic environmental variables are measured and recorded
2. These data can be used to predict future sea ice extent 
3. Predictions can help drive change in the future to preserve arctic sea ice and affected ecosystems

In the dataset provided, sea ice extent is a **target variable**. There are 10 **features** (the meaning of the features can be found [here](Data/features.png)).

### 2 Load data

- Load data into `full_data`. 
- Load the names of the features into `features_list`. Print `features_list`.

In [None]:
full_data = ...
features_list = ...
features_list

### 3 Data pre-processing

- Create a new table, called `yearly_data`, where each row contains averaged feature data over one year. Display the first 4 rows of the table.

In [None]:
...

### 4 Graph trends

- Graph all features with `year` as the independent variable.
- Add a linear trendline for each graph.
- Display the equation of the trendline as the title of the plot.
- Save the graph in the `Graphs` folder. 
- Label the x-axis as `year` and the y-axis as the feature name

In [None]:
...

### 5 Interpretation

What trends are present in the graph? What insights can you take away from this model? What may be the cause of the trend?

### 6 Correlation analysis

- Initialize `correlation_table` with 5 columns
    - \["X", "Y", "r_squared", "m", "b"]
- For every feature combination (X and Y), find the r-squared value, slope (m), and y-intercept (b) of the trendlines
- Save the table as `correlation_table`, and save a correlation table ordered by r-squared as `correlation_table_r2`

In [None]:
correlation_table = ...
...

### 7 Graph relationships
- Graph the correlation with:
    - The highest R2
    - The lowest R2

In [None]:
...

### 8 Analysis

Explain why the correlations were observed. Does correlation imply causation, or is there a confounding variable at play? Make further observations if necessary. 