# Arctic Dataset Notebook

The dataset was found [here](https://github.com/big-data-lab-umbc/sea-ice-prediction) and the screenshot of feature significance came from an article found [here](https://s3.us-east-1.amazonaws.com/climate-change-ai/papers/icml2021/50/paper.pdf).

**MLA citation:** `Ali, Sahara, et al. "Sea Ice Forecasting using Attention-based Ensemble LSTM." arXiv preprint arXiv:2108.00853 (2021).`

### 0 Setup

For help, type `hint(x)`, replacing x with section number

In [4]:
# Run this cell
import matplotlib.pyplot as plt
import datascience as ds
import numpy as np
from IPython.lib.display import YouTubeVideo
from Hints.hints import hint

### 1 Background
Learn about the context of the arctic data (provided in the `Data` folder).

In [None]:
# Run this cell
YouTubeVideo("V4lwQcho1No")

The video explores three fundamental topics, in chronological order:
1. Using primarily satelites and weather stations, arctic environmental variables are measured and recorded
2. These data can be used to predict future sea ice extent 
3. Predictions can help drive change in the future to preserve arctic sea ice and affected ecosystems

In the dataset provided, sea ice extent is a **target variable**. There are 10 **features** (the meaning of the features can be found [here](Data/features.png)).

### 2 Load data

- Load data into `full_data`. 
- Load the names of the features into `features_list`. Print `features_list`.

In [28]:
full_data = ds.Table().read_table("Data/Arctic_domain_mean_1979_2018.csv")
features_list = list(full_data.labels[2:12])
features_list

['wind_10m',
 'specific_humidity',
 'LW_down',
 'SW_down',
 'rainfall',
 'snowfall',
 'sosaline',
 'sst',
 't2m',
 'surface_pressure']

### 3 Data pre-processing

- Create a new table, called `yearly_data`, where each row contains averaged feature data over one year. Display the first 4 rows of the table.

In [25]:
yearly_data = ds.Table(["year"] + features_list)
for year in range(min(full_data.column("Year")), max(full_data.column("Year")) + 1):
    averages_array = []
    for feature in features_list:
        averages_array.append(sum(full_data.where("Year", ds.are.equal_to(year)).column(feature)) / len(full_data.where("Year", ds.are.equal_to(year)).column(feature)))
    yearly_data = yearly_data.with_row([year] + averages_array)
    averages_array = []
yearly_data.show(4)

year,wind_10m,specific_humidity,LW_down,SW_down,rainfall,snowfall,sosaline,sst,t2m
1979,5.07764,2.21866,237.497,96.3341,1.34656,0.70885,33.2524,274.36,263.532
1980,5.12035,2.24572,239.288,96.7265,1.28481,0.695901,33.2624,274.489,264.163
1981,5.18837,2.30067,243.107,96.0786,1.39369,0.760159,33.246,274.363,264.963
1982,5.1452,2.19611,237.621,97.6215,1.36295,0.738999,33.2725,274.287,263.673


### 4 Graph trends

- Graph `sst`, where `year` is the independent variable.
- Add a linear trendline for `sst`.
- Display the equation of the trendline as the title of the plot.
- Save the graph in the `Graphs` folder. 

In [27]:
full_data_avg.plot("year", "sst")
x = full_data_avg.column("year")
y = full_data_avg.column("sst")
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plt.plot(x,p(x),"r--")
plt.title("y=" + str(round(z[0], 6)) + "x + " + str(round(z[1], 6)))
plt.savefig("Graphs/sst.png")

### 5 Interpretation

What trends are present in the graph? What insights can you take away from this model? What may be the cause of the trend?