# REU 2022 Day 8: Challenge

**We are going to be comparing the Arctic temperature dataset we used this morning with Arctic sea ice area** <br>

* A. Load SIA time series data and compare time series of Arctic temperature and we used this morning
* B. Look at 5 coldest and warmest members in July during the period 1980-2000. Look at the same members with SIA in August, do you see any relationship?
* C. (Extension) Look at Arctic temperatures in all 6 CMIP5 LEs and compute correlations with SIA in those models, also make scatter plots for each model


## A. Compare Arctic temperature time series for CanESM2 in July with sea ice area (SIA) in August

In [9]:
import xarray as xr
import numpy as np
import datetime
import matplotlib.pyplot as plt
import scipy.stats as stats

**Load both sea ice area and Arctic temperature datasets for CanESM2**

In [72]:
#Load SIA for all 6 models and make a variable for CanESM2 in August
CLIVAR_SIA = xr.open_dataset('CLIVAR_SIA_1850_2100_RCP85.nc')
CanESM2_SIA_Aug = CLIVAR_SIA['CanESM2'].sel(time=CLIVAR_SIA['time.month']==8).sel(member=slice(1,50))

In [71]:
#Load Arctic temperatures for all 6 models and make a variable for CanESM2 in July
CLIVAR_Arctic_temp = xr.open_dataset('CLIVAR_Arctic_surface_temp_1850_2100_RCP85.nc')
CanESM2_Arctic_July = CLIVAR_Arctic_temp['CanESM2'].sel(time=CLIVAR_Arctic_temp['time.month']==7).sel(member=slice(1,50))

<span style="color:blue"> **Make a scatter plot of July Arctic temperature in 1980-2000 versus August SIA** 

<span style="color:blue"> **Do you see a correlation? Does this make sense?**

<span style="color:blue"> **Compute the correlation, look back at Day 6 if you need to check how this is done. Add a line of best fit and the r value to the figure title**

## B. Now just look at the 5 highest and lowest temperature members, plot those against SIA for the same members 

<span style="color:blue"> **Find the member numbers of the highest and lowest Arctic temperatures in July 1980-2000. Hint look up the `.rank` function for xarray dataarrays**

In [1]:
#hint, you want to rank your data in dimension 'member', then on CanESM2_Arctic_July['member'] do .where based on the rank

<span style="color:blue"> **Make a scatter plot to compare the July Arctic temperature and August SIA. Use 2 different colors for the high and low groups**


<span style="color:blue"> **Do these look like 2 distinct groups in their SIA?**



<span style="color:blue"> **Calculate whether these two groups of SIA are statistically different at the 0.05 level, use `scipy.stats.ttest_ind`**

## C. Now compute correlations for the other 5 climate models in the datasets and plot scatter plots as you did for CanESM2 in part A

<span style="color:blue"> **Make a variable for Arctic temperatures in July and SIA in August**

In [127]:
#you can list the model names which are the xarray dataset variable names
model_names = np.array(list(CLIVAR_Arctic_temp.keys()))
print(model_names)

['CanESM2' 'CESM1' 'CSIRO_MK36' 'GFDL_CM3' 'GFDL_ESM2M' 'MPI_ESM1']


In [128]:
#each of these models have different numbers of ensemble members, in alphabetical order they are as follows
mem_len = [50,40,30,20,30,100]

In [126]:
Arctic_temp_July = CLIVAR_Arctic_temp.sel(time=CLIVAR_Arctic_temp['time.month']==7)
SIA_August = CLIVAR_SIA.sel(time=CLIVAR_SIA['time.month']==8)

<span style="color:blue"> **Repeat the same analysis as in A, make individual plots for each member to show a scatter plot with linear regression line and r value.** <br>
**Make your code cell run for any of the model names and for any time period without rewriting the whole script**

<span style="color:blue"> **Extension, look at longer time periods, e.g. 1950-2100 - do you get better correlations?** 
