# Calculations of SIA and SIE from observational datasets

### Author: Chris Wyburn-Powell, [github](https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb) <br>

#### Input:
All datasets contain monthly data of which all months 1979-2020 are used:
- **Merged Hadley NOAA Optimal Interpolation, version 2.0 (Merged Hadley-OI)** - SIC. [`doi:10.5065/r33v-sv91`](https://doi.org/10.5065/r33v-sv91)
- **Hadley Centre Sea Ice and Sea Surface Temperature data set, version 1 (HadISST1)** - SIC. [`doi:10.1029/2002JD002670`](https://doi.org/10.1029/2002JD002670)
- **Hadley Centre Sea Ice and Sea Surface Temperature data set, version 2 (HadISST2)** - SIC. N.B. currently updated beyond 2020-08 (as of 2021-07-01), not used [`doi:10.1002/2013JD020316`](https://doi.org/)
- **NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 4 (CDR)** - SIC. Contains Climate Data Record (CDR), NASA Team (NT) and NASA Boostrap (BT) datasets. [10.1002/2013JD020316`doi:10.7265/efmz-2t65`
- **NSIDC sea ice index Version 3 (SII)** - SIA and SIE only. Monthly SIE and SIA based primerily on the NASA Team data. `doi:10.7265/N5K072F8`
 
#### Output:
1979-2020 missing/spurious data corrected for SIC, SIA and SIE: 
- **NOAA/NSIDC CDR (CRD, NT, BT)**
- **HadISST1**
- **Merged Hadley OI**
- **NSIDC SII (only SIA and SIE)**

#### Corrections to datasets:
The obejctive in making corrections is to remove spurious data and fill missing data with values which are similar to what happened and/or represents a realistic spatial scenario - e.g. linear interpolation may be okay for SIA and SIE, but doing so for SIC could cause unrealistic spaital distributions to occur, such as where 100% and 0% were averaged over a large area simulating an unrealistically wide ice edge zone. To ensure SIA/SIE should match SIC variables, SIC data from the same month of a different year is used to fill missing data.
- **NOAA/NSIDC CDR version 4**: Pole hole is filled using the average SIC of the surrounding grid cells (built in). Missing months (1984-07 1987-12, 1988-01) are filled by looking at the closest valid months for SIA (CDR), idenfitying whether the previous or following year's SIA for those valid months are closets to that year with missing data, then selecting the previous or following SIC data to fill the missing data. E.g. For 1984-07: SIA for 1983-06 and 1985-06 are compared with 1984-06 and 1983-08 and 1985-08 are compared with 1984-08. 1985 is found to be closer to 1984 than 1983 was with 1984 so to fill 1984-07, 1985-07 is copied. Similarly SIC values for 1988-12 and 1989-01 are used to fill 1987-12 and 1988-01.
- **HadISST1**: Discontinuities for months 2009-03 and 2009-04 were found with extreme negative anomalies which do not appear in other datasets. SIC from 2007-03 is used for 2009-03 and 2008-04 are used for 2009-04. 
- **Merged Hadley OI**: Data interpolated over land is masked using the land mask for HadISST1. 2009-03 and 2009-04 are filled with data from 2007-03 and 2008-04 respectively similarly to HadISST1. 2009-02 and 2009-05 are filled with 2010-02 and 2010-05 respectively.
- **NSIDC Sea Ice Index**: Pole hole is filled from SIA and SIE from NOAA/NSIDC CDR version 4 NT dataset. Missing months (1987-08, 1987-12, 1988-01) are filled using data from the following year (as with NOAA/NSIDC CDR SIC datatset).

In [1]:
import numpy as np
import xarray as xr
import pandas as pd
import glob 
import datetime
import warnings

In [2]:
#define the root path for the project directory
data_path = '/glade/scratch/cwpowell/Synthetic_ensemble/'

#define the mid-month dates to be used across all datasets
CLIVAR = xr.open_dataset(data_path+'/SIA/SIA_SIE_SIV/CLIVAR_SIA_1850_2100_RCP85.nc')
CLIVAR_time = CLIVAR['time'].sel(time=slice('1979','2020'))

# NSIDC CDR Version 4
## Exclude non-sea ice data and interpolate missing months

In [3]:
#load all monthly files into a single xarray dataset and correct time dimension
warnings.filterwarnings("ignore", 'variable ') #surpress warnings as the non-stantard time produces an error but is resolved during opening of dataset

all_CDR_data = []

for file in glob.glob(data_path+'Raw_data/observations/NSIDC_CDR_v4/seaice_conc_monthly_nh*'):
    all_CDR_data.append(xr.open_dataset(file))
    
all_CDR_xr = xr.concat((all_CDR_data), dim='tdim') #concatenate all of the monthly files into a single xarray dataaray for all dates

all_CDR_xr = all_CDR_xr.rename({'tdim':'time', 'y':'ygrid', 'x':'xgrid'}) #rename dimension so they match coordinates

all_CDR_xr = all_CDR_xr.sortby(all_CDR_xr['time']) #sort by time dimension, files were loaded in a random order

all_CDR_xr['time'] = CLIVAR_time #replace the time dimension with numpy.datetime64 objects for mid-month

In [4]:
#set all non-sea ice to np.nan
CDR = all_CDR_xr['cdr_seaice_conc_monthly'].where(all_CDR_xr['cdr_seaice_conc_monthly']<1.1) #exclude land and coastal grid points - values <2.5
BT  = all_CDR_xr['nsidc_bt_seaice_conc_monthly'].where(all_CDR_xr['nsidc_bt_seaice_conc_monthly']<1.1) 
NT  = all_CDR_xr['nsidc_nt_seaice_conc_monthly'].where(all_CDR_xr['nsidc_nt_seaice_conc_monthly']<1.1) 

In [5]:
#for filling values of 1984-07, 1987-12 and 1988-01 the following years (1985-07, 1988-12 and 1989-01)
#were found to be closest to the year with the missing values for other months of the year
filled = []

for data_var in [CDR, BT, NT]: #loop through all 3 datasets
    CDR_xr_1984_07 = data_var.sel(time='1985-07').copy()
    CDR_xr_1984_07['time'] = xr.DataArray(data = CLIVAR_time.sel(time='1984-07').values, coords={'time': CLIVAR_time.sel(time='1984-07').values}, dims=['time'])
    
    CDR_xr_1987_12 = data_var.sel(time='1988-12').copy()
    CDR_xr_1987_12['time'] = xr.DataArray(data = CLIVAR_time.sel(time='1987-12').values, coords={'time': CLIVAR_time.sel(time='1987-12').values}, dims=['time'])
    
    CDR_xr_1988_01 = data_var.sel(time='1989-01').copy()
    CDR_xr_1988_01['time'] = xr.DataArray(data = CLIVAR_time.sel(time='1988-01').values, coords={'time': CLIVAR_time.sel(time='1988-01').values}, dims=['time'])

    filled.append(xr.concat((data_var.sel(time=slice('1979-01','1984-06')), CDR_xr_1984_07, data_var.sel(time=slice('1984-08','1987-11')), 
                             CDR_xr_1987_12, CDR_xr_1988_01, data_var.sel(time=slice('1988-02','2020-12'))), dim='time'))

In [6]:
#save interpolated SIC to NetCDF
CDR_filled = xr.Dataset({'CDR':filled[0], 'BT':filled[1], 'NT':filled[2]})

CDR_filled.attrs = {'Description': 'Arctic sea ice concentration (SIC) from the Climate Data Record (CDR), NASA Team (NT) and NASA Boostrap (BT). All months 1979-2020, missing data (1984-07, 1987-12, 1988-01) filled with data from the following years (1985-07, 1988-12, 1989-01) as the following year SIA is closer than the preceeding year SIA for the months with data adjacent to the missing months.', 
                    'Units'      : 'million square km',
                    'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                    'Data source': 'NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 4, doi:10.7265/efmz-2t65.',
                    'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

CDR_filled.to_netcdf(data_path+'Raw_data/observations/NSIDC_CDR_v4/SIC_CDR_BT_NT_79-20_filled.nc')

## Calculate SIA, SIE from interpolated SIC

In [15]:
#calculate SIA
CDR_SIA = CDR_filled.sum('xgrid').sum('ygrid')*625/1e6 #each grid cell is 25x25 km

#calculate SIE
CDR_SIE = {}
for var_ in ['CDR', 'BT', 'NT']:
    ones_zeros = np.where(CDR_filled[var_]>0.15, np.ones(np.shape(CDR_filled[var_])), np.zeros(np.shape(CDR_filled[var_])))
    CDR_SIE[var_] = np.sum(ones_zeros, axis=(1,2))*625/1e6

In [16]:
#combine all the calculations into a single dataset and save to NetCDF

CDR_SIA_SIE = xr.Dataset(data_vars = {'CDR_SIA':(('time'), CDR_SIA['CDR']),
                                      'BT_SIA':(('time'), CDR_SIA['BT']),
                                      'NT_SIA':(('time'), CDR_SIA['NT']),
                                      'CDR_SIE':(('time'), CDR_SIE['CDR']),
                                      'BT_SIE':(('time'), CDR_SIE['BT']),
                                      'NT_SIE':(('time'), CDR_SIE['NT'])},
                         coords    = {'time': CDR_SIA['time']})

CDR_SIA_SIE.attrs = {'Description': 'Arctic sea ice area (SIA) and sea ice extent (SIE) from the Climate Data Record (CDR), NASA Team (NT) and NASA Boostrap (BT). All months 1979-2020, missing data (1984-07, 1987-12, 1988-01) are filled with the following year (1985-07, 1988-12, 1989-01).', 
                     'Units'      : 'million square km',
                     'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                     'Data source': 'NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 4, doi:10.7265/efmz-2t65.',
                     'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

CDR_SIA_SIE.to_netcdf(data_path+'Raw_data/observations/NSIDC_CDR_v4/SIA_SIE_CDR_BT_NT_79-20_filled.nc')

## Calculate the SIA and SIE of the pole hole for use with SII

In [20]:
#make mask of 1 for pole hole, 0 for not pole hole for the 3 sizes of pole holes
warnings.filterwarnings("ignore", 'variable ')

#pole hole valid for 1978-10 to 1987-07
qa_1979_01 = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_CDR_v4/seaice_conc_monthly_nh_197901_n07_v04r00.nc')
qa_1979_01 = qa_1979_01['qa_of_cdr_seaice_conc_monthly'].where(qa_1979_01['qa_of_cdr_seaice_conc_monthly']==47,0)
ph_1987_07 = qa_1979_01.where(qa_1979_01==0,1)

#pole hole valid for 1987-07 to 2007-12
qa_1988_09 = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_CDR_v4/seaice_conc_monthly_nh_198809_f08_v04r00.nc')
qa_1988_09 = qa_1988_09['qa_of_cdr_seaice_conc_monthly'].where(qa_1988_09['qa_of_cdr_seaice_conc_monthly']==47,0)
ph_2007_12 = qa_1988_09.where(qa_1988_09==0,1)

#pole hole valid for 2008-01 to present
qa_2008_09 = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_CDR_v4/seaice_conc_monthly_nh_200809_f17_v04r00.nc')
qa_2008_09 = qa_2008_09['qa_of_cdr_seaice_conc_monthly'].where(qa_2008_09['qa_of_cdr_seaice_conc_monthly']==47,0)
ph_current = qa_2008_09.where(qa_2008_09==0,1)

In [21]:
#calculate the SIA of the pole hole - using the edge interpolated data
#each grid cell is 25x25 km so 125/1000000 million square km

#from the interpolated CDR data only select the grid cells containing the pole hole
ph_SIA_1987_07 = (CDR_filled.sel(time=slice('1979-01','1987-07')) * ph_1987_07.values).sum('xgrid').sum('ygrid')*625*1e-6
ph_SIA_2007_12 = (CDR_filled.sel(time=slice('1987-08','2007-12')) * ph_2007_12.values).sum('xgrid').sum('ygrid')*625*1e-6
ph_SIA_present = (CDR_filled.sel(time=slice('2008-01','2020-12')) * ph_current.values).sum('xgrid').sum('ygrid')*625*1e-6

#SIE of pole hole is just the area of the pole hole
                  #1978-11--1987-07, 1987-08--2007-12, 2008-01--present
#pole_hole_SIE = [1.19,              0.31,             0.029]

ph_SIE_1987_07 = (ph_SIA_1987_07['CDR'] * 0) + 1.19
ph_SIE_2007_12 = (ph_SIA_2007_12['CDR'] * 0) + 0.31
ph_SIE_present = (ph_SIA_present['CDR'] * 0) + 0.029

In [22]:
#combine the data into a single dataset and save to NetCDF
ph_SIA = xr.concat((ph_SIA_1987_07, ph_SIA_2007_12, ph_SIA_present), dim='time')
ph_SIE = xr.concat((ph_SIE_1987_07, ph_SIE_2007_12, ph_SIE_present), dim='time')

ph_SIA_SIE = xr.Dataset({'CDR_SIA':ph_SIA['CDR'], 'BT_SIA':ph_SIA['BT'], 'NT_SIA':ph_SIA['NT'] ,'SIE':ph_SIE})

ph_SIA_SIE.attrs = {'Description': 'Arctic sea ice area (SIA) and sea ice extent (SIE) of the pole hole from NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 4. All months 1979-2020.', 
                    'Units'      : 'million square km',
                    'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                    'Data source': 'NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 4, doi:10.7265/efmz-2t65.',
                    'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

ph_SIA_SIE.to_netcdf(data_path+'Raw_data/observations/NSIDC_CDR_v4/pole_hole_SIA_edge_CDR_BT_NT_79-20.nc')

# HadISST 1 

## Reduce the dataset to 1979-2020 >30N and replace 2009-03 with 2007-03 and 2009-04 with 2008-04
N.B. March and April 2009 have a large drops in SIA and SIE (notably large negative SIC anomalies in Hudson Bay, Labrador Sea and sea of Okhotsk), this discontinutiy is not shown in other datasets.

In [4]:
HadISST1 = xr.open_dataset(data_path+'Raw_data/observations/HadISST/HadISST_ice.nc') #open the raw dataset

HadISST1_30N = HadISST1['sic'].sel(time=slice('1979','2020')).where(HadISST1['latitude']>30, drop=True) #select 1979-2020 and the area above 30N
HadISST1_30N['time'] = CLIVAR_time #adjust the time of mid-month to exactly match the time in the CLIVAR models

HadISST1_30N.to_netcdf(data_path+'Raw_data/observations/HadISST/HadISST1_NH_79-20.nc') #save to NetCDF

In [66]:
#fill the spurious data with the most appropriate nearby year's data
HadISST1_2009_03 = HadISST1_30N['sic'].sel(time='2007-03').copy()
HadISST1_2009_03['time'] = xr.DataArray(data = CLIVAR_time.sel(time='2009-03').values, coords={'time': CLIVAR_time.sel(time='2009-03').values}, dims=['time'])

HadISST1_2009_04 = HadISST1_30N['sic'].sel(time='2008-04').copy()
HadISST1_2009_04['time'] = xr.DataArray(data = CLIVAR_time.sel(time='2009-04').values, coords={'time': CLIVAR_time.sel(time='2009-04').values}, dims=['time'])

HadISST1_30N_correct = xr.concat((HadISST1_30N['sic'].sel(time=slice('1979','2009-02')), HadISST1_2009_03, HadISST1_2009_04, 
                                  HadISST1_30N['sic'].sel(time=slice('2009-05','2020'))), dim='time')

#save this corrected data to NetCDF
HadISST1_30N_correct.to_netcdf(data_path+'Raw_data/observations/HadISST/HadISST1_NH_79-20_filled.nc') #save to NetCDF

## From the corrected dataset and area file, compute the SIA and SIE
N.B. Concentrations <0.15 are set to 0 this will affect SIA but not SIE

In [72]:
#open area file created from: cdo gridarea -selgrid,2 HadISST_ice.nc HadISST_ice_area.nc
HadISST1_areas = xr.open_dataset(data_path+'Raw_data/observations/HadISST/HadISST_ice_area.nc')
HadISST1_areas_NH = HadISST1_areas['cell_area'].where(HadISST1_areas['latitude']>30,drop=True) #select >30N

In [73]:
#compute SIA and SIE
NH_SIA = (HadISST1_30N_correct * HadISST1_areas_NH / 1e12).sum('latitude').sum('longitude') 
NH_SIE = HadISST1_areas_NH.where(HadISST1_30N_correct>=0.15,0).sum('latitude').sum('longitude') / 1e12

#save calculations to NetCDF
HadISST1_SIA_SIE = xr.Dataset({'SIA' : NH_SIA, 'SIE' : NH_SIE})

HadISST1_SIA_SIE.attrs = {'Description': 'Arctic sea ice area (SIA) and sea ice extent (SIE) from HadISST1 for all months 1979-2020, calculated using a grid area file from CDO. Note large negative SIE and SIA anomalies for 2009-03 and 2009-04 are filled with 2007-03 and 2008-04 values.', 
                          'Units'      : 'million square km',
                          'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                          'Data source': 'Hadley Centre Sea Ice and Sea Surface Temperature data set (HadISST), doi:10.1029/2002JD002670',
                          'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

HadISST1_SIA_SIE.to_netcdf(data_path+'Raw_data/observations/HadISST/HadISST1_SIA_SIE_79-20_filled.nc')

# HadISST2.2.0.0 - Not currently used, data beyond 2020-08 is missing

In [163]:
# HadISST2 = xr.open_dataset(data_path+'Raw_data/observations/HadISST2/HadISST.2.2.0.0_sea_ice_concentration.nc')
# HadISST2_79_20 = HadISST2['sic'].sel(time=slice('1979','2020')).where(HadISST2['latitude']>30, drop=True)

# Merged Hadley-OI
## Make a reduced dataset for 1979-2020, fill land with `np.nan` using land mask from HadISST1

In [5]:
#open the dataset
Merged = xr.open_dataset(data_path+'Raw_data/observations/merged_Hadley_OI/MODEL.ICE.HAD187001-198110.OI198111-202103.nc')
Merged_30N = Merged['SEAICE'].sel(time=slice('1979','2020')).where(Merged['lat']>30,drop=True)

#change lat and lon dimension to match HadISST1
Merged_new_coords = Merged_30N.copy()
Merged_new_coords['lon'] = np.where(Merged_30N['lon']<180, Merged_30N['lon'], Merged_30N['lon']-360)
Merged_new_coords_reversed = Merged_new_coords.sortby('lon')
Merged_new_coords_reversed = Merged_new_coords_reversed.transpose('lat', 'lon', 'time')
Merged_new_coords_reversed = Merged_new_coords_reversed.rename({'lat':'latitude', 'lon':'longitude'})

#maks the land using HadISST1 np.nan values
HadISST1_30N = xr.open_dataset(data_path+'Raw_data/observations/HadISST/HadISST1_NH_79-20.nc') 
Merged_nans = Merged_new_coords_reversed.where(HadISST1_30N>-1)
#correct time to standard mid-month
Merged_nans['time'] = CLIVAR_time

## Correct spurious dates:
- 2009-02 to 2010-02
- 2009-03 to 2007-03 (similarly to HadISST1)
- 2009-04 to 2008-04 (similarly to HadISST1)
- 2009-05 to 2010-05

In [49]:
#fill the spurious data with nearby year's data
merged_2009_02 = Merged_nans['sic'].sel(time='2010-02').copy()
merged_2009_02['time'] = xr.DataArray(data = CLIVAR_time.sel(time='2009-02').values, coords={'time': CLIVAR_time.sel(time='2009-02').values}, dims=['time'])

merged_2009_03 = Merged_nans['sic'].sel(time='2007-03').copy()
merged_2009_03['time'] = xr.DataArray(data = CLIVAR_time.sel(time='2009-03').values, coords={'time': CLIVAR_time.sel(time='2009-03').values}, dims=['time'])

merged_2009_04 = Merged_nans['sic'].sel(time='2008-04').copy()
merged_2009_04['time'] = xr.DataArray(data = CLIVAR_time.sel(time='2009-04').values, coords={'time': CLIVAR_time.sel(time='2009-04').values}, dims=['time'])

merged_2009_05 = Merged_nans['sic'].sel(time='2010-05').copy()
merged_2009_05['time'] = xr.DataArray(data = CLIVAR_time.sel(time='2009-05').values, coords={'time': CLIVAR_time.sel(time='2009-05').values}, dims=['time'])

merged_correct = xr.concat((Merged_nans['sic'].sel(time=slice('1979','2009-01')), merged_2009_02, merged_2009_03, merged_2009_04, merged_2009_05, 
                            Merged_nans['sic'].sel(time=slice('2009-06','2020'))), dim='time')

In [50]:
#save the corrected SIC file to NetCDF
merged_correct.attrs = {'Description': 'Arctic sea ice concentration (SIC) from Merged Hadley-OI for all months 1979-2020, land masked using HadISST1 land mask. 2009-03 replaced with 2007-03 and 2009-04 replaced with 2008-04 due to spurious data.', 
                        'Units'      : 'million square km',
                        'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                        'Data source': 'Merged Hadley-OI sea surface temperature and sea ice concentration data set, version 2.0, doi:10.5065/r33v-sv91',
                        'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

merged_correct.to_netcdf(data_path+'Raw_data/observations/merged_Hadley_OI/merged_Hadley_OI_SIC_79-20.nc')

## Calculate SIA and SIE

In [52]:
#use the same area file as for HadISST1 
Merged_areas = xr.open_dataset(data_path+'Raw_data/observations/HadISST/HadISST_ice_area.nc')
Merged_areas_NH = Merged_areas['cell_area'].where(Merged_areas['latitude']>30,drop=True) #select >30N

#compute SIA and SIE
NH_SIA = (merged_correct * Merged_areas_NH.sortby('latitude')).sum('latitude').sum('longitude') / 1e14 #divide by 1e14 from % m2 to million km2
NH_SIE = Merged_areas_NH.sortby('latitude').where(merged_correct>=15,0).sum('latitude').sum('longitude') / 1e12

#save calculations to NetCDF
Merged_SIA_SIE = xr.Dataset({'SIA':NH_SIA, 'SIE':NH_SIE})

Merged_SIA_SIE.attrs = {'Description': 'Arctic sea ice area (SIA) and sea ice extent (SIE) from Merged Hadley-OI for all months 1979-2020, calculated using a grid area file from CDO. 2009-03 replaced with 2007-03 and 2009-04 replaced with 2008-04 due to spurious data.', 
                        'Units'      : 'million square km',
                        'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                        'Data source': 'Merged Hadley-OI sea surface temperature and sea ice concentration data set, version 2.0, doi:10.5065/r33v-sv91',
                        'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

Merged_SIA_SIE.to_netcdf(data_path+'Raw_data/observations/merged_Hadley_OI/merged_Hadley_OI_SIA_SIE_79-20.nc')

# NSIDC Sea Ice Index V3
**N.B. Uses NASA Team (NT) algorithm data**
- Set pole hole to the concentration found in NT (SIC of the surrounding grid cells to the edge)
- Use following years to fill missing data for 1987-12, 1988-01, 1987-08 filled from % change fron NT SIC calculations

In [12]:
#load all CSV files into xarray objects and assign coordinates
SIA_list = []
SIE_list = []

for i in np.arange(1,13):
    SII_month = xr.Dataset(pd.read_csv(data_path+'Raw_data/observations/NSIDC_sea_ice_index_v3/N_{}_extent_v3.0.csv'.format(str(i).zfill(2))))
    SII_month_SIA = SII_month['   area'].where(SII_month['   area']>0) #set missing values from -9999 to np.nan
    SII_month_SIE = SII_month[' extent'].where(SII_month[' extent']>0)
    
    SII_month_SIA['dim_0'] = np.arange('{}-{}'.format(SII_month['year'].min().values, str(i).zfill(2)), '{}-{}'.format(SII_month['year'].max().values+1, str(i).zfill(2)), 
                                       np.timedelta64(1,'Y'), dtype='datetime64[M]')
    SIA_list.append(SII_month_SIA.rename({'dim_0':'time'}))
    
    SII_month_SIE['dim_0'] = np.arange('{}-{}'.format(SII_month['year'].min().values, str(i).zfill(2)), '{}-{}'.format(SII_month['year'].max().values+1, str(i).zfill(2)), 
                                       np.timedelta64(1,'Y'), dtype='datetime64[M]')
    SIE_list.append(SII_month_SIE.rename({'dim_0':'time'}))

SIA_list_xr = xr.concat(SIA_list, dim='time')
SIE_list_xr = xr.concat(SIE_list, dim='time')

SIA_list_xr = SIA_list_xr.sortby('time') #change from ordered by month then year to chronological
SIE_list_xr = SIE_list_xr.sortby('time')

SIA_SIE_no_pole = xr.Dataset({'SIA': SIA_list_xr, 'SIE':SIE_list_xr})
SIA_SIE_no_pole = SIA_SIE_no_pole.sel(time=slice('1979','2020'))

SIA_SIE_no_pole['time'] = CLIVAR_time #adjust the times from start of month by default to mid-month

In [27]:
#fill missing months using NT SIA and SIE data calculated from SIC
ph_SIA_SIE = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_CDR_v4/pole_hole_SIA_edge_CDR_BT_NT_79-20.nc')
CDR_SIA_SIE = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_CDR_v4/SIA_SIE_CDR_BT_NT_79-20_filled.nc')

NT_SIA_no_ph = CDR_SIA_SIE['NT_SIA'] - ph_SIA_SIE['NT_SIA'] #calculate NT SIA without the pole hole to compare with original sea ice index data
NT_SIE_no_ph = CDR_SIA_SIE['NT_SIE'] - ph_SIA_SIE['SIE'] 

#SIA filled
SIA_no_pole_1987_08 = SIA_SIE_no_pole['SIA'].sel(time='1987-07').values * 1+((NT_SIA_no_ph.sel(time='1987-08').values - NT_SIA_no_ph.sel(time='1987-07').values) / NT_SIA_no_ph.sel(time='1986-07').values)
SIA_no_pole_1987_08 = xr.DataArray(data = SIA_no_pole_1987_08, coords={'time': CLIVAR_time.sel(time='1987-08').values}, dims=['time'])

SIA_no_pole_1987_12 = xr.DataArray(data = SIA_SIE_no_pole['SIA'].sel(time='1988-12').copy(), coords={'time': CLIVAR_time.sel(time='1987-12').values}, dims=['time'])

SIA_no_pole_1988_01 = xr.DataArray(data = SIA_SIE_no_pole['SIA'].sel(time='1989-01').copy(), coords={'time': CLIVAR_time.sel(time='1988-01').values}, dims=['time'])

SIA_interp_no_pole = xr.concat((SIA_SIE_no_pole['SIA'].sel(time=slice('1979-01','1987-07')), SIA_no_pole_1987_08, SIA_SIE_no_pole['SIA'].sel(time=slice('1987-09','1987-11')), 
                                SIA_no_pole_1987_12, SIA_no_pole_1988_01, SIA_SIE_no_pole['SIA'].sel(time=slice('1988-02','2020-12'))), dim='time')

#SIE filled
SIE_no_pole_1987_12 = xr.DataArray(data = SIA_SIE_no_pole['SIE'].sel(time='1988-12').copy(), coords={'time': CLIVAR_time.sel(time='1987-12').values}, dims=['time'])
SIE_no_pole_1988_01 = xr.DataArray(data = SIA_SIE_no_pole['SIE'].sel(time='1989-01').copy(), coords={'time': CLIVAR_time.sel(time='1988-01').values}, dims=['time'])

SIE_interp_no_pole = xr.concat((SIA_SIE_no_pole['SIE'].sel(time=slice('1979-01','1987-11')), SIE_no_pole_1987_12, SIE_no_pole_1988_01, 
                                SIA_SIE_no_pole['SIE'].sel(time=slice('1988-02','2020-12'))), dim='time')

In [28]:
#add the pole hole
SIA_interp = SIA_interp_no_pole + ph_SIA_SIE['NT_SIA']
SIE_interp = SIE_interp_no_pole + ph_SIA_SIE['SIE']

SIA_SIE_interp_index = xr.Dataset({'SIA':SIA_interp, 'SIE':SIE_interp})

In [33]:
#save the sea ice index to NetCDF
SIA_SIE_interp_index.attrs = {'Description': 'Arctic sea ice area (SIA) and sea ice extent (SIE) including pole hole for NSIDC sea ice index vesion 3. All months 1979-2020, missing month 1987-08 filled from NOAA/NSIDC NT SIC data, 1987-12 and 1988-01 filled with data from the following (1988-12, 1989-01).', 
                              'Units'      : 'million square km',
                              'Timestamp'  : str(datetime.datetime.utcnow().strftime("%H:%M UTC %a %Y-%m-%d")),
                              'Data source': 'NSIDC Sea Ice Index, Version 3,  doi:10.7265/N5K072F8.',
                              'Analysis'   : 'https://github.com/chrisrwp/synthetic-ensemble/SIA/SIA_calculations_observations.ipynb'}

SIA_SIE_interp_index.to_netcdf(data_path+'Raw_data/observations/NSIDC_sea_ice_index_v3/NSIDC_sea_ice_index_SIA_SIE_79-20_filled_including_pole_hole.nc')

# Compare all SIA and SIE from the different datasets

In [53]:
#open the SIA and SIE data sets
CDR_SIA_SIE.close()
SIA_SIE_interp_index.close()
HadISST1_SIA_SIE.close()
Merged_SIA_SIE.close()

CDR_SIA_SIE = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_CDR_v4/SIA_SIE_CDR_BT_NT_79-20_filled.nc')
SIA_SIE_interp_index = xr.open_dataset(data_path+'Raw_data/observations/NSIDC_sea_ice_index_v3/NSIDC_sea_ice_index_SIA_SIE_79-20_filled_including_pole_hole.nc')
HadISST1_SIA_SIE = xr.open_dataset(data_path+'Raw_data/observations/HadISST/HadISST1_SIA_SIE_79-20_filled.nc')
Merged_SIA_SIE = xr.open_dataset(data_path+'Raw_data/observations/merged_Hadley_OI/merged_Hadley_OI_SIA_SIE_79-20.nc')

In [54]:
#compute the average from all data sets
average_SIA = (CDR_SIA_SIE['CDR_SIA'] + CDR_SIA_SIE['NT_SIA'] + CDR_SIA_SIE['BT_SIA'] + SIA_SIE_interp_index['SIA'] + HadISST1_SIA_SIE['SIA'] + Merged_SIA_SIE['SIA']) / 6
average_SIE = (CDR_SIA_SIE['CDR_SIE'] + CDR_SIA_SIE['NT_SIE'] + CDR_SIA_SIE['BT_SIE'] + SIA_SIE_interp_index['SIE'] + HadISST1_SIA_SIE['SIE'] + Merged_SIA_SIE['SIE']) / 6

In [30]:
import matplotlib.pyplot as plt
month_list = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 
              'August', 'September', 'October', 'November', 'December']

In [59]:
#plot a single month time series of SIA or SIE
month_ = 3 
SIE_SIA = 'A'

plt.figure(figsize=[10,5])
CDR_SIA_SIE['CDR_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_).plot(label='CDR')
CDR_SIA_SIE['BT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_).plot(label='BT')
CDR_SIA_SIE['NT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_).plot(label='NT')
SIA_SIE_interp_index['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_).plot(label='SII')
HadISST1_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=HadISST1_SIA_SIE['time.month']==month_).plot(label='HadISST1')
Merged_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=Merged_SIA_SIE['time.month']==month_).plot(label='Hadley OI')

average_SIA.sel(time=average_SIA['time.month']==month_).plot(label='Mean', c='k', linewidth=2)

plt.legend(fontsize=12)
plt.xlim(np.datetime64('1979-01'), np.datetime64('2020-12'))
plt.xticks(fontsize=12)
plt.xlabel('Time', fontsize=16)

plt.ylabel(r'$SI{} \ [10^6 \ km^2]$'.format(SIE_SIA), fontsize=16)
plt.yticks(fontsize=12)

plt.title('{} SI{} 1979-2020'.format(month_list[month_-1], SIE_SIA), fontsize=18);

In [60]:
#plot a single month trend plot for all data sets
month_ = 3 
SIE_SIA = 'A'

plt.figure(figsize=[10,5])

coefs = np.polyfit(np.arange(1979,2021), CDR_SIA_SIE['CDR_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='CDR')

coefs = np.polyfit(np.arange(1979,2021), CDR_SIA_SIE['BT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='BT')

coefs = np.polyfit(np.arange(1979,2021), CDR_SIA_SIE['NT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='NT')

coefs = np.polyfit(np.arange(1979,2021), SIA_SIE_interp_index['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='SII')

coefs = np.polyfit(np.arange(1979,2021), HadISST1_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='HadISST1')

coefs = np.polyfit(np.arange(1979,2021), Merged_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='Hadely IO')

coefs = np.polyfit(np.arange(1979,2021), average_SIA.sel(time=average_SIA['time.month']==month_), 1)
plt.plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='Average', c='k', linewidth=2)


plt.legend(fontsize=12)
plt.xlim(1979,2020)
plt.xticks(fontsize=12)
plt.xlabel('Time', fontsize=16)

plt.ylabel(r'$SI{} \ [10^6 \ km^2]$'.format(SIE_SIA), fontsize=16)
plt.yticks(fontsize=12)

plt.title('{} SI{} 1979-2020'.format(month_list[month_-1], SIE_SIA), fontsize=18);

In [None]:
#plot for all of months SIA or SIE time series

s_y = [0,1,2,0,1,2,0,1,2,0,1,2] #axes counting
s_x = [0,0,0,1,1,1,2,2,2,3,3,3]

SIE_SIA = 'E' #select either E for extent or A for area

if SIE_SIA == 'E':
    str_name = 'Extent'
    ave = average_SIE.copy()
else:
    str_name = 'Area'
    ave = average_SIA.copy()

fig, axes = plt.subplots(4,3,figsize=[19,12])

for month_i, month_ in enumerate(np.arange(1,13,1)):
    #for each month plot each of the datasets
    CDR_SIA_SIE['CDR_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_).plot(label='CDR', ax=axes[s_x[month_i]][s_y[month_i]])
    CDR_SIA_SIE['BT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_).plot(label='BT', ax=axes[s_x[month_i]][s_y[month_i]])
    CDR_SIA_SIE['NT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_).plot(label='NT', ax=axes[s_x[month_i]][s_y[month_i]])
    SIA_SIE_interp_index['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_).plot(label='SII', ax=axes[s_x[month_i]][s_y[month_i]])
    HadISST1_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=HadISST1_SIA_SIE['time.month']==month_).plot(label='HadISST1', ax=axes[s_x[month_i]][s_y[month_i]])
    Merged_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=Merged_SIA_SIE['time.month']==month_).plot(label='Hadley OI', ax=axes[s_x[month_i]][s_y[month_i]])
        
    ave.sel(time=ave['time.month']==month_).plot(label='Mean', c='k', linewidth=2, ax=axes[s_x[month_i]][s_y[month_i]], linestyle='--')
    
    
    axes[s_x[month_i]][s_y[month_i]].set_xlim(np.datetime64('1979-{}'.format(str(month_).zfill(2))), 
                                              np.datetime64('2020-{}'.format(str(month_).zfill(2)))) #set the x-axis limits
    axes[s_x[month_i]][s_y[month_i]].set_title(month_list[month_-1], fontsize=16)
    axes[s_x[month_i]][s_y[month_i]].set_xlabel('')
    
    if s_x[month_i] == 3:
        axes[s_x[month_i]][s_y[month_i]].set_xlabel('Year', fontsize=14)
    if s_y[month_i] == 0:
        axes[s_x[month_i]][s_y[month_i]].set_ylabel(r'$Sea \ Ice \ {} \ [10^6 \ km^2]$'.format(str_name), fontsize=14)
        
    plt.tight_layout()
    
extra_legend = plt.legend(bbox_to_anchor=(1.2, 1), loc='upper center', borderaxespad=0, ncol=1, fontsize=14)
plt.gca().add_artist(extra_legend);

In [None]:
#plot trends of SIA or SIE for all months
s_y = [0,1,2,0,1,2,0,1,2,0,1,2]
s_x = [0,0,0,1,1,1,2,2,2,3,3,3]

SIE_SIA = 'A'

if SIE_SIA == 'E':
    str_name = 'Extent'
    ave = average_SIE.copy()
else:
    str_name = 'Area'
    ave = average_SIA.copy()

fig, axes = plt.subplots(4,3,figsize=[19,12])

for month_i, month_ in enumerate(np.arange(1,13,1)):
    #for each month calculate the trend for each data set and plot it
    coefs = np.polyfit(np.arange(1979,2021), CDR_SIA_SIE['CDR_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='CDR')

    coefs = np.polyfit(np.arange(1979,2021), CDR_SIA_SIE['BT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='BT')

    coefs = np.polyfit(np.arange(1979,2021), CDR_SIA_SIE['NT_SI{}'.format(SIE_SIA)].sel(time=CDR_SIA_SIE['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='NT')

    coefs = np.polyfit(np.arange(1979,2021), SIA_SIE_interp_index['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='SII')

    coefs = np.polyfit(np.arange(1979,2021), HadISST1_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='HadISST1')

    coefs = np.polyfit(np.arange(1979,2021), Merged_SIA_SIE['SI{}'.format(SIE_SIA)].sel(time=SIA_SIE_interp_index['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='Hadely IO')

    coefs = np.polyfit(np.arange(1979,2021), ave.sel(time=ave['time.month']==month_), 1)
    axes[s_x[month_i]][s_y[month_i]].plot(np.arange(1979,2021), np.arange(1979,2021)*coefs[0] + coefs[1], label='Average', c='k', linewidth=2, linestyle='--')
    
    axes[s_x[month_i]][s_y[month_i]].set_xlim(1979,2020)
    axes[s_x[month_i]][s_y[month_i]].set_title(month_list[month_-1], fontsize=16)
    axes[s_x[month_i]][s_y[month_i]].set_xlabel('')
    
    if s_x[month_i] == 3:
        axes[s_x[month_i]][s_y[month_i]].set_xlabel('Year', fontsize=14)
    if s_y[month_i] == 0:
        axes[s_x[month_i]][s_y[month_i]].set_ylabel(r'$Sea \ Ice \ {} \ [10^6 \ km^2]$'.format(str_name), fontsize=14)
        
    plt.tight_layout()
    
extra_legend = plt.legend(bbox_to_anchor=(1.2, 1), loc='upper center', borderaxespad=0, ncol=1, fontsize=14)
plt.gca().add_artist(extra_legend);