# Direct Access to NSIDC Bootstrap Sea Ice Concentrations 

This notebook demonstrates how to load the NSIDC Bootstrap Sea Ice Concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS dataset directly into memory.  The data can be accessed via HTTPS.  Here, I use the `earthaccess` package to search for and open the dataset directly into memory as an `xarray.Dataset`.  Data does not have to downloaded.

_Direct Access_ is acheived using the `fsspec` package which creates a virtual filesystem for the data files.  The dataset is then displayed using `hvplot`.

The dataset landing page, short name and DOI are given below.

Landing Page: https://nsidc.org/data/nsidc-0079/versions/4
Short Name: NSIDC-0079
DOI: 10.5067/X5LG68MH013O

In [1]:
import earthaccess

import xarray as xr
import hvplot.xarray

  from .autonotebook import tqdm as notebook_tqdm


## Search for files using `earthaccess`

`earthaccess` is first used to authenticate (login).  You need an EarthData Login account.  I store my credentials in a `.netrc` file, so I do not have to enter them.  If you do not have this set up, you can enter your username and password when prompted.

I search for two years of data for the northern hemisphere.

In [2]:
%%time 

auth = earthaccess.login()  # Authenticate - I store my files in a .netrc

result = earthaccess.search_data(
    doi = "10.5067/X5LG68MH013O",
    version = 4, 
    temporal = ('2019-03-01', '2021-03-31'),
    bounding_box = (-180., 60., 180., 90.),
)

Granules found: 770
CPU times: user 177 ms, sys: 25.7 ms, total: 203 ms
Wall time: 7.15 s


## Load data into an `xarray.Dataset`

Passing the search result to `earthaccess.open` creates a virtual file system.  This dataset is currently stored in the NSIDC DAAC Data Center.  It will be moved to the cloud in the near future.  This virtual file system enables direct access to the data, essentially creating a bunch of file-like objects in memory.  We don't have to download the files.

These file-like objects are passed to `xarray`, where they are concatenated into a single `Dataset`.

In [3]:
%%time

files = earthaccess.open(result)
ds = xr.open_mfdataset(files, decode_coords='all')
ds

Opening 770 granules, approx size: 0.05 GB


QUEUEING TASKS | : 770it [00:00, 153360.28it/s]
PROCESSING TASKS | : 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 770/770 [02:46<00:00,  4.63it/s]
COLLECTING RESULTS | : 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 770/770 [00:00<00:00, 368089.14it/s]


CPU times: user 29.1 s, sys: 1.14 s, total: 30.2 s
Wall time: 17min 12s


Unnamed: 0,Array,Chunk
Bytes,775.14 MiB,1.04 MiB
Shape,"(746, 448, 304)","(1, 448, 304)"
Dask graph,746 chunks in 1493 graph layers,746 chunks in 1493 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 775.14 MiB 1.04 MiB Shape (746, 448, 304) (1, 448, 304) Dask graph 746 chunks in 1493 graph layers Data type float64 numpy.ndarray",304  448  746,

Unnamed: 0,Array,Chunk
Bytes,775.14 MiB,1.04 MiB
Shape,"(746, 448, 304)","(1, 448, 304)"
Dask graph,746 chunks in 1493 graph layers,746 chunks in 1493 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [8]:
print(f"We have {ds.nbytes / 1e6} MB of data loaded in memory")

We have 812.805841 MB of data loaded in memory


## Fixing non-concentration values

The ice concentration data variable contains flag values as well as valid ice concentration values.  To avoid confusion and mistaking non-valid values for actual concentrations, it is better to set these to Not-A-Number (NANs) before we do any analysis.

In [4]:
ds = ds.where(ds.F17_ICECON <= 1.)

## Plot the data

`hvplot` is a great tool for displaying time series of gridded data.  By passing a `groupby` keyword, we can get a slider that allows us to scroll through the timesteps.

_TO DO_

- add a crs to the data - load `rioxarray`
- plot with Land
- 

In [9]:
ds.hvplot(groupby='time', width=700, height=700, vmin=0.15)

