In [3]:
# Quick hack to put us in the root of the repository/pipeline directory
import os
if os.path.exists("02.data_and_forecasts.ipynb"):
    os.chdir("..")

# IceNet Data Analysis

## Context

### Purpose
The IceNet library provides the ability to download, process, train and predict from end to end via a set of command-line interfaces.

Using this notebook one can understand the various data sources, intermediaries and products that arise from the [CLI demonstrator notebook](01.cli_demonstration.ipynb) activities.

### Modelling approach
This modelling approach allows users to immediately utilise the library for producing sea ice concentraion forecasts.

### Highlights
The key features of an end to end run are: 
* Setup: _this was concerned with setting up the conda environment, which remains the same_
* [Download](#Download) 
* [Process](#Process)
* [Train](#Train)
* [Predict](#Predict)

_This follows the same structure as the CLI demonstration notebook so that it's easy to follow step-by-step..._

### Contributions
#### Notebook
James Byrne (author)

__Please raise issues [in this repository](https://github.com/antarctica/IceNet-Pipeline) to suggest updates to this notebook!__ 

Contact me at _jambyr \<at\> bas.ac.uk_ for anything else...

#### Modelling codebase
James Byrne (code author), Tom Andersson (science author)

#### Modelling publications
Andersson, T.R., Hosking, J.S., Pérez-Ortiz, M. et al. Seasonal Arctic sea ice forecasting with probabilistic deep learning. Nat Commun 12, 5124 (2021). https://doi.org/10.1038/s41467-021-25257-4

#### Involved organisations
The Alan Turing Institute and British Antarctic Survey

## Setup

For the purposes of python analysis we use and provide the following header libraries which are heavily utilised within the IceNet project and the pipeline.

In [6]:
import glob, os, sys
import numpy as np, pandas as pd, xarray as xr

## Download

Downloading data using the icenet_data commands produces a dataset specific input data storage directory called `/data` whose source data can be reused across normalisation (`icenet_process*`) and dataset production (`icenet_dataset*`) runs.

In [8]:
os.listdir("data")

['osisaf', 'era5', 'mars.hres', 'masks']

The structure of these directories (aside from masks) have consistent layouts:

In [9]:
!find data/era5 -maxdepth 3 -type d
!ls -l /data/era5/sh/tas

data/era5
data/era5/sh
data/era5/sh/tas
data/era5/sh/tas/1990
data/era5/sh/tas/1991
data/era5/sh/tas/1992
data/era5/sh/tas/1993
data/era5/sh/tas/1994
data/era5/sh/tas/1995
data/era5/sh/tas/1996
data/era5/sh/tas/1997
data/era5/sh/tas/1998
data/era5/sh/tas/1999
data/era5/sh/tas/2021
data/era5/sh/tas/2000
data/era5/sh/tas/2001
data/era5/sh/tas/2002
data/era5/sh/tas/2003
data/era5/sh/tas/2004
data/era5/sh/tas/2005
data/era5/sh/tas/2006
data/era5/sh/tas/2007
data/era5/sh/tas/2008
data/era5/sh/tas/2009
data/era5/sh/tas/1979
data/era5/sh/tas/1980
data/era5/sh/tas/1981
data/era5/sh/tas/1982
data/era5/sh/tas/1983
data/era5/sh/tas/1984
data/era5/sh/tas/1985
data/era5/sh/tas/1986
data/era5/sh/tas/1987
data/era5/sh/tas/1988
data/era5/sh/tas/1989
data/era5/sh/tas/2010
data/era5/sh/tas/2011
data/era5/sh/tas/2012
data/era5/sh/tas/2013
data/era5/sh/tas/2014
data/era5/sh/tas/2015
data/era5/sh/tas/2016
^C


In [None]:
## Process

In [None]:
## Train

In [None]:
## Predict

In [1]:
## Random stuff

```python
from dask.distributed import Client
dfs = glob.glob("data/era5/sh/tos/**/19*.nc") + glob.glob("data/era5/sh/tos/**/20*.nc")
client = Client()
ds = xr.open_mfdataset(dfs, combine="nested", concat_dim="time", parallel=True)
a = ds.groupby("time.year").max("time").max(("yc", "xc"))
m = a.compute()
    
for DAME in $( find . -name '199*' -a -type f ); do NEWNAME=`echo "$DAME" | sed -r 's#(\/)(....\_)#\1max_\2#'`; echo mv $DAME $NEWNAME; done
```


## Plotting a forecast

TODO: convert

```python
from icenet2.plotting.video import xarray_to_video as xv
import xarray as xr
ds = xr.open_dataset("south_daily_forecast.nc")
fc = ds.sic_mean.isel(time=0).drop_vars("time").rename(dict(leadtime="time"))
fc['time'] = [dt.datetime(2022,1,3) + dt.timedelta(days=int(e)) for e in fc.time.values]
xv(fc, 15, "south_daily_forecast.mp4")
```


## Video processing of datasets

```bash
icenet_video_data -w 4 -v osisaf
icenet_video_data -w 4 -n -sy -p processed/north_10 -v era5,osisaf
```

```
import glob, os, xarray as xr

for vn in ("zg250", "zg500", "tos"):
    dfs = glob.glob("data/era5/sh/{}/**/19*.nc".format(vn)) + glob.glob("data/era5/sh/{}/**/20*.nc".format(vn))
    for y in range(1979, 2020):
        ds = [d for d in dfs if os.path.basename(d).startswith(str(y))]
        print("{} {} {} files".format(vn, str(y), len(ds)))
        if len(ds) > 0:
            ds2 = xr.open_mfdataset(ds, parallel=True)
            k = list(ds2.data_vars)[0]
            d_min, d_max = getattr(ds2, k).min(), getattr(ds2, k).max()
            print("\t\t\t{} {} {:.2f} {:.2f}".format(y, k, float(d_min.compute()), float(d_max.compute())))
    
    
```

## Version
- Notebook:
- Codebase: