# CAST
Collocated
Allong
Satellite
Track

### ToDo
- [x] convert ADM dictionary into a xarray dataset
- [x] read model data into a xarray dataset
- collocate model to ADM grid
  - [x] lon/lat
  - [x] time (not using right now)
  - [x] altitude/elevation
- [x] save to netcdt
- plotting, statistics on a different notebook

# Conda environment at PPI/lustre

```bash
# load right (ana)conda module
module load aerocom-py/aerocom-griesie_work

# activate coda environment
source `which conda | sed 's:bin/conda:etc/profile.d/conda.sh:'`
conda activate coda.2.19

# start notebook
jupyter notebook --no-browser --ip=$HOSTNAME.met.no

```

In [1]:
import numpy as np
import pandas as pd
import xarray as xr

for m in [np, pd, xr]:
    print(m.__name__, m.__version__)

numpy 1.14.3
pandas 0.23.4
xarray 0.10.8


## ADM pyaerocom tools

In [2]:
from pyaerocom.io.read_aeolus_l2b_data import ReadAeolusL2bData
ADM = ReadAeolusL2bData(verbose=True)

### Read dataset
All datasets on `$CODA_DEFINITION/download/`

In [3]:
%time ADM.read(vars_to_read=['ec550aer'])

2018-09-03 10:06:41,259:INFO:searching for data files. This might take a while...
2018-09-03 10:06:41,262:INFO:time for file find: 0.003
2018-09-03 10:06:41,380:INFO:reading file /lustre/storeA/project/aerocom/aerocom1/ADM_CALIPSO_TEST/download/AE_OPER_ALD_U_N_2A_20070101T002249149_002772000_003606_0001.DBL
2018-09-03 10:06:41,751:INFO:time for single file read [s]: 0.371
2018-09-03 10:06:41,752:INFO:reading file /lustre/storeA/project/aerocom/aerocom1/ADM_CALIPSO_TEST/download/AE_OPER_ALD_U_N_2A_20070101T020142709_002772000_003607_0001.DBL
2018-09-03 10:06:42,287:INFO:time for single file read [s]: 0.535
2018-09-03 10:06:42,290:INFO:reading file /lustre/storeA/project/aerocom/aerocom1/ADM_CALIPSO_TEST/download/AE_OPER_ALD_U_N_2A_20070101T034035509_002772000_003608_0001.DBL
2018-09-03 10:06:43,066:INFO:time for single file read [s]: 0.776
2018-09-03 10:06:43,069:INFO:reading file /lustre/storeA/project/aerocom/aerocom1/ADM_CALIPSO_TEST/download/AE_OPER_ALD_U_N_2A_20070101T051928319_002

CPU times: user 21.5 s, sys: 136 ms, total: 21.6 s
Wall time: 23.5 s


## ADM to XArray

In [4]:
adm = xr.Dataset.from_dict(dict(
    coords = dict(
        time =  dict(
            dims = 'time', 
            data = pd.to_datetime(ADM.data[:,ReadAeolusL2bData._TIMEINDEX],unit='s'),
            attrs = {'long_name':'time'},
        ),
        lat =  dict(
            dims = 'time', 
            data = ADM.data[:,ReadAeolusL2bData._LONINDEX],
            attrs = {'long_name':'latitude', 'units':'degrees_north'},
        ),
        lon =  dict(
            dims = 'time', 
            data = ADM.data[:,ReadAeolusL2bData._LATINDEX],
            attrs = {'long_name':'longitude', 'units':'degrees_east'},
        ),
        alt =  dict(
            dims = 'time', 
            data = ADM.data[:,ReadAeolusL2bData._ALTITUDEINDEX],
            attrs = {'long_name':'altitude', 'units':'m'},
        ),
    ),
    dims = 'time', 
    data_vars = dict(
        ec550 =  dict(
            dims = 'time', 
            data = ADM.data[:,ReadAeolusL2bData._EC550INDEX],
            attrs = {'long_name':'ec550', 'units':'1'},
        ),
    ),
))

adm

<xarray.Dataset>
Dimensions:  (time: 79512)
Coordinates:
  * time     (time) datetime64[ns] 2007-01-01T00:22:49.346999884 ...
    lat      (time) float64 nan 173.4 173.4 173.4 173.4 173.4 173.4 173.4 ...
    lon      (time) float64 nan 72.33 72.33 72.33 72.33 72.33 72.33 72.33 ...
    alt      (time) float64 nan 3.047e+04 2.669e+04 2.417e+04 2.164e+04 ...
Data variables:
    ec550    (time) float64 nan 0.0 0.0 55.44 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...

## CAMS50 forecat to XArray

In [5]:
metproduction = '/lustre/storeB/project/metproduction/products/%s'
cams50 = dict(
    forecast = metproduction%'cwf_ctm/CWF_12FC-%s_hourInst.nc',
    analysis = metproduction%'cwf_ctm/CWF_00AN-%s_hourInst.nc',
    reanalysis = metproduction%'cwf_ctm/CWF_00RE-%s_hourInst.nc',
)

ncfile = cams50['forecast']%'20180606'
emep = xr.open_dataset(ncfile)
emep

<xarray.Dataset>
Dimensions:                (ilev: 9, lat: 369, lev: 8, lon: 301, time: 121)
Coordinates:
  * lon                    (lon) float64 -30.0 -29.75 -29.5 -29.25 -29.0 ...
  * lat                    (lat) float64 30.0 30.12 30.25 30.38 30.5 30.62 ...
  * lev                    (lev) float64 0.9946 0.9838 0.9703 0.9509 0.8932 ...
  * ilev                   (ilev) float64 0.9892 0.9784 0.9621 0.9396 0.8756 ...
  * time                   (time) datetime64[ns] 2018-06-05 ...
Data variables:
    P0                     float64 ...
    hyam                   (lev) float64 ...
    hybm                   (lev) float64 ...
    hyai                   (ilev) float64 ...
    hybi                   (ilev) float64 ...
    SURF_ug_O3             (time, lat, lon) float32 ...
    SURF_ug_NO2            (time, lat, lon) float32 ...
    SURF_ug_PM25_rh50      (time, lat, lon) float32 ...
    SURF_ug_PM10_rh50      (time, lat, lon) float32 ...
    SURF_ug_NO             (time, lat, lon) float32 

## Test data

In [6]:
lustre = "/lustre/storeA/users/alvarov/ADM/%s"
test = dict(
    hourly     = lustre%"rv417a_EMEP01_hourInst.nc",
    topography = lustre%"EMEP01_topo.nc",
)
emep = xr.open_dataset(test["hourly"]).rename({"Z_MID":"alt"}).set_coords("alt")

### Model altitude
model `Z_MID` + `topography`

In [7]:
topo = xr.open_dataset(test["topography"]).topography.isel(time=0)
topo["lon"] = emep.lon
topo["lat"] = emep.lat
emep["alt"] += topo
emep

<xarray.Dataset>
Dimensions:      (ilev: 21, lat: 520, lev: 20, lon: 1200, time: 120)
Coordinates:
  * lon          (lon) float64 -29.95 -29.85 -29.75 -29.65 -29.55 -29.45 ...
  * lat          (lat) float64 30.05 30.15 30.25 30.35 30.45 30.55 30.65 ...
  * lev          (lev) float64 0.1099 0.142 0.1876 0.2404 0.3001 0.3662 ...
  * ilev         (ilev) float64 0.09869 0.121 0.1629 0.2122 0.2685 0.3316 ...
  * time         (time) datetime64[ns] 2016-01-01T01:00:00 ...
    alt          (time, lev, lat, lon) float32 15753.486 15752.737 15752.33 ...
Data variables:
    P0           float64 ...
    hyam         (lev) float64 ...
    hybm         (lev) float64 ...
    hyai         (ilev) float64 ...
    hybi         (ilev) float64 ...
    SURF_ppb_O3  (time, lat, lon) float32 ...
    AOD_350nm    (time, lat, lon) float32 ...
    AOD_550nm    (time, lat, lon) float32 ...
    EXT_350nm    (time, lev, lat, lon) float32 ...
    EXT_550nm    (time, lev, lat, lon) float32 ...
Attributes:
    vert_co

## CAST

### Discard ADM outside the forecast domain

In [8]:
domain = dict(
    time = emep.time[[0, -1]].values,
    lat = emep.lat[[0, -1]].values,
    lon = emep.lon[[0, -1]].values,
    alt = [0.0, emep.alt.isel(time=0, lev=0).max().values],
)
in_range = lambda x, k: np.logical_and(x >= domain[k][0], x <= domain[k][1])

domain

{'time': array(['2016-01-01T01:00:00.000000000', '2016-01-06T00:00:00.000000000'],
       dtype='datetime64[ns]'),
 'lat': array([30.05, 81.95]),
 'lon': array([-29.95,  89.95]),
 'alt': [0.0, array(15842.48046875)]}

In [9]:
in_latlon = lambda x: np.logical_and(in_range(x.lat, 'lat'), in_range(x.lon, 'lon'))
in_domain = lambda x: np.logical_and(in_latlon(x), in_range(x.alt, 'alt'))

print('in_latlon: %6d'%in_latlon(adm).sum())
print('in_domain: %6d'%in_domain(adm).sum())

in_latlon:   8997
in_domain:   5669


### Forecast time window
Skip time filtering for now. The example dataset is old and will not match any current forecast.

In [10]:
in_forecast = lambda x: np.logical_and(in_domain(x), in_range(x.time, 'time'))

print('in_forecast: %6d'%in_forecast(adm).sum())

in_forecast:      0


### Filtered ADM

In [11]:
ec550 = adm.where(in_domain(adm), drop=True).ec550
ec550

<xarray.DataArray 'ec550' (time: 5669)>
array([0., 0., 0., ..., 0., 0., 0.])
Coordinates:
  * time     (time) datetime64[ns] 2007-01-01T00:28:25.346999884 ...
    lat      (time) float64 79.48 79.48 79.48 79.48 79.48 79.48 79.48 79.48 ...
    lon      (time) float64 80.6 80.6 80.6 80.6 80.6 80.6 80.6 80.6 80.6 ...
    alt      (time) float64 1.47e+04 1.344e+04 1.218e+04 1.092e+04 9.654e+03 ...
Attributes:
    long_name:  ec550
    units:      1

In [12]:
print('ADM total: %6d'%adm.ec550.count())
print('in_latlon: %6d'%ec550.count())

ADM total:  71397
in_latlon:   5669


### Collocated forecast

In [13]:
collocate = lambda model, obs: model.load().sel(
    lon=obs.lon, lat=obs.lat, time=obs.time, method='nearest'
)

In [14]:
%time aod550 = collocate(emep.EXT_550nm, ec550)

aod550

CPU times: user 2min 11s, sys: 40.6 s, total: 2min 52s
Wall time: 3min 43s


<xarray.DataArray 'EXT_550nm' (time: 5669, lev: 20)>
array([[0.000000e+00, 7.444348e-07, 1.120077e-06, ..., 5.290326e-05,
        6.021130e-05, 3.114995e-05],
       [0.000000e+00, 7.444348e-07, 1.120077e-06, ..., 5.290326e-05,
        6.021130e-05, 3.114995e-05],
       [0.000000e+00, 7.444348e-07, 1.120077e-06, ..., 5.290326e-05,
        6.021130e-05, 3.114995e-05],
       ...,
       [0.000000e+00, 4.101983e-07, 5.860849e-07, ..., 1.395512e-04,
        1.424588e-04, 1.557732e-04],
       [0.000000e+00, 4.101983e-07, 5.860849e-07, ..., 1.395512e-04,
        1.424588e-04, 1.557732e-04],
       [0.000000e+00, 4.101983e-07, 5.860849e-07, ..., 1.395512e-04,
        1.424588e-04, 1.557732e-04]], dtype=float32)
Coordinates:
    lon      (time) float64 80.65 80.65 80.65 80.65 80.65 80.65 80.65 80.65 ...
    lat      (time) float64 79.45 79.45 79.45 79.45 79.45 79.45 79.45 79.45 ...
  * lev      (lev) float64 0.1099 0.142 0.1876 0.2404 0.3001 0.3662 0.4377 ...
  * time     (time) datetime64[

### Save to NetCDF
save filtered ADM obs and collocated model results

In [15]:
ec550.to_netcdf(lustre%"adm_domain.nc")
aod550.to_netcdf(lustre%"emep_colloc.nc")