### Getting started

This notebook is meant to give a quick introduction into pyaerocom based and into some of the relevant features and workflows when using [pyaerocom](http://aerocom.met.no/pyaerocom/). 

It ends with a colocation of CAM53-Oslo model AODs both all-sky and clear-sky with Aeronet Sun V3 level 2 data.

#### Pyaerocom API flowchart (minimal)

The following flowchart illustrates the minimal workflow to create standard output in pyaerocom based on a user query (that typically comprises a model ID and observation ID as well as one (or more) variable(s) of interest (products indicated in red are not available yet, date of latest update: 4-10-2018).

In [None]:
from IPython.display import Image
flowchart = Image(filename=('../suppl/api_minimal_v0.png'))
flowchart

A user query typically comprises a model (+ experiment -> model run) and an observation network, which are supposed to be compared. 

**Note**: the flowchart depicts a situation, where the data from the observation network is *ungridded*, that is, the data is not available in a gridded format such as NetCDF, but, for instance, in the form of column seperated text files (as is the case for Aeronet data, which is used as an example here and included in the test dataset). 
For `gridded` observations (e.g. satellite data), the flowchart is equivalent but with `ReadGridded` class and `GriddedData` for the observation branch (and without caching). 

This notebook illustrates and briefly discusses the individual aspects displayed in the flowchart.

In [None]:
import pyaerocom as pya

##### Check data directory

By default, pyaerocom assumes that the AEROCOM database can be accessed (cf. top of flowchart), that is, it initiates all data query paths relative to the database server path names.

In [None]:
pya.const.BASEDIR

**NOTE**: Execution of the following lines will only work if you are connected to the AEROCOM data server or if you have access to the pyaerocom testdataset. The latter can be retrieved upon request (please contact jonasg@met.no).

#### Reading of and working with *gridded* model data (`ReadGridded` and `GriddedData` classes)

This section illustrates the reading of gridded data as well as some features of the `GriddedData` class of *pyaerocom*. First, however, we have to find a valid model ID for the reading (cf. flow chart).

##### Find model data

The database contains data from the CAM53-Oslo model, which is used in the following. You can use the `browse_database` function of pyaerocom to find model ID's (which can be quite cryptic sometimes) using wildcard pattern search.

#### Reading of and working with ungridded data (`ReadUngridded` and `UngriddedData` classes)

Ungridded data in pyaerocom refers to data that is available in the form of *files per station* and that is not sampled in a manner that it would make sense to translate into a rgular gridded format such as the previously introduced `GriddedData` class. 

Data from the AERONET network (that is introduced in the following), for instance, is provided in the form of column seperated text files per measurement station, where columns correspond to different variables and data rows to individual time stamps. Needless to say that the time stamps (or the covered periods) vary from station to station. 

The basic workflow for reading of ungridded data, such as Aeronet data, is very similar to the reading of gridded data (comprising a reading class that handles a query and returns a data class, here [UngriddedData](http://aerocom.met.no/pyaerocom/api.html#module-pyaerocom.ungriddeddata) (see also flow chart above).

Before we can continue with the data import, some things need to be said related to the caching of `UngriddedData` objects. 

##### Caching of UngriddedData

Reading of ungridded data is often rather time-consuming. Therefore, pyaerocom uses a caching strategy that stores loaded instances of the `UngriddedData` class as pickle files in a cache directory (illustrated in the left hand side of the flowchart shown above). The loaction of the cache directory can be accessed via:

In [None]:
pya.const.CACHEDIR

You may change this directory if required.

In [None]:
print('Caching is active? {}'.format(pya.const.CACHING))

**Deactivate caching**

In [None]:
pya.const.CACHING = False

**Activate caching**

In [None]:
pya.const.CACHING = True

**Note**: if caching is active, make sure you have enough disk quota or change location where the files are stored.

##### Read Aeronet Sun v3 level 2 data

As illustrated in the flowchart above,  ungridded observation data can be imported using the `ReadUngridded` class. The reading class requires an ID for the observation network that is supposed to be read. Let's find the right ID for these data:

In [None]:
pya.browse_database('Aeronet*V3*Lev2*')

It found one match and the dataset ID is *AeronetSunV3Lev2.daily*. It also tells us what variables can be loaded via the interface.

**Note**: You can safely ignore all the warnings in the output. These are due to the fact that the testdata set does not contain all observation networks that are available in the AEROCOM database.

In [None]:
obs_reader = pya.io.ReadUngridded('AeronetSunV3Lev2.daily')
print(obs_reader)

Let's read the data (you can read a single or multiple variables at the same time). For now, we only read the AOD at 550 nm:

In [None]:
aeronet_data = obs_reader.read(vars_to_retrieve='od550aer')
type(aeronet_data) #displays data type

As you can see, the data object is of type `UngriddedData`. Like the `GriddedData` object, also the `UngriddedData` class has an informative string representation (that can be printed):

In [None]:
print(aeronet_data)

##### Access of individual stations

In [None]:
print(aeronet_data.station_name)

Let's say you are interested in the city of Leipzig, Germany.

In [None]:
station_data = aeronet_data['Leipzig']
type(station_data)

As you can see, the returned object is of type `StationData`, which is one further data format of pyaerocom (note that this is not displayed in the simplified  flowchart above). `StationData` may be useful for individual stations and is an extended Python dictionary (if you are familiar with Python). 

You may print it to see what is in there:

In [None]:
print(station_data)

As you can see, this station contains a time-series of the AOD at 550 nm. If you like, you can plot this time-series:

In [None]:
station_data.plot_variable('od550aer', style=' xg', figsize=(16,6)).set_title('Leipzig AOD all times')

You can also retrieve the `StationData` with specifying more constraints using `to_station_data` (e.g. in monthly resolution and only for the year 2010). And you can overlay different curves, by passing the axes instance returned by the plotting method:

In [None]:
ax=aeronet_data.to_station_data('Leipzig', 
                                start=2010, 
                                freq='daily').plot_variable('od550aer', 
                                                            label='daily')

ax=aeronet_data.to_station_data('Leipzig', 
                                start=2010, 
                                freq='monthly').plot_variable('od550aer', 
                                                              label='monthly',
                                                              ax=ax)
ax.legend()
ax.set_title('Leipzig AODs 2010')

#### You can also plot the time-series directly

For instance, if you want to do an air-quality check for you next bouldering trip, you may call:

In [None]:
ts = aeronet_data.to_station_data('Fontainebleau', 'od550aer', 2006, None, 'monthly')
ts

In [None]:
aeronet_data.plot_station_timeseries('Fontainebleau', 'od550aer', ts_type='monthly',
                                     start=2006).set_title('AOD in Fontainebleau, 2006')

Seems like November is a good time (maybe a bit rainy though)

#### Colocation of model and obsdata

Now that we have different data objects loaded we can continue with colocation. In the following, both the all-sky and the clear-sky data from CAM53-Oslo will be colocated with the subset of Aeronet stations that we just loaded. 

The colocation will be performed for the year of 2010 and two scatter plots will be created. 

You have also the option to apply a certain filter when colocating using a valid filter name. Here, we use global data and exclude mountain sides.

In [None]:
print(od550csaer)

##### Access time stamps

Time stamps are represented as numerical values with respect to a reference date and frequency, according to the CF conventions. They can be accessed via the `time` attribute of the data class.

In [None]:
od550aer.time

You may also want the time-stamps in the form of actual datetime-like objects. These can be computed using the `time_stamps()` method:

In [None]:
od550aer.time_stamps()[0:3]

##### Plotting maps

Maps of individual time stamps can be plotted using the quickplot_map method.

In [None]:
fig1 = od550aer.quickplot_map('2009-3-15')
fig2 = od550csaer.quickplot_map('2009-3-15')

##### Filtering

Regional filtering can be performed using the [Filter](http://aerocom.met.no/pyaerocom/api.html#module-pyaerocom.filter) class (cf. flowchart above). 

An overview of available default regions can be accessed via:

In [None]:
print(pya.region.get_all_default_region_ids())

Now let's go for north Africa. Create instance of Filter class:

In [None]:
f = pya.Filter('NAFRICA')
f

... and apply to the two data objects (this can be done by calling the filter with the corresponding data class as input parameter):

In [None]:
od550aer_nafrica = f(od550aer)
od550csaer_nafrica = f(od550csaer)

Compare shapes:

In [None]:
od550aer_nafrica

In [None]:
od550aer

As you can see, the filtered object is reduced in the longitude and latitude dimension. Let's plot the two new objects:

In [None]:
ax1 = od550aer_nafrica.quickplot_map('2009-3-15')
ax2 = od550csaer_nafrica.quickplot_map('2009-3-15')

##### Filtering of time

Filtering of time is not yet included in the Filter class but can be easily performed from the `GriddedData` object directly. If you know the indices of the time stamps you want to crop, you can simply use numpy indexing syntax (remember that we have a 3D array containing time, latitude and lonfgitude). 

Let's say we want to filter the **year 2009**.

Since the time dimension corresponds the first index in the 3D data (time, lat, lon), and since we know, that we have monthly data from 2008-2010 (see above), we may use

In [None]:
od550aer_nafrica_2009 = od550aer_nafrica[12:24]
od550aer_nafrica_2009.time_stamps()

in order to extract the year 2009.

However, this methodology might not always be handy (imagine you have a 10 year dataset of `3hourly` sampled data and want to extract three months in the 6th year ...). In that case, you can perform the cropping using the actual timestamps (for comparibility, let's stick to 2009 here):

In [None]:
od550aer_nafrica_2009_alt = od550aer_nafrica.crop(time_range=('1-1-2009', '1-1-2010'))
od550aer_nafrica_2009.time_stamps()

##### Data aggregation

Let's say we want to compute yearly means for each of the 3 years. In this case we can simply call the `downscale_time` method:

In [None]:
od550aer_nafrica.downscale_time('yearly')
od550aer_nafrica.quickplot_map('2009')

**Note**: seasonal aggregation is not yet implemented in pyaerocom but will follow soon.

In the following section the reading of ungridded data is illustrated based on the example of AERONET version 3 (level 2) data. The test dataset contains a randomly picked subset of 100 Aeronet stations. Aeronet provides different products, 

##### Overview of what is in the data

Simply print the object.

In [None]:
print(od550aer)

In [None]:
col_all_sky_glob = pya.colocation.colocate_gridded_ungridded(od550aer, aeronet_data, 
                                                                ts_type='monthly',
                                                                start=2010,
                                                                filter_name='WORLD-noMOUNTAINS')
type(col_all_sky_glob)

Let's do the same for the clear-sky data.

In [None]:
pya.browse_database('CAM53*-Oslo*UNTUNED*')

##### Read Aerosol optical depth at 550 nm 

Import both clear-sky (*cs* in variable name) and all-sky data.

In [None]:
import warnings
warnings.filterwarnings('ignore')
reader = pya.io.ReadGridded('CAM53-Oslo_7310_MG15CLM45_5feb2017IHK_53OSLO_PD_UNTUNED')
od550aer = reader.read_var('od550aer')
od550csaer = reader.read_var('od550csaer')

Both data objects are instances of class [GriddedData](http://aerocom.met.no/pyaerocom/api.html#module-pyaerocom.griddeddata) which is based on the [Cube](https://scitools.org.uk/iris/docs/v1.9.0/html/iris/iris/cube.html#iris.cube.Cube) class ([iris library](https://scitools.org.uk/iris/docs/v1.9.0/html/index.html)) and features very similar functionality and more.

Some of these features are introduced below.

In [None]:
col_clear_sky_glob = pya.colocation.colocate_gridded_ungridded(od550csaer, aeronet_data, 
                                                                  ts_type='monthly',
                                                                  start=2010,
                                                                  filter_name='WORLD-noMOUNTAINS')
type(col_clear_sky_glob)

In [None]:
ax1 = col_all_sky_glob.plot_scatter()
ax1.set_title('All sky (2010, monthly)')

In [None]:
ax2 = col_clear_sky_glob.plot_scatter()
ax2.set_title('Clear sky (2010, monthly)')

... or for EUROPE: