# 01: Diving Right In

These tutorials are organized around [xarray](https://xarray.pydata.org), which provides a powerful set of tools for analyzing climate data, including spatial distributions and time series. Because the time for this training is limited, we dive right into using [xarray](https://xarray.pydata.org) without working through the [basic](https://docs.python.org/3/tutorial/datastructures.html) and [intermediate](https://pandas.pydata.org/) building blocks that xarray extends. Links to resources that provide additional details on those building blocks are included throughout.  

One building block that does require special mention is modules. Python is a modular language, meaning that we often use it by loading a number of specialized extensions. This reliance on modules has both advantages and disadvantages. The most important advantages are that
1. We can each build our own bespoke programming environment (or [environments](https://www.anaconda.com/docs/tools/working-with-conda/environments)) based around the tools we use most often
2. We only need to load the tools we will actually use for any given task
3. Code is more readable and tracable

The main disadvantages are that
1. Modules may conflict with each other, and these conflicts can in some cases break the environment
2. Many modules are maintained by volunteers, who may or may not have time to keep them updated

The use of [conda-forge](https://conda-forge.org/) mitigates both of these disadvantages, and is highly recommended!

Once you have installed python and the required modules ([see readme](README.md)), import the modules needed for this example as follows. Note the different ways to import modules.

In [None]:
!pip install colormaps
import colormaps

The most basic way to import a module. Methods and other objects provided by the module are then accessible by:

    cmap = colormaps.dense

In [None]:
import numpy as np
import xarray as xr
import seaborn as sns

We can simplify our code by assigning abbreviations to the imported modules. We can then access methods by, for example:

    a = np.arange(10)

Just make sure not to create any variables that conflict with the module names!

In [None]:
!pip install cartopy
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeat
import cartopy.util as cputil

Similarly, we may access some submodules directly, assigning each its own abbreviation to simplify access:

    fig = plt.figure(figsize=(8,4))

In [None]:
from scipy.stats import linregress

This type of import is useful if we only need to access one or two objects from the module. We can then use the method directly as:

    rgr = linregress(x,y)

Once you've run the blocks above, we can get started with data analysis! Reading a netcdf data file is done by using the [open_dataset()](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html#xarray.open_dataset) method in xarray. This same interface can be used to read other file types, including the [grib format](https://docs.xarray.dev/en/stable/user-guide/io.html#grib-format-via-cfgrib) used by many weather prediction centers, as long as the appropriate engine is installed. 
- [Input / output options in xarray](https://docs.xarray.dev/en/stable/user-guide/io.html)

In [None]:
!pip install gdown
file_id = '140JTQtWrh3rGY6goZYPqL4sADcIdIQA5' # Replace with your file's ID
!gdown {file_id}

In [None]:
ncdf = xr.open_dataset('cams_compositionreanalysis_2003-2024.1m.nc')

The file loaded above provides monthly-mean total column ozone, carbon monoxide, and aerosol optical depth from the [CAMS](https://doi.org/10.24381/fd75fff2) composition reanalysis produced by the [ECMWF](https://weather.metoffice.gov.uk/climate/met-office-hadley-centre/index).
- [Documentation on the CAMS reanalysis](https://confluence.ecmwf.int/display/CKB/CAMS%3A+Reanalysis+data+documentation)
- [Publication describing the CAMS reanalysis](https://doi.org/10.5194/acp-19-3515-2019)

To see a summary of the contents in the file, we can simply type the name of the variable by itself in a code block. Note that only the last variable typed in a given block will be shown as output.

In [None]:
ncdf

The object we have named <code>ncdf</code> is an [xarray](https://xarray.pydata.org) Dataset object. You can often find useful information about the data under "Attributes". In this example, we will process the variable named <code>aod550</code>, which contains monthly-mean aerosol optical depths with dimensions of latitude, longitude, and time. First, we access the variable by indexing the file with its name:

In [None]:
aod = ncdf['aod550']
aod

Unlike <code>ncdf</code>, <code>tas</code> is a DataArray object. You can find more information about xarray data structures [here](https://docs.xarray.dev/en/stable/user-guide/data-structures.html). Again we find useful information under the attributes tab, including the units, various names of the variable, and a brief description of what it contains. 

The next step is to calculate the climatological mean distribution from the first ensemble member and plot it. xarray makes this step extraordinarily easy:

In [None]:
aod.mean('time').plot()

This plot is useful mainly as a quick look. For example, what is the range of the data? Does it contain any missing values? Does the distribution match our expectations? 

- [Indexing and selecting data in xarray](https://docs.xarray.dev/en/stable/user-guide/indexing.html)
- [Climate data in xarray](https://docs.xarray.dev/en/stable/user-guide/weather-climate.html)

In the next example, we'll work through some ways to make the plot prettier.