# Using packages in Python

You will have explored some basic Python in the previous notebook. Here we will explore a couple of commonly used Python packages in environmental science, namely `xarray`, used for manipulating array-like data (typically processed observations, and model output), and `matplotlib.pyplot`, a plotting library.

To use the functionalities offered by these packages within our Python scripts/notebooks, we first need to import them.

```
import xarray
import matplotlib.pyplot
```

To save typing later on (you'll see very shortly), we can give these packages an alias. These two are typical for `xarray` and `matplotlib`. It's common for the first cell in a notebook, or the top of a script, to contain all the import statements for the packages we'll need for that particular notebook or script.

```
import xarray as xr
import matplotlib.pyplot as plt
```

However, since we're just playing around here, let's put these import statements in the next cell.

In [None]:
import xarray as xr
import matplotlib.pyplot as plt

Now, whenver we want to access the functions within these packages, we just call them using the package name (or alias).

Let's make a simple line plot, using the `plt.plot` function. You can look up the documentation on the `matplotlib` website, but in essence, this function takes two arguments, `x` and `y`. Let's do a really simple example here.

In [None]:
plt.plot([0, 1, 2, 3], [0, 2, 4, 6])

Voila, a line plot. We'll use the bit of `matplotlib` within `xarray` a bit later to plot some example model output. `matplotlib` contains a lot of routines to customise your plot to (almost) your heart's desire.
Some common ones are:
- `plt.title("Title of Plot")`
- `plt.legend()`: you must pass the `label="label"` option to `plt.plot` for labels to show up
- `plt.xlabel("x axis label")` 
- `plt.savefig("/path/to/file.extension")`: for saving your plot to disk

## Data

The `xarray` library has many functions for cropping, analysing, performing functions on and plotting data on coordinates, and we'll be exploiting this in this workshop.
The most common way to read data into an `xarray` object is to use the `open_dataset` function, along with a path to the file where the data live.

`xarray` provide an example dataset on their website, which we'll use here for simplicity: for the case of this workshop you'll just replace this line with one that looks like `ds = xr.open_dataset("/path/to/dataset.nc")`. There is also a function called `open_mfdataset` which we can use to open one xarray dataset from many files (for example, if you have model data with one file per time slice). This also has performance benefits, since the entire dataset isn't loaded into memory at once. 

For this example, we'll use the example tutorial dataset, which we open slightly differently.

In [None]:
ds = xr.tutorial.load_dataset("air_temperature")
ds

Here we can see that this *dataset* is on three dimensions: time, lat and lon, with corresponding coordinates. Sometimes you can have coordinates defined on multiple dimensions, for example if you have a map projection where latitude and longitude do not change uniformly in x and y. There is one variable, air, on dimensions time, lat and lon. 
We can access air, either with square brackets or a dot like so:
- `ds.air`
- `ds["air"]`

In [None]:
ds.air

`xarray` calls this object a *DataArray*, and there are some routines within `xarray` that only work with *DataArrays*, as opposed to *Datasets*, which can be thought of as a collection of *DataArrays*. 

## Subsetting with `xarray`

Suppose we want to look at the temperature data from the final time in the data, 2014-12-31T18:00:00. The easiest way is to use list indexing on the data, but we can also use the `.sel` function to filter the data to the bit we're interested in. We saw just now that the DataArray 'air' is on three dimensions, `time`, `lat` and `lon`. We can subset this to our choosing, for example the last time slice:

In [None]:
ds["air"][-1]

In [None]:
ds["air"].sel(time="2014-12-31T18:00")

We can also subset over more than one dimension. The following two cells subset the data the same way.

In [None]:
ds["air"][-1, :, 0]

In [None]:
ds["air"].sel(time="2014-12-31T18:00", lon=200.0)

## Applying functions to data

There are plenty functions we can apply to the data in a data array, such as `.mean`, which will calculate the mean value in the array. 

In [None]:
ds["air"].mean()

This has calculated the mean over the entire DataArray. We can also calculate functions over certain dimensions, if we like. So, to calculate the mean only over the `time` dimension we pass the `dim` argument to mean:

In [None]:
ds["air"].mean(dim="time")

## Visualising data

This is all very well and good, but how can we plot some of this data?

We can use `matplotlib`, for example to plot the mean temperature by latitude:

In [None]:
plt.plot(ds["air"]["lat"], ds["air"].mean(dim=["time", "lon"]))
plt.xlabel("Latitude ($\degree$N)")
plt.ylabel("Mean air temperature (K)")
plt.show()

`xarray` also has its own .plot functions, which we can use. Handily, these also infer x and y labels from the attributes within the DataArray.

In [None]:
ds["air"].mean(dim=["time", "lon"]).plot()
# plt.xlabel('Latitude ($\degree$N)')
# plt.ylabel('Mean air temperature (K)')
plt.show()

We can also plot multiple lines at once, and label them.

In [None]:
ds["air"].mean(dim=["lat", "lon"]).plot(label="Mean")
ds["air"].max(dim=["lat", "lon"]).plot(label="Max")
ds["air"].min(dim=["lat", "lon"]).plot(label="Min")
plt.legend()  # add labels
# plt.xlabel('Date')
# plt.ylabel('Air temperature (K)')
plt.show()

### 2D plots

We can also make 2D plots using .plot, if we pass a 2D array.

In [None]:
ds["air"].mean(dim=["time"]).plot()
plt.show()

## Extension: jazzy plots

We've now been able to plot some 2D data. Now we're going to use the library `cartopy` to make our plots easier to interpret by adding coastlines and gridlines.

We can also plot certain slices at certain times. These are more worked examples than a tutorial, but hopefully show you some of the things you can do.

In [None]:
import cartopy.crs as ccrs

In [None]:
# we can use figsize to control the width and height of plots, in inches (blame the USA). This one will be 8 inches across by 4 inches tall.
fig = plt.figure(figsize=(8, 4))

# this adds a subplot to our figure, using the notation (number of rows, number of columns, plot number). Note this indexes from 1, not 0.
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree())  

# plot the mean air temp on our axes, using the built-in colormap [sic] 'cool'. The default, 'viridis', is decent enough.
# Other colourmaps can be found at https://matplotlib.org/stable/users/explain/colors/colormaps.html.
ds["air"].mean(dim=["time"]).plot(ax=ax, cmap="cool")  

ax.coastlines()  # use cartopy to add coastlines to our plot
gl = ax.gridlines(draw_labels=True)  # add gridlines using cartopy.
gl.right_labels = False
gl.top_labels = False  # turn off extra labels

plt.title("Mean air temperature")  # add not very useful plot title

# save our plot as a png. Most image extensions work, pick your favourite. The bbox_inches=True argument makes matplotlib scale the plot.
# Take care when using savefig as it will overwrite at will!
plt.savefig("example_plot.png", bbox_inches="tight")  
plt.show()  # show the plot.

In [None]:
# Plot seasonal minimum temperature


fig = plt.figure(figsize=(8, 4))  # set up figure
ax = []  # make empty list to store our axes in
seasons = [[1, 2, 12],[3, 4, 5],[6, 7, 8],[9, 10, 11],]  # list of months to filter data by
labels = ["Winter", "Spring", "Summer", "Autumn"]  # corresponding labels

vmin = ds["air"].min(dim="time").min()  # get the min and max temperature for later
vmax = ds["air"].min(dim="time").max()

for i in range(4):  # essentially loop through the following code 4 times

    # add new axis to our figure, in the ith position
    ax.append(fig.add_subplot(2, 2, i + 1, projection=ccrs.PlateCarree()))
    # use the built-in xarray time accessor to filter our data to the chosen season and calculate the minimum.
    season = ds["air"][ds["air"].time.dt.month.isin(seasons[i])].min(dim=["time"])

    # plot the data on the ith axis, remove the colourbar and scale the colours to the min and max to ensure uniformity between subplots.
    img = season.plot(ax=ax[i], add_colorbar=False, vmin=vmin, vmax=vmax)

    ax[i].set_title(labels[i])  # add subplot title
    ax[i].coastlines()  # add coastlines
    i = i + 1
fig.suptitle("Minimum air temperature by season")  # add big title

# add colourbar to figure, shared between all four axes
fig.colorbar(img, ax=ax, label="Air temperature (K)") 
plt.show()