# Exercise 1.3 - Scatter plots
prepared by M.Hauser

In this exercise we will learn how to create scatter plots.

In [None]:
import matplotlib.pyplot as plt

import numpy as np
import netCDF4 as nc

In [None]:
% matplotlib inline

## Create sample data

In [None]:
N = 50
x = np.random.rand(N)
y = np.random.rand(N)

## Climtological Station Data for Switzerland - Temperature & Precip

We use data of the 14 swiss weather stations avaliable from [MeteoSwiss](http://www.meteoswiss.admin.ch/home/climate/past/homogenous-monthly-data.html).

The data has already been [retrieved and postprocessed](../data/prepare_data_MCH.ipynb).

We load the climatology of temperature and precipitation. We also load the outline of Switzerland in lon/ lat coordinates.

In [None]:
# load outline of switzerland

import netCDF4 as nc

fN = '../data/outline_switzerland.nc'

with nc.Dataset(fN) as ncf:
    
    lon_ch = ncf.variables['lon'][:]
    lat_ch = ncf.variables['lat'][:]

# =====================================
    
# load climatological station data

fN = '../data/MCH_clim.nc'
with nc.Dataset(fN) as ncf:
    
    lon = ncf.variables['lon'][:]
    lat = ncf.variables['lat'][:]
    
    temp = ncf.variables['temp'][:]
    prec = ncf.variables['prec'][:]
    station = ncf.variables['station'][:]

## Scatterplot

using `plt.scatter(x, y)` we can create a simple scatterplot:


In [None]:
plt.scatter(x, y)

### Exercise

 * Plot the position (`lon`, `lat`) of the meteorological stations

In [None]:
# start with the outline of switzerland

f, ax = plt.subplots()
ax.plot(lon_ch, lat_ch)

# code here



### Solution

In [None]:
# start with the outline of switzerland

f, ax = plt.subplots()
ax.plot(lon_ch, lat_ch)

ax.scatter(lon, lat)



## Markers

pyplot offers a great [range of markers](https://matplotlib.org/api/markers_api.html). They can be set with the `marker` keyword. Note that you can only specify one set of markers per `plt.scatter` command.

In [None]:
plt.scatter(x, y, marker='d')

## Scatterplot with `plt.plot`

Note that we can create a scatterplot with `plt.plot`, if we set the linestyle to `''`, and the marker to `'o'`. Indeed, I often use this and not plt.scatter. Using `plt.plot` is faster than using `plt.scatter`. However, there are some things that can only be done with `scatter` - see examples below.

In [None]:
plt.plot(x, y, marker='o', linestyle='')

There are many properties you can set to adjust your scatterplot:

In [None]:
plt.plot(x, y, 
         linestyle='',
         marker='s',
         markersize=12,
         markerfacecolor='0.75',
         markeredgecolor='green',
         markeredgewidth=2)

## Colors

Colors can be set with the `colors` or `c` keyword - however these keywords are not equal in their functionality. Both can take a list of colors (e.g. `['r', 'g', 'b']`). If there are more datapoints than colors, they get repeated.

In [None]:
colors = ['r', 'g', 'b']

f, axes = plt.subplots(1, 2)

ax = axes[0]
ax.scatter(x, y, color=colors)
ax.set_title('color=colors')

ax = axes[1]
ax.scatter(x, y, c=colors)
ax.set_title('c=colors')

plt.setp(axes, aspect='equal');

## Mapping colors

However `c` can also take values, then the color of each point is according to the given value.

In [None]:
values = np.random.rand(N) * 100

f, ax = plt.subplots()

h = ax.scatter(x, y, c=values)

f.colorbar(h)

In [None]:
plt.scatter?

## Colorbar

This is the first time we added a colorbar. Because colorbar is a new `axes` instance, it is a function of the figure (`f.colorbar`), but you can also call it as `plt.colorbar`. Colorbars take a `mappable` argument - so that they know which colors and data range to display. `scatter` returns such a `mappable`, I usually call it `h` (for handle). Other plot functions that return a `mappable` are e.g. pcolormesh or contourf (see later).

We will learn more about colorbars in Exercise 1.5 and Exercise 3.4.

### Exercise

 * Try passing `values` to `color`.
    
``` ipython
plt.scatter(x, y, color=values)
```

In [None]:
# code here


### Solution
This creates an error.

### Exercise
 * color the scatterpoints according to the climatological temperature at the stations (`temp`)
 * add a colorbar
 * use the `label` keyword in `plt.colorbar` to indicate that the data is `'Temperature [°C]'`

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

# adjust the scatter command
h = ax.scatter(lon, lat)

# add colorbar


### Solution

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

h = ax.scatter(lon, lat, c=temp)

h = plt.colorbar(h, label='Temperature [°C]')

This is not yet entirely what we want - we would like the range of the colors to be symetrical around 0 °C and positive temperature should be red and negative blue...

## Colormaps

The choosen colors are according to the default colormap of matplotlib, 'viridis'. There are probably about [100 colormaps you can choose from](https://matplotlib.org/users/colormaps.html). However, not all of them are recommended. Colormaps can be set using the `cmap` keyword argument.

`vmin` and `vmax` set the range of the colormap.

In [None]:
h = plt.scatter(x, y, c=values, vmin=0, vmax=100, cmap='Reds')

plt.colorbar(h)

### Exercise

* Use the `RdBu_r` colormap.
* Ensure that the color range goes from -8 to 8.
* Draw a light gray line around each point, so that all of them can be clearly seen (hint: `edgecolor`).

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

# adjust scatter
h = ax.scatter(lon, lat, c=temp)

h = plt.colorbar(h, label='Temperature [°C]')

### Solution

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)
h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, edgecolor='0.5')

plt.colorbar(h, label='Temperature [°C]')


## Marker size

The size of each point can be set with the `s` keyword.

Note: normally you specify sizes as the width (or height) of an element (e.g. `linewidth` or `markersize`). However, `s` is the size in points^2. Thus, the following two points have the same size (see below):

    plt.plot(0, 1, marker="o", markersize=22)
    plt.scatter(1, 1, s=22**2)

Why is that? Because our brain uses the area and not the width/ height to interpret importance. So if something is twice as large it should have twice the area and not twice the width/ height, see this [stackoverflow question](https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size).



In [None]:
fig,ax = plt.subplots()

ax.plot(-1, 1, marker="o", markersize=22)
ax.scatter(1, 1, s=22**2)

ax.set_xlim(-5, 5)

In [None]:
# 0 to 15 point radii
area = (15 * np.random.rand(N)) ** 2

plt.scatter(x, y, c=values, s=area);

### Exercise
 * set the size of the points according to the mean precipiation (`prec`) 
 * Make sure the points are plotted above the border, using the `zorder` keyword.

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

# adjust the scatter command
h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, edgecolor='0.5')

plt.colorbar(h, label='Temperature [°C]')

### Solution

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)
h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, edgecolor='0.5', s=prec, zorder=3)

plt.colorbar(h, label='Temperature [°C]')

### Exercise

This is not optimal, yet. We'll have to rescale the precipitation so it leads to a more reasonable size. 

  * Set the size according to the mean precipiation, use a size of 50 for the smallest value and a size of 250 for the highest precipitation value.
  

In [None]:
mn = prec.min()
mx = prec.max()

# scale the precipitation
# p_scaled = ...

# ======================

f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

# replace prec by p_scaled
h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, edgecolor='0.5', s=prec, zorder=3)

plt.colorbar(h, label='Temperature [°C]')

## Solution

In [None]:
mn = prec.min()
mx = prec.max()

# scale the precipitation
p_scaled = ((prec - mn) / (mx - mn)) * 200 + 50

# ======================

f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

# replace prec by p_scaled
h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, edgecolor='0.5', s=p_scaled, zorder=3)

plt.colorbar(h, label='Temperature [°C]')

## Bonus: Legends

What is missing from the plot now is a legend, indicating that (1) the size of the points corresponds to the amount of precipitation. We already learned how to add a standard legend.

With the `loc` keyword you can manually set the position within the axes:

 * best
 * lower right
 * right
 * upper right
 * ...   
 
We also want to remove the line around the legend frame. Therefore we specify `edgecolor='none'`.

In [None]:
h = plt.scatter(x, y, c=values, label='Legend entry')

legend = plt.legend(loc='upper left', edgecolor='none')

### Dummy Plot

You can add an empty `plot` (or `scatter`) function, that does not add any point/ line to the axes, but provides a legend entry. Sometimes this is the easiest way to create a certain legend entry.

In [None]:
h = plt.plot([], label='empty plot function')
h = plt.scatter([], [], label='empty scatter function')

legend = plt.legend(loc='upper left', edgecolor='none')

So assume in our example `area` corresponds to an actual area in km$^2$, we can add legend entries like so. Note that we need to rescale `s` as we did above.

In [None]:
plt.scatter(x, y, c=values, s=area)


s = 5
h = plt.scatter([], [], s=s**2, label=str(s) + ' km$^2$', facecolor='0.7', edgecolor='0.4')

s = 10
h = plt.scatter([], [], s=s**2, label=str(s) + ' km$^2$', facecolor='0.7', edgecolor='0.4')

s = 15
h = plt.scatter([], [], s=s**2, label=str(s) + ' km$^2$', facecolor='0.7', edgecolor='0.4')


legend = plt.legend(loc='upper left', edgecolor='none')

# the background of the legend is slightly transparent
# we want to get rid of this
frame = legend.get_frame()
frame.set_alpha(1)


### Exercise

 * add a legend indicating anual mean precipitation for 1000 mm / yr, 1500 mm / yr, and 2000 mm / yr
 * note that you will have to scale these values as you did for the original precipitation values
 * put the legend in the at the top in the middle
 * add a title to the legend `plt.legend(..., title='')`
 * try the keyword `ncol`, set it to 3
 * you may have to change the y_lim to make room for the legend

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)
h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, s=p_scaled, edgecolor='0.5', zorder=3)

plt.colorbar(h, label='Temperature [°C]')

# code here

for area in [1000, 1500, 2000]:
    # size = ...
    
    # convert number to string
    label = str(area)
    
    # ax.scatter(...)

# ax.legend(...)

# ax.set_ylim(None, ...)

### Solution

In [None]:
f, ax = plt.subplots()

ax.plot(lon_ch, lat_ch)

h = ax.scatter(lon, lat, c=temp, cmap='RdBu_r', vmax=8, vmin=-8, s=p_scaled, edgecolor='0.5', zorder=3)

plt.colorbar(h, label='Temperature [°C]')

# code here

for area in [1000, 1500, 2000]:
    size = ((area - mn) / (mx - mn)) * 200 + 50
    
    # convert number to string
    label = str(area)
    ax.scatter([], [], c='0.85', s=size, label=label, edgecolor='0.5')


ax.legend(title='Precipitation [mm / yr]', loc='upper center', ncol=3, edgecolor='none')

ax.set_ylim(None, 48.3)
