# Quick intro to basic plotting and Python functions (step 0)

## Intro

<img align="right" src="meme.jpeg" width="30%">


In the coming practicals you use some basic Python commands and a dedicated plotting function (different from the plotting function used in the practical of week 1). In this workbook we will briefly introduce those topics.

(please, do not spend too much time on this, at most 15-20 minutes)

(source of the meme: [Pinterest](https://pin.it/3Lp3Iwb))

## Initialize Python stuff

Please run the cell below by selecting it and pressing Shift+Enter. Or Press the Run button in the toolbar at the top of the screen (with the right pointing triangle).

In [None]:
# Load some necessary Python modules
import pandas as pd # Pandas is a library for data analysis
import numpy as np # Numpy is a library for processing multi-dimensional datasets
from hupsel_helper import myplot, myreadfile

## The data: dataframes

First read some data from the Excel file (for now, we're not really interested in the data, we just need some numbers to play with).

In [None]:
# File name
fname='Hupsel2011_MeteoData.xlsx'

# Get the data
df = myreadfile(fname)

The data you just read are contained in a so-called dataframe. You could think of it as a kind of spreadsheet, where each variables occupies a column, and each row is a point in time (our data are time series). To show the data (and recognize that it is like a spreadsheet, just execute the cell below (Ctrl-Enter).

In [None]:
df

To show only the names of the available variables, type `df.keys()` in the cell below (and run, or press Shift+Enter).

In [None]:
df.keys()

The dataframe also contains information about the units of the variables: type `df.attrs['units']` in the cell below. You can also access the units of an individual variable as follows: `df.attrs['units']['u_10']` should give `[m/s]`. Finally, the dataframe also contains a more complete description of the variables: `df.attrs['description']`.

(note that these attributes are not standard attributes of any dataframe: we constructed these attributes in the `myread` function).

In [None]:
df.attrs['units']

In [None]:
df.attrs['units']['K_in']

In [None]:
df.attrs['description']

If you're not interested in the full data frame, but in a single variable only, you can address it by it's name: type for instance `df['K_in']` to show the values of global radiation. Try this in the cell below.

In [None]:
# Print a single variable from the dataframe


If you're tired of always having to type the brackets and quotes, you can also simply assign a time series to a new variable (e.g. `my_K_in = df['K_in']`. Try it below:

In [None]:
# Make a new variable from a single variable from the dataframe


Apart from showing the contents of variables you can also use them to compute things. For instance, the albedo for each point in time can be computed as 
`my_albedo = df['K_out_m']/df['K_in_m']`. 

Try it below:

In [None]:
# Enter some computation in this cell, using one or more variables
# For example, you could compute the evaporative fraction:
my_EF = df['LvE_m'] / (df['Q_net_m'] - df['G_0_m'])

# You could also do it like this by first defining new variables and use those in the computation:
my_LvE = df['LvE_m']
my_Qnet = df['Q_net_m']
my_G = df['G_0_m']
my_EF_new = my_LvE/(my_Qnet - my_G)

# If you want to see what your produced, print it
# print(my_EF)

## Plotting the data in the dataframe

### Basic plotting

The simplify your work, we have made a dedicated plotting command. 

`myplot([x,y])`

Here `[x,y]` is a so-called list that contains the variables to be plotted (in Excel-speak this is called a 'data series').

Examples of usage:
* `myplot([ df['Date'] , df['p'] ])` 
    * You give the actual variables from the dataframe as a list (e.g. `x` could be `df['Date']` and `y` could be `df['p']`. 
* `myplot([ df['Date'] , my_albedo ])`
    * You use your own variables (e.g. result of a computation, like the albedo above, `my_albedo`) in the list `[x,y]`

In [None]:
# Make a plot with myplot([]) 
# (remove the hashtags to execute the commands and make the plot)
# foo = df['Date']
# bar = df['p'] * 100
# myplot([foo,bar,'-','pressure'])

### Multiple variables

If you want to plot multiple variables in a graph, you can provide multiplt lists as an argument, one list for each series to be plotted. So this becomes:
* `myplot([x, y], [x,z])`: plot both `y` and `z` as a function of `x` (where the variables `x`, `y` and `z` have been defined before.

But if you plot multiple variables, how can you distingish them? Well, in two ways:
* the plotting routine will automatically assign a new colour to the next plot
* you can select the plotting type:
  * `myplot( [df['Date'], df['K_in'],'-'])` : plot a line
  * `myplot( [df['Date'], df['K_in'],'o'])` : plot dots
  * `myplot( [df['Date'], df['K_in'],'o-'])`: plot a line with dots combined
  * `myplot( [df['Date'], df['prec'],'#'])` : bar graph (only one series per graph)

In [None]:
# Compare two independent observations of global radiation (remove the hashtag to make the plot)
# myplot( [df['Date'],df['K_in'],'-','KNMI'], [df['Date'], df['K_in_m'],'o', 'MAQ'])

### Further tweaking of your plots

Now that you know how to make basic plots, it is time to add some extra options. The main message here: it is good to know that these things are possible. Use them when you need them (we here only give examples using the standard plot method).

#### Axis-labals
Set the label text on the x-axis and y-axis: 
* `myplot( [x, y], xlabel='quantity (unit)', ylabel='quantity (unit)')`
* example `myplot( [ df['u_10'], df['T_1_5'] ], xlabel='wind speed at 10m (m/s)', ylabel='temperature at 1.5m (K)')`

#### Name of series in legend
You can now manually set the name of a series, to be used in the legend: 
* `myplot( [x, y, 'o', 'my specially constructed variable'] )`. 
* example `myplot( [ df['Date'], EF, 'o', 'evaporative fraction' ] )`. 

Note that in this case you *should* specify the type of plotting symbol (here a dot: `'o'`, could also be `'-'` for line and `'#'` for a bar graph).

#### Color dots in scatter plot with 3rd variables (color_by and colormap)
You can now color dots in a scatter plot with the values of a third variable (say `x`, `y` and `c`):
* `myplot([x, y, 'o'], color_by = c)`
* example `myplot( [ df['Date'], df['u_10'], 'o' ], color_by = df['u_dir'])`

You can choose the pallette used to color the dots. 
* `myplot([x, y, 'o'], color_by = c, colormap=cmap_name)`
* example `myplot([x, y, 'o'], color_by = df['K_in_m'] , colormap='colorblind')`

The options for the colormap are:
* `'turbo'` (red - green - blue) (default)
* `'plasma'` (blue - purple - yellow)
* `'viridis'` (purple - green - yellow)
* `'colorblind'` (colormap with 8 colors, optimized for people with colour blindness)

#### Log axis and linear axis
You can specify if an axis should be linear or logarithmic, separately for the x-axis and the y-axis. You do this with the keywords `x_axis_type` and `y_axis_type`:
* `myplot([x, y, 'o'], x_axis_type = 'linear', y_axis_type = 'log')`
* example `myplot( [ df['Date'], df['u_10'], 'o' ], y_axis_type = 'log')`

#### Axis limits
You can specify axis limits for both axis (rather than the plot command using auto-scaling). You can also only defined limits for one of the axes. The keywords are `xlim` and `ylim` and both require a list (2 numbers between square brackets):
* `myplot([x, y, 'o'], xlim = [0,10])`
* example `myplot( [ df['Date'], df['u_10'], 'o' ], ylim = [0,10])`
  
  
#### More help
To learn more about the plotting function you can type `help(myplot)`. This will actually work for any Python function (e.g. `help(np.sin)`).

In [None]:
# Do some experimens with the more advanced plotting options (remove hash tags to execute commands and make plot)
# x = df['u_10']
# y = df['T_1_5']
# z = df['K_in']
# myplot([x,y,'o','my special correlation'], color_by=z, xlabel='wind speed (m/s)', ylabel='temperature (K)')

## Functions in Python

There are roughly two reasons to use so-called *functions* in Python:
* When you apply a **complex operation** on one or more variables, your Python code can become unclear -> replacing the complex operation by a simple function call will clarifiy your code
* When you apply a certain operation **repeatedly** to different datasets you easily make errors when copying code -> only defining the function once, ensures that your operation is done in the same way

You could consider a *function* as a way to hide the complexity of certain operations. It is like a black box: some variables are thrown into the function, inside the box something is happening, and the black box gives you back some results (see the conceptual figure to the below).

The definition of a function conceptually consists of four parts:
* the *name* of the function
* the *interface* of the function: the variables that will be use inside the function for the computations, and that come from outside the function
* the actual *operations* 
* a *return* command that gives back the result to the outside world. 

<img align="right" src="concept_function.png" width="40%">

Let's assume that we need to compute the evaporative fraction repeatedly: $L_vE/(Q^*-G)$. The four parts defined above then would be (see also the conceptual figure to the right):
* name: `evap_frac`
* interface: `(LvE, Qnet, G)`
* operations: `EF = LvE/(Qnet - G)`
* return: `return EF`



Putting all of this together gives:
```
# The function definition starts with the word 'def',
# followed by the name and the interface, and ends with a colon (':')
def evap_frac(LvE, Qnet, G):
    # The actual calculation
    EF = LvE / (Qnet - G)
    # Return the outcome to the outside world
    return EF
```        
There are a few important things to realize:
* A function is just a **recipe**: it does not do anything by itself (once it has been defined, it's like a cookbook on the shelf). Only when you 'call' a function, with variables as it arguments (you give it milk, eggs and flour), it will actually become active and give you back results (a cake).
* The **names of variables** inside the function are only known there: they have no relation to the name of variables outside of the functions (so what is called `T` inside a function, migh be called `T_air_1_5` or `df['T_1_5']` outside the function). 
* The function **does not know about the properties or units of the data it receives** through it's interface. So if you design a function based on the assumption that the temperature it receives is in Kelvin, it is up to the user of the function to ensure that the provided data are *indeed* temperatures in Kelvin (and not e.g. degrees Celcius).

Once you have defined the function (written the recipe), you can use it. Let's try this in the cell below.

In [None]:
# Define the function
def evap_frac(LvE, Qnet, G):
    # The actual calculation
    EF = LvE / (Qnet - G)
    # Return the outcome to the outside world
    return EF

# Now use it with the data
# Note that the name of the variables that go into the function have no connection to the
# names of the variables inside the function (we could also invoke the function with evap_frac(x,y,z))
my_EF = evap_frac(df['LvE_m'],df['Q_net_m'], df['G_0_m'])

# Print it to show that we did something useful
print(my_EF)

The calculation in the example function above was rather simple. However, you can make functions as complex as you like. Calculations in Python are quite straightforward: +, -, \*, / and \*\*  indicate addition, subtraction, multiplications, division and exponentiation (... to the power ...). For more complex operations, we can use functions defined in the `numpy` library (which we imported as `np`). For instance,  for the exponential (exp) we use `np.exp`.

## Up to the next exercise
Now that you finished this brief intro, it is time to start with the real work. Continue to Step 1.