# Quick intro to basic plotting and Python functions (step 0)

## Intro
<img align="right" src="meme.jpeg" width="30%">In the coming practicals you use some basic Python commands and a dedicated plotting function (different from the plotting function used in the practical of week 1). In this workbook we will briefly introduce those topics.

(please, do not spend too much time on this, at most 15-20 minutes)

(source of the meme: [Pinterest](https://pin.it/3Lp3Iwb))

## Initialize Python stuff
Please run the cell below by selecting it and pressing Shift+Enter. Or Press the Run button in the toolbar at the top of the screen (with the right pointing triangle).

In [1]:
# Load some necessary Python modules
import pandas as pd # Pandas is a library for data analysis
import numpy as np # Numpy is a library for processing multi-dimensional datasets
from hupsel_helper import myplot, myreadfile

## The data: dataframes
First read some data from the Excel file (for now, we're not really interested in the data, we just need some numbers to play with).

In [2]:
# File name
fname='Hupsel2011_MeteoData.xlsx'

# Get the data
df = myreadfile(fname)

The data you just read are contained in a so-called dataframe. You could think of it as a kind of spreadsheet, where each variables occupies a column, and each row is a point in time (our data are time series). To show the data (and recognize that it is like a spreadsheet, just execute the cell below (Ctrl-Enter).

In [3]:
df

Unnamed: 0_level_0,Date,DOY_begin,DOY_end,u_dir,u_10,T_1_5,T_0_1,RH_1_5,p,K_in,...,G_0_m,Sonic_OK_m,Irga_OK_m,rho_m,u_10_m,u_dir_m,H_m,LvE_m,ustar_m,FCO2_m
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-04-12,2011-04-12,102,103,297.39,5.8447,7.8778,7.5701,79.049,1021.7,172.38,...,-8.458419,99.997859,85.415336,1.263158,4.756742,297.712708,8.961021,59.374954,0.367422,-7.791147e-08
2011-04-13,2011-04-13,103,104,292.47,3.7217,7.4611,6.8319,76.84,1021.2,187.13,...,-1.794606,99.9989,99.9989,1.265206,3.099981,278.475625,7.625416,47.309526,0.21811,-1.586858e-07
2011-04-14,2011-04-14,104,105,268.05,1.1269,6.5882,5.9076,78.597,1019.8,178.98,...,5.239124,90.32934,89.20787,1.266098,0.905036,203.412,22.289921,47.512459,0.086012,-3.54834e-07
2011-04-15,2011-04-15,105,106,65.812,0.94833,8.3111,7.5937,70.667,1021.4,176.76,...,6.03399,92.974942,92.974942,1.261179,0.729759,109.347379,12.744949,48.822771,0.068862,-3.722183e-07
2011-04-16,2011-04-16,106,107,206.0,1.2197,8.8701,7.7417,67.125,1022.8,187.78,...,7.009469,99.997454,99.997454,1.25914,0.910671,178.426562,14.47155,61.618865,0.079024,-3.670335e-07
2011-04-17,2011-04-17,107,108,42.599,1.3756,9.8451,8.8931,69.458,1025.0,170.24,...,8.564103,99.998032,99.998032,1.257115,0.906073,136.196729,11.138962,57.846119,0.078257,-3.051208e-07
2011-04-18,2011-04-18,108,109,96.629,2.8161,11.396,10.131,63.694,1020.9,241.89,...,9.546275,99.997685,99.997685,1.244644,2.031587,94.558854,11.710257,88.443756,0.129309,-3.987689e-07
2011-04-19,2011-04-19,109,110,102.88,2.3579,13.767,12.537,59.951,1016.4,245.72,...,14.139482,99.997801,99.997801,1.227831,1.66652,100.096812,9.061107,87.392376,0.097464,-3.512894e-07
2011-04-20,2011-04-20,110,111,61.35,1.2778,14.163,13.154,63.326,1016.0,246.74,...,14.90879,99.997454,99.997454,1.227008,0.885289,105.262565,18.219829,78.541043,0.06745,-3.672646e-07
2011-04-21,2011-04-21,111,112,89.953,1.676,15.495,14.185,61.924,1014.8,212.6,...,12.500479,99.998611,99.998611,1.217931,1.17254,108.760375,7.489815,70.729589,0.068942,-3.444573e-07


To show only the names of the available variables, type `df.keys()` in the cell below (and run, or press Shift+Enter).

In [0]:
df.keys()

The dataframe also contains information about the units of the variables: type `df.attrs['units']` in the cell below. You can also access the units of an individual variable as follows: `df.attrs['units']['u_10']` should give `[m/s]`. Finally, the dataframe also contains a more complete description of the variables: `df.attrs['description']`.

(note that these attributes are not standard attributes of any dataframe: we constructed these attributes in the `myread` function).

In [0]:
df.attrs['units']

In [0]:
df.attrs['units']['K_in']

In [0]:
df.attrs['description']

If you're not interested in the full data frame, but in a single variable only, you can address it by it's name: type for instance `df['K_in']` to show the values of global radiation. Try this in the cell below.

In [0]:
# Print a single variable from the dataframe


Apart from showing the contents of variables you can also use them to compute things. For instance, the albedo for each point in time can be computed as `df['K_out_m']/df['K_in_m']`. Try it below:

In [0]:
# Enter some computation in this cell, using one or more variables
# For example, you could compute the evaporative fraction:
my_EF = df['LvE_m'] / (df['Q_net_m'] - df['G_0_m'])

# If you want to see what your produced, print it
# print(my_EF)

## Plotting the data in the dataframe

### Basic plotting
The dedicated plotting command has two basic ways of operation (which cannot be mixed)
* `myplot(df,['Date','K_in'])`: 
    * 1st argument is the dataframe that contains all the data (in this case: `df`)
    * 2nd argument is a list of the names of the variables to be used as `x` and `y` in the plot (in this case `['Date','K_in']`; the square brackets indicated that this is a list).
* `myplot([x,y])`: you give the actual variables as a list (e.g. `x` could be `df['Date']` and `y` could be `df['p']*100` (to convert pressure to Pa). In thise case the plot function does not 'know' about the dataframe.

In [0]:
# Make a plot with myplot(df, .....)

# Make a plot with myplot([]) 
# (remove the hashtags to execute the commands and make the plot)
# foo = df['Date']
# bar = df['p'] * 100
# myplot([foo,bar])

### Multiple variables
If you want to plot multiple variables in a graph, you can provide multiplt lists as an argument, one list for each series to be plotted. So for the two different plot methods, this becomes:
* `myplot(df,['Date','K_in'], ['Date','K_out_m'])`: plot both `K_in` and `K_out` as a function of date
* `myplot([x, y], [x,z])`: plot both `y` and `z` as a function of `x` (where the variables `x`, `y` and `z` have been defined before.

But if you plot multiple variables, how can you distingish them? Well, in two ways:
* the plotting routine will automatically assign a new colour to the next plot
* you can select the plotting type:
  * `myplot(df,['Date','K_in','-'])`: plot a line
  * `myplot(df,['Date','K_in','o'])`: plot dots
  * `myplot(df,['Date','prec','#'])`: bar graph (only one series per graph)

In [0]:
# Compare two independent observations of global radiation (remove the hashtag to make the plot)
# myplot(df,['Date','K_in','-'],['Date','K_in_m','o'])

### Further tweaking of your plots

Now that you know how to make basic plots, it is time to add some extra options. The main message here: it is good to know that these things are possible. Use them when you need them.

#### Axis-labals
Set the label text on the x-axis and y-axis (in particular relevant if you do not plot from a dataframe): 
* `myplot( [x, y], xlabel='wind speed (m/s)', ylabel='temperature (K)')`

#### Name of series in legend
You can now manually set the name of a series, to be used in the legend (again, relevant if no dataframe is used): 
* `myplot( [x, y, 'o', 'my specially constructed variable'] )`. 

Note that in this case you *should* specify the type of plotting symbol (here a dot: `'o'`, could also be `'-'` for line and `'#'` for a bar graph).

#### Color dots in scatter plot with 3rd variables (color_by)
You can now color dots in a scatter plot with the values of a third variable. This works both when you plot from a dataframe (first argument `df`) or from variables that you defined yourselve (say `x`, `y` and `c`):
  * `myplot(df, ['Date', 'LvE_m', 'o'], color_by = 'T_1_5')`
  * `myplot([x, y, 'o'], color_by = c)`
  
#### Log axis and linear axis
You can specify if an axis should be linear or logarithmic, separately for the x-axis and the y-axis. You do this with the keywords `x_axis_type` and `y_axis_type`:
  * `myplot([x, y, 'o'], x_axis_type = 'linear', y_axis_type = 'log')`

#### Axis limits
You can specify axis limits for both axis (rather than the plot command using auto-scaling). You can also only defined limits for one of the axes. The keywords are `xlim` and `ylim` and both require a list (2 numbers between square brackets):
  * `myplot([x, y, 'o'], xlim = [0,10])`

In [0]:
# Do some experimens with the more advanced plotting options (remove hash tags to execute commands and make plot)
# x = df['u_10']
# y = df['T_1_5']
# z = df['K_in']
# myplot([x,y,'o','my special correlation'], color_by=z, xlabel='wind speed (m/s)', ylabel='temperature (K)')

## Functions in Python
There are roughly two reasons to use so-called *functions* in Python:
* When apply a **complex operation** on one or more variables, your Python code can become unclear -> replacing the complex operation by a simple function call clarifies your code
* When you apply a certain operation **repeatedly** to different datasets you easily make errors when copying code -> only define the function once ensure that your operation is done in the same way

You could consider a *function* as a way to hide the complexity of certain operations. It is like a black box: some variables are thrown into the function, inside the box something is happening, and the blackbox gives you back some results.

The definition of a function conceptually consists of four parts:
* the *name* of the function
* the *interface* of the function: the variables that will be use inside the function for the computations, and that come from outside the function
* the actual *operations* 
* a *return* command that gives back the result to the outside world. 

Let's assume that we need to compute the evaporative fraction repeatedly: $L_vE/(Q^*-G)$. The four parts defined above then would be:
* name: `evap_frac`
* interface: `(LvE, Qnet, G)`
* operations: `EF = LvE/(Qnet - G)`
* return: `return EF`

Putting all of this together gives:

    # The function definition starts with the word 'def',
    # followed by the name and the interface, and ends with a colon (':')
    def evap_frac(LvE, Qnet, G):
        # The actual calculation
        EF = LvE / (Qnet - G)
        # Return the outcome to the outside world
        return EF
        
Once you have defined the function, you can use it. Let's try this in the cell below.

In [0]:
# Define the function
def evap_frac(LvE, Qnet, G):
    # The actual calculation
    EF = LvE / (Qnet - G)
    # Return the outcome to the outside world
    return EF

# Now use it with the data
# Note that the name of the variables that go into the function have no connection to the
# names of the variables inside the function (we could also invoke the function with evap_frac(x,y,z))
my_EF = evap_frac(df['LvE_m'],df['Q_net_m'], df['G_0_m'])

# Print it to show that we did something useful
print(my_EF)

The calculation in the example function above was rather simple. However, you can make functions as complex as you like. Calculations in Python are quite straightforward: +, -, \*, / and \*\*  indicate addition, subtraction, multiplications, division and exponentiation (... to the power ...). For more complex operations, we can use functions defined in the `numpy` library (which we imported as `np`). For instance,  for the exponential (exp) we use `np.exp`.

## Up to the next exercise
Now that you finished this brief intro, it is time to start with the real work. Continue to Step 1.