# External libraries of functions
External libraries contain functions that perform specific tasks such as statistical analysis or data visualization. These libraries complement the functions of Python (built-in functions) and to be able to use them **we need import them** into our Notebook. We are going to look only at some of the most popular Python libraries like NumPy and Pandas. The libraries that we will see come with Anaconda so we will not need to install them. These are some of the external libraries that Anaconda has installed by default.

<img src='util/Anaconda_librerias.png'>

## 1. NumPy
NumPy is a library of mathematical functions that is very popular for performing mathematical operations on vectors and matrices. NumPy has many functions but in this course we only cover a few but if you want more information about NumPy and all the functions it contains you can go to this link: https://numpy.org/doc/stable/user/index.html

<img src='util/Numpy_funciones.png'>

Being an external library, i.e. the functions are not built-in in Python, the first thing we must do to be able to use the NumPy functions is import NumPy and give it a name, for example *np*.

In [None]:
import numpy as np # We import the NumPy library as np (we can give it another name if we want)

From now on to use NumPy functions we must put np. followed by the name of the function.

⚡ If in a cell we put np. and press the Tab key we will see the list of available functions of Numpy. As you can see, are lots of them.

In [None]:
np.

### 1.1 Array
Un concepto muy importante en NumPy es *array*, es como se llama a las estructuras de valores que utiliza NumPy. Un *array* es similar a una lista de valores pero en un formato que NumPy reconoce y que es compatible con las funciones de Numpy.

Con la funcion ***array*** de NumPy podemos crear un array (vector o matriz) a partir de una lista de valores

In [None]:
myList = [2,5,10,20,6] # create a list of values
myArray = np.array(myList) # create an array from the list of values

In [None]:
myListmyList

In [None]:
myArray.

myArray is now a list of values that has the *array* format making it compatible with NumPy functions

### 1.2 Basic functions of NumPy
NumPy contains a number of basic functions that are very similar to some of Python's built-in functions, for example:

To create sequences of values

In [None]:
myArray = np.arange(1,10,1)

In the case of myArray, we have created a list of values of type *array* (that is, a vector or matrix) which is the proper format of NumPy

In [None]:
myArray

To calculate the maximum value

In [None]:
np.max(myArray)

In [None]:
max(myArray)

❗ As you can see, the two functions have the same name *max*, but we can distinguish the one that is from NumPy because we must put *np.* before the name of the function.

To calculate the number of values or elements we have the function *size* which is also similar to the Python built-in function called *len*.

In [None]:
np.size(myArray)

In [None]:
len(myArray)

There are other very useful functions that come with Numpy that are not available as Python's built-in functions.

In [None]:
np.sqrt(81) # to calculate the square root of a number

In [None]:
np.zeros(10) # to create an array of a certain size where all the elements are 0

In [None]:
np.mean([myArray]) # to calculate the mean value of a list or an array

In [None]:
np.random.random(10) # to create an array of a given size with random numbers

⚡ **Exercise:** Now try to create a matrix of random values. The matrix must have 10 rows and 5 columns

⚡ **Exercise:** Now try to create a matrix of random values according to a normal distribution. The matrix must have 10 rows and 5 columns, and the values must have a mean value of 10 and a standard deviation of 3. Store it in a variables called *myMatrix*

⚡ **Exercise:** Now calculate the mean of *myMatrix*.

## 2. Pandas
Pandas is a Python library specialized in the management and analysis of data structures, in particular data tables. With Pandas you can:

- Easily read and write files in CSV and Excel format.
- Extract data from tables using indices or names for rows and columns.
- Provides methods for reordering, splitting, and combining data sets.
- Work with time series and easily handle dates.

Pandas has many functions, in this course we only cover a few but if you want more information about Pandas and all the functions that it contains you can go to this link: https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html

<img src='util/Pandas_funciones.png'>

Since Pandas is an external library, the first thing we must do is to import Pandas and give it a name, for example *pd*.

In [None]:
import pandas as pd

From now on to use the Pandas functions we must put pd. followed by the name of the function.

⚡ If in a cell we put ps. and press the Tab key we will see the list of available functions of Pandas.

In [None]:
pd.

Si quereis saber mas sobre como crear y modificar dataframes: https://pandas.pydata.org/docs/user_guide/index.html

### 2.1 2.1 Panda and Excel
#### Read Excel files and save their data as Pandas dataframes 
One of the best features of Pandas is that you can read Excel files and extract their data as *dataframes*.

A very important concept in Pandas is *dataframe*, it is what the value structures used by Pandas are called. A *dataframe* is the core of Pandas and we can think of a *dataframe* as an Excel table. We are going to create a *dataframe* with Pandas to better understand what it is and what it can be used for.

With the function ***read_excel*** the Excel tables are converted to *dataframe* and thus we can use the functions of Pandas

In [None]:
data_df = pd.read_excel('Datos/data example 1.xlsx')

Let's see the data stored in the dataframe *data_df*

In [None]:
data_df

Like in Excel, the left column are the index values of each row and the first row is the name of each column.

But we would like to have the *date* column as index, for this purpose we need to run the code below

In [None]:
data_df = pd.read_excel('Datos/data example 1.xlsx',index_col = 'date')
data_df # to print the data on screen

If we want to extract only part of the data, we can for example extract only data from a the column *rain*

In [None]:
data_df["rain"]

In [None]:
data_df["rain"]>10

Let's extract the data from the dataframe *data_df* corresponding to days with rainfall higher than 10. 

In [None]:
data_df[data_df["rain"]>10]

Now let's have a look at the index column of the dataframe

In [None]:
data_df.index

Pay attention on the type of data stored in each column. See that the index column has a datetime format. This allows us to extract data for certain dates.

In [None]:
data_df.index.year

In [None]:
data_df.index.year == 2010

Let's extract only the data that corresponds only to the year 2018

In [None]:
data_df['2011-02-01':'2011-02-28'] # we can also extract data comprised between two dates

In [None]:
data_df['2011-02-01':'2011-02-28'][data_df['outflow']>4000] # and add another condition

We can also save the extracted data into an Excel file by using the Pandas function *to_excel*

## 3. Matplotlib
Matplotlib is a Python library specialized in creating two-dimensional plots. It lets you create and customize the most common types of charts:
- Line graphs
- Bar graphs
- Dot plots
- Box and whisker plots
...

In this course we are going to see how to create a simple graph and how to customize it with a few functions.

<img src='util/Matplotlib_funciones.png'>

But if you want to know more about the different types of graphs available with Matplotlib: (https://matplotlib.org/2.0.2/gallery.html). And what parameters we can use to modify them: https://matplotlib.org/stable/plot_types/index

The first thing is to import the library. Specifically, we must import a Matplotlib module called `pyplot` and which we are going to call `plt`.

In [None]:
import matplotlib.pyplot as plt

Let's plot the rainfall data from the *data_df* dataframe using the function plot.

In [None]:
plt.plot(data_df['rain'])

We would like to change the size of the plot area to facilitate the visualization the plotted data

In [None]:
plt.figure(figsize=(15,3)) # to define the plot size
plt.plot(data_df['rain'])

Now we would like to change the color of the plotted line

In [None]:
plt.figure(figsize=(15,3))
plt.plot(data_df['rain'], color = 'blue') # we ise the parameter color inside of the plot function

Let's now indicate the units of the y axis, in this case mm/day

In [None]:
plt.figure(figsize=(15,3))
plt.plot(data_df['rain'], color = 'blue')
plt.ylabel('mm/day')

Now we would like to plot only data for the year 2018

In [None]:
plt.figure(figsize=(15,3))
plt.plot(data_df['2018-01-01':'2018-12-31']['rain'], color = 'blue')
plt.ylabel('mm/day')

What if we would like to represent as a bar plot instead of a line. For that purpose we can use the function *bar* instead of *plot*

In [None]:
plt.figure(figsize=(15,3))
plt.bar(data_df['2018-01-01':'2018-12-31']['rain'], color = 'blue')
plt.ylabel('mm/day')

⚡ We get an error, do you know why? Hint: have a look at the help menu of the function *bar*

In [None]:
plt.figure(figsize=(15,3))
plt.bar(data_df['2018-01-01':'2018-12-31'].index,data_df['2018-01-01':'2018-12-31']['rain'], color = 'lightblue')
plt.ylabel('mm/day')

### Subplots
With this Matplotlib function we are going to be able to combine different data in the same plot area.

In [None]:
fig, ax =plt.subplots(figsize=(15,6))
ax.plot(data_df['2018-01-01':'2018-12-31']['outflow'], color = 'darkblue')
ax.set(ylabel = 'm3/day')

Now let's combine the rainfall and outflow graphs in the same plot area

In [None]:
fig, ax =plt.subplots(2,1,figsize=(15,6)) #

ax[0].bar(data_df['2018-01-01':'2018-12-31'].index,data_df['2018-01-01':'2018-12-31']['rain'], color = 'blue')
ax[0].set(ylabel='mm/day')

ax[1].plot(data_df['2018-01-01':'2018-12-31']['outflow'], color = 'darkblue')
ax[1].set(ylabel = 'm3/day')

Or combine the rainfall and outflow data in the same graph

In [None]:
fig, ax =plt.subplots(figsize=(15,6)) #

ax.bar(data_df['2018-01-01':'2018-12-31'].index,data_df['2018-01-01':'2018-12-31']['rain'], color = 'blue')
ax.set(ylabel='mm/day')

ax2 = ax.twinx()
ax2.plot(data_df['2018-01-01':'2018-12-31']['outflow'], color = 'darkblue')
ax2.set(ylabel = 'm3/day')