# Python basics
***

### General comments
The first step in every Python script is to load those packages that we'll use during the analysis. A package is a set of tools that are not included in the built-in Python tools.

There are four packages that are commonly used and we will usually load: 
*  __[NumPy](http://www.numpy.org/)__ is a fundamental package for scientific computing that includes N-dimensional array objects, linear algebra, Fourier transforms, random number capabilities... __NumPy__ uses a vector structure called *array*; data in an *array* must be always of the same nature, i.e., integer, floating point number, string... To import __NumPy__, use the following command:
> ```Python
import numpy as np
```

*  __[pandas](https://pandas.pydata.org/)__ is a pacakge that allows organizind data in a structure named *data frame*. *Data frames* resemble the usual Excel table, in the sense that columns represent variables and rows represent samples. All the elements of a column (variable) must be of the same nature (integer, string...), but different columns may differ in the type of data they contain. As Excel talbes, a _data frame_ has an index and heading that identifies rows and columns, respectively, that allow us to search for specific values. To import __pandas__, use the following command:
> ```Python
import pandas as pd
```

* __[matplotlib](https://matplotlib.org/)__ is a package designed to plot graphs similar to those in Matlab. To import __matplotlib__, you need the following commands:
> ```Python
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-whitegrid')
```

* __[SciPy](https://www.scipy.org/)__ contains several numerical tools that are efficient and easty to apply, e.g., numerical integration and optimization. We will not load the complete set of tools in __SciPy__, but those we need:
> ```Python
from scipy.stats import genextreme
from scipy.optimize import curve_fit
```

* [__os__](https://docs.python.org/3.4/library/os.html) is a package that allows us to change the working directory, create new directories, list the files contained in a directory, etc. To import it:
> ```Python
import os
```

In [None]:
import numpy as np

import pandas as pd

from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('seaborn-whitegrid')

from scipy.stats import genextreme
from scipy.optimize import curve_fit

import os

In case you need to install some of those packages, you'll need to do the following (example to install SciPy):<br>
*  Launch Anaconda Prompt<br>
*  Type `conda install scipy` + `Enter`<br>

We're going to install a variable inspector to be able to check the existing objects in our analysis:<br>
*  Launch Anaconda Prompt<br>
*  Type:
> `pip install jupyter_contrib_nbextensions` + `Enter`<br>
`jupyter contrib nbextension install --user` + `Enter`<br>
`jupyter nbextension enable varInspector/main` + `Enter`<br>

### Basic data structures in Python
**Lists**<br>
Lists are a data structure that can contain data of any type (integer, float, strings...) in a single object. Lists are mutable, meaning that we can modify the values inside a list after its declaration.

In [None]:
# create a list
a = [1, 'hello', 1.5]

In [None]:
# extract a value from the list


In [None]:
# modify one of the values in the list


**Tuples**<br>
Tuples are a data structure similar to lists because they can also contain data of any type. Contrary to lists, tuples can no be modified after declared.

In [None]:
# create a lista
b = (2, 'red', np.nan)

In [None]:
# extract a value from the tuple


In [None]:
# modify one of the values in the tuple


**Arrays**<br>
This is a specific structure of the package *NumPy* that allows us to work with vectores and matrices, and perform calculations upon them easily. All the values in an array must be of the same data type.

In [None]:
# create an array from the list 'a'


In [None]:
# create an array
c = np.array([1.5, 2.1, 4.5])

In [None]:
# extract values from the array


In [None]:
# invert the array


In [None]:
# modify a value in the array


In [None]:
# calculate the mean of the array


**Pandas: _series_ and _data frames_**<br>
_Pandas_ is a package suitable for working with bidimensional (_data frames_) or unidimensional (_series_) tables. Pandas' structures use the tools in *NumPy* to perform easily several tasks with the table. In _Pandas_, all the data contained in a column of the table must be of the same type; different columns may have different types of data.

In [None]:
# create a 'data frame' with name, age and weight
d = [['Peter', 36, 71],
     ['Laura', 40, 58],
     ['John', 25, 65]]
d = pd.DataFrame(data=d, columns=['name', 'age', 'weight'])
d

In [None]:
# a column in a data frame is a series


In [None]:
# calculate the mean of the dataframe


**Dictionaries**<br>
A dictionary can store several data structures (from those above mentioned) in a single object. We need to set a _key_ to access any of the data structures included in the dictionary.

In [None]:
# crear un diccionario que contenga todos los datos anteriormente creados
# siendo la clave el tipo de estructura
# create a dictionary that contains all the data structures previously created
# in this example, the key will be the type of structure
e = {'list': a,
     'tuple': b,
     'array': c,
     'dataframe': d}

In [None]:
# extract one of the structures from the dictionary
