# The fun part: Scientific & Numerical Data Structures
Although `Python` was not originally designed for scientific data, its community-based and free nature allowed for the development of packages and libraries that handle it nicely & efficiently. 

## Beyond the data types and collections we already seen, scientific computation need specific data structures, as we want to be able to handle (efficiently):
- large amounts of data
- multiple dimensions
- mathematical and statistical operation over parts or the whole data set
- metadata and data attributes

### Here we will use these basic modules of scientific & numerical computing: `math`, `scipy`, `numpy`, `pandas`, `xarray` & `matplotlib`

Note: these packages are also referred as libraries or modules

*** 
## `math` module
Basic math operation (as methods or functions) and numbers (as attributes)

<div class="alert alert-block alert-info">
    - Execute the code below to try different ways to import the <b>math</b> module. 
    <br>
</div>

In [None]:
import math 
import math as m
from math import pi
print(math.pi, m.pi, pi,'\n\n')

<div class="alert alert-block alert-info">
    - Try the next code to use some functions from the math module
</div>

In [None]:
rads = 2*pi
degrees = math.degrees(rads)
print('{} radians is equal to {} degrees'.format(rads,m.floor(degrees)))

***
## `SciPy` is a mathematics, science & engineering ecosystem in `Python`. 
## It is also the name of the library which contains core numerical routines.
<img src='images/scipy_logo.png' width=300>

## Within this ecosystem, we will use three basic modules: numpy, pandas, & matplotlib. `SciPy` library is integrated with these packages.

***
<img src='images/numpy_logo.jpeg' width=300>

## `NumPy` is the basic scientific module in `Python`. Its most important characteristics: 

- Multi-dimensional arrays objects (ordered, changeable, allows duplicates)
- Broadcasting functions

<div class="alert alert-block alert-info">
    - Execute the code in the next cell to import numpy and define two arrays
    <br>
    - Print both objects
    <br>
    - Print the type of object of <b>a</b>
    <br>
    - Print the element of <b>b</b> equal to 7, using the indexing <b>b[row,column]</b>. Remember that indices start at 0

In [None]:
import numpy as np
a=np.ones((3,5))
b=np.arange(15).reshape(3,5)

<div class="alert alert-block alert-info">
    - Print the following attributes of the defined <b>numpy</b> objects: ndim, shape
    <br>
    - Print the output of the following methods of a <b>numpy</b> object: max(), max(axis=0), sum()
</div>

<div class="alert alert-block alert-info">
    - Try an elementwise operation between <b>a</b> & <b>b</b>, like + or *
</div>


### Indexing a `numpy` array
Numpy indexing is very logical. Just remember that indices start at zero.
<div class="alert alert-block alert-info">
    - Run the code in the following cell
    <br>
    - Then add the necessary the code to print the correct element(s) in the next cell
    <br>
    <b>Hint:</b> Use <b>-1</b> to indicate the last element, and <b>n:</b> or <b>:n</b> for "n to end" & "first to n" elements
</div>

In [None]:
c=a+b
print('Entire array')
print(c)
print('Second row')
print(c[1])
print('First column')
print(c[:,0])

In [None]:
# print the second element in the first row

# print the last two elements of the second column

# print the element last column

# use the syntaxis c[[r1,r2],[c1,c2]] to print first and last elements of the array


***
<img src='images/pandas-logo.png' width=300>

## `Pandas` is a package for high-performance data structures. 
### Best characteristics include the use of :
- 2-D Tables
- Indexing by labels and numerical indices

### Building a `pandas` dataframe start with defining `Series` that will become the (column-wise) data in a `DataFrame`
<div class="alert alert-block alert-info">
    -Try the code in the next cells
</div

In [None]:
import pandas as pd
s=pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)

## Now lets create a `DataFrame`
### We need to define our elements as a dictionary first, and then create the DataFrame

In [None]:
# Dictionary - 2D
d = {'A':s, 'B':pd.Series([5,6,3,4,1],index=['a', 'b', 'c', 'd', 'e'])}
print(d)
print(type(d))
# Creating the DataFrame
print('\n*** DataFrame ***\n')
df = pd.DataFrame(d)
print(df)

### Creating a DataFrame from a np.array

In [None]:
df2 = pd.DataFrame(np.arange(15).reshape(3,5),index=['a','b','c'],columns=['c1','c2','c3','c4','c5'])
print(df2)

### Finally, accesing the data in a `DataFrame`

In [None]:
print('Second column')
print(df2.c2)
print('\nFirst column')
print(df2['c1']) 
print('\nAdding column')
df2['R']=df2.c2+df2.c3
print(df2)

<img src='images/xarray-logo-square.png' width=300>

## `xarray` is another, more sophsiticated package for scientific computing
### Objects are multidimensional & labelled arrays ... and the labels are not only in dimension form, but can also be coordinates 
### `xarray` objects is modeled based on `netcdf` file format; they have metadata & attributes