# The fun part: Scientific & Numerical Data Structures
## Beyond the data types and collections we already seen, scientific computation need specific data structures. 
## We want to be able to handle (efficiently):
- large amounts of data
- multiple dimensions
- mathematical and statistical operation over parts or the whole data set
- metadata and data attributes

### Thanks to the free, community based and block-building nature of `Python`, we have awesome scientific packages that can do it!

### Here we will review and use: `math`, `scipy`, `numpy`, `pandas`, `xarray`, & `matplotlib`

Note: these packages are also referred as libraries or modules

*** 
## `math` module
Basic math operation (as methods or functions) and numbers (as attributes)

<div class="alert alert-block alert-info">
    - Try the next code to use the <b>math</b> module. 
    <br>
    - Try some other methods/attribute listed at: https://docs.python.org/3/library/math.html
    </div>

In [None]:
import math
pi = math.pi
rads = 2*pi
degrees = math.degrees(rads)
print('{} radians is equal to {} degrees'.format(rads,math.floor(degrees)))

***
## `SciPy` is a mathematics, science & engineering ecosystem in `Python`. 
## It is also the name of the library which contains core numerical routines.
<img src='images/scipy_logo.png' width=300>

## Within this ecosystem, we will use three basic modules: numpy, pandas, & matplotlib. `SciPy` library is integrated with these packages.

***
<img src='images/numpy_logo.jpeg' width=300>

## `NumPy` is the basic scientific module in `Python`. Its most important characteristics: 

- Multi-dimensional arrays objects 
- Broadcasting functions

<div class="alert alert-block alert-info">
    - Execute the code in the next cell to import numpy and define two arrays
    <br>
    - Print both objects
    <br>
    - Print the type of object of <b>a</b>
    <br>
    - Print the element of <b>b</b> equal to 7, using the indexing <b>b[row,column]</b>. Remember that indices start at 0

In [19]:
import numpy as np
a=np.ones((3,5))
b=np.arange(15).reshape(3,5)

<div class="alert alert-block alert-info">
    - Print the following attributes of the defined <b>numpy</b> objects: ndim, shpe, data
    <br>
    - Print the output of the following methods of a <b>numpy</b> object: max, max(axis=0), sum
</div>

<div class="alert alert-block alert-info">
    - Try an elementwise operation between <b>a</b> & <b>b</b>, like + or *
</div>


### Indexing a `numpy` array
Numpy indexing is very logical. Just remember that indices start at zero.
<div class="alert alert-block alert-info">
    - Run the code in the following cell
    <br>
    - Then add the necessary the code to print the correct element(s) in the next cell
    <br>
    <b>Hint:</b> Use <b>-1</b> to indicate the last element, and <b>n:</b> or <b>:n</b> for "n to end" & "first to n" elements
</div>

In [29]:
c=a+b
print('Entire array')
print(c)
print('Second row')
print(c[1])
print('First column')
print(c[:,0])

Entire array
[[ 1.  2.  3.  4.  5.]
 [ 6.  7.  8.  9. 10.]
 [11. 12. 13. 14. 15.]]
Second row
[ 6.  7.  8.  9. 10.]
First column
[ 1.  6. 11.]


In [32]:
# print the second element in the first row

# print the last two elements of the second column

# print the element last column

# use the syntaxis c[[r1,c1],[r2,c2]] to print first and last elements of the array


***
<img src='images/pandas-logo.png' width=300>

## `Pandas` is a package for high-performance data structures. 
### Best characteristics include the use of :
- 2-D Tables
- Indexing by labels and numerical indices

### Building a `pandas` dataframe start with defining `Series` that will become the (column-wise) data in a `DataFrame`
<div class="alert alert-block alert-info">
    -Try the code in the next cells
</div

In [44]:
import pandas as pd
s=pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)

a    0
b    1
c    2
d    3
e    4
dtype: int64


## Now lets create a `DataFrame`
### We need to define our elements as a dictionary first, and then create the DataFrame

In [47]:
# Dictionary - 2D
d = {'A':s, 'B':pd.Series([5,6,3,4,1],index=['a', 'b', 'c', 'd', 'e'])}
print(d)
print(type(d))
# Creating the DataFrame
print('\n*** DataFrame ***\n')
df = pd.DataFrame(d)
print(df)

{'A': a    0
b    1
c    2
d    3
e    4
dtype: int64, 'B': a    5
b    6
c    3
d    4
e    1
dtype: int64}
<class 'dict'>

*** DataFrame ***

   A  B
a  0  5
b  1  6
c  2  3
d  3  4
e  4  1


### Creating a DataFrame from a np.array

In [50]:
df2 = pd.DataFrame(np.arange(15).reshape(3,5),index=['a','b','c'],columns=['c1','c2','c3','c4','c5'])
print(df2)

   c1  c2  c3  c4  c5
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14


### Finally, accesing the data in a `DataFrame`

In [55]:
print('Second column')
print(df2.c2)
print('\nFirst column')
print(df2['c1']) 
print('\nAdding column')
df2['R']=df2.c2+df2.c3
print(df2)

Second column
a     1
b     6
c    11
Name: c2, dtype: int64

First column
a     0
b     5
c    10
Name: c1, dtype: int64

Adding column
   c1  c2  c3  c4  c5   R
a   0   1   2   3   4   3
b   5   6   7   8   9  13
c  10  11  12  13  14  23


<img src='images/xarray-logo-square.png' width=300>

## `xarray` is another, more sophsiticated package for scientific computing
### Objects are multidimensional & labelled arrays ... and the labels are not only in dimension form, but can also be coordinates 
### `xarray` objects is modeled based on `netcdf` file format; they have metadata & attributes

***
##  `DataArrays` is the basic structure in `xarray`
<div class="alert alert-block alert-info">
    - Execute the following code to create a <b>DataArray</b>, and then print it with its attributes  
    </div>

In [38]:
import xarray as xr
# DataArrays
x = xr.DataArray([[20,24,21,18],[21,23,26,22],[19,23,25,21]], name='SST', coords={'lat':[30,35,40],'lon':[-145, -140,-135,-130 ]}, dims=('lat','lon'))
print(x)
print('Values')
print(x.values)
print('Coordinates')
print(x.coords)
print('Dimensions')
print(x.dims)

<xarray.DataArray 'SST' (lat: 3, lon: 4)>
array([[20, 24, 21, 18],
       [21, 23, 26, 22],
       [19, 23, 25, 21]])
Coordinates:
  * lat      (lat) int64 30 35 40
  * lon      (lon) int64 -145 -140 -135 -130
Values
[[20 24 21 18]
 [21 23 26 22]
 [19 23 25 21]]
Coordinates
Coordinates:
  * lat      (lat) int64 30 35 40
  * lon      (lon) int64 -145 -140 -135 -130
Dimensions
('lat', 'lon')


### Lets try a basic operation on our DataArray
<div class="alert alert-block alert-info">
    - Execute the following code that averages the data array
    </div>

In [39]:
print(x.mean())
print(x.mean(dim={'lon','lat'}))
print(x.mean(dim='lon'))

<xarray.DataArray 'SST' ()>
array(21.916667)
<xarray.DataArray 'SST' ()>
array(21.916667)
<xarray.DataArray 'SST' (lat: 3)>
array([20.75, 23.  , 22.  ])
Coordinates:
  * lat      (lat) int64 30 35 40


## `DataSets` are collection of `DataArrays` with similar dimensions/coordinates, packed together in a dictionary-like structure. 
<div class="alert alert-block alert-info">
    - Try the next code to create a <b>DataSet</b>
 </div

In [43]:
ds = xr.Dataset({'SST1':x,'SST2':x+0.5,'SST3':x-0.5})
print(ds)
print('\n *** Only SST2 ***\n')
print(ds.SST2)

<xarray.Dataset>
Dimensions:  (lat: 3, lon: 4)
Coordinates:
  * lat      (lat) int64 30 35 40
  * lon      (lon) int64 -145 -140 -135 -130
Data variables:
    SST1     (lat, lon) int64 20 24 21 18 21 23 26 22 19 23 25 21
    SST2     (lat, lon) float64 20.5 24.5 21.5 18.5 21.5 ... 19.5 23.5 25.5 21.5
    SST3     (lat, lon) float64 19.5 23.5 20.5 17.5 20.5 ... 18.5 22.5 24.5 20.5

 *** Only SST2 ***

<xarray.DataArray 'SST2' (lat: 3, lon: 4)>
array([[20.5, 24.5, 21.5, 18.5],
       [21.5, 23.5, 26.5, 22.5],
       [19.5, 23.5, 25.5, 21.5]])
Coordinates:
  * lat      (lat) int64 30 35 40
  * lon      (lon) int64 -145 -140 -135 -130


*** 
### Now that you know the basics, lets really learn how to use `xarray`