# 10 Minutes to mooda

This is a introduction to mooda and the WaterFrame.

A WaterFrame contains three essential elements:

* A pandas DataFrame, located into WaterFrame.data
* A metadata dictionary, located into WaterFrame.metadata
* A meaning dictionary, located into WaterFrame.meaning

The **metadata dictionary** contains information about the DataFrame, i.e. The location of the measurements, the instrument models, some extra notes.

The **meaning dictionary** contains information about the meaning of the keys of the DataFrame. For example, if the DataFrame contains a key called "TEMP", in the meaning dictionary will be the information that explains that "TEMP" means "Seawater temperature."

There are two types of columns in the pandas DataFrame:

* **Parameter columns**: Columns that contain values of a parameter. The key of the column is the name of the parameter.
* **Quality Control columns**: Columns that contain the Quality Control Flag of the values of a parameter. The key of the column is named such as **{parameter name}_QC**.

The **index** of the pandas DataFrame must be **TIME**.


Customarily, we import as follows:

In [1]:
from mooda import WaterFrame
import numpy as np
import pandas as pd

## Object creation

Creating an empty [WaterFrame](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/__init__.md):

In [2]:
wf = WaterFrame()
wf

Memory usage: 264.00 Bytes
There is no data.

Creating a [WaterFrame](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/__init__.md) from a [pandas DataFrame]((https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/from_dataframe.md)), a metadata dictionary and a meanings dictionary:

In [3]:
# Creating a DataFrame
dates = pd.date_range('20180101000000', periods=20)
x = np.linspace(-np.pi, 4*np.pi, 20)
df = pd.DataFrame({'TEMP': np.sin(x)+10, 'PSAL': np.cos(x)*2 + 30}, index=dates)

# Creating metadata information
metadata = dict()
metadata['instrument'] = 'CTD'
metadata['latitude'] = '42.03'
metadata['longitude'] = '2.11'

# Creating parameter meanings
meaning = dict()
meaning['TEMP'] = {'long_name': 'sea_water_temperature',
                    'units': 'degree_celsius'}
meaning['PSAL'] = {'long_name': 'sea_water_practical_salinity',
                   'units': 'PSU'}

# Creating the WaterFrame
wf = WaterFrame(df=df, metadata=metadata, meaning=meaning)
wf

Memory usage: 1.06 KBytes
Parameters:
  - TEMP: sea_water_temperature (degree_celsius)
    - Min value: 9.003
    - Date min value: 2018-01-03 00:00:00
    - Max value: 10.969
    - Date max value: 2018-01-07 00:00:00
    - Mean value: 9.886
    - Values with QC = 1: 0.000 %
  - PSAL: sea_water_practical_salinity (PSU)
    - Min value: 28.000
    - Date min value: 2018-01-01 00:00:00
    - Max value: 32.000
    - Date max value: 2018-01-20 00:00:00
    - Mean value: 30.000
    - Values with QC = 1: 0.000 %

Creating a [WaterFrame](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/__init__.md) from a [NetCDF](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/from_netcdf.md) file:

In [4]:
path_netcdf = r"C:\Users\rbard\Google Drive\ok\git\mooda\docs\examples\example_data\example.nc"
wf = WaterFrame(path=path_netcdf)
wf

Memory usage: 904.00 Bytes
Parameters:
  - TEMP: sea_water_temperature (degree_celsius)
    - Min value: 9.003
    - Date min value: 2018-01-03 00:00:00
    - Max value: 10.969
    - Date max value: 2018-01-07 00:00:00
    - Mean value: 9.886
    - Values with QC = 1: 0.000 %
  - PSAL: sea_water_practical_salinity (PSU)
    - Min value: 28.000
    - Date min value: 2018-01-01 00:00:00
    - Max value: 32.000
    - Date max value: 2018-01-20 00:00:00
    - Mean value: 30.000
    - Values with QC = 1: 0.000 %

Creating a [WaterFrame](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/__init__.md) from a [CSV](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/from_csv.md) file:

In [5]:
path_csv = r"C:\Users\rbard\Google Drive\ok\git\mooda\docs\examples\example_data\example.csv"
wf.from_csv(path_csv, comment="#", sep=";", index_col=0)
wf

Memory usage: 2.56 KBytes
Parameters:
  - TEMP: sea_water_temperature (degree_celsius)
    - Min value: 9.003
    - Date min value: 2018-01-03
    - Max value: 10.969
    - Date max value: 2018-01-07
    - Mean value: 9.886
    - Values with QC = 1: 0.000 %
  - PSAL: sea_water_practical_salinity (PSU)
    - Min value: 28.000
    - Date min value: 2018-01-01
    - Max value: 32.000
    - Date max value: 2018-01-20
    - Mean value: 30.000
    - Values with QC = 1: 0.000 %

## Viewing data

Here is how to access to the metadata diccionary:

In [6]:
wf.metadata

{'instrument': 'CTD', 'latitude': '42.03', 'longitude': '2.11'}

Display the [metadata information](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/info_metadata.md):

In [7]:
print("METADATA:")
print(wf.info_metadata())

METADATA:
  - instrument: CTD
  - latitude: 42.03
  - longitude: 2.11


Display the [meaning information](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/info_meaning.md):

In [8]:
print("MEANING:")
print(wf.info_meaning())

MEANING:
  - dim_0
  - TEMP
    - long_name: sea_water_temperature
    - units: degree_celsius
  - PSAL
    - long_name: sea_water_practical_salinity
    - units: PSU
  - DEPTH
    - long_name: depth_of_measure
    - units: meters


[parameters()](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/parameters.md) returns a list with the keys of the DataFrame, but expluding the QC columns.

In [9]:
wf.parameters()

['TEMP', 'PSAL']

Checking the [min](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/min.md), [max](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/max.md) and [mean](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/mean.md) of a parameter:

In [10]:
time_min, value_min = wf.min('TEMP')
time_max, value_max = wf.max('TEMP')
value_mean = wf.mean('TEMP')

print("TEMPERATURE INFO:")
print(f"Min value: {value_min}, at {time_min}")
print(f"Max value: {value_max}, at {time_max}")
print(f"Mean value: {value_mean}")

TEMPERATURE INFO:
Min value: 9.00341550699333, at 2018-01-03
Max value: 10.969400265939331, at 2018-01-07
Mean value: 9.886011481529044


Know how many RAM [memory](https://github.com/rbardaji/mooda/blob/master/docs/api_reference/waterframe/momory_usage.md) is using the WaterFrame.

In [11]:
print(wf.memory_usage(), "Bytes")

2564 Bytes


## Selection 

### Getting metadata

Here is how to access to the metadata diccionary:

In [12]:
wf.metadata

{'instrument': 'CTD', 'latitude': '42.03', 'longitude': '2.11'}

Adding new element to the metadata:

In [13]:
wf.metadata['new information'] = 'example of indformation'
wf.metadata

{'instrument': 'CTD',
 'latitude': '42.03',
 'longitude': '2.11',
 'new information': 'example of indformation'}

Deleting an element of the metadata:

In [14]:
del wf.metadata['new information']
wf.metadata

{'instrument': 'CTD', 'latitude': '42.03', 'longitude': '2.11'}

### Getting meaning 

Here is how to access to the meaning dictionary:

In [15]:
wf.meaning

{'dim_0': {},
 'TEMP': {'long_name': 'sea_water_temperature', 'units': 'degree_celsius'},
 'PSAL': {'long_name': 'sea_water_practical_salinity', 'units': 'PSU'},
 'DEPTH': {'long_name': 'depth_of_measure', 'units': 'meters'}}

Display an element of meaning:

In [16]:
wf.meaning['TEMP']

{'long_name': 'sea_water_temperature', 'units': 'degree_celsius'}

In [17]:
wf.meaning['TEMP']['long_name']

'sea_water_temperature'

Adding new element to meaning:

In [18]:
wf.meaning['TEMP']['other_name'] = 'other name for the parameter'
wf.meaning['TEMP']

{'long_name': 'sea_water_temperature',
 'units': 'degree_celsius',
 'other_name': 'other name for the parameter'}

Delete an element of meaning:

In [19]:
del wf.meaning['TEMP']['other_name']
wf.meaning['TEMP']

{'long_name': 'sea_water_temperature', 'units': 'degree_celsius'}

### Getting data

WaterFrame.data is a pandas DataFrame. You can use all the pandas Dataframe methods.
This is how to access to the DataFrame:

In [20]:
wf.data

Unnamed: 0,TEMP,PSAL,TEMP_QC,PSAL_QC,TEMP_QC_QC,PSAL_QC_QC
2018-01-01,10.0,28.0,0,0,0,0
2018-01-02,9.264276,28.645437,0,0,0,0
2018-01-03,9.003416,30.165159,0,0,0,0
2018-01-04,9.385787,31.578281,0,0,0,0
2018-01-05,10.164595,31.972723,0,0,0,0
2018-01-06,10.837166,31.093896,0,0,0,0
2018-01-07,10.9694,29.509029,0,0,0,0
2018-01-08,10.475947,28.241052,0,0,0,0
2018-01-09,9.675301,28.108366,0,0,0,0
2018-01-10,9.084227,29.196609,0,0,0,0


However, you can select a single column, which yields a Series, equivalent to *wf.data.TEMP*:

In [24]:
wf['TEMP']

2018-01-01    10.000000
2018-01-02     9.264276
2018-01-03     9.003416
2018-01-04     9.385787
2018-01-05    10.164595
2018-01-06    10.837166
2018-01-07    10.969400
2018-01-08    10.475947
2018-01-09     9.675301
2018-01-10     9.084227
2018-01-11     9.084227
2018-01-12     9.675301
2018-01-13    10.475947
2018-01-14    10.969400
2018-01-15    10.837166
2018-01-16    10.164595
2018-01-17     9.385787
2018-01-18     9.003416
2018-01-19     9.264276
2018-01-20    10.000000
Name: TEMP, dtype: float64

Selecting via [], which slices the rows.

In [25]:
wf[0:3]

Unnamed: 0,TEMP,PSAL,TEMP_QC,PSAL_QC,TEMP_QC_QC,PSAL_QC_QC
2018-01-01,10.0,28.0,0,0,0,0
2018-01-02,9.264276,28.645437,0,0,0,0
2018-01-03,9.003416,30.165159,0,0,0,0


#### Boolean Indexing

Using a single column’s values to select data.

In [28]:
wf[wf['TEMP'] > 10]

Unnamed: 0,TEMP,PSAL,TEMP_QC,PSAL_QC,TEMP_QC_QC,PSAL_QC_QC
2018-01-05,10.164595,31.972723,0,0,0,0
2018-01-06,10.837166,31.093896,0,0,0,0
2018-01-07,10.9694,29.509029,0,0,0,0
2018-01-08,10.475947,28.241052,0,0,0,0
2018-01-13,10.475947,31.758948,0,0,0,0
2018-01-14,10.9694,30.490971,0,0,0,0
2018-01-15,10.837166,28.906104,0,0,0,0
2018-01-16,10.164595,28.027277,0,0,0,0


Using the isin() method for filtering:

In [32]:
wf2 = wf.copy()
wf2['filter'] = [1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1]
wf2.data

Unnamed: 0,TEMP,PSAL,TEMP_QC,PSAL_QC,TEMP_QC_QC,PSAL_QC_QC,filter
2018-01-01,10.0,28.0,0,0,0,0,1
2018-01-02,9.264276,28.645437,0,0,0,0,1
2018-01-03,9.003416,30.165159,0,0,0,0,1
2018-01-04,9.385787,31.578281,0,0,0,0,1
2018-01-05,10.164595,31.972723,0,0,0,0,0
2018-01-06,10.837166,31.093896,0,0,0,0,0
2018-01-07,10.9694,29.509029,0,0,0,0,0
2018-01-08,10.475947,28.241052,0,0,0,0,0
2018-01-09,9.675301,28.108366,0,0,0,0,1
2018-01-10,9.084227,29.196609,0,0,0,0,1


In [34]:
wf2[wf2['filter'].isin(['1'])]

Unnamed: 0,TEMP,PSAL,TEMP_QC,PSAL_QC,TEMP_QC_QC,PSAL_QC_QC,filter
2018-01-01,10.0,28.0,0,0,0,0,1
2018-01-02,9.264276,28.645437,0,0,0,0,1
2018-01-03,9.003416,30.165159,0,0,0,0,1
2018-01-04,9.385787,31.578281,0,0,0,0,1
2018-01-09,9.675301,28.108366,0,0,0,0,1
2018-01-10,9.084227,29.196609,0,0,0,0,1
2018-01-11,9.084227,30.803391,0,0,0,0,1
2018-01-12,9.675301,31.891634,0,0,0,0,1
2018-01-13,10.475947,31.758948,0,0,0,0,1
2018-01-14,10.9694,30.490971,0,0,0,0,1
