# OMS CSV data files
----

**Documentation Author:** [Niccolò Tubini](https://www.researchgate.net/profile/Niccolo_Tubini2)

**To whom address questions:** 
 - [Niccolò Tubini](https://www.researchgate.net/profile/Niccolo_Tubini2) 
 - [GEOframe Users Group](https://groups.google.com/forum/#!forum/geoframe-components-users)
 - [GEOframe Developers Mailing List](https://groups.google.com/forum/#!forum/geoframe-components-developers)
 
**Version:** 0.98

**Keywords:** OMS3, OMS csv data file 

**License:** [GPL3 v3](https://www.gnu.org/licenses/gpl-3.0.en.html)

## Table of Contents

* [Abstract](#Abstract)

* [Setup](#Setup)

* [Write an OMS csv file in Python with `geoframepy`](#Write-an-OMS-csv-file)

* [Read an OMS csv file in Python with `geoframepy`](#Read-an-OMS-csv-file)

* [HortonMachine](#HortonMachine)
    - [Time-series-reader](#Time-series-reader)
       - [Time series reader - Component description](#Time-series-reader-Component-description)
       - [Time series reader - .sim file description](#Time-series-reader-.sim-file-description)
       - [Common errors](#Common-errors)
    - [Time series writer](#Time-series-writer)
       - [Time series writer - Componet description](#Time-series-writer-Componet-description)
       - [Time series writer - .sim file description](#Time-series-writer-.sim-file-description)


# Abstract

[OMS](https://alm.engr.colostate.edu/cb/wiki/16976) can use data in [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) format for tabular input and output. There are some assumptions about the [structure](https://alm.engr.colostate.edu/cb/wiki/16970) of an OMS CSV file in order to use it for data reading/writing.

A table is stored as an ASCII file using the CSV standard. The file has the extension .csv. The content is stored as comma separated values. Tables may have comment lines, which start with the pound symbol # in the first column. Empty lines are allowed anywere in a table and get ignored. Tables consists of columns and rows, and optional table meta data. Columns may have a type and optional meta data. Meta data is organized as pair key, value.
A table requires two key words, `@table` and `@header`. The `@table` keyword tags the start of a table definition, the `@header` tag starts a column definition. Both tags are case insensitive.


A CSV file consists of three main sections:

- The table header, indicated by `@Table`, followed by the name of the table. The next lines may have table level meta data, one meta data entry per line. Meta data is optional.
- The table header is followed by the column header, indicated by the `@Header` keyword. Next to this all the column names are listed. The next lines may contain column meta data, starting with the key, followed by the values for each column (Example above shows Type and Format for the columns).
- Data rows start with a ',' as the first character; values are comma separated.

<figure>
    <img src="Figures/OMS_csv_data_file.png" width="800" height="800/1.618">
    <figcaption>Fig.1 - Example of a .csv file OMS compliant. </figcaption>
<figure>


# Setup
- install the package `gf-group` x.y.z with `pip install gf-group x.y.x`
- create a [Conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) with [geoframe_vicenza.yaml](https://github.com/geoframecomponents/python4GEOframe) file.
    - open the Anaconda prompt 
    - set in the folder where you have geoframe_vicenza.yaml `cd folder_path`
    - `conda env create -f geoframe_vicenza.yaml`
    - `conda activate geoframe_vicenza`
    
More details on the installation can be found at
- [for Windows users](https://geoframe.blogspot.com/2020/12/installations-of-2021-geoframe.html)
- [for Linux users](https://geoframe.blogspot.com/2020/12/installations-of-2021-geoframe_15.html)
- [for Mac users](https://geoframe.blogspot.com/2021/01/installations-for-mac-users.html)

In [1]:
import os
import pandas as pd
from gf.io import gf_io

oms_project_path = os.path.dirname(os.getcwd())

# Write an OMS csv file

The file to be formatted is `\data\Timeseries\data.csv`. The formatted file is saved as `\data\Timeseries\data_formatted.csv`.


In [3]:
df = pd.read_csv(oms_project_path+'\data\Timeseries\data.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7
0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,,0.0,0.0,0.0,0.0,-9999.0,0.0,0.0,0.0
4,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The first column may contain the dates, it is not mandatory as in this example. The headers `0`, `1`, `2`, `3`, `4`, `5`, `6`, `7` are the ID of the meteo stations. 

In [4]:
help(gf_io.write_OMS_timeseries)

Help on function write_OMS_timeseries in module gf.io.gf_io:

write_OMS_timeseries(df, start_date, frequency, file_name)
    Save a timeseries dataframe to .csv file with OMS format
    
    :param df: dataframe containing the timeseries. Each column correspond to a station/centroid and the 
    the header contains the ID of the station/centroid.
    :type df: pandas.dataframe
    
    :param start_date: start date of the timeseries. 'mm-dd-yyyy hh:mm'.
    :type start_date: str
    
    :param frequency: frequency of the timeseries. 'H': hourly, 'D': daily
    :type frequency: str
    
    :param file_name: output file name.
    :type file_name: str
    
    @author: Niccolò Tubini
    
    Notes:
    2021-01-09 changed pd.date_range with pd.period_range 
    https://stackoverflow.com/questions/50265288/how-to-work-around-python-pandas-dataframes-out-of-bounds-nanosecond-timestamp



In [5]:
gf_io.write_OMS_timeseries(df.iloc[:,1:], '01-01-2021 00:00', 'H', oms_project_path+'\data\Timeseries\data_formatted.csv')



***SUCCESS writing!  C:\Users\Niccolo\OMS\OMS_Project_WHETGEO1D\data\Timeseries\data_formatted.csv


# Read an OMS csv file

Read an OMS file, as an example `\data\Timeseries\data_formatted.csv`.

In [6]:
help(gf_io.read_OMS_timeseries)

Help on function read_OMS_timeseries in module gf.io.gf_io:

read_OMS_timeseries(file_name, nan_values)
    Read a timeseries .csv file formatted for OMS console
    
    :param file_name: file name of the csv file.
    :type file_name: string
    
    :param nan_value: value used for no values.
    :type nan_value: double
    
    :return pandas dataframe
    
    @author: Niccolò Tubini



In [7]:
df = gf_io.read_OMS_timeseries(oms_project_path+'\data\Timeseries\data_formatted.csv', -9999)
df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2021-01-01 00:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2021-01-01 01:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2021-01-01 02:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2021-01-01 03:00,0.0,0.0,0.0,0.0,,0.0,0.0,0.0
2021-01-01 04:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# HortonMachine

[HortonMachine](https://thehortonmachine.github.io/hortonmachine/) is a library developed by [HydroloGIS](https://www.hydrologis.com/) that among the several tools contains two OMS3 component that you can use to read and write time series formatted accordingly the OMS3 standard. In this section we are going to present these two components and how to use them. Moreover we will provide a list of common errors that you may face. 


## Time series reader

### Time series reader - Componet description
`org.hortonmachine.gears.io.timedependent.OmsTimeSeriesIteratorReader` is the classpath of the OMS3 component to read an OMS3 time series.

The source code of this class can be found on [here](https://github.com/TheHortonMachine/hortonmachine/blob/master/gears/src/main/java/org/hortonmachine/gears/io/timedependent/OmsTimeSeriesIteratorReader.java)
This component is controlled by the following parameters:
- `file`: the path of the file containing the time series to be read;
- `tStart`: the start date of the simulation expressed as yyyy-MM-dd HH:mm. The start date must be present in the time series;
- `tEnd`: the end date of the simulation expressed as yyyy-MM-dd HH:mm;
- `tTimestep`: the time step of the time series expressed in minutes;
- `idfield`: this string corresponds to the first string in the 5th line of the time series file. Usually it is ID;
- `fileNovalue`: the value used to identify no values in the time series. This value must be -9999.

### Time series reader - .sim file description
Below is reported an example of a .sim file to simply read a time series. Here we want to read the file `/data/Timeseries/Daily_airT_2000_2010.csv` from the date `2000-01-01 00:00` to the date `2010-01-01 00:00`. The time step is 1 day, i.e. 1440 minutes.


```groovy
/*
 * Read an OMS time series
 * 
 */
import static oms3.SimBuilder.instance as OMS3

def inputPath = "$oms_prj/data/Timeseries/Daily_airT_2000_2010.csv"

def startDate = "2000-01-01 00:00"
def endDate = "2010-01-01 00:00"
def tTimestep = 1440

OMS3.sim {

    resource "$oms_prj/lib"

    model(while: "reader.doProcess" ) {

    components {
           
        "reader" "org.jgrasstools.gears.io.timedependent.OmsTimeSeriesIteratorReader"
           
    }
        
    parameter {

        "reader.file"             	"${inputPath}"
        
		"reader.idfield"          	"ID"  
        
		"reader.tStart"           	"${startDate}"
        
		"reader.tEnd"             	"${endDate}"
        
		"reader.tTimestep"        	"${tTimestep}"
        
		"reader.fileNovalue"      	"-9999"

        }
        
    }
    
}

```

Let us have a closer look at this .sim file. 

The field `resource` specify  where are the libraries (.jar) with the project folder. 

The time loop is controlled by a kind of while loop, `model(while: "reader.doProcess" )`: as the reader component reaches the end of the simulation, `tEnd`, the output variable `doProcess`, is set to False and the simulation stops.

With in the `component` scope the user declares the components required to run the simulation. In this case one component is required. `reader` is an alias name for the component. This name is used throughout the .sim file to refer to this specific component. Whenever the user requires more components of the same type she/he can use different alias, e.g. `reader1`, `mickeyMouse` etc...
The component alias is followed by the classpath identifying the component.

The `parameter` scope defines the parameters required to control the component. These must be specified by the user. In this specific example, the parameters are specifed by using some variables defined at the beginnig of the .sim file, such as `def inputPath = "$oms_prj/data/Timeseries/Daily_airT_2000_2010.csv"`.
The syntax is quite simple: the keyword `def` followed by the variable name and to assing the value we use the `=`. To recall this variable `${varibale_name}`


### Time series reader - Common errors
In this section we report some common errors you may face reading a time series. For each error we report a test case.

- Error with the simulation time step

`/simulation/00_Read_OMS_time_series_error_time_step.sim`

In this case the time step of the time series is 1440 minutes whereas the user specified a time step (`tTimestep`) equal to 60 minutes


- Error with the start date of the simulation

`/simulation/00_Reader_OMS_time_series_error_start_date.sim`

In this case the time series starts on 2000-01-01 00:00 but the user specified as start date (`tStart`) 1999-12-01 00:00


- Error with the file path

`/simulation/00_Read_OMS_time_series_error_file_path.sim`

In this case the user did not specified the correct path to the file for the time series.


- Error calling up a user defined variable

`/simulation/00_Read_OMS_time_series_error_variable_name.sim`

Here there is a typos in calling up the variable `startDate`.


- Error with the end date

`/simulation/00_Read_OMS_time_series_error_end_date.sim`

In this case the user specified as end date (`tEnd`) a date that is out of the time series dates. The simulation does not return any error but it simply continues until the last date of the file.

## Time series writer


### Time series writer - Componet description
`org.hortonmachine.gears.io.timedependent.OmsTimeSeriesIteratorWriter` is the classpath of the OMS3 component to read an OMS3 time series.

The source code of this class can be found on [here](https://github.com/TheHortonMachine/hortonmachine/blob/master/gears/src/main/java/org/hortonmachine/gears/io/timedependent/OmsTimeSeriesIteratorWriter.java)
This component is controlled by the following parameters:
- `file`: the path of the file containing the time series to be read;
- `tStart`: the start date of the output time series expressed as yyyy-MM-dd HH:mm.
- `tTimestep`: the time step of the time series expressed in minutes;

### Time series writer - .sim file description
Below is reported an example of a .sim file to read a time series, as seen previously, and write it to another .csv file. Here we read the file `/data/Timeseries/Daily_airT_2000_2010.csv` from the date `2000-01-01 00:00` to the date `2010-01-01 00:00`. The time step is 1 day, i.e. 1440 minutes.
The UML diagram representing this simulation is shown below

<figure>
    <img src="Figures/OMS_timeseries_reader_writer.png" width="800" height="800/1.618">
    <figcaption>Fig.2 - UML diagram representing the `/simulation/00_Read_Write_OMS_time_series.sim` file. This simulation consists in reading a time series and then writing it to a new .csv file. This task is accomplished by using two OMS3 components: `reader`, the box on the left, as the name suggests reads the time series. The data is then passed to the `writer` component, the box on the right. </figcaption>
<figure>




```groovy
/*
 * Timeseries example.
 *   A component reading an OMS timeseries and writing it to a .csv file.
 */
import static oms3.SimBuilder.instance as OMS3

def inputPath = "$oms_prj/data/Timeseries/Daily_airT_2000_2010.csv"
def outputPath = "$oms_prj/output/Daily_airT_2000_2010_out.csv"
def startDate = "2000-01-01 00:00"
def endDate = "2010-01-01 00:00"
def tTimestep = 1440

OMS3.sim {

    resource "$oms_prj/dist"

    model(while: "reader.doProcess" ) {

        components {
           
            "reader" "org.jgrasstools.gears.io.timedependent.OmsTimeSeriesIteratorReader"
            "writer" "org.jgrasstools.gears.io.timedependent.OmsTimeSeriesIteratorWriter"
           
        }
        
        parameter {
        
            // reader parameters
            
            "reader.file"         "${inputPath}"

            "reader.idfield"      "ID"  
            
            "reader.tStart"       "${startDate}"

            "reader.tEnd"         "${endDate}"
   
            "reader.tTimestep"    "${tTimestep}"

            "reader.fileNovalue"  "-9999"
            
            
            
            // writer parameters

            "writer.file"         "${outputPath}" 

            "writer.tStart"       "${startDate}"

            "writer.tTimestep"    "${tTimestep}"      

        }
        
        connect {
        
            // Forward connection: from --> to
            "reader.outData"      "writer.inData"

        }
        
    }
    
}

```

Let us have a closer look at this .sim file. Here, in the `components` scope we define two components: the `reader`, as seen before, and the `writer` component. 

In the `parameter` scope we need to provide the parameters controlling the `reader` as well as those controlling the `writer`.

The main novelty of this .sim file regards the new scope `connect`, in the very end of the .sim file. `connect` scope establish which output of a component (on the left) is the input of another component (on the right), forward connection. In this specific case, the `outData` variable of the `reader` component, its output variable, is passed to the variable `inData` of the `writer` component.