# Final assignment: Data processing

The final exercise involves converting data from an hypothetical external provider.

The data will be used for MIKE modelling and must be converted to Dfs0 with apppropriate EUM types/units in order to be used by the MIKE software.

The data is provided as a [zip file](https://github.com/DHI/getting-started-with-mikeio/raw/main/mini_book/data/stream_data.zip).

Inside the zip file, there are a many timeseries of discharge data from streams located across several regions (`*.dat`).

Static data for each region is found in a separate file (`region_info.csv`)

Pandas `read_csv` is very powerful, but here are a few things to keep in mind

* Column separator e.g. comma (,)
* Blank lines
* Comments
* Missing values
* Date format

The MIKE engine can not handle missing values / delete values, fill in in missing values with interpolated values.

In order to save diskspace, crop the timeseries to simulation period Feb 1 - June 30.

## G.1 Convert all timeseries to Dfs0

In [22]:
import os
import numpy as np
import pandas as pd
import mikeio

In [23]:
# This is one way to find and filter filenames in a directory
# [x for x in os.listdir("datafolder") if "some_str" in x]

In [36]:
# This is useful!
# help(pd.read_csv)

In [37]:
# example of reading csv
# df = pd.read_csv("../data/oceandata.csv", comment='#', index_col=0, sep=',', parse_dates=True)


## G.2 Add region specific info to normalize timeseries with surface area

Each timeseries belongs to a region identified in the filename, e.g. `s15_east_novayork_river.dat` is located in the `novayork` region.

## G.3 Gridded weather forcing data

The dataset is provided in NumPy binary format and consists of

* Temperture 2m (degree Kelvin)
* Relative humidity 2m (%)

The spatial grid is: 40E - 50E, 10-15N with a grid spacing of 1 degree in each direction.

The time axis consists of two timesteps '2005-01-31', '2005-07-31' which is sufficent to cover the simulation period.

In [76]:
tmp = np.load("../data/temperature_2m.npy")
rh = np.load("../data/rel_hum_2m.npy")

In [79]:
dy = 1.0
dx = 1.0
time = pd.date_range("2005-01-01", freq='6M', periods=2)

In [85]:
from mikeio import Dfs2, Dataset
from mikeio.eum import ItemInfo, EUMType, EUMUnit

In [103]:
data = [tmp, rh]
# ds = Dataset(data,...

The expected outcome

![](../images/weather_dfs2.png)