# Scientific Data

**A spatio-temporal high-resolution precipitation dataset for Mexico City derived from optical disdrometers and weighting rain gauges**

**Authors:**

*Adrián Pedrozo-Acuña, José Agustín Breña-Naranjo, Pamela Iskra Mejía-Estrada, Mauricio Osorio-González, Saúl Arciniega-Esparza1, Jorge Blanco-Figueroa1, Jorge Alberto Magos-Hernández, Juan Alejandro Sánchez-Peralta*


## Description

Example of How To Read the OH-IIUNAM database of recipitation at high-temporal resolution.

The data files can be downloadad from: [Scientific Data Repository](https://www.oh-iiunam.mx/)

This database has been converted to NetCDF, but real-time data can be donloaded from: https://www.oh-iiunam.mx/



### Requirements

OH database has been converted to NetCDF, and can be readed by any library that works with such files. However, we highly recomended [xarray](http://xarray.pydata.org/en/stable/) for such purpose. 


In [None]:
import numpy as np
import pandas as pd
import xarray as xr

import matplotlib.pyplot as plt

## How to read a Disdrometer data file

First, we must use xarray to connect with the netCDF:

In [None]:
filename = r"~/DISDRO_IIUNAM.nc"  # inser the full filename
dataset = xr.open_dataset(filename)
dataset

To extract the variables' name you can use:

In [None]:
varnames = list(dataset.variables.keys())
varnames

Just select a variable name and use the following example to read the timeseries:

In [None]:
varname = "intensity"
data  = dataset[varname]  # extract data
serie = data.to_pandas()  # convert to pandas serie
prec = serie / 60         # conver to rain amount by minute

Data can be ploted as DataArray of xarray or as Serie from Pandas:

In [None]:
fig, ax = plt.subplots(figsize=(7, 3.5))
data.plot(ax=ax, linewidth=0.8)
ax.grid(True)
ax.set_xlabel("")
fig.tight_layout()

Rainfall spectrum from Disdrometers can be visualized using a  2D histogram:

In [None]:
date     = "2018-08-30 18:20"
spectrum = dataset["spectrum"].loc[date, :, :]
fig, ax = plt.subplots()
spectrum.plot(cmap="Spectral_r", ax=ax)
ax.set_xlim(0, 6)
ax.set_ylim(0, 10)
fig.tight_layout()

Multidimensional variables can be analyzed in a fashion way in xarray.

Here you have an example of how to plot the rainfall spectrum for a date range:

In [None]:
start_date = "2018-08-30 00:00"
end_date   = "2018-08-30 23:59"

# sum drops in a date range
spectrum = dataset["spectrum"].loc[start_date:end_date, :, :].sum(dim="time")

# plot data
fig, ax = plt.subplots()
spectrum.plot(cmap="Spectral_r", ax=ax)
ax.set_xlim(0, 6)
ax.set_ylim(0, 10)
fig.tight_layout()

Close dataset connection. **This is always recomended with NetCDF**

In [None]:
dataset.close()

## How to read a Pluviometer data file

The dada structure of Pluviometer and Disdrometers is similar the difference is the variables stored in each database.

In [None]:
filename = r"~/PLUVIO_PREPA2.nc"  # inser the full filename
dataset = xr.open_dataset(filename)
dataset

To extract the variables' name you can use:

In [None]:
varnames = list(dataset.variables.keys())
varnames

Just select a variable name and use the following example to read the timeseries:

In [None]:
varname = "intensity"
data  = dataset[varname]  # extract data
serie = data.to_pandas()  # convert to pandas serie

If you need ignore unreal data ranges, only follow the next example:

In [None]:
mask        = serie > 400  # unreal intensity values over 400 mm/hr
serie[mask] = np.nan       # set unreal values as nan
prec        = serie / 60   # conver to rain amount by minute

An now plot the results:

In [None]:
fig, ax = plt.subplots(figsize=(7, 3.5))
prec.plot(ax=ax, linewidth=0.8)
ax.grid(True)
ax.set_xlabel("")
ax.set_ylabel("Precipitation [mm]")
fig.tight_layout()

Don't forget to close the netCDF connection.

In [None]:
dataset.close()