# The detailed demonstration of the tropical rainfall diagnostic for ICON data:

# Calculation of histograms

### Content


1. [The load of packages, modules, and data](#1-load-of-packages-modules-and-data)

2. [Calculation of histogram of precipitation/tropical precipitation](#2-calculation-of-histogram-of-precipitationtropicalprecipitation)

    2.1. [with continuous  uniform binning](#21-with-continuous-uniform-binning)

    2.2  [with non-uniform (log-spaced) binning](#22-with-non-uniform-log-spaced-binning)

    2.3. [in the lazy (or delayed) mode](#23-in-the-lazy-or-delayed-mode)
    
    2.4. [with non-default tropical latitude band](#24-with-non-default-latitude-band)

    2.5  [with a specific time band](#25-with-a-specific-time-band)

    2.6. [with weights (weights=reader.grid_area)](#26-with-weights-weightsreadergrid_area)
 

3. [Loading the histogram to/from storage](#3-loading-the-histogram-tofrom-storage)

[Go to the end of file](#the-end)

#

## 1. The load of packages, modules, and data

In [1]:
import sys
from aqua import Reader
sys.path.insert(0, '../')
from tropical_rainfall_class import TR_PR_Diagnostic as TR_PR_Diag

FDB5 binary library not present on system, disabling FDB support.


#### ICON data

In [2]:
reader = Reader(model="ICON", exp="ngc2009", source="lra-r100-monthly")
icon = reader.retrieve(streaming=True, stream_step=1, stream_unit='days') 

## 2. Calculation of histogram of tropical precipitation


### 2.1. with continuous uniform binning 

##### We can perform calculations of the histograms with uniform binning by  initializing the following attributes of diagnostic: 
 - num_of_bins
 - first_edge
 - width_of_bin
 
 ##### In this case, all the bins in the histogram will have the width of bins.

In [3]:
diag = TR_PR_Diag(num_of_bins = 20, first_edge = 0, width_of_bin = 1*10**(-6)/20)

##### We calculate the histogram simply by function:

In [4]:
hist_icon_trop  = diag.histogram(icon)
hist_icon_trop



The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.


##### The output of the histogram function is xarray.Dataset, which has two coordinates 
- center_of_bin:   the center of each bin
- width:           width of each bin
##### and contains three variables:
- counts:       the number of observations that fall into each bin
- frequency:    the number of cases in each bin, normalized by the total number of counts. The sum of the frequencies equals 1.
- pdf:          the number of cases in each bin, normalized by the total number of counts and width of each bin. 

##### local and global attributes. Local attributes contain the information about the time and space grid for which diagnostic performed the calculations:
- time_band:    the value of time of the first and last element in the dataset and the frequency of the time grid
- lat_band:     the maximum and minimum values of the tropical latitude band and the frequency of the latitude grid
- lon_band:     the maximum and minimum values of the longitude and the frequency of the longitude grid

##### Global attribute `history` contains the information about when the histogram was calculated and values of `time_band`, `lat_band`, and `lon_band`.

### 2.2.  with non-uniform (log-spaced) binning  

##### If you want to perform the calculation for non-uniform binning, use the `bins` attribute of the diagnostic instead of  `num_of_bins`, `first_edge`, and `width_of_bin`.  

In [5]:
bins = [1.00000000e-09, 1.63789371e-09, 2.68269580e-09, 4.39397056e-09,
       7.19685673e-09, 1.17876863e-08, 1.93069773e-08, 3.16227766e-08,
       5.17947468e-08, 8.48342898e-08, 1.38949549e-07, 2.27584593e-07,
       3.72759372e-07, 6.10540230e-07, 1.00000000e-06]

In [6]:
diag = TR_PR_Diag(bins = bins)

In [8]:
hist_icon_trop_logspaced  = diag.histogram(icon)
hist_icon_trop_logspaced

The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.




### 2.3. in the lazy (or delayed) mode

##### To perform calculations in the so-called lazy mode, use the flag `lazy` in the histogram function. 

In [9]:
diag = TR_PR_Diag(num_of_bins = 20, first_edge = 0, width_of_bin = 1*10**(-6)/20)

In [10]:
hist_icon_lazy  = diag.histogram(icon,  lazy=True) 
hist_icon_lazy



The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.


Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 160 B 160 B Shape (20,) (20,) Dask graph 1 chunks in 5 graph layers Data type float64 numpy.ndarray",20  1,

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


##### `Note`: In the case of lazy calculation, the function's output will be different:  the xarray will contain only non-computed counts. If you want to add frequency and pdf variables to the histogram Dataset, apply the following function `add_frequency_and_pdf` (but only when you are actually ready to compute the histogram).

In [11]:
diag.add_frequency_and_pdf(tprate_dataset=hist_icon_lazy)

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 160 B 160 B Shape (20,) (20,) Dask graph 1 chunks in 5 graph layers Data type float64 numpy.ndarray",20  1,

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 5 graph layers,1 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 47 graph layers,1 chunks in 47 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 160 B 160 B Shape (20,) (20,) Dask graph 1 chunks in 47 graph layers Data type float64 numpy.ndarray",20  1,

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 47 graph layers,1 chunks in 47 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 49 graph layers,1 chunks in 49 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 160 B 160 B Shape (20,) (20,) Dask graph 1 chunks in 49 graph layers Data type float64 numpy.ndarray",20  1,

Unnamed: 0,Array,Chunk
Bytes,160 B,160 B
Shape,"(20,)","(20,)"
Dask graph,1 chunks in 49 graph layers,1 chunks in 49 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### 2.4. with non-default latitude band 

##### The tropical latitude band, by default, is the following

In [7]:
diag.trop_lat

10

##### You can modify the tropical latitude band as 

In [9]:
diag.trop_lat = 20
diag.trop_lat

20

##### You can also modify the latitude band with the use of an argument `trop_lat` of the histogram function 

In [12]:
hist_icon_trop  = diag.histogram(icon, trop_lat=30)
hist_icon_trop

The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.




### 2.5. with a specific time band 

##### You can specify `s_time` and `f_time` as integers. For example, 

In [13]:
diag = TR_PR_Diag( trop_lat = 30,  num_of_bins = 20, first_edge = 0, width_of_bin = 1*10**(-6)/15, s_time = 0, f_time = 47)

In [14]:
hist_icon = diag.histogram(icon)
hist_icon

The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.




##### Also, you can specify `s_time` and `f_time` as strings. For example, 

In [16]:
diag.s_time = '2020:01'
diag.f_time ='2020/03/20/12'

##### or

In [17]:
diag.s_time = '2020'
diag.f_time ='2020.03.20'

##### There is the possibility of specifying only the year band or only the months' band. For example, we can select June, July, and August in a whole dataset as

In [18]:
diag.s_month = 3
diag.f_month = 6 

### 2.6. with weights (weights=reader.grid_area)

##### The function provides the opportunity to calculate the histogram with weights. Compared to standard methods, such computations are `high-speed` because they are based on `dask_histogram` package.

In [15]:
diag = TR_PR_Diag(num_of_bins = 20, first_edge = 0, width_of_bin = 1*10**(-6)/15, s_time = 0, f_time = 47)

In [16]:
hist_icon_trop_weighted  = diag.histogram(icon, weights=reader.grid_area)
hist_icon_trop_weighted



The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.


##### You can also parse only the `bins` attribute of the class  instead of the specification of `num_of_bins`, `first_edge`, and `width_of_bin`. 

##### But such calculations would be much `slower`! (at around 50 times)

In [17]:
diag = TR_PR_Diag(bins = bins)

In [18]:
hist_icon_trop_weighted  = diag.histogram(icon,  weights=reader.grid_area)
hist_icon_trop_weighted

The obtained xarray.Dataset doesn't have global attributes. Consider adding global attributes manually to the dataset.




#

## 3. Loading the histogram to/from storage

#### Saving the histogram in storage. 

##### `Notice`: the name of the file will be unique and depends on the time band 

In [19]:
path_to_histogram='/work/bb1153/b382267/AQUA/histograms/'
diag.dataset_to_netcdf(dataset = hist_icon, path_to_netcdf = path_to_histogram, name_of_file = 'icon')

'/work/bb1153/b382267/AQUA/histograms/icon_2020-02-01T00_histogram.pkl'

#### Loading/opening the histogram from the storage 

##### The information about how to open and merge the list of histograms, see the notebook  `diagnostic_vs_streaming.ipynb`

In [20]:
diag.open_dataset(path_to_netcdf = '/work/bb1153/b382267/AQUA/histograms/icon_2020-01-20T00_2020-01-20T23_histogram.nc')

[Back to the top of file](#structure)

#

##### The end