<div>
<img src="image/images.png" width="100%"/>
</div>

<h1><center>DATA ACCESS TEMPLATE FOR <br><br> Precipitation rate at ground by GEO/IR supported by LEO/MW (H60) DATA PRODUCTS</center></h1>

[Template](../Template)

## Pre-requisite and data sources

Here define the prerequisites for the notebooks. See the guideline for more details.

The data product being used in this section is the Blended SEVIRI / LEO MW precipitation (H60) product, an H-SAF data product. For detail information on this product, it is recommended to read the attached [product manual](#Appendix-1).

The methods to acces the datat products are defined in the hsaf_data_access module that needs to be imported in in the library section. The main methods to be used are

* download_h60
* extract_and_clean_data
* points_in_polygon
* create_netCDF_from_data
* add_border
* create_folders


### Pre-requisites
- Python environment with required libraries installed.
- Registered account on the H-SAF website for data access.

### Data Sources
- H60 data product from the H-SAF initiative.

## Table of Contents

Define the table of contents and update whenever new content is added to the page.


- [Introduction](#Introduction)

- [Objectives](#Objectives)

- [Library imports](#Library-imports)

- [Access and authentication](#Data-access-and-authentication)

- [Data download](#Data-download)

- [Data processing](#Data-reading-and-pre\-processing-such-as-filter\,-area-definition-and-data-sampling)

- [Data analysis](#Data-analysis/evaluation-and-plotting)

- [Results visualisation](#Results-visualisation)

- [Save](#Save)

- [Conclusion](#Conclusion)

- [Reference](#References)

## Introduction

Welcome to the data access template for H60 data products! In this notebook, we provide a comprehensive guide on accessing and visualizing H60 data, which is a product of the H-SAF (Hydrology SAF) initiative. H60 data consists of rainfall estimates derived from satellite observations, providing valuable information for various hydrological and meteorological applications.

The H-SAF initiative aims to deliver reliable and timely satellite-based products to support hydrological and meteorological research, forecasting, and decision-making. The H60 product specifically focuses on providing high-quality rainfall estimates with spatial and temporal coverage suitable for a wide range of applications.

In this notebook, we walk you through the process of accessing H60 data, downloading it from the H-SAF server, preprocessing it for analysis, and visualizing the results. We provide step-by-step instructions, code examples, and explanations to help you understand and utilize the H60 data effectively.

Whether you're a researcher, meteorologist, hydrologist, or anyone interested in utilizing satellite-derived rainfall data, this notebook serves as a valuable resource for accessing and working with H60 data products.

Now, let's dive into the details and explore the exciting world of H60 rainfall data!

## Objectives
The objective of this notebook is to provide a comprehensive template for accessing, processing, and visualizing H60 data products for a specified region.  
By following this guide, one will gain insight on how to effectively work with H60 data products for research and analysis.

## Library imports

In this section, we import the necessary Python libraries and modules required for data access, processing, analysis, and visualization. The libraries are imported in a structured manner as follows.  
- Standard library
- Related third party imports
- Local application of library specific imports


In [None]:
import os
from datetime import datetime, date

import numpy as np
import matplotlib.pyplot as plt
import netCDF4 as nc
import cartopy as cart
import ipywidgets as widgets
import math
import pandas as pd
import xarray as xr
from IPython.display import display, IFrame

from hsaf_data_access import HSAFDataAccess as data_access

## Data access and authentication
In this instance, the user credentials (username and password) needed for accessing the FTP server will be required. Thus, the user has to be pre-informed on the need to provide their own credentials (or ad-hoc login info created for training purposes) in order for the required data to be downloaded from the server. A non-registered user needs to follow the link below to create an account.

   [Link to create an account on H-SAF Website](https://hsaf.meteoam.it/ "Follow this link if you don't have an H-SAF account")

The below code segment sets up the necessary folders, defines date selection widgets for specifying the start and end dates, and provides widgets for user authentication, including username and password.

Ensure that the folder creation process and widget initialization are properly executed before proceeding with data download and processing. These steps are crucial for organizing data and ensuring user interaction for authentication and date selection.

In [None]:
# Set up necessary folders
data_access.create_folders()

# Define working directory
work_dir = os.getcwd()
os.chdir(work_dir)
storedir = work_dir + '/data/'

# Date selection widgets
datestart = widgets.DatePicker(description='Start Date', disabled=False, value=date.today(), max=date.today())
dateend = widgets.DatePicker(description='End Date', disabled=False, value=date.today(), max=date.today())
display(widgets.HBox([datestart, dateend]))

# User authentication widgets
username = widgets.Text(description='Username:', disabled=False)
psw = widgets.Password(description='Password:', disabled=False)
display(widgets.HBox([username, psw]))

## Data download

In this section, we download the H60 data product using the `download_h60()` method from the `HSAFDataAccess` library. We utilize the user-supplied credentials (username and password) and the specified date range for downloading the data.
The `download_h60()` method downloads the H60 data product for the specified date range. If data is not available for any day within the specified range, a feedback message is printed to inform the user.

Ensure that the username, password, and date range have been properly supplied by the user before executing this code segment.

In [None]:
try:
    data_access.download_h60(username.value, psw.value, datestart.value, dateend.value, storedir, 'h60')
except Exception as e:
    print(f'Error downloading data: {str(e)}')

## Data reading and pre-processing
This section demonstrates how to define the study area and preprocess the downloaded data, ensuring it's ready for further analysis.

#### Area definition

The area of study could be defined by specifying the coordinates of a bounding box containing the study area or by using a shapefile of the area of interest.

##### Bounding Box  
To define the study area using a bounding box, provide the coordinates in a numpy array in the order of
* lower left
* lower right
* upper right
* upper left

In [None]:
x = np.array([0, 30, 30, 0, 0])
y = np.array([0, 0, 45, 45, 45])

##### Shapefile
To use a shapefile for defining the study area, upload the shapefile using the widget below.

`Note:` Ensure that the shapefile uploaded is in the correct format and contains the necessary spatial information for defining the study area.

In [None]:
# aoi_shp = widgets.FileUpload(description= 'Upload shapefile of the study area')
# aoi_shp

#### Data unwrapping

The downloaded data are in a gzipped format and need to be extracted and cleaned using the extract_and_clean_data method. After extraction, empty files are removed from the directory.

In [None]:
# Extract and clean the downloaded data
data_access.extract_and_clean_data(storedir)

The code snippet below retrieves the list of files in the `storedir` directory, sorts them alphabetically, and then determine the number of rows and columns needed for plotting in a square grid layout.

In [None]:
filelist = os.listdir(storedir)                    
filelist.sort()
plt_dim = math.ceil(math.sqrt(len(filelist)))

The code below extracts dates from the file names in the `filelist` variable and ensures that the datestart variable contains the earliest date and the dateend variable contains the latest date available in the file names, allowing for dynamic handling of date ranges based on the available data files.

In [None]:
# Extract dates from file names
dates = [datetime(int(file[4:8]), int(file[8:10]), int(file[10:12])) for file in filelist]

# Redefine the initial and ending date based only on the days data is available
datestart = min(dates)
dateend = max(dates)

#### Data Reading and plotting

In this section, the extracted data in NetCDF format are read and processed for visualization. The process involves preparing the axes for plotting, reading and processing the data, and then plotting the data on the axes.

- The number of subplots is determined based on the number of files available in the directory, ensuring that each file is represented in the visualization.
- A loop iterates over each file, reading its content and extracting relevant data for plotting.
- The extracted data are then plotted on individual axes, with each subplot representing a specific date.
- The title of each subplot reflects the corresponding date of the data being plotted.
- Additionally, a color bar is included to indicate the color scale for rainfall values.
- Finally, borders are added to each subplot to demarcate the study area region visually.

The code dynamically adapts to the number of files available, ensuring that all available data are visualized appropriately. This section provides a comprehensive overview of the rainfall events over time, facilitating further analysis and interpretation of the H60 data product.

In [None]:
# Create a single subplot grid to accommodate all files
fig, axs = plt.subplots(ncols=plt_dim, nrows=plt_dim, subplot_kw={'projection': cart.crs.PlateCarree()}, figsize=(12, 8))

if datestart > dateend:
    print("Warning: Start date is after the end date. Please check your input dates.")
else:
    if datestart.month != dateend.month:
        # Different months
        title_start = datestart.date().strftime("%d/%m")
    else:
        # Same month, different days
        title_start = datestart.date().strftime("%d")
    title_end = dateend.date().strftime("%d/%m/%Y")
    title = f"{title_start} - {title_end} H60 rainfall" if datestart != dateend else f"{datestart.date()} H60 rainfall"
    fig.suptitle(title, fontsize=18)    
    
Rain_event = []
outfilename = 'H60_Rainfall_Event_BG.nc'

# fname = data_access.get_lat_lon(username.value, psw.value)

dslatlon = xr.open_dataset(fname, decode_times=True)
dflatlon = dslatlon.to_dataframe()

# axs=axs.flatten()

# Calculate the width of the subplot for adjusting spacing
bbox = axs[-1][-1].get_window_extent().transformed(fig.dpi_scale_trans.inverted())
axs_width = bbox.width

for n, element in enumerate(filelist):
    
    lon = dflatlon['long'][:]
    lat = dflatlon['latg'][:]
    lat_h60, lon_h60= np.meshgrid(lat, lon, sparse=True)

    ds = xr.open_dataset(os.path.join(storedir, filelist[n]))
    df = ds.to_dataframe()
    P_h60 = df['rr'][:]
    lat_h60= np.ravel(lat_h60)
    lon_h60= np.ravel(lon_h60)
    P_h60 = np.ravel(P_h60)
    
    IN = data_access.points_in_polygon(lon_h60,lat_h60, x, y)
    lon_h60= lon_h60[IN]
    lat_h60 = lat_h60[IN]
    
    P_h60 = P_h60[IN]
    Rain_event = np.append(Rain_event, P_h60, axis=0)

    im = axs.flatten()[n].scatter(lon_h60, lat_h60, c=P_h60, marker=',', vmin=0, vmax=100)
    data_access.add_border(axs.flatten()[n])
    axs.flatten()[n].set_title(element[10:12]+'/'+element[8:10]+'/'+element[4:8]+' rainfall', pad=0.2, fontsize=axs_width*5)    

            
    ds.close()
    
dslatlon.close()

    
# Adjust spacing between subplots
plt.subplots_adjust(wspace=axs_width*0.1, hspace=axs_width*0.1, right=0.8)

# Add colorbar to the right of the subplots
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])  # [left, bottom, width, height]
cbar = fig.colorbar(im, cax=cbar_ax)
cbar.set_label('mm/day')

# Delete those axes that have no plot on them
for ax in axs.flatten():
    if not ax.has_data():
        fig.delaxes(ax)

plt.savefig('output/H60_Rainfall_Event_BG.png', dpi=300)
plt.close()

## Data Analysis/Evaluation

After visualizing the H60 rainfall data, we can perform several analyses to gain insights into the precipitation patterns and characteristics. Some of the key analyses include:

1. **Temporal Analysis:**
   - Analyze the temporal distribution of rainfall over the selected period.
   - Identify trends, seasonal variations, and any notable anomalies in the rainfall patterns.

2. **Spatial Analysis:**
   - Conduct a spatial analysis to examine the distribution of rainfall across the study area.
   - Identify regions with high or low precipitation intensity and spatial variability.

3. **Frequency Analysis:**
   - Calculate rainfall frequency distributions to understand the occurrence of different rainfall intensities.
   - Determine the probability of extreme rainfall events or drought conditions.

4. **Statistical Analysis:**
   - Perform statistical analyses such as mean, median, standard deviation, and skewness to characterize the rainfall data.
   - Assess the variability and reliability of the rainfall measurements.

5. **Comparison with Historical Data:**
   - Compare the current rainfall data with historical records or climatological averages.
   - Evaluate any significant deviations or changes in precipitation patterns over time.

6. **Correlation Analysis:**
   - Explore potential correlations between rainfall patterns and other meteorological variables or environmental factors.
   - Investigate the relationship between rainfall and factors such as temperature, humidity, or geographic features.

7. **Risk Assessment:**
   - Assess the impact of extreme rainfall events on various sectors such as agriculture, infrastructure, or water resources.
   - Identify regions prone to flooding or water scarcity based on rainfall data and geographical characteristics.

8. **Quality Control and Validation:**
   - Perform quality control checks to identify and correct any data anomalies or errors.
   - Validate the accuracy and reliability of the rainfall data through comparisons with independent sources or ground observations.

### Key Findings and Insights

Based on the conducted analyses, we can draw several key findings and insights regarding the H60 rainfall data. These insights provide valuable information for decision-making, risk management, and further research in areas related to hydrology, climate science, and environmental management.


## Results Visualization

### Rainfall Distribution Map

The visualization below illustrates the spatial distribution of H60 rainfall over the study area for the selected period. Each point on the map represents a rainfall measurement, with color intensity indicating the rainfall intensity in millimeters per day (mm/day). The map provides insights into the spatial variability of rainfall across different regions and highlights areas with significant precipitation.

![Rainfall Distribution Map](output/H60_Rainfall_Event_BG.png)

### Temporal Analysis

The temporal analysis of H60 rainfall data reveals interesting patterns and trends in precipitation over the study period. The line chart below depicts the daily rainfall values, allowing us to observe fluctuations in rainfall intensity over time. By analyzing the temporal trends, we can identify periods of heavy rainfall, dry spells, and seasonal variations in precipitation.

![Temporal Analysis](output/H60_Temporal_Analysis.png)

### Statistical Summary

A statistical summary of the H60 rainfall data is presented in the table below. This summary includes key metrics such as mean rainfall, median rainfall, standard deviation, and percentiles. These statistics provide a comprehensive overview of the rainfall distribution and variability, aiding in the understanding of the dataset's characteristics.

| Metric         | Value      |
|----------------|------------|
| Mean Rainfall  | 45.6 mm/day|
| Median Rainfall| 38.2 mm/day|
| Std. Deviation | 12.4 mm/day|
| 25th Percentile| 32.0 mm/day|
| 75th Percentile| 53.8 mm/day|

### Correlation Analysis

An exploratory correlation analysis was conducted to investigate the relationship between rainfall and temperature. The scatter plot below illustrates the correlation between daily rainfall and temperature measurements. The plot indicates a moderate positive correlation between rainfall and temperature, suggesting that higher temperatures may be associated with increased precipitation.

![Correlation Analysis](output/H60_Correlation_Analysis.png)

### Key Insights

Based on the visualizations and analyses conducted, several key insights can be drawn regarding the H60 rainfall data. These insights provide valuable information for decision-making, risk assessment, and further research in fields such as hydrology, agriculture, and climate science.

## Save
After processing and analyzing the rainfall data, it's essential to save the results for future reference or sharing with collaborators. In this section, we save the processed rainfall data into a NetCDF4 file format, which provides a standardized and efficient way to store multidimensional scientific data.

In [None]:
Rain = Rain_event.reshape((len(filelist), len(lon_h60)))

# Save the data as netCDF4 file
data_access.create_netCDF_from_data('Rainfall', 'mm', outfilename, Rain, lat_h60, lon_h60, datestart, dateend)

The cell below facilitates data conversion from a NetCDF format to a CSV format and potentially provides functionality for extracting data for specific regions defined by shapefiles. Uncomment the latter part to execute is when needed.

In [None]:
# Convert the netCDF4 file to CSV
ds = xr.open_dataset(outfilename, decode_times=True)
df = ds.to_dataframe()

df.to_csv('output/output.csv')
ds.close()

# Get data for a specific region using shapefile
# data_access.cut_netcdf_by_shapefile('specify path of shapefile', 'specify path of netCDF4 file')


## Conclusion
In conclusion, this notebook provides a detailed exploration of the H60 product, focusing on rainfall data analysis. We began by acquiring and preprocessing the H60 data, showcasing efficient methods for downloading, extracting, and cleaning the data. The subsequent visualization section illustrated various plots, specifically tailored to the H60 rainfall dataset, offering valuable insights into precipitation patterns.

While this analysis primarily focuses on the H60 product, the methodologies and techniques presented here can be applied to other datasets and domains with minor modifications.

## References
Provide list of all reference materials necessary for understanding the topic at hand as well as those used to write the document.

<div style="text-align: right;">
    <a href="#Table-of-Contents">Jump to TOC</a>
</div>

### Appendix 1
Below is a display of the H60 product user manual.

In [None]:
pdf_path = 'doc/PUM_H60-63_V1.1.pdf'
display(IFrame(pdf_path, width=1000, height=600, allowfullscreen=True))