<img src="../images/logo_VORTEX.png" width="200" height="auto" alt="Company Logo">

| Project| Authors           | Company                                 | Year | Chapter |
|--------|-------------------|-----------------------------------------|------|---------|
| Pywind | Oriol L & Arnau T | [Vortex FdC](https://www.vortexfdc.com) | 2024 | 3       |

# Chapter 3: Merge

_Overview_
---------
This script demonstrates the process of reading and processing various types of meteorological data files. The goal is to compare measurements from different sources and formats by resampling, interpolating, and merging the data for further analysis.

The script uses functions to load and manipulate data from four distinct file formats:

1. **Measurements (NetCDF)** - Contains multiple heights and variables.
2. **Vortex NetCDF** - NetCDF file format with multiple heights and variables.
3. **Vortex Text Series** - Text file containing time series data of meteorological measurements.
4. **Measurements Text Series** - Text file containing time series data of observations.

_Data Storage_
-------------
The acquired data is stored and processed in two data structures for comparison and analysis:
- **Xarray Dataset**: For handling multi-dimensional arrays of the meteorological data, useful for complex operations and transformations.
- **Pandas DataFrame**: For flexible and powerful data manipulation and analysis, allowing easy integration and comparison of different datasets.

_Objective_
----------
- **Read and Interpolate Data**: Load data from NetCDF and text files, and interpolate Vortex data to match the measurement levels.
- **Resample Data**: Convert the time series data to an hourly frequency to ensure uniformity in the analysis.
- **Data Comparison**: Merge the datasets to facilitate a detailed comparison of measurements from different sources.
- **Statistical Overview**: Utilize the `describe()` method from Pandas for a quick statistical summary of the datasets, providing insights into the distribution and characteristics of the data.
- **Concurrent Period Analysis**: Clean the data by removing non-concurrent periods (no data) to focus on the overlapping timeframes for accurate comparison.

By following these steps, the script aims to provide a comprehensive approach to handling and analyzing meteorological data from various sources, ensuring a clear understanding of the data's behavior and relationships.

### Import Libraries

In [4]:
import sys
import os

sys.path.append(os.path.join(os.getcwd(), '../examples'))
from example_3_merge_functions import *

### Define Paths and Site

In [5]:
SITE = 'froya'
pwd = os.getcwd()
base_path = str(os.path.join(pwd, '../data'))

print()
measurements_netcdf = os.path.join(base_path, f'{SITE}/measurements/obs.nc')
vortex_netcdf = os.path.join(base_path, f'{SITE}/vortex/SERIE/vortex.serie.era5.utc0.nc')

vortex_txt = os.path.join(base_path, f'{SITE}/vortex/SERIE/vortex.serie.era5.utc0.100m.txt')
measurements_txt = os.path.join(base_path, f'{SITE}/measurements/obs.txt')

# Print filenames
print('Measurements txt: ', measurements_txt)
print('Vortex txt: ', vortex_txt)


Measurements txt:  /home/oriol/vortex/git/pywind_private/notebooks/../data/froya/measurements/obs.txt
Vortex txt:  /home/oriol/vortex/git/pywind_private/notebooks/../data/froya/vortex/SERIE/vortex.serie.era5.utc0.100m.txt


### Read Vortex Series in NetCDF and Text

In [6]:
# Read NetCDF
ds_obs_nc = xr.open_dataset(measurements_netcdf)
ds_vortex_nc = xr.open_dataset(vortex_netcdf)
#ds_vortex_nc = ds_vortex_nc.rename_vars({'D': 'Dir'})

# Read Text Series
ds_vortex_txt = read_vortex_serie(vortex_txt)
df_obs_txt = read_vortex_obs_to_dataframe(measurements_txt)[['M', 'Dir']]
ds_obs_txt = convert_to_xarray(df_obs_txt)[['M', 'Dir']]

  df: pd.DataFrame = pd.read_csv(infile, **readcsv_kwargs)
  df: pd.DataFrame = pd.read_csv(infile, **readcsv_kwargs)


#### Interpolate Vortex Series to the same Measurements level. Select M and Dir

In [7]:
max_height = ds_obs_nc.squeeze().coords['lev'].max().values
print("Max height in measurements: ", max_height)
ds_obs_nc = ds_obs_nc.sel(lev=max_height).squeeze().reset_coords(drop=True)[['M', 'Dir']]

ds_vortex_nc = ds_vortex_nc.interp(lev=max_height).squeeze().reset_coords(drop=True)[['M', 'Dir']]
ds_vortex_txt = ds_vortex_txt[['M', 'Dir']].squeeze().reset_coords(drop=True)

Max height in measurements:  100.0


#### Measurements Time Resampling to Hourly

No need to perform any resampling to Vortex data, as SERIES products is already hourly.

In [8]:
# convert ds_obs_nc to hourly
ds_obs_nc = ds_obs_nc.resample(time='1H').mean()
# convert ds_obs_txt to hourly
ds_obs_txt = ds_obs_txt.resample(time='1H').mean()

  self.index_grouper = pd.Grouper(
  self.index_grouper = pd.Grouper(


#### Convert all to DataFrame, Rename and Merge

In [9]:
# convert to Pandas DataFrames
df_obs_nc = ds_obs_nc.to_dataframe()
df_vortex_nc = ds_vortex_nc.to_dataframe()
df_obs_txt = ds_obs_txt.to_dataframe()
df_vortex_txt = ds_vortex_txt.to_dataframe()

# rename columns so they do now have the same name when merging
df_obs_nc.columns = ['M_obs_nc', 'Dir_obs_nc']
df_vortex_nc.columns = ['M_vortex_nc', 'Dir_vortex_nc']
df_obs_txt.columns = ['M_obs_txt', 'Dir_obs_txt']
df_vortex_txt.columns = ['M_vortex_txt', 'Dir_vortex_txt']

# merge using index (time) all dataframes
df_nc = df_obs_nc.merge(df_vortex_nc, left_index=True, right_index=True)
df_txt = df_obs_txt.merge(df_vortex_txt, left_index=True, right_index=True)
df = df_nc.merge(df_txt, left_index=True, right_index=True)

#### Results

In [10]:
from IPython.display import display

display(df.head().round(2))
print()
display(df.describe().round(2))  

Unnamed: 0_level_0,M_obs_nc,Dir_obs_nc,M_vortex_nc,Dir_vortex_nc,M_obs_txt,Dir_obs_txt,M_vortex_txt,Dir_vortex_txt
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2009-11-18 13:00:00,3.62,159.0,5.81,114.23,3.62,159.0,5.8,114
2009-11-18 14:00:00,2.35,147.58,5.91,114.55,2.35,147.58,5.9,115
2009-11-18 15:00:00,1.25,162.58,5.83,117.74,1.26,162.58,5.8,118
2009-11-18 16:00:00,1.2,77.42,5.24,128.54,1.2,77.42,5.2,129
2009-11-18 17:00:00,1.84,124.08,4.22,151.34,1.84,124.08,4.2,151





Unnamed: 0,M_obs_nc,Dir_obs_nc,M_vortex_nc,Dir_vortex_nc,M_obs_txt,Dir_obs_txt,M_vortex_txt,Dir_vortex_txt
count,28809.0,28809.0,44275.0,44275.0,28809.0,28809.0,44275.0,44275.0
mean,8.06,178.51,8.22,180.63,8.06,178.51,8.21,180.72
std,4.73,91.88,4.71,88.93,4.73,91.88,4.71,89.36
min,0.29,2.17,0.1,0.04,0.29,2.17,0.1,0.0
25%,4.61,96.0,4.73,110.63,4.61,96.0,4.7,110.0
50%,7.04,196.5,7.37,190.02,7.04,196.5,7.4,190.0
75%,10.57,249.33,10.71,244.68,10.57,249.33,10.7,245.0
max,34.08,358.58,33.09,359.95,34.08,358.58,33.1,360.0


After Cleaning Nodatas: Concurrent period

In [11]:
# If you want to have only concurrent period, remove nodatas
df = df.dropna(how='any', axis=0)

display(df.head().round(2))
print()
display(df.describe().round(2))  

Unnamed: 0_level_0,M_obs_nc,Dir_obs_nc,M_vortex_nc,Dir_vortex_nc,M_obs_txt,Dir_obs_txt,M_vortex_txt,Dir_vortex_txt
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2009-11-18 13:00:00,3.62,159.0,5.81,114.23,3.62,159.0,5.8,114
2009-11-18 14:00:00,2.35,147.58,5.91,114.55,2.35,147.58,5.9,115
2009-11-18 15:00:00,1.25,162.58,5.83,117.74,1.26,162.58,5.8,118
2009-11-18 16:00:00,1.2,77.42,5.24,128.54,1.2,77.42,5.2,129
2009-11-18 17:00:00,1.84,124.08,4.22,151.34,1.84,124.08,4.2,151





Unnamed: 0,M_obs_nc,Dir_obs_nc,M_vortex_nc,Dir_vortex_nc,M_obs_txt,Dir_obs_txt,M_vortex_txt,Dir_vortex_txt
count,28809.0,28809.0,28809.0,28809.0,28809.0,28809.0,28809.0,28809.0
mean,8.06,178.51,8.0,178.24,8.06,178.51,7.99,178.32
std,4.73,91.88,4.51,93.59,4.73,91.88,4.51,94.1
min,0.29,2.17,0.1,0.06,0.29,2.17,0.1,0.0
25%,4.61,96.0,4.67,98.12,4.61,96.0,4.7,98.0
50%,7.04,196.5,7.24,191.0,7.04,196.5,7.2,191.0
75%,10.57,249.33,10.47,245.96,10.57,249.33,10.5,246.0
max,34.08,358.58,30.91,359.95,34.08,358.58,30.9,360.0


### Thank you for completing this Notebook! 
### *Other references available upon request.*

You now can:

- Read Vortex SERIES txt files.
- Convert from txt to NetCDF.
- Convert to **Pandas** DataFrames.
- Have a quick overview of the data using `head()` and `describe()` Pandas functions.
- Perform interpolation.
- Perform resampling.
- Merge datasets.

**Don't hesitate to [contact us](https://vortexfdc.com/contact/) for any questions and information.**

## Change Log


| Date (YYYY-MM-DD) | Version | Changed By | Change Description                         |
|-------------------|---------|------------|--------------------------------------------|
| 2024-07-23        | 0.0     | Arnau      | Notebook creation                          |
| 2025-02-10        | 0.1     | Oriol      | Notebook review                            |

<hr>

## <h3 align="center"> © Vortex F.d.C. 2024. All rights reserved. <h3/>