# Setup

conda install netcdf4, xarray

# ERA5 Summary

Reanalysis combines model data with observations from across the world into a globally complete and consistent datas.

Coordinates:
number: An integer coordinate, possibly indicating ensemble member or forecast number.     
valid_time: Datetime values indicating the valid times of the data.     
```pressure_level: Pressure levels in hPa.```      
latitude: Latitude values in degrees.    
longitude: Longitude values in degrees.    
expver: Experiment version, likely indicating the version of the reanalysis data.   

Data Variables:
t: Temperature values (in Kelvin).       
```u: Zonal wind component (east-west direction, in m/s).```       
```v: Meridional wind component (north-south direction, in m/s).```        
w: Vertical wind component (up-down direction, in m/s).        

# Open nc file

When you open the file without a group defined, you get the global attributes with no variables. You need to include a group='PRODUCT' to get the data product.

In [2]:
import xarray as xr
import pandas as pd
file_path = r"C:\Users\joonw\tco\GEMS_TCO-2\era5\era5_data.nc"
ds = xr.open_dataset(file_path)
ds # ERA5 REanalysis data

# Copernicus Climate Data Store 1 and the ECMWF Reanalysis v5 (ERA5) 2.

In [None]:
# Load the dataset
file_path = r"C:\Users\joonw\tco\GEMS_TCO-2\era5\era5_data.nc"
# file_path = r"C:\\Users\\joonw\\Downloads\\ct5km_ssta_v3.1_20230101.nc"
ds = xr.open_dataset(file_path)

# Convert the dataset to a pandas DataFrame
df = ds.to_dataframe().reset_index()

# Display the DataFrame
print(df.head())

# Close the NetCDF file
ds.close()

# Group by 'valid_time' and calculate the mean of 'u' and 'v'
df = df.groupby('valid_time')[['u', 'v']].mean().reset_index()

# Convert wind components from m/s to degrees/h
# Assuming 1 degree of latitude is approximately 111 km
df['u'] = (df['u'] * 3600 / 111000).round(2)
df['v'] = (df['v'] * 3600 / 111000).round(2)


# Save the DataFrame to a CSV file
df.to_csv(r"C:\Users\joonw\tco\GEMS_TCO-2\era5\era5.csv", index=False)

# Display the updated DataFrame
print(df.head())




  valid_time  pressure_level  latitude  longitude  number expver           t  \
0 2024-07-01           100.0      10.0      110.0       0   0001  192.954239   
1 2024-07-01           100.0      10.0      110.5       0   0001  192.872208   
2 2024-07-01           100.0      10.0      111.0       0   0001  192.834122   
3 2024-07-01           100.0      10.0      111.5       0   0001  192.854630   
4 2024-07-01           100.0      10.0      112.0       0   0001  192.946426   

           u         v         w  
0 -22.386612 -9.133911  0.007045  
1 -21.660049 -9.380005  0.007327  
2 -20.870987 -9.647583  0.008303  
3 -20.333878 -9.837036  0.012591  
4 -20.031143 -9.907349  0.016322  
           valid_time     u     v
0 2024-07-01 00:00:00 -0.61 -0.04
1 2024-07-01 03:00:00 -0.61  0.00
2 2024-07-01 06:00:00 -0.60  0.03
3 2024-07-02 00:00:00 -0.58 -0.04
4 2024-07-02 03:00:00 -0.59 -0.01


In [13]:
df2 = df[ (df['valid_time']== "2024-07-01 00:00:00") & (df['pressure_level']<=30)].copy(deep=True)

