# Setup

conda install netcdf4, xarray

# ERA5 Summary

Reanalysis combines model data with observations from across the world into a globally complete and consistent datas.

Coordinates:
number: An integer coordinate, possibly indicating ensemble member or forecast number.     
valid_time: Datetime values indicating the valid times of the data.     
```pressure_level: Pressure levels in hPa.```      
latitude: Latitude values in degrees.    
longitude: Longitude values in degrees.    
expver: Experiment version, likely indicating the version of the reanalysis data.   

Data Variables:
t: Temperature values (in Kelvin).       
```u: Zonal wind component (east-west direction, in m/s).```       
```v: Meridional wind component (north-south direction, in m/s).```        
w: Vertical wind component (up-down direction, in m/s).        

# Open nc file

When you open the file without a group defined, you get the global attributes with no variables. You need to include a group='PRODUCT' to get the data product.

In [None]:
import xarray as xr  # install xarray and netCDF4
import pandas as pd
file_path = r"/Users/joonwonlee/Documents/GEMS_TCO-1/era5/era5_data.nc"     # mac
# file_path = r"C:\Users\joonw\tco\GEMS_TCO-2\era5\era5_data.nc"       # window
ds = xr.open_dataset(file_path)
ds # ERA5 REanalysis data

# Copernicus Climate Data Store 1 and the ECMWF Reanalysis v5 (ERA5) 2.

In [None]:
# Load the dataset
# file_path = r"C:\Users\joonw\tco\GEMS_TCO-2\era5\era5_data.nc"

file_path = r"/Users/joonwonlee/Documents/GEMS_TCO-1/era5/era5_data.nc"     # mac

ds = xr.open_dataset(file_path)

# Convert the dataset to a pandas DataFrame
df = ds.to_dataframe().reset_index()

# Display the DataFrame

# Close the NetCDF file
ds.close()

df = df[df['pressure_level']<=30]
print(df.head())
# Group by 'valid_time' and calculate the mean of 'u' and 'v'
df = df.groupby('valid_time')[['u', 'v']].mean().reset_index()

# Convert wind components from m/s to degrees/h
# Assuming 1 degree of latitude is approximately 111 km
df['u'] = (df['u'] * 3600 / 111000).round(2)
df['v'] = (df['v'] * 3600 / 111000).round(2)


# Save the DataFrame to a CSV file
df.to_csv(r"C:\Users\joonw\tco\GEMS_TCO-2\era5\era5.csv", index=False)

# Display the updated DataFrame
print(df)




    valid_time  pressure_level  latitude  longitude  number expver  \
693 2024-07-01            30.0      10.0      110.0       0   0001   
694 2024-07-01            30.0      10.0      110.5       0   0001   
695 2024-07-01            30.0      10.0      111.0       0   0001   
696 2024-07-01            30.0      10.0      111.5       0   0001   
697 2024-07-01            30.0      10.0      112.0       0   0001   

              t         u         v         w  
693  217.629578 -6.180130 -2.036636  0.007358  
694  217.292664 -6.139114 -1.595230  0.006420  
695  217.103210 -5.965286 -1.225113  0.004606  
696  217.078796 -5.752396 -1.015152  0.002258  
697  217.241882 -5.588333 -0.809097 -0.000155  
            valid_time     u     v
0  2024-07-01 00:00:00 -0.61 -0.00
1  2024-07-01 03:00:00 -0.59  0.04
2  2024-07-01 06:00:00 -0.57  0.08
3  2024-07-02 00:00:00 -0.55 -0.04
4  2024-07-02 03:00:00 -0.57 -0.01
..                 ...   ...   ...
88 2024-07-30 03:00:00 -0.94 -0.00
89 2024-07-

In [22]:
df.head(50)

Unnamed: 0,valid_time,u,v
0,2024-07-01 00:00:00,-0.61,-0.0
1,2024-07-01 03:00:00,-0.59,0.04
2,2024-07-01 06:00:00,-0.57,0.08
3,2024-07-02 00:00:00,-0.55,-0.04
4,2024-07-02 03:00:00,-0.57,-0.01
5,2024-07-02 06:00:00,-0.55,-0.0
6,2024-07-03 00:00:00,-0.6,0.01
7,2024-07-03 03:00:00,-0.6,0.06
8,2024-07-03 06:00:00,-0.55,0.09
9,2024-07-04 00:00:00,-0.64,-0.07
