# Exploratory Data Analysis (EDA) of Processed ERA5 Data

This notebook explores the processed ERA5 data for Bonn, Germany (Jan-Jun 2024).

## 1. Load Processed Data

In [3]:
import xarray as xr
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

processed_path = '../data/processed/era5_processed_Bonn_2024_months_1-6.nc'
ds = xr.open_dataset(processed_path)
ds

In [8]:
    print(ds)

<xarray.Dataset> Size: 210kB
Dimensions:     (valid_time: 4368, latitude: 1, longitude: 1)
Coordinates:
    number      int64 8B 0
  * valid_time  (valid_time) datetime64[ns] 35kB 2024-01-01 ... 2024-06-30T23...
  * latitude    (latitude) float64 8B 50.7
  * longitude   (longitude) float64 8B 7.0
    expver      (valid_time) <U4 70kB '0001' '0001' '0001' ... '0001' '0001'
Data variables:
    u10         (valid_time, latitude, longitude) float32 17kB 3.177 ... 0.8687
    v10         (valid_time, latitude, longitude) float32 17kB 5.32 ... 0.7886
    t2m         (valid_time, latitude, longitude) float32 17kB 281.0 ... 289.2
    sp          (valid_time, latitude, longitude) float32 17kB 9.82e+04 ... 9...
    cbh         (valid_time, latitude, longitude) float32 17kB 2.698e+03 ... ...
    tcc         (valid_time, latitude, longitude) float32 17kB 0.9035 ... 1.0
Attributes:
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    G

## 2. Summary Statistics and Missing Value Check

In [4]:
# Convert to DataFrame for easier stats
df = ds.to_dataframe().reset_index()
# Show summary statistics for key variables
key_vars = [
    'surface_solar_radiation_downwards_w_m2',
    '2m_temperature_c',
    '10m_wind_speed',
    'total_precipitation',
    'total_cloud_cover'
]
df[key_vars].describe()

KeyError: "None of [Index(['surface_solar_radiation_downwards_w_m2', '2m_temperature_c',\n       '10m_wind_speed', 'total_precipitation', 'total_cloud_cover'],\n      dtype='object')] are in the [columns]"

In [None]:
# Check for missing values
df[key_vars].isnull().sum()

## 3. Interactive Time Series Plots (Plotly)

In [None]:
for var in key_vars:
    fig = px.line(df, x='time', y=var, title=f'Time Series of {var}')
    fig.show()

## 4. Interactive Histograms and Boxplots (Plotly)

In [None]:
for var in key_vars:
    fig = px.histogram(df, x=var, nbins=40, title=f'Histogram of {var}')
    fig.show()
    fig = px.box(df, y=var, title=f'Boxplot of {var}')
    fig.show()