SOURCE:
- https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate_urban/hourly/wind/recent/  
- https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate_urban/hourly/wind/recent/DESCRIPTION_obsgermany_climate_urban_hourly_wind_recent_en.pdf 
- https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate_urban/hourly/wind/recent/FF_STADT_Stundenwerte_Beschreibung_Stationen.txt  

In [2]:
# GET ALL THE JSONS INTO ONE DATAFRAME
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
import json


file_path_berlin = os.path.join("..", "winddaten_berlin","produkt_wind_399_akt.txt")# join because different os use either \ or / as file path separators
weather_station = pd.read_csv(file_path_berlin, names=['STATIONS_ID','MESS_DATUM','QUALITAETS_NIVEAU','STRUKTUR_VERSION', 'WINDGESCHWINDIGKEIT', 'WINDRICHTUNG', 'eor'], skiprows=1, sep=';') 
# TODO rename names
weather_station


Unnamed: 0,STATIONS_ID,MESS_DATUM,QUALITAETS_NIVEAU,STRUKTUR_VERSION,WINDGESCHWINDIGKEIT,WINDRICHTUNG,eor
0,399,2015082001,2,0,7.5,100,eor
1,399,2015082002,2,0,8.1,120,eor
2,399,2015082003,2,0,7.3,120,eor
3,399,2015082004,2,0,7.4,120,eor
4,399,2015082005,2,0,8.6,120,eor
...,...,...,...,...,...,...,...
65856,399,2024061220,2,0,7.7,260,eor
65857,399,2024061221,2,0,5.1,290,eor
65858,399,2024061222,2,0,5.9,290,eor
65859,399,2024061223,2,0,5.3,290,eor


##### Erklärung Windrichtung  
https://www.dwd.de/DE/service/lexikon/Functions/glossar.html?lv3=103182&lv2=102936  
Die Windrichtung wird bestimmt nach dem Polarwinkel (Azimut). Zur Richtungsangabe benutzt man die 360 Grad Skala des Kreises.   
Alle Richtungsangaben in Grad sind rechtweisend auf geographisch Nord bezogen, d.h.   
Ost  =  90 Grad,  
Süd  =  180 Grad,  
West =  270 Grad,   
Nord =  360 Grad.  

In [4]:
# Convert the 'dates' column to datetime and store it in a new column 'datetime'
weather_station['DATETIME'] = pd.to_datetime(weather_station['MESS_DATUM'], format='%Y%m%d%H')

In [5]:
weather_station

Unnamed: 0,STATIONS_ID,MESS_DATUM,QUALITAETS_NIVEAU,STRUKTUR_VERSION,WINDGESCHWINDIGKEIT,WINDRICHTUNG,eor,DATETIME
0,399,2015082001,2,0,7.5,100,eor,2015-08-20 01:00:00
1,399,2015082002,2,0,8.1,120,eor,2015-08-20 02:00:00
2,399,2015082003,2,0,7.3,120,eor,2015-08-20 03:00:00
3,399,2015082004,2,0,7.4,120,eor,2015-08-20 04:00:00
4,399,2015082005,2,0,8.6,120,eor,2015-08-20 05:00:00
...,...,...,...,...,...,...,...,...
65856,399,2024061220,2,0,7.7,260,eor,2024-06-12 20:00:00
65857,399,2024061221,2,0,5.1,290,eor,2024-06-12 21:00:00
65858,399,2024061222,2,0,5.9,290,eor,2024-06-12 22:00:00
65859,399,2024061223,2,0,5.3,290,eor,2024-06-12 23:00:00


In [6]:
wind = weather_station[['DATETIME','WINDGESCHWINDIGKEIT','WINDRICHTUNG']]
wind

Unnamed: 0,DATETIME,WINDGESCHWINDIGKEIT,WINDRICHTUNG
0,2015-08-20 01:00:00,7.5,100
1,2015-08-20 02:00:00,8.1,120
2,2015-08-20 03:00:00,7.3,120
3,2015-08-20 04:00:00,7.4,120
4,2015-08-20 05:00:00,8.6,120
...,...,...,...
65856,2024-06-12 20:00:00,7.7,260
65857,2024-06-12 21:00:00,5.1,290
65858,2024-06-12 22:00:00,5.9,290
65859,2024-06-12 23:00:00,5.3,290


In [7]:
wind.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65861 entries, 0 to 65860
Data columns (total 3 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   DATETIME             65861 non-null  datetime64[ns]
 1   WINDGESCHWINDIGKEIT  65861 non-null  float64       
 2   WINDRICHTUNG         65861 non-null  int64         
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 1.5 MB


In [10]:
# SORT DATA BY DATETIME AND LOOK FOR EARLIEST MEASUREMENT
df_sorted = wind.set_index('DATETIME').sort_index()
df_sorted.head()

Unnamed: 0_level_0,WINDGESCHWINDIGKEIT,WINDRICHTUNG
DATETIME,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-08-20 01:00:00,7.5,100
2015-08-20 02:00:00,8.1,120
2015-08-20 03:00:00,7.3,120
2015-08-20 04:00:00,7.4,120
2015-08-20 05:00:00,8.6,120


#### DECISION: FOR FURTHER ANALYSES AND MERGE: FIRST MEASUREMENTS ON AUGUST 20th 2015 OR BASED ON WHEN THE PARTICLE WAS MEASURED FIRST

In [None]:
# TODO combine the wind and pm10 data and do y-profiling & heatmap to find correlationA