# Import temperature data from the DWD and process it

This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.

To reduce computing time, we also crop all data earlier than 2007. 

Files should be executed in the following pipeline:
* 1-dwd_konverter_download
* 2-dwd_konverter_extract
* 3-dwd_konverter_build_df
* 4-dwd_konverter_final_processing

## 1.) Download files from the DWD-API
Here we download all relevant files from the DWS Server. The DWD Server is http-based, so we scrape the download page for all links that match 'stundenwerte_TU_.\*_hist.zip' and download them to the folder 'download'. 

Link to the relevant DWD-page: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/

In [1]:
import requests
import re
from bs4 import BeautifulSoup
from pathlib import Path

# Set base values
download_folder = Path.cwd() / 'download'
base_url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/'


# Initiate Session and get the Index-Page
with requests.Session() as s:
    resp = s.get(base_url)

# Parse the Index-Page for all relevant <a href> 
soup = BeautifulSoup(resp.content)
links = soup.findAll("a", href=re.compile("stundenwerte_TU_.*_hist.zip"))

# For testing, only download 10 files
file_max = 10
dl_count = 0

#Download the .zip files to the download_folder
for link in links:
    zip_response = requests.get(base_url + link['href'], stream=True)
    # Limit the downloads while testing
    dl_count += 1
    if dl_count > file_max:
        break
    with open(Path(download_folder) / link['href'], 'wb') as file:
        for chunk in zip_response.iter_content(chunk_size=128):
            file.write(chunk)  
    
print('Done')

Done
