# Get DWD CDC Station List for Climate Data

## 1. About the DWD Open Data Portal 

The data of the Climate Data Center (CDC) of the DWD (Deutscher Wetterdienst, German Weather Service) is provided on an **FTP server**. <br> **FTP** stands for _File Transfer Protocol_.

Open the FTP link ftp://opendata.dwd.de/climate_environment/CDC/ in your browser (copy-paste) and find our how it is structured hierarchically.

You can also open the link with **HTTPS** (Hypertext Transfer Protocol Secure): https://opendata.dwd.de/climate_environment/CDC/

**Download and read** the document https://opendata.dwd.de/climate_environment/CDC/Readme_intro_CDC_ftp.pdf

**Q1:** In which temporal resolutions are the time series provided? <br>
The parameters are provided in time series of 1min(only for percipitations), 10min, daily, monthly and many multitudes of yearlys resolutions. 

**Q2:** What is the difference between _historical_ and _recent_ data also with respect to quality control?<br>
_recent provides the un-revisited versions of the recent meta data in various time resolutions. And said to have not yet completed the full quality control. <br>
_historical is the archive of all the known meta data and the errors are made correct later. The process is continuous and they are versioned. They can be more reliable. <br>
**Q3:** Are all meteorological parameters provided at the same temporal resolution? <br>
Not all the meteorological parameters are not provided in the same temporal resolution wherin only the percipitation is provided in 1min intervals and others their own. 


## 2. Download the Station Meta Data 

We are interested in observations with following properties:

1. The observations are taken in Germany.
1. It is climate data.
1. The temporal resolution is annual.
1. Use historial data, nt recent.


Download the corresonding station meta data file (description) from the FTP server. The file you have to download is named `KL_Jahreswerte_Beschreibung_Stationen.txt`. The elements of the file name denote:

* KL: Klima, Climate, 
* Jahreswerte: Annual Values, 
* Beschreibung: Description, 
* Stationen: Stations

**Q1:** Under with path (directory, folder) on the FTP server do you find the file?

**Q2:** The Python FTP client we use is provided through the library _ftplib_: <br>
https://pythonprogramming.net/ftp-transfers-python-ftplib/ <br>
How to you use it?

**Q3:** Look at the code below. In which folder is the data stored locally? What is are relative and absolute paths?

In [1]:
server = "opendata.dwd.de"
user = "anonymous"
passwd = ""
dir = "/climate_environment/CDC/observations_germany/climate/annual/kl/historical/"
filename = "KL_Jahreswerte_Beschreibung_Stationen.txt"
localpath = "downloads"

In [2]:
from ftplib import FTP

In [3]:
#domain name or server ip:
ftp = FTP(server)
res = ftp.login(user=user, passwd = passwd)

res = ftp.cwd(dir)


ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

In [None]:
def grabFile(filename,localpath):
    localfile = open(localpath+"/"+filename, 'wb')
    ftp.retrbinary('RETR ' + filename, localfile.write, 1024)
    localfile.close()

In [4]:
grabFile(filename,localpath)

NameError: name 'grabFile' is not defined

In [5]:
# Finally disconnect from the FPT Server
res = ftp.quit()
print(res)

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

## 3. Read the Station Data into a Pandas Dataframe

The Station Data is in fixed column format. Pandas provides a reader for text files with fixed column width.  

Search the Pandas doc https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html for this fixed column reader. Learn how to use it and read the station data file into a dataframe.

Hint: Count the characters per column (column wdith) in a text editor.

### Extract column names and translate them from DE to EN.

In [20]:
# extract column names. They are in German (de)
file = open(localpath+"/"+filename,"r")
r = file.readline()
file.close()
colnames_de = r.split()
colnames_de

['Stations_id',
 'von_datum',
 'bis_datum',
 'Stationshoehe',
 'geoBreite',
 'geoLaenge',
 'Stationsname',
 'Bundesland']

In [21]:
# translation dictionary
translate = \
{'Stations_id':'station_id',
 'von_datum':'date_from',
 'bis_datum':'date_to',
 'Stationshoehe':'altitude',
 'geoBreite': 'Latitude',
 'geoLaenge': 'Longitude',
 'Stationsname':'name',
 'Bundesland':'state'}

In [22]:
for h in colnames_de:
    print(translate[h])

station_id
date_from
date_to
altitude
Latitude
Longitude
name
state


In [23]:
# Pythonic
colnames_en = [translate[y] for y in colnames_de]
print(colnames_en)

['station_id', 'date_from', 'date_to', 'altitude', 'Latitude', 'Longitude', 'name', 'state']


### Read the formatted data with pd.read_fwf().

In [24]:
import pandas as pd

In [25]:
help(pd.read_fwf)

Help on function read_fwf in module pandas.io.parsers.readers:

read_fwf(filepath_or_buffer: 'FilePathOrBuffer', colspecs='infer', widths=None, infer_nrows=100, **kwds)
    Read a table of fixed-width formatted lines into DataFrame.
    
    Also supports optionally iterating or breaking of the file
    into chunks.
    
    Additional help can be found in the `online docs for IO Tools
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_.
    
    Parameters
    ----------
    filepath_or_buffer : str, path object or file-like object
        Any valid string path is acceptable. The string could be a URL. Valid
        URL schemes include http, ftp, s3, and file. For file URLs, a host is
        expected. A local file could be:
        ``file://localhost/path/to/table.csv``.
    
        If you want to pass in a path object, pandas accepts any
        ``os.PathLike``.
    
        By file-like object, we refer to objects with a ``read()`` method,
        such as a fil

In [None]:
# Skip the first two rows and set the column names.

df = pd.read_fwf(localpath+"/"+filename,)
df.head()

In [28]:
# Better parse dates! Column 0 should be treated as index. It makes the later export with pd.to_csv() easier.
filename = "KL_Jahreswerte_Beschreibung_Stationen.txt"
df = pd.read_fwf(filename,skiprows=[0,1],names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0,encoding="cp1252")
df.head()

Unnamed: 0_level_0,date_from,date_to,altitude,Latitude,Longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,1931-01-01,1986-06-30,478,47.8413,8.8493,Aach,Baden-Württemberg
3,1851-01-01,2011-03-31,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
44,1971-03-01,2020-12-31,44,52.9336,8.237,Großenkneten,Niedersachsen
52,1973-01-01,2001-12-31,46,53.6623,10.199,Ahrensburg-Wulfsdorf,Schleswig-Holstein
61,1975-07-01,1978-08-31,339,48.8443,12.6171,Aiterhofen,Bayern


In [29]:
filename = "KL_Jahreswerte_Beschreibung_Stationen.txt"
df = pd.read_csv(filename,skiprows=[0,1],names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0,encoding ="cp1252" )

In [30]:
df.shape

(1178, 7)

## 4. Export the dataframe as CSV file

Use semicolons as field delimiters.

In [16]:
# extract basename (Filename) without extension
import os
fname = os.path.splitext(filename)[0]
csvname = fname + ".csv"
print(csvname)

df.to_csv(localpath+"/"+csvname, sep =";")

KL_Jahreswerte_Beschreibung_Stationen.csv


## 5. Import the CSV as point vector layer into QGIS.

## 6. Download the zip-Archive with the Digital Administrative Boundaries



https://www.opengeodata.nrw.de/produkte/geobasis/tsk/dvg/dvg1/

dvg1_EPSG25832_Shape.zip

DVG: Digitale Verwaltungsgrenzen, DVG1 has more details than DVG2.

How to use the data: https://www.opengeodata.nrw.de/produkte/geobasis/tsk/dvg/dvg1/Nutzerinformationen.pdf

Download the pdf and use Google Translate (GT) to translate the pdf (upload to GT).

https://www.bezreg-koeln.nrw.de/brk_internet/geobasis/topographie_sonderkarten/verwaltungsgrenzen/index.html

## Homework: Create a Map in QGIS

Follow the tutorial http://www.qgistutorials.com/en/docs/3/making_a_map.html

In class we created a vector data layer (point shape file) with the coordinates of the DWD CDC climate stations from a CSV file we generated from the meta data file downloaded from the open data DWD FTP archive (yearly values, temperature).

Create a map of the DWD climate stations located in NRW. Use a shapefile of the NRW administrative boundaries.

Use the EPSG:28532 coordinate reference system (projection). We will learn later what it is.
