# Get DWD CDC Station List for Climate Data

## 1. About the DWD Open Data Portal 

The data of the Climate Data Center (CDC) of the DWD (Deutscher Wetterdienst, German Weather Service) is provided on an **FTP server**. <br> **FTP** stands for _File Transfer Protocol_.

Open the FTP link ftp://opendata.dwd.de/climate_environment/CDC/ in your browser (copy-paste) and find our how it is structured hierarchically.

You can also open the link with **HTTPS** (Hypertext Transfer Protocol Secure): https://opendata.dwd.de/climate_environment/CDC/

**Download and read** the document https://opendata.dwd.de/climate_environment/CDC/Readme_intro_CDC_ftp.pdf

**Q1:** In which temporal resolutions are the time series provided?

**Q2:** What is the difference between _historical_ and _recent_ data also with respect to quality control?

**Q3:** Are all meteorological parameters provided at the same temporal resolution?


## 2. Download the Station Meta Data 

We are interested in observations with following properties:

1. The observations are taken in Germany.
1. It is climate data.
1. The temporal resolution is annual.
1. Use historial data, nt recent.


Download the corresonding station meta data file (description) from the FTP server. The file you have to download is named `KL_Jahreswerte_Beschreibung_Stationen.txt`. The elements of the file name denote:

* KL: Klima, Climate, 
* Jahreswerte: Annual Values, 
* Beschreibung: Description, 
* Stationen: Stations

**Q1:** Under with path (directory, folder) on the FTP server do you find the file?

**Q2:** The Python FTP client we use is provided through the library _ftplib_: <br>
https://pythonprogramming.net/ftp-transfers-python-ftplib/ <br>
How to you use it?

**Q3:** Look at the code below. In which folder is the data stored locally? What is are relative and absolute paths?

In [1]:
server = "opendata.dwd.de"
user = "anonymous"
passwd = ""
# COMPLETE THE PATH: dir = "/climate_environment/CDC/observations_germany/..."
filename = "KL_Jahreswerte_Beschreibung_Stationen.txt"
localpath = "data"

In [3]:
from ftplib import FTP

In [4]:
#domain name or server ip:
ftp = FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)
res = ftp.cwd(dir)
print(res)
ftp.dir()

230 Login successful.
250 Directory successfully changed.
-rw-r--r--    1 9261     15101      232713 Sep 30 08:32 KL_Jahreswerte_Beschreibung_Stationen.txt
-rw-r--r--    1 9261     15101       11277 Jun 06 10:13 jahreswerte_KL_00001_19310101_19851231_hist.zip
-rw-r--r--    1 9261     15101       17870 Jun 06 10:18 jahreswerte_KL_00003_18510101_20101231_hist.zip
-rw-r--r--    1 9261     15101       13749 Jun 06 10:18 jahreswerte_KL_00044_19720101_20181231_hist.zip
-rw-r--r--    1 9261     15101       11892 Jun 06 10:22 jahreswerte_KL_00052_19730101_20011231_hist.zip
-rw-r--r--    1 9261     15101        9108 Jun 06 10:13 jahreswerte_KL_00061_19760101_19771231_hist.zip
-rw-r--r--    1 9261     15101        9685 Jun 06 10:13 jahreswerte_KL_00070_19740101_19851231_hist.zip
-rw-r--r--    1 9261     15101       12641 Jun 06 10:13 jahreswerte_KL_00071_19870101_20181231_hist.zip
-rw-r--r--    1 9261     15101       11237 Jun 06 10:13 jahreswerte_KL_00072_19790101_19941231_hist.zip
-rw-r--r--  

-rw-r--r--    1 9261     15101       17176 Jun 06 10:20 jahreswerte_KL_00722_18960101_20181231_hist.zip
-rw-r--r--    1 9261     15101       10679 Jun 06 10:17 jahreswerte_KL_00727_19530101_19861231_hist.zip
-rw-r--r--    1 9261     15101        6447 Jun 06 10:22 jahreswerte_KL_00729_19040101_19801231_hist.zip
-rw-r--r--    1 9261     15101       13078 Jun 06 10:19 jahreswerte_KL_00736_18620101_19991231_hist.zip
-rw-r--r--    1 9261     15101       16236 Jun 06 10:13 jahreswerte_KL_00755_18810101_20181231_hist.zip
-rw-r--r--    1 9261     15101       12349 Jun 06 10:13 jahreswerte_KL_00757_19910101_20181231_hist.zip
-rw-r--r--    1 9261     15101       12578 Jun 06 10:18 jahreswerte_KL_00760_19770101_20031231_hist.zip
-rw-r--r--    1 9261     15101        4562 Jun 06 10:19 jahreswerte_KL_00766_20020101_20081231_hist.zip
-rw-r--r--    1 9261     15101       12049 Jun 06 10:18 jahreswerte_KL_00769_19790101_20051231_hist.zip
-rw-r--r--    1 9261     15101       11368 Jun 06 10:17 jahreswe

In [5]:
def grabFile(filename,localpath):
    localfile = open(localpath+"/"+filename, 'wb')
    ftp.retrbinary('RETR ' + filename, localfile.write, 1024)
    localfile.close()

In [6]:
grabFile(filename,localpath)

In [7]:
# Finally disconnect from the FPT Server
res = ftp.quit()
print(res)

221 Goodbye.


## 3. Read the Station Data into a Pandas Dataframe

The Station Data is in fixed column format. Pandas provides a reader for text files with fixed column width.  

Search the Pandas doc https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html for this fixed column reader. Learn how to use it and read the station data file into a dataframe.

Hint: Count the characters per column (column wdith) in a text editor.

### Extract column names and translate them from DE to EN.

In [8]:
# extract column names. They are in German (de)
file = open(localpath+"/"+filename,"r")
r = file.readline()
file.close()
colnames_de = r.split()
colnames_de

['Stations_id',
 'von_datum',
 'bis_datum',
 'Stationshoehe',
 'geoBreite',
 'geoLaenge',
 'Stationsname',
 'Bundesland']

In [9]:
# translation dictionary
translate = \
{'Stations_id':'station_id',
 'von_datum':'date_from',
 'bis_datum':'date_to',
 'Stationshoehe':'altitude',
 'geoBreite': <fill in!>,
 'geoLaenge': <fill in!>,
 'Stationsname':'name',
 'Bundesland':'state'}

In [10]:
for h in colnames_de:
    print(translate[h])

station_id
date_from
date_to
altitude
latitude
longitude
name
state


In [11]:
# Pythonic
colnames_en = [translate[h] for h in colnames_de]
print(colnames_en)

['station_id', 'date_from', 'date_to', 'altitude', 'latitude', 'longitude', 'name', 'state']


### Read the formatted data with pd.read_fwf().

In [15]:
import pandas as pd

In [16]:
help(pd.read_fwf)

Help on function read_fwf in module pandas.io.parsers:

read_fwf(filepath_or_buffer, colspecs='infer', widths=None, **kwds)
    Read a table of fixed-width formatted lines into DataFrame
    
    Also supports optionally iterating or breaking of the file
    into chunks.
    
    Additional help can be found in the `online docs for IO Tools
    <http://pandas.pydata.org/pandas-docs/stable/io.html>`_.
    
    Parameters
    ----------
    filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any \
    object with a read() method (such as a file handle or StringIO)
        The string could be a URL. Valid URL schemes include http, ftp, s3, and
        file. For file URLs, a host is expected. For instance, a local file could
        be file://localhost/path/to/table.csv
    colspecs : list of pairs (int, int) or 'infer'. optional
        A list of pairs (tuples) giving the extents of the fixed-width
        fields of each line as half-open intervals (i.e.,  [from, to[ ).
  

In [14]:
# Skip the first two rows and set the column names.
df = pd.read_fwf(localpath+"/"+filename,skip...<fill in!>,names=colnames_en)
df.head()

Unnamed: 0,station_id,date_from,date_to,altitude,latitude,longitude,name,state
0,1,19310101,19851231,478,47.8413,8.8493,Aach,Baden-Württemberg
1,3,18510101,20101231,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
2,44,19720101,20181231,44,52.9336,8.237,Großenkneten,Niedersachsen
3,52,19730101,20011231,46,53.6623,10.199,Ahrensburg-Wulfsdorf,Schleswig-Holstein
4,61,19760101,19771231,339,48.8443,12.6171,Aiterhofen,Bayern


In [17]:
# Better parse dates! Column 0 should be treated as index. It makes the later export with pd.to_csv() easier.
df = pd.read_fwf(localpath+"/"+filename,skip...<fill in!>,names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0)
df.head()

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,1931-01-01,1985-12-31,478,47.8413,8.8493,Aach,Baden-Württemberg
3,1851-01-01,2010-12-31,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
44,1972-01-01,2018-12-31,44,52.9336,8.237,Großenkneten,Niedersachsen
52,1973-01-01,2001-12-31,46,53.6623,10.199,Ahrensburg-Wulfsdorf,Schleswig-Holstein
61,1976-01-01,1977-12-31,339,48.8443,12.6171,Aiterhofen,Bayern


In [18]:
df.shape

(1151, 7)

## 4. Export the dataframe as CSV file

Use semicolons as field delimiters.

In [19]:
# extract basename (Filename) without extension
import os
fname = os.path.splitext(filename)[0]
csvname = fname + ".csv"
print(csvname)

df.to_csv(localpath+"/"+csvname, sep =";")

KL_Jahreswerte_Beschreibung_Stationen.csv


## 5. Import the CSV as point vector layer into QGIS.

## 6. Download the zip-Archive with the Digital Administrative Boundaries



https://www.opengeodata.nrw.de/produkte/geobasis/tsk/dvg/dvg1/

dvg1_EPSG25832_Shape.zip

DVG: Digitale Verwaltungsgrenzen, DVG1 has more details than DVG2.

How to use the data: https://www.opengeodata.nrw.de/produkte/geobasis/tsk/dvg/dvg1/Nutzerinformationen.pdf

Download the pdf and use Google Translate (GT) to translate the pdf (upload to GT).

https://www.bezreg-koeln.nrw.de/brk_internet/geobasis/topographie_sonderkarten/verwaltungsgrenzen/index.html

## Homework: Create a Map in QGIS

Follow the tutorial http://www.qgistutorials.com/en/docs/3/making_a_map.html

In class we created a vector data layer (point shape file) with the coordinates of the DWD CDC climate stations from a CSV file we generated from the meta data file downloaded from the open data DWD FTP archive (yearly values, temperature).

Create a map of the DWD climate stations located in NRW. Use a shapefile of the NRW administrative boundaries.

Use the EPSG:28532 coordinate reference system (projection). We will learn later what it is.
