# Read DWD CDC Time Series, Merge with Station Description and Append 

The main idea behind this activity is to reformat and merge time series (here we use hourly precipitation) from the DWD Climate Data Center in such a way that it can be used with the **QGIS time manager extension**. 

This extension allows to filter an attribute table of a vector layer (e.g. points representing precipitation stations plus precipitation data) with a time stamp column. The extension limits the attribute table to the records matching the particular time stamp provided by the time manager extension (e.g. by the user moving the time slider). This selected subset of the attribute table is then used to change the sympology of the vector layer according to the variable of interest (e.g. precipitation rate).

The QGIS time manager extension approach is a bit brute force, because each individual measurement at a station at a given time is one feature (row in the table), i.e. a time series at station X with hourly resolution for a day (24 values) entails 24 different features with the same station id and the corresponding coordinates but different times. As of now this 1:n relationship can only be realized by importing a CSV file with the according structure. 

(At least I was not able to generate the required view on a 1:n relationship by merging a point vector layer with precipitation station locations and an imported CSV time series table.)

The final data format is a concatenation of time series together with geographic location in 2D (e.g. lat, lon). The required data format looks principly like this:


| station_id |        name        |   lat   |   lon  |        meas_time       | prec_rate |
|:----------:|:------------------:|:-------:|:------:|:----------------------:|:---------:|
|        ... | ...                |     ... |    ... |                    ... |       ... |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T08:00:00UTC |       1.5 |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T09:00:00UTC |       1.7 |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T10:00:00UTC |       0.1 |
|        ... | ...                |     ... |    ... |                    ... |       ... |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T08:00:00UTC |       0.8 |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T09:00:00UTC |       0.4 |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T10:00:00UTC |       0.0 |
|        ... | ...                |     ... |    ... |                    ... |       ... |


(Table generated with https://www.tablesgenerator.com/markdown_tables)

To achieve this the precipitation time series (station_id, meas_time, prec_rate) have to be merged with the station metadata (station_id, lat, lon) coming from the a CSV file generated in an earlier activity. We use Pandas to read, join and append the data to generate the final CSV file to be imported as a point layer to QGIS. 

This final data format is far from being optimal because of large size and highly redundant information. This is a challenge for QGIS which loses responsiveness with large data. To jsut show the principle it is advisable to limit to size of the problem. 

The following filters (selection criteria) are applied:

  * Precipitation stations in NRW only (approx. 127 stations) 
  * Hourly precipitation data
  * Time interval from 2018-12-01 to last date in precipitation data set 
  
Still: 40 days * 24 hrs / day * 127 stations = 121920 records leading to 121920 features in a point layer in QGIS. 

In fact, the resulting number of records is arround 91000. The reason might be that not all stations in the station list have time series. This has to be checked carefully.

## FTP Connection

### Connection Parameters

In [1]:
server = "opendata.dwd.de"
user   = "anonymous"
passwd = ""

### FTP Directory Definition and Station Description Filename Pattern

In [2]:
# The topic of interest.
#topic_dir = "/daily/more_precip/historical/"
topic_dir = "/daily/soil_temperature/historical/"

# This is the search pattern common to ALL station description file names 
station_desc_pattern = "_Beschreibung_Stationen.txt"

# Below this directory tree node all climate data are stored.
ftp_climate_data_dir = "/climate_environment/CDC/observations_germany/climate/"
ftp_dir =  ftp_climate_data_dir + topic_dir

### Local Directories

In [3]:
local_ftp_dir         = "data/original/DWD/"      # Local directory to store local ftp data copies, the local data source or input data. 
local_ftp_station_dir = local_ftp_dir + topic_dir # Local directory where local station info is located
local_ftp_ts_dir      = local_ftp_dir + topic_dir # Local directory where time series downloaded from ftp are located

local_generated_dir   = "data/generated/DWD/" # The generated of derived data in contrast to local_ftp_dir
local_station_dir     = local_generated_dir + topic_dir # Derived station data, i.e. the CSV file
local_ts_merged_dir   = local_generated_dir + topic_dir # Parallel merged time series, wide data frame with one TS per column
local_ts_appended_dir = local_generated_dir + topic_dir # Serially appended time series, long data frame for QGIS TimeManager Plugin

In [4]:
print(local_ftp_dir)
print(local_ftp_station_dir)
print(local_ftp_ts_dir)
print()
print(local_generated_dir)
print(local_station_dir)
print(local_ts_merged_dir)
print(local_ts_appended_dir)

data/original/DWD/
data/original/DWD//daily/soil_temperature/historical/
data/original/DWD//daily/soil_temperature/historical/

data/generated/DWD/
data/generated/DWD//daily/soil_temperature/historical/
data/generated/DWD//daily/soil_temperature/historical/
data/generated/DWD//daily/soil_temperature/historical/


In [5]:
import os
os.makedirs(local_ftp_dir,exist_ok = True) # it does not complain if the dir already exists.
os.makedirs(local_ftp_station_dir,exist_ok = True)
os.makedirs(local_ftp_ts_dir,exist_ok = True)

os.makedirs(local_generated_dir,exist_ok = True)
os.makedirs(local_station_dir,exist_ok = True)
os.makedirs(local_ts_merged_dir,exist_ok = True)
os.makedirs(local_ts_appended_dir,exist_ok = True)

### FTP Connect

In [6]:
import ftplib
ftp = ftplib.FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)

230 Login successful.


In [7]:
ret = ftp.cwd(".")

In [None]:
#ftp.quit()

### FTP Grab File Function

In [8]:
def grabFile(ftpfullname,localfullname):
    try:
        ret = ftp.cwd(".") # A dummy action to chack the connection and to provoke an exception if necessary.
        localfile = open(localfullname, 'wb')
        ftp.retrbinary('RETR ' + ftpfullname, localfile.write, 1024)
        localfile.close()
    
    except ftplib.error_perm:
        print("FTP ERROR. Operation not permitted. File not found?")

    except ftplib.error_temp:
        print("FTP ERROR. Timeout.")

    except ConnectionAbortedError:
        print("FTP ERROR. Connection aborted.")



### Generate Pandas Dataframe from FTP Directory Listing

In [9]:
import pandas as pd
import os

def gen_df_from_ftp_dir_listing(ftp, ftpdir):
    lines = []
    flist = []
    try:    
        res = ftp.retrlines("LIST "+ftpdir, lines.append)
    except:
        print("Error: ftp.retrlines() failed. ftp timeout? Reconnect!")
        return
        
    if len(lines) == 0:
        print("Error: ftp dir is empty")
        return
    
    for line in lines:
#        print(line)
        [ftype, fsize, fname] = [line[0:1], int(line[31:42]), line[56:]]
#        itemlist = [line[0:1], int(line[31:42]), line[56:]]
#        flist.append(itemlist)
        
        fext = os.path.splitext(fname)[-1]
        
        if fext == ".zip":
            station_id = int(fname.split("_")[2])
        else:
            station_id = -1 
        
        flist.append([station_id, fname, fext, fsize, ftype])
        
        

    df_ftpdir = pd.DataFrame(flist,columns=["station_id", "name", "ext", "size", "type"])
    return(df_ftpdir)

In [10]:
df_ftpdir = gen_df_from_ftp_dir_listing(ftp, ftp_dir)

In [11]:
df_ftpdir.head()

Unnamed: 0,station_id,name,ext,size,type
0,-1,BESCHREIBUNG_obsgermany_climate_daily_soil_tem...,.pdf,69925,-
1,-1,DESCRIPTION_obsgermany_climate_daily_soil_temp...,.pdf,69373,-
2,-1,EB_Tageswerte_Beschreibung_Stationen.txt,.txt,99393,-
3,3,tageswerte_EB_00003_19510101_20110331_hist.zip,.zip,219229,-
4,44,tageswerte_EB_00044_19810101_20191231_hist.zip,.zip,140794,-


### Dataframe with TS Zip Files

In [12]:
#df_ftpdir["ext"]==".zip"
df_zips = df_ftpdir[df_ftpdir["ext"]==".zip"]
df_zips.set_index("station_id", inplace = True)
df_zips.head()

Unnamed: 0_level_0,name,ext,size,type
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,tageswerte_EB_00003_19510101_20110331_hist.zip,.zip,219229,-
44,tageswerte_EB_00044_19810101_20191231_hist.zip,.zip,140794,-
52,tageswerte_EB_00052_19760101_20011231_hist.zip,.zip,99143,-
71,tageswerte_EB_00071_19880701_20031231_hist.zip,.zip,59893,-
72,tageswerte_EB_00072_19870101_19950531_hist.zip,.zip,35575,-


### Download the Station Description File

In [13]:
station_fname = df_ftpdir[df_ftpdir['name'].str.contains(station_desc_pattern)]["name"].values[0]
print(station_fname)

# ALternative
#station_fname2 = df_ftpdir[df_ftpdir["name"].str.match("^.*Beschreibung_Stationen.*txt$")]["name"].values[0]
#print(station_fname2)

EB_Tageswerte_Beschreibung_Stationen.txt


In [14]:
print("grabFile: ")
print("From: " + ftp_dir + station_fname)
print("To:   " + local_ftp_station_dir + station_fname)
grabFile(ftp_dir + station_fname, local_ftp_station_dir + station_fname)

grabFile: 
From: /climate_environment/CDC/observations_germany/climate//daily/soil_temperature/historical/EB_Tageswerte_Beschreibung_Stationen.txt
To:   data/original/DWD//daily/soil_temperature/historical/EB_Tageswerte_Beschreibung_Stationen.txt


In [15]:
# extract column names. They are in German (de)
# We have to use codecs because of difficulties with character encoding (German Umlaute)
import codecs

def station_desc_txt_to_csv(txtfile, csvfile):
    file = codecs.open(txtfile,"r","utf-8")
    r = file.readline()
    file.close()
    colnames_de = r.split()
    colnames_de
    
    translate = \
    {'Stations_id':'station_id',
     'von_datum':'date_from',
     'bis_datum':'date_to',
     'Stationshoehe':'altitude',
     'geoBreite': 'latitude',
     'geoLaenge': 'longitude',
     'Stationsname':'name',
     'Bundesland':'state'}
    
    colnames_en = [translate[h] for h in colnames_de]
    
    # Skip the first two rows and set the column names.
    df = pd.read_fwf(txtfile,skiprows=2,names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0)
    
    # write csv
    df.to_csv(csvfile, sep = ";")
    return(df)

In [16]:
basename = os.path.splitext(station_fname)[0]
df_stations = station_desc_txt_to_csv(local_ftp_station_dir + station_fname, local_station_dir + basename + ".csv")
df_stations.head()

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3,1951-01-01,2011-03-31,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
44,1981-01-01,2020-06-29,44,52.9336,8.237,Großenkneten,Niedersachsen
52,1976-01-01,2001-12-31,46,53.6623,10.199,Ahrensburg-Wulfsdorf,Schleswig-Holstein
71,1988-07-01,2003-12-31,759,48.2156,8.9784,Albstadt-Badkap,Baden-Württemberg
72,1987-01-01,1995-05-31,794,48.2766,9.0001,Albstadt-Onstmettingen,Baden-Württemberg


### Select Stations Located in NRW from Station Description Dataframe

In [17]:
isOperational = df_stations.date_to.max() 

station_ids_selected = df_stations[df_stations['state'].str.contains("Bayern")].index 
station_ids_selected
#isOperational


Int64Index([  125,   154,   221,   232,   282,   320,   361,   502,   685,
              856,   867,  1107,  1161,  1262,  1292,  1473,  1550,  1587,
             2023,  2261,  2290,  2360,  2410,  2488,  2521,  2542,  2559,
             2597,  2691,  2700,  2750,  2773,  2783,  2829,  2831,  2905,
             3056,  3139,  3271,  3366,  3379,  3390,  3484,  3485,  3565,
             3571,  3621,  3668,  3722,  3730,  3875,  3879,  3975,  4104,
             4280,  4287,  4354,  4438,  4559,  4592,  4706,  4911,  5032,
             5185,  5397,  5404,  5434,  5440,  5538,  5654,  5703,  5705,
             5800,  5802,  5856,  5904,  6158,  6219,  6312,  6336,  7075,
             7319,  7369,  7370,  7394,  7395,  7412,  7424, 15555],
           dtype='int64', name='station_id')

In [18]:
# Create variable with TRUE if state is Nordrhein-Westfalen
isNRW = df_stations['state'] == "Bayern"

# Create variable with TRUE if date_to is latest date (indicates operation up to now)
isOperational = df_stations['date_to'] == df_stations.date_to.max() 

isBefore1950 = df_stations['date_from'] < '1980'

# select on both conditions
dfNRW = df_stations[isNRW & isOperational & isBefore1950]
#print("Number of stations in NRW: \n", dfNRW.count())
dfNRW

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
232,1951-01-01,2020-06-29,461,48.4254,10.942,Augsburg,Bayern
282,1951-01-01,2020-06-29,240,49.8743,10.9206,Bamberg,Bayern
867,1953-01-01,2020-06-29,344,50.3066,10.9679,Lautertal-Oberlauter,Bayern
2261,1951-01-01,2020-06-29,565,50.3123,11.876,Hof,Bayern
2597,1949-09-01,2020-06-29,282,50.224,10.0792,"Kissingen, Bad",Bayern
3366,1953-01-01,2020-06-29,406,48.279,12.5024,Mühldorf,Bayern
3730,1961-01-01,2020-06-29,806,47.3984,10.2759,Oberstdorf,Bayern
5397,1972-01-01,2020-06-29,440,49.6663,12.1845,Weiden,Bayern
5404,1972-01-01,2020-06-29,477,48.4024,11.6946,Weihenstephan-Dürnast,Bayern
5440,1961-01-01,2020-06-29,439,49.0115,10.9308,Weißenburg-Emetzheim,Bayern


In [19]:
print(df_zips)

                                                      name   ext    size type
station_id                                                                   
3           tageswerte_EB_00003_19510101_20110331_hist.zip  .zip  219229    -
44          tageswerte_EB_00044_19810101_20191231_hist.zip  .zip  140794    -
52          tageswerte_EB_00052_19760101_20011231_hist.zip  .zip   99143    -
71          tageswerte_EB_00071_19880701_20031231_hist.zip  .zip   59893    -
72          tageswerte_EB_00072_19870101_19950531_hist.zip  .zip   35575    -
...                                                    ...   ...     ...  ...
14311       tageswerte_EB_14311_19870101_20051231_hist.zip  .zip   55798    -
15000       tageswerte_EB_15000_20110401_20191231_hist.zip  .zip   33708    -
15207       tageswerte_EB_15207_20131101_20191231_hist.zip  .zip   25379    -
15444       tageswerte_EB_15444_20140901_20191231_hist.zip  .zip   22584    -
15555       tageswerte_EB_15555_20160501_20191231_hist.zip  .zip

### Download TS Data from FTP Server

Problem: Not all stations listed in the station description file are associated with a time series (zip file)! The stations in the description file and the set of stations whoch are TS data provided for (zip files) do not match perfectly.  

In [20]:
list(dfNRW.index)

[232, 282, 867, 2261, 2597, 3366, 3730, 5397, 5404, 5440, 5705]

In [21]:
# Add the names of the zip files only to a list. 
local_zip_list = []

station_ids_selected = list(dfNRW.index)

for station_id in station_ids_selected:
    try:
        fname = df_zips["name"][station_id]
        print(fname)
        grabFile(ftp_dir + fname, local_ftp_ts_dir + fname)
        local_zip_list.append(fname)
    except:
        print("WARNING: TS file for key %d not found in FTP directory." % station_id)

tageswerte_EB_00232_19510101_20191231_hist.zip
tageswerte_EB_00282_19510101_20191231_hist.zip
tageswerte_EB_00867_19530101_20191231_hist.zip
tageswerte_EB_02261_19510101_20191231_hist.zip
tageswerte_EB_02597_19490901_20191231_hist.zip
tageswerte_EB_03366_19530101_20191231_hist.zip
tageswerte_EB_03730_19610101_20191231_hist.zip
tageswerte_EB_05397_19720101_20191231_hist.zip
tageswerte_EB_05404_19720101_20191231_hist.zip
tageswerte_EB_05440_19610101_20191231_hist.zip
tageswerte_EB_05705_19770101_20191231_hist.zip


### Join (Merge) the Time Series Columns

https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd


In [22]:
def temp_ts_to_df(fname):
    
    dateparse = lambda dates: [pd.datetime.strptime(str(d), '%Y%m%d') for d in dates]

    df = pd.read_csv(fname, delimiter=";", encoding="utf8", index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse, na_values = [-999.0, -999])

    #df = pd.read_csv(fname, delimiter=";", encoding="iso8859_2",\
    #             index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse)
    
    # https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd

    # Column headers: remove leading blanks (strip), replace " " with "_", and convert to lower case.
    df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
    df.index.name = df.index.name.strip().lower().replace(' ', '_').replace('(', '').replace(')', '')
    return(df)

In [None]:
#def temp_ts_to_df(fname):
    
#    dateparse = lambda dates: [pd.datetime.strptime(str(d), '%Y%m%d') for d in dates]

#    df = pd.read_csv(fname, delimiter=";", encoding="utf8", index_col="MESS_DATUM_BEGINN", parse_dates = ["MESS_DATUM_BEGINN"], date_parser = dateparse, na_values = [-999.0, -999])

    #df = pd.read_csv(fname, delimiter=";", encoding="iso8859_2",\
    #             index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse)
    
    # https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd

    # Column headers: remove leading blanks (strip), replace " " with "_", and convert to lower case.
#    df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
#    df.index.name = df.index.name.strip().lower().replace(' ', '_').replace('(', '').replace(')', '')
#    return(df)

In [23]:
from zipfile import ZipFile

In [24]:
def ts_merge():
    # Very compact code.
    df = pd.DataFrame()
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
            # read the time series data from the file starting with "produkt"
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = temp_ts_to_df(myfile)
#                s = dftmp["r1"].rename(dftmp["stations_id"][0]).to_frame()
                s = dftmp["stations_id"].to_frame()
                # outer merge.
                df = pd.merge(df, s, left_index=True, right_index=True, how='outer')

    #df.index.names = ["year"]
    df.index.rename(name = "time", inplace = True)
    return(df)

In [None]:
#def prec_ts_merge():
    # Very compact code.
#    df = pd.DataFrame()
#    for elt in local_zip_list:
#        ffname = local_ftp_ts_dir + elt
#        print("Zip archive: " + ffname)
#        with ZipFile(ffname) as myzip:
            # read the time series data from the file starting with "produkt"
#            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
#            print("Extract product file: %s" % prodfilename)
#            print()
#            with myzip.open(prodfilename) as myfile:
#                dftmp = temp_ts_to_df(myfile)
#                s = dftmp["ja_tt"].rename(dftmp["stations_id"][0]).to_frame()
                # outer merge.
#                df = pd.merge(df, s, left_index=True, right_index=True, how='outer')

    #df.index.names = ["year"]
#    df.index.rename(name = "time", inplace = True)
#    return(df)

In [25]:
df_merged_ts = ts_merge()

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00232_19510101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20191231_00232.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00282_19510101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20191231_00282.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00867_19530101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19530101_20191231_00867.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_02261_19510101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20191231_02261.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_02597_19490901_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19490901_20191231_02597.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_03366_1953

In [26]:
df_merged_ts.head()

Unnamed: 0_level_0,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1949-09-01,,,,,2597.0,,,,,,
1949-09-02,,,,,2597.0,,,,,,
1949-09-03,,,,,2597.0,,,,,,
1949-09-04,,,,,2597.0,,,,,,
1949-09-05,,,,,2597.0,,,,,,


In [None]:
#%matplotlib inline

In [None]:
#import matplotlib.pyplot as plt

In [None]:
#fig = plt.figure(dpi= 136, figsize=(8,4))
#ax1 = fig.add_subplot(221)
#ax2 = fig.add_subplot(222)
#ax3 = fig.add_subplot(223)
#ax4 = fig.add_subplot(224)
#df_merged_ts[1024].plot(ax = ax1)
#df_merged_ts[1232].plot(ax = ax1)
#df_merged_ts[1246].plot(ax = ax2)
#df_merged_ts[1277].plot(ax = ax3)
#df_merged_ts[1300].plot(ax = ax4)
#plt.show()



In [None]:
#import seaborn as sns
#import matplotlib.pyplot as plt
#%matplotlib inline

# plot
#sns.set_style('ticks')
#fig1, ax1 = plt.subplots(dpi = 400, figsize = (12,24))

#sns.heatmap(df_merged_ts, cmap='RdYlGn_r', annot=False, ax = ax1)
#sns.heatmap(df_merged_ts, cmap='coolwarm', annot=True, vmin = 8, vmax = 12, ax = ax1)

# _r reverses the normal order of the color map 'RdYlGn'

#sns.heatmap(df, cmap='coolwarm', annot=True, vmin = 8, vmax = 12, ax = ax)
#ax1.set_yticklabels(df_merged_ts.index.strftime('%Y'))
#plt.show()
#fig1.savefig('example1.png')

In [27]:
df_merged_ts.to_csv(local_ts_merged_dir + "ts_merged_temp.csv",sep=";")

In [28]:
df_merged_ts_transposed = df_merged_ts.transpose()

In [29]:
df_merged_ts_transposed.index.names = ['station_id']

In [30]:
df_merged_ts_transposed.head()

time,1949-09-01,1949-09-02,1949-09-03,1949-09-04,1949-09-05,1949-09-06,1949-09-07,1949-09-08,1949-09-09,1949-09-10,...,2019-12-22,2019-12-23,2019-12-24,2019-12-25,2019-12-26,2019-12-27,2019-12-28,2019-12-29,2019-12-30,2019-12-31
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
stations_id_x,,,,,,,,,,,...,232.0,232.0,232.0,232.0,232.0,232.0,232.0,232.0,232.0,232.0
stations_id_y,,,,,,,,,,,...,282.0,282.0,282.0,282.0,282.0,282.0,282.0,282.0,282.0,282.0
stations_id_x,,,,,,,,,,,...,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0,867.0
stations_id_y,,,,,,,,,,,...,2261.0,2261.0,2261.0,2261.0,2261.0,2261.0,2261.0,2261.0,2261.0,2261.0
stations_id_x,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,...,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0,2597.0


In [31]:
df_merged_ts_transposed.to_csv(local_ts_merged_dir + "ts_merged_transposed.csv",sep=";")

In [32]:
def ts_append():
    # Very compact code.
    df = pd.DataFrame()
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
            # read the time series data from the file starting with "produkt"
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = temp_ts_to_df(myfile)
                dftmp=dftmp.loc[(dftmp.index>='2016-06-01 00:00:00')&(dftmp.index<='2020-06-01 00:00:00')]
                dftmp = dftmp.merge(df_stations,how="inner",left_on="stations_id",right_on="station_id",right_index=True)
#                print(dftmp.head(5))
                df = df.append(dftmp)

    #df.index.names = ["year"]
    #df.index.rename(name = "time", inplace = True)
    return(df)

In [33]:
df_appended_ts = ts_append()

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00232_19510101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20191231_00232.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00282_19510101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20191231_00282.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00867_19530101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19530101_20191231_00867.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_02261_19510101_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20191231_02261.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_02597_19490901_20191231_hist.zip
Extract product file: produkt_erdbo_tag_19490901_20191231_02597.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_03366_1953

In [34]:
df_appended_ts.head()

Unnamed: 0_level_0,stations_id,qn_2,v_te002m,v_te005m,v_te010m,v_te020m,v_te050m,eor,date_from,date_to,altitude,latitude,longitude,name,state
mess_datum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2016-06-01,232,3,,17.9,17.6,17.1,14.9,eor,1951-01-01,2020-06-29,461,48.4254,10.942,Augsburg,Bayern
2016-06-02,232,3,,16.7,16.8,16.8,14.9,eor,1951-01-01,2020-06-29,461,48.4254,10.942,Augsburg,Bayern
2016-06-03,232,3,,17.9,17.5,16.8,14.8,eor,1951-01-01,2020-06-29,461,48.4254,10.942,Augsburg,Bayern
2016-06-04,232,3,,18.9,18.4,17.6,14.8,eor,1951-01-01,2020-06-29,461,48.4254,10.942,Augsburg,Bayern
2016-06-05,232,3,,18.6,18.3,17.8,15.0,eor,1951-01-01,2020-06-29,461,48.4254,10.942,Augsburg,Bayern


In [35]:
df_appended_ts.to_csv(local_ts_appended_dir + "ts_appended_temp.csv",sep=";")