# Read DWD CDC Time Series, Merge with Station Description and Append 

The main idea behind this activity is to reformat and merge time series (here we use hourly precipitation) from the DWD Climate Data Center in such a way that it can be used with the **QGIS time manager extension**. 

This extension allows to filter an attribute table of a vector layer (e.g. points representing precipitation stations plus precipitation data) with a time stamp column. The extension limits the attribute table to the records matching the particular time stamp provided by the time manager extension (e.g. by the user moving the time slider). This selected subset of the attribute table is then used to change the sympology of the vector layer according to the variable of interest (e.g. precipitation rate).

The QGIS time manager extension approach is a bit brute force, because each individual measurement at a station at a given time is one feature (row in the table), i.e. a time series at station X with hourly resolution for a day (24 values) entails 24 different features with the same station id and the corresponding coordinates but different times. As of now this 1:n relationship can only be realized by importing a CSV file with the according structure. 

(At least I was not able to generate the required view on a 1:n relationship by merging a point vector layer with precipitation station locations and an imported CSV time series table.)

The final data format is a concatenation of time series together with geographic location in 2D (e.g. lat, lon). The required data format looks principly like this:


| station_id |        name        |   lat   |   lon  |        meas_time       | prec_rate |
|:----------:|:------------------:|:-------:|:------:|:----------------------:|:---------:|
|        ... | ...                |     ... |    ... |                    ... |       ... |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T08:00:00UTC |       1.5 |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T09:00:00UTC |       1.7 |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T10:00:00UTC |       0.1 |
|        ... | ...                |     ... |    ... |                    ... |       ... |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T08:00:00UTC |       0.8 |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T09:00:00UTC |       0.4 |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T10:00:00UTC |       0.0 |
|        ... | ...                |     ... |    ... |                    ... |       ... |


(Table generated with https://www.tablesgenerator.com/markdown_tables)

To achieve this the precipitation time series (station_id, meas_time, prec_rate) have to be merged with the station metadata (station_id, lat, lon) coming from the a CSV file generated in an earlier activity. We use Pandas to read, join and append the data to generate the final CSV file to be imported as a point layer to QGIS. 

This final data format is far from being optimal because of large size and highly redundant information. This is a challenge for QGIS which loses responsiveness with large data. To jsut show the principle it is advisable to limit to size of the problem. 

The following filters (selection criteria) are applied:

  * Precipitation stations in NRW only (approx. 127 stations) 
  * Hourly precipitation data
  * Time interval from 2018-12-01 to last date in precipitation data set 
  
Still: 40 days * 24 hrs / day * 127 stations = 121920 records leading to 121920 features in a point layer in QGIS. 

In fact, the resulting number of records is arround 91000. The reason might be that not all stations in the station list have time series. This has to be checked carefully.

## FTP Connection

### Connection Parameters

In [54]:
server = "opendata.dwd.de"
user   = "anonymous"
passwd = ""

### FTP Directory Definition and Station Description Filename Pattern

In [8]:
# The topic of interest.
#topic_dir = "/hourly/precipitation/recent/"
topic_dir = "/daily/more_precip/historical/"

# This is the search pattern common to ALL station description file names 
station_desc_pattern = "_Beschreibung_Stationen.txt"

# Below this directory tree node all climate data are stored.
ftp_climate_data_dir = "/climate_environment/CDC/observations_germany/climate/"
ftp_dir =  ftp_climate_data_dir + topic_dir

### Local Directories

In [10]:
local_ftp_dir         = "data/original/DWD/"      # Local directory to store local ftp data copies, the local data source or input data. 
local_ftp_station_dir = local_ftp_dir + topic_dir # Local directory where local station info is located
local_ftp_ts_dir      = local_ftp_dir + topic_dir # Local directory where time series downloaded from ftp are located

local_generated_dir   = "data/generated/DWD/" # The generated of derived data in contrast to local_ftp_dir
local_station_dir     = local_generated_dir + topic_dir # Derived station data, i.e. the CSV file
local_ts_merged_dir   = local_generated_dir + topic_dir # Parallel merged time series, wide data frame with one TS per column
local_ts_appended_dir = local_generated_dir + topic_dir # Serially appended time series, long data frame for QGIS TimeManager Plugin

In [11]:
print(local_ftp_dir)
print(local_ftp_station_dir)
print(local_ftp_ts_dir)
print()
print(local_generated_dir)
print(local_station_dir)
print(local_ts_merged_dir)
print(local_ts_appended_dir)

data/original/DWD/
data/original/DWD//daily/more_precip/historical/
data/original/DWD//daily/more_precip/historical/

data/generated/DWD/
data/generated/DWD//daily/more_precip/historical/
data/generated/DWD//daily/more_precip/historical/
data/generated/DWD//daily/more_precip/historical/


In [12]:
import os
os.makedirs(local_ftp_dir,exist_ok = True) # it does not complain if the dir already exists.
os.makedirs(local_ftp_station_dir,exist_ok = True)
os.makedirs(local_ftp_ts_dir,exist_ok = True)

os.makedirs(local_generated_dir,exist_ok = True)
os.makedirs(local_station_dir,exist_ok = True)
os.makedirs(local_ts_merged_dir,exist_ok = True)
os.makedirs(local_ts_appended_dir,exist_ok = True)

### FTP Connect

In [13]:
import ftplib
ftp = ftplib.FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)

230 Login successful.


In [14]:
ret = ftp.cwd(".")

In [8]:
#ftp.quit()

### FTP Grab File Function

In [15]:
def grabFile(ftpfullname,localfullname):
    try:
        ret = ftp.cwd(".") # A dummy action to chack the connection and to provoke an exception if necessary.
        localfile = open(localfullname, 'wb')
        ftp.retrbinary('RETR ' + ftpfullname, localfile.write, 1024)
        localfile.close()
    
    except ftplib.error_perm:
        print("FTP ERROR. Operation not permitted. File not found?")

    except ftplib.error_temp:
        print("FTP ERROR. Timeout.")

    except ConnectionAbortedError:
        print("FTP ERROR. Connection aborted.")



### Generate Pandas Dataframe from FTP Directory Listing

In [17]:
import pandas as pd
import os

def gen_df_from_ftp_dir_listing(ftp, ftpdir):
    lines = []
    flist = []
    try:    
        res = ftp.retrlines("LIST "+ftpdir, lines.append)
    except:
        print("Error: ftp.retrlines() failed. ftp timeout? Reconnect!")
        return
        
    if len(lines) == 0:
        print("Error: ftp dir is empty")
        return
    
    for line in lines:
#        print(line)
        [ftype, fsize, fname] = [line[0:1], int(line[31:42]), line[56:]]
#        itemlist = [line[0:1], int(line[31:42]), line[56:]]
#        flist.append(itemlist)
        
        fext = os.path.splitext(fname)[-1]
        
        if fext == ".zip":
            station_id = int(fname.split("_")[2])
        else:
            station_id = -1 
        
        flist.append([station_id, fname, fext, fsize, ftype])
        
        

    df_ftpdir = pd.DataFrame(flist,columns=["station_id", "name", "ext", "size", "type"])
    return(df_ftpdir)

In [18]:
df_ftpdir = gen_df_from_ftp_dir_listing(ftp, ftp_dir)

In [19]:
df_ftpdir.head(10)

Unnamed: 0,station_id,name,ext,size,type
0,-1,BESCHREIBUNG_obsgermany_climate_daily_more_pre...,.pdf,72261,-
1,-1,DESCRIPTION_obsgermany_climate_daily_more_prec...,.pdf,71026,-
2,-1,RR_Tageswerte_Beschreibung_Stationen.txt,.txt,1202111,-
3,1,tageswerte_RR_00001_19120101_19860630_hist.zip,.zip,109677,-
4,2,tageswerte_RR_00002_19510101_20061231_hist.zip,.zip,82951,-
5,3,tageswerte_RR_00003_18910101_20110331_hist.zip,.zip,162410,-
6,4,tageswerte_RR_00004_19510101_19791031_hist.zip,.zip,45468,-
7,6,tageswerte_RR_00006_19821101_20181231_hist.zip,.zip,38084,-
8,7,tageswerte_RR_00007_19510101_19960131_hist.zip,.zip,69540,-
9,8,tageswerte_RR_00008_19310101_19911231_hist.zip,.zip,88001,-


### Dataframe with TS Zip Files

In [20]:
#df_ftpdir["ext"]==".zip"
df_zips = df_ftpdir[df_ftpdir["ext"]==".zip"]
df_zips.set_index("station_id", inplace = True)
df_zips.head(10)

Unnamed: 0_level_0,name,ext,size,type
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,tageswerte_RR_00001_19120101_19860630_hist.zip,.zip,109677,-
2,tageswerte_RR_00002_19510101_20061231_hist.zip,.zip,82951,-
3,tageswerte_RR_00003_18910101_20110331_hist.zip,.zip,162410,-
4,tageswerte_RR_00004_19510101_19791031_hist.zip,.zip,45468,-
6,tageswerte_RR_00006_19821101_20181231_hist.zip,.zip,38084,-
7,tageswerte_RR_00007_19510101_19960131_hist.zip,.zip,69540,-
8,tageswerte_RR_00008_19310101_19911231_hist.zip,.zip,88001,-
9,tageswerte_RR_00009_19920601_20101231_hist.zip,.zip,31560,-
10,tageswerte_RR_00010_19610101_20050831_hist.zip,.zip,66983,-
12,tageswerte_RR_00012_19410101_20061231_hist.zip,.zip,93793,-


### Download the Station Description File

In [21]:
station_fname = df_ftpdir[df_ftpdir['name'].str.contains(station_desc_pattern)]["name"].values[0]
print(station_fname)

# ALternative
#station_fname2 = df_ftpdir[df_ftpdir["name"].str.match("^.*Beschreibung_Stationen.*txt$")]["name"].values[0]
#print(station_fname2)

RR_Tageswerte_Beschreibung_Stationen.txt


In [22]:
print("grabFile: ")
print("From: " + ftp_dir + station_fname)
print("To:   " + local_ftp_station_dir + station_fname)
grabFile(ftp_dir + station_fname, local_ftp_station_dir + station_fname)

grabFile: 
From: /climate_environment/CDC/observations_germany/climate//daily/more_precip/historical/RR_Tageswerte_Beschreibung_Stationen.txt
To:   data/original/DWD//daily/more_precip/historical/RR_Tageswerte_Beschreibung_Stationen.txt


In [23]:
# extract column names. They are in German (de)
# We have to use codecs because of difficulties with character encoding (German Umlaute)
import codecs

def station_desc_txt_to_csv(txtfile, csvfile):
    file = codecs.open(txtfile,"r","utf-8")
    r = file.readline()
    file.close()
    colnames_de = r.split()
    colnames_de
    
    translate = \
    {'Stations_id':'station_id',
     'von_datum':'date_from',
     'bis_datum':'date_to',
     'Stationshoehe':'altitude',
     'geoBreite': 'latitude',
     'geoLaenge': 'longitude',
     'Stationsname':'name',
     'Bundesland':'state'}
    
    colnames_en = [translate[h] for h in colnames_de]
    
    # Skip the first two rows and set the column names.
    df = pd.read_fwf(txtfile,skiprows=2,names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0)
    
    # write csv
    df.to_csv(csvfile, sep = ";")
    return(df)

In [24]:
basename = os.path.splitext(station_fname)[0]
df_stations = station_desc_txt_to_csv(local_ftp_station_dir + station_fname, local_station_dir + basename + ".csv")
df_stations.head()

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,1912-01-01,1986-06-30,478,47.8413,8.8493,Aach,Baden-Württemberg
2,1951-01-01,2006-12-31,138,50.8066,6.0996,Aachen (Kläranlage),Nordrhein-Westfalen
3,1891-01-01,2011-03-31,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
4,1951-01-01,1979-10-31,243,50.7683,6.1207,Aachen-Brand,Nordrhein-Westfalen
6,1982-11-01,2020-02-28,455,48.8361,10.0598,Aalen-Unterrombach,Baden-Württemberg


### Select Stations Located in NRW from Station Description Dataframe

In [25]:
import datetime
d1 = datetime.datetime(2018, 12, 31) 
#df_stations['date_to']==d1
station_ids_selected = df_stations[df_stations['state'].str.contains("Nordrhein") & (df_stations['date_to']>=d1)].index
station_ids_selected


Int64Index([   79,   110,   187,   216,   389,   390,   488,   533,   554,
              603,
            ...
            14186, 14187, 15000, 15001, 15455, 15456, 15927, 15980, 16087,
            19125],
           dtype='int64', name='station_id', length=205)

In [39]:
# Create variable with TRUE if state is Nordrhein-Westfalen
#isNRW = dfStations['state'] == "Nordrhein-Westfalen"

# Create variable with TRUE if date_to is latest date (indicates operation up to now)
#isOperational = dfStations['date_to'] == dfStations.date_to.max() 

# select on both conditions
#dfNRW = dfStations[isNRW & isOperational]
#print("Number of stations in NRW: \n", dfNRW.count())
#dfNRW.head()

In [26]:
df_zips.head()

Unnamed: 0_level_0,name,ext,size,type
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,tageswerte_RR_00001_19120101_19860630_hist.zip,.zip,109677,-
2,tageswerte_RR_00002_19510101_20061231_hist.zip,.zip,82951,-
3,tageswerte_RR_00003_18910101_20110331_hist.zip,.zip,162410,-
4,tageswerte_RR_00004_19510101_19791031_hist.zip,.zip,45468,-
6,tageswerte_RR_00006_19821101_20181231_hist.zip,.zip,38084,-


### Download TS Data from FTP Server

Problem: Not all stations listed in the station description file are associated with a time series (zip file)! The stations in the description file and the set of stations whoch are TS data provided for (zip files) do not match perfectly.  

In [27]:
 
local_zip_list = []

for station_id in station_ids_selected:
    try:
        fname = df_zips["name"][station_id]
        print(fname)
        grabFile(ftp_dir + fname, local_ftp_ts_dir + fname)
        local_zip_list.append(fname)
    except:
        print("WARNING: TS file for key %d not found in FTP directory." % station_id)

tageswerte_RR_00079_19310101_20181231_hist.zip
tageswerte_RR_00110_19310101_20181231_hist.zip
tageswerte_RR_00187_19410101_20181231_hist.zip
tageswerte_RR_00216_19710101_20181231_hist.zip
tageswerte_RR_00389_19310101_20181231_hist.zip
tageswerte_RR_00390_19861201_20181231_hist.zip
tageswerte_RR_00488_19410101_20181231_hist.zip
tageswerte_RR_00533_19610101_20181231_hist.zip
tageswerte_RR_00554_19460101_20181231_hist.zip
tageswerte_RR_00603_19900101_20181231_hist.zip
tageswerte_RR_00613_19410101_20181231_hist.zip
tageswerte_RR_00617_19410101_20181231_hist.zip
tageswerte_RR_00644_19410101_20181231_hist.zip
tageswerte_RR_00796_19410101_20181231_hist.zip
tageswerte_RR_00871_19410101_20181231_hist.zip
tageswerte_RR_00902_19310101_20181231_hist.zip
tageswerte_RR_00934_20041001_20181231_hist.zip
tageswerte_RR_00989_19310101_20181231_hist.zip
tageswerte_RR_01024_19310101_20181231_hist.zip
tageswerte_RR_01046_19710101_20181231_hist.zip
tageswerte_RR_01078_19690701_20181231_hist.zip
tageswerte_RR

### Join (Merge) the Time Series Columns

https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd


In [28]:
def prec_ts_to_df(fname):
    
   # dateparse = lambda dates: [pd.datetime.strptime(str(d), '%Y%m%d%H') for d in dates]
    dateparse = lambda dates: [pd.datetime.strptime(str(d), '%Y%m%d') for d in dates]

    df = pd.read_csv(fname, delimiter=";", encoding="utf8", index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse, na_values = [-999.0, -999])

    #df = pd.read_csv(fname, delimiter=";", encoding="iso8859_2",\
    #             index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse)
    
    # https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd

    # Column headers: remove leading blanks (strip), replace " " with "_", and convert to lower case.
    df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
    df.index.name = df.index.name.strip().lower().replace(' ', '_').replace('(', '').replace(')', '')
    return(df)

In [29]:
from zipfile import ZipFile

In [33]:
def ts_merge():

    df = pd.DataFrame()
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
            # read the time series data from the file starting with "produkt"
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = prec_ts_to_df(myfile)
               # s = dftmp["r1"].rename(dftmp["stations_id"][0]).to_frame()
                s = dftmp["stations_id"].to_frame()
                # outer merge.
                df = pd.merge(df, s, left_index=True, right_index=True, how='outer')

    #df.index.names = ["year"]
    df.index.rename(name = "time", inplace = True)
    return(df)

In [34]:
df_merged_ts = ts_merge()

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00079_19310101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19310101_20181231_00079.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00110_19310101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19310101_20181231_00110.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00187_19410101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19410101_20181231_00187.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00216_19710101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19710101_20181231_00216.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00389_19310101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19310101_20181231_00389.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00390_19861201_20181231_hist.zip
Ex

In [26]:
df_merged_ts.head()

Unnamed: 0_level_0,216,389,390,554,555,603,613,617,644,796,...,7344,7374,7378,13669,13670,13671,13696,13700,13713,15000
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-05-17 00:00:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.3,0.0,0.0
2018-05-17 01:00:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2018-05-17 02:00:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2018-05-17 03:00:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2018-05-17 04:00:00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
df_merged_ts.to_csv(local_ts_merged_dir + "ts_merged.csv",sep=";")

In [40]:
def ts_append():
    # Very compact code.
    df = pd.DataFrame()
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
            # read the time series data from the file starting with "produkt"
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = prec_ts_to_df(myfile)
                dftmp = dftmp.merge(df_stations,how="inner",left_on="stations_id",right_on="station_id",right_index=True)
#                print(dftmp.head(5))
                df = df.append(dftmp)

    #df.index.names = ["year"]
    #df.index.rename(name = "time", inplace = True)
    return(df)

In [41]:
df_appended_ts = ts_append()

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00002_19510101_20061231_hist.zip
Extract product file: produkt_nieder_tag_19510101_20061231_00002.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00003_18910101_20110331_hist.zip
Extract product file: produkt_nieder_tag_18910101_20110331_00003.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00004_19510101_19791031_hist.zip
Extract product file: produkt_nieder_tag_19510101_19791031_00004.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00038_18910818_20061231_hist.zip
Extract product file: produkt_nieder_tag_18910818_20061231_00038.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00042_19710101_20091231_hist.zip
Extract product file: produkt_nieder_tag_19710101_20091231_00042.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00079_19310101_20181231_hist.zip
Ex

In [39]:
df_merged_ts.head()

Unnamed: 0_level_0,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,...,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1891-09-01,,,,,,,,,,,...,,,,,,,,,,
1891-09-02,,,,,,,,,,,...,,,,,,,,,,
1891-09-03,,,,,,,,,,,...,,,,,,,,,,
1891-09-04,,,,,,,,,,,...,,,,,,,,,,
1891-09-05,,,,,,,,,,,...,,,,,,,,,,


In [41]:
df_merged_ts.to_csv(local_ts_merged_dir + "ts_merged.csv",sep=";")

In [44]:
def ts_append():
    df = pd.DataFrame()
    i=0;
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
        
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = prec_ts_to_df(myfile)
               
                dftmp=dftmp.loc[(dftmp.index>='2015-06-10 00:00:00')&(dftmp.index<='2018-05-11 00:00:00')]
                dftmp = dftmp.merge(df_stations,how="inner",left_on="stations_id",right_on="station_id",right_index=True)

                df = df.append(dftmp)
        
  
    return(df)

In [45]:
df_appended_ts = ts_append()

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00079_19310101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19310101_20181231_00079.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00110_19310101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19310101_20181231_00110.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00187_19410101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19410101_20181231_00187.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00216_19710101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19710101_20181231_00216.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00389_19310101_20181231_hist.zip
Extract product file: produkt_nieder_tag_19310101_20181231_00389.txt

Zip archive: data/original/DWD//daily/more_precip/historical/tageswerte_RR_00390_19861201_20181231_hist.zip
Ex

In [46]:
df_appended_ts.head()

Unnamed: 0,stations_id,qn_6,rs,rsf,sh_tag,nsh_tag,eor,date_from,date_to,altitude,latitude,longitude,name,state
2015-06-10,79,9.0,0.0,0.0,0.0,0.0,eor,1931-01-01,2020-02-25,160,50.6718,7.0155,Alfter-Volmershoven,Nordrhein-Westfalen
2015-06-11,79,9.0,0.0,0.0,0.0,0.0,eor,1931-01-01,2020-02-25,160,50.6718,7.0155,Alfter-Volmershoven,Nordrhein-Westfalen
2015-06-12,79,9.0,11.0,6.0,0.0,0.0,eor,1931-01-01,2020-02-25,160,50.6718,7.0155,Alfter-Volmershoven,Nordrhein-Westfalen
2015-06-13,79,9.0,4.8,6.0,0.0,0.0,eor,1931-01-01,2020-02-25,160,50.6718,7.0155,Alfter-Volmershoven,Nordrhein-Westfalen
2015-06-14,79,9.0,0.0,0.0,0.0,0.0,eor,1931-01-01,2020-02-25,160,50.6718,7.0155,Alfter-Volmershoven,Nordrhein-Westfalen


In [47]:
df_appended_ts.to_csv(local_ts_appended_dir + "ts_appended.csv",sep=";")

In [55]:
import numpy as np

stations=[]
latitudes=[]
longitudes=[]
precipation_values=[]
years=[]

stations_ids=df_appended_ts['stations_id'].unique()
for st in stations_ids:
    print(st)
 
    rows2016 = df_appended_ts[df_appended_ts.index <= datetime.datetime(2016, 6, 10)]
    rows2016= rows2016[rows2016.stations_id==st]
    latitude=""
    longitude=""
    if (len(rows2016)>0):
        latitude=rows2016.iloc[0].latitude
        longitude=rows2016.iloc[0].longitude
    stations.append(st)
    latitudes.append(latitude)
    longitudes.append(longitude)
    years.append(2016)
    precipation_values.append(rows2016['rs'].sum())    
    
    rows2017 = df_appended_ts[df_appended_ts.index >= datetime.datetime(2016, 5, 26)]
    rows2017 = rows2017[rows2017.index <= datetime.datetime(2017, 5, 26)]   
    rows2017= rows2017[rows2017.stations_id==st]
    if (len(rows2017)>0 and (latitude=="")):
        latitude=rows2017.iloc[0].latitude
        longitude=rows2017.iloc[0].longitude
    stations.append(st)
    latitudes.append(latitude)
    longitudes.append(longitude)
    years.append(2017)
    precipation_values.append(rows2017['rs'].sum())
    
    
    rows2018 = df_appended_ts[df_appended_ts.index >= datetime.datetime(2017, 5, 11)]
    rows2018 = rows2018[rows2018.index <= datetime.datetime(2018, 5, 11)]   
    rows2018= rows2018[rows2018.stations_id==st]
    if (len(rows2018)>0 and (latitude=="")):
        latitude=rows2018.iloc[0].latitude
        longitude=rows2018.iloc[0].longitude
    stations.append(st)
    latitudes.append(latitude)
    longitudes.append(longitude)
    years.append(2018)
    precipation_values.append(rows2018['rs'].sum())

precipitaion = {'stationids':stations,
    'year':  years,
    'cp': precipation_values,
    'lat':latitudes,
    'long':longitudes          
  }
precipitaion_data_frame = pd.DataFrame(precipitaion, columns = ['stationids', 'year', 'cp','lat','long'])
precipitaion_data_frame.to_csv(local_ts_appended_dir + "precipitation.csv",sep=";")


79
110
187
216
389
390
488
533
554
603
613
617
644
796
871
902
934
989
1024
1046
1078
1232
1241
1246
1277
1283
1298
1300
1303
1327
1590
1595
1673
1766
1891
1999
2027
2099
2104
2110
2117
2135
2254
2258
2332
2358
2419
2473
2483
2497
2505
2629
2667
2744
2802
2810
2936
2947
2968
2970
2976
2999
3020
3028
3031
3081
3098
3201
3202
3215
3264
3316
3321
3339
3350
3407
3465
3499
3540
3591
3610
3656
3767
3795
3798
3913
3952
3980
4020
4063
4127
4150
4170
4313
4368
4371
4400
4488
4667
4708
4741
4810
4849
4852
4962
5064
5213
5271
5347
5360
5468
5480
5483
5486
5502
5513
5579
5594
5619
5699
5717
5733
6197
6264
6276
6313
6337
7106
7330
7344
7374
7378
13669
13670
13671
13696
13700
13713
15000
15001
15455
15456
15927
15980


In [56]:
! jupyter nbconvert --to html gi0601_DWD_Stations_and_TS_for_TM_V001-Copy1.ipynb

[NbConvertApp] Converting notebook gi0601_DWD_Stations_and_TS_for_TM_V001-Copy1.ipynb to html
[NbConvertApp] Writing 543650 bytes to gi0601_DWD_Stations_and_TS_for_TM_V001-Copy1.html
