# Read DWD CDC Time Series, Merge with Station Description and Append 

The main idea behind this activity is to reformat and merge time series (here we use hourly precipitation) from the DWD Climate Data Center in such a way that it can be used with the **QGIS time manager extension**. 

This extension allows to filter an attribute table of a vector layer (e.g. points representing precipitation stations plus precipitation data) with a time stamp column. The extension limits the attribute table to the records matching the particular time stamp provided by the time manager extension (e.g. by the user moving the time slider). This selected subset of the attribute table is then used to change the sympology of the vector layer according to the variable of interest (e.g. precipitation rate).

The QGIS time manager extension approach is a bit brute force, because each individual measurement at a station at a given time is one feature (row in the table), i.e. a time series at station X with hourly resolution for a day (24 values) entails 24 different features with the same station id and the corresponding coordinates but different times. As of now this 1:n relationship can only be realized by importing a CSV file with the according structure. 

(At least I was not able to generate the required view on a 1:n relationship by merging a point vector layer with precipitation station locations and an imported CSV time series table.)

The final data format is a concatenation of time series together with geographic location in 2D (e.g. lat, lon). The required data format looks principly like this:


| station_id |        name        |   lat   |   lon  |        meas_time       | prec_rate |
|:----------:|:------------------:|:-------:|:------:|:----------------------:|:---------:|
|        ... | ...                |     ... |    ... |                    ... |       ... |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T08:00:00UTC |       1.5 |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T09:00:00UTC |       1.7 |
|       1595 | Gelsenkirchen-Buer | 51.5762 | 7.0652 | 2018-12-07T10:00:00UTC |       0.1 |
|        ... | ...                |     ... |    ... |                    ... |       ... |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T08:00:00UTC |       0.8 |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T09:00:00UTC |       0.4 |
|      13670 | Duisburg-Baerl     | 51.5088 | 6.7018 | 2018-12-07T10:00:00UTC |       0.0 |
|        ... | ...                |     ... |    ... |                    ... |       ... |


(Table generated with https://www.tablesgenerator.com/markdown_tables)

To achieve this the precipitation time series (station_id, meas_time, prec_rate) have to be merged with the station metadata (station_id, lat, lon) coming from the a CSV file generated in an earlier activity. We use Pandas to read, join and append the data to generate the final CSV file to be imported as a point layer to QGIS. 

This final data format is far from being optimal because of large size and highly redundant information. This is a challenge for QGIS which loses responsiveness with large data. To jsut show the principle it is advisable to limit to size of the problem. 

The following filters (selection criteria) are applied:

  * Precipitation stations in NRW only (approx. 127 stations) 
  * Hourly precipitation data
  * Time interval from 2018-12-01 to last date in precipitation data set 
  
Still: 40 days * 24 hrs / day * 127 stations = 121920 records leading to 121920 features in a point layer in QGIS. 

In fact, the resulting number of records is arround 91000. The reason might be that not all stations in the station list have time series. This has to be checked carefully.

## FTP Connection

### Connection Parameters

In [1]:
server = "opendata.dwd.de"
user   = "anonymous"
passwd = ""

### FTP Directory Definition and Station Description Filename Pattern

In [2]:
# The topic of interest.
#topic_dir = "/hourly/precipitation/recent/"
#topic_dir = "/daily/more_precip/historical/"
topic_dir = "/daily/soil_temperature/historical/"

# This is the search pattern common to ALL station description file names 
station_desc_pattern = "_Beschreibung_Stationen.txt"

# Below this directory tree node all climate data are stored.
#ftp_climate_data_dir = "/climate_environment/CDC/observations_germany/climate/"
#temprature 
ftp_climate_data_dir = "/climate_environment/CDC/observations_germany/climate"

ftp_dir =  ftp_climate_data_dir + topic_dir

### Local Directories

In [3]:
local_ftp_dir         = "data/original/DWD/"      # Local directory to store local ftp data copies, the local data source or input data. 
local_ftp_station_dir = local_ftp_dir + topic_dir # Local directory where local station info is located
local_ftp_ts_dir      = local_ftp_dir + topic_dir # Local directory where time series downloaded from ftp are located

local_generated_dir   = "data/generated/DWD/" # The generated of derived data in contrast to local_ftp_dir
local_station_dir     = local_generated_dir + topic_dir # Derived station data, i.e. the CSV file
local_ts_merged_dir   = local_generated_dir + topic_dir # Parallel merged time series, wide data frame with one TS per column
local_ts_appended_dir = local_generated_dir + topic_dir # Serially appended time series, long data frame for QGIS TimeManager Plugin

In [4]:
print(local_ftp_dir)
print(local_ftp_station_dir)
print(local_ftp_ts_dir)
print()
print(local_generated_dir)
print(local_station_dir)
print(local_ts_merged_dir)
print(local_ts_appended_dir)

data/original/DWD/
data/original/DWD//daily/soil_temperature/historical/
data/original/DWD//daily/soil_temperature/historical/

data/generated/DWD/
data/generated/DWD//daily/soil_temperature/historical/
data/generated/DWD//daily/soil_temperature/historical/
data/generated/DWD//daily/soil_temperature/historical/


In [6]:
import os
os.makedirs(local_ftp_dir,exist_ok = True) # it does not complain if the dir already exists.
os.makedirs(local_ftp_station_dir,exist_ok = True)
os.makedirs(local_ftp_ts_dir,exist_ok = True)

os.makedirs(local_generated_dir,exist_ok = True)
os.makedirs(local_station_dir,exist_ok = True)
os.makedirs(local_ts_merged_dir,exist_ok = True)
os.makedirs(local_ts_appended_dir,exist_ok = True)

### FTP Connect

In [7]:
import ftplib
ftp = ftplib.FTP(server)
res = ftp.login(user=user, passwd = passwd)
print(res)

230 Login successful.


In [8]:
ret = ftp.cwd(".")

In [8]:
#ftp.quit()

### FTP Grab File Function

In [9]:
def grabFile(ftpfullname,localfullname):
    try:
        ret = ftp.cwd(".") # A dummy action to chack the connection and to provoke an exception if necessary.
        localfile = open(localfullname, 'wb')
        ftp.retrbinary('RETR ' + ftpfullname, localfile.write, 1024)
        localfile.close()
    
    except ftplib.error_perm:
        print("FTP ERROR. Operation not permitted. File not found?")

    except ftplib.error_temp:
        print("FTP ERROR. Timeout.")

    except ConnectionAbortedError:
        print("FTP ERROR. Connection aborted.")



### Generate Pandas Dataframe from FTP Directory Listing

In [11]:
import pandas as pd
import os

def gen_df_from_ftp_dir_listing(ftp, ftpdir):
    lines = []
    flist = []
    try:    
        res = ftp.retrlines("LIST "+ftpdir, lines.append)
    except:
        print("Error: ftp.retrlines() failed. ftp timeout? Reconnect!")
        return
        
    if len(lines) == 0:
        print("Error: ftp dir is empty")
        return
    
    for line in lines:
#        print(line)
        [ftype, fsize, fname] = [line[0:1], int(line[31:42]), line[56:]]

        
        fext = os.path.splitext(fname)[-1]
        
        if fext == ".zip":
            station_id = int(fname.split("_")[2])
        else:
            station_id = -1 
        
        flist.append([station_id, fname, fext, fsize, ftype])
        
        

    df_ftpdir = pd.DataFrame(flist,columns=["station_id", "name", "ext", "size", "type"])
    return(df_ftpdir)

In [12]:
df_ftpdir = gen_df_from_ftp_dir_listing(ftp, ftp_dir)

In [13]:
df_ftpdir.head(10)

Unnamed: 0,station_id,name,ext,size,type
0,-1,BESCHREIBUNG_obsgermany_climate_daily_soil_tem...,.pdf,69925,-
1,-1,DESCRIPTION_obsgermany_climate_daily_soil_temp...,.pdf,69373,-
2,-1,EB_Tageswerte_Beschreibung_Stationen.txt,.txt,99393,-
3,3,tageswerte_EB_00003_19510101_20110331_hist.zip,.zip,218207,-
4,44,tageswerte_EB_00044_19810101_20181231_hist.zip,.zip,136119,-
5,52,tageswerte_EB_00052_19760101_20011231_hist.zip,.zip,98089,-
6,71,tageswerte_EB_00071_19880701_20031231_hist.zip,.zip,58069,-
7,72,tageswerte_EB_00072_19870101_19950531_hist.zip,.zip,35119,-
8,78,tageswerte_EB_00078_19810101_20181231_hist.zip,.zip,133354,-
9,91,tageswerte_EB_00091_19920501_20181231_hist.zip,.zip,84507,-


### Dataframe with TS Zip Files

In [14]:
#df_ftpdir["ext"]==".zip"
df_zips = df_ftpdir[df_ftpdir["ext"]==".zip"]
df_zips.set_index("station_id", inplace = True)
df_zips.head(10)

Unnamed: 0_level_0,name,ext,size,type
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,tageswerte_EB_00003_19510101_20110331_hist.zip,.zip,218207,-
44,tageswerte_EB_00044_19810101_20181231_hist.zip,.zip,136119,-
52,tageswerte_EB_00052_19760101_20011231_hist.zip,.zip,98089,-
71,tageswerte_EB_00071_19880701_20031231_hist.zip,.zip,58069,-
72,tageswerte_EB_00072_19870101_19950531_hist.zip,.zip,35119,-
78,tageswerte_EB_00078_19810101_20181231_hist.zip,.zip,133354,-
91,tageswerte_EB_00091_19920501_20181231_hist.zip,.zip,84507,-
125,tageswerte_EB_00125_20010403_20181231_hist.zip,.zip,23630,-
129,tageswerte_EB_00129_19960701_20061231_hist.zip,.zip,38445,-
131,tageswerte_EB_00131_20040914_20181231_hist.zip,.zip,50199,-


### Download the Station Description File

In [15]:
station_fname = df_ftpdir[df_ftpdir['name'].str.contains(station_desc_pattern)]["name"].values[0]
print(station_fname)

# ALternative
#station_fname2 = df_ftpdir[df_ftpdir["name"].str.match("^.*Beschreibung_Stationen.*txt$")]["name"].values[0]
#print(station_fname2)

EB_Tageswerte_Beschreibung_Stationen.txt


In [16]:
print("grabFile: ")
print("From: " + ftp_dir + station_fname)
print("To:   " + local_ftp_station_dir + station_fname)
grabFile(ftp_dir + station_fname, local_ftp_station_dir + station_fname)

grabFile: 
From: /climate_environment/CDC/observations_germany/climate/daily/soil_temperature/historical/EB_Tageswerte_Beschreibung_Stationen.txt
To:   data/original/DWD//daily/soil_temperature/historical/EB_Tageswerte_Beschreibung_Stationen.txt


In [17]:
# extract column names. They are in German (de)
# We have to use codecs because of difficulties with character encoding (German Umlaute)
import codecs

def station_desc_txt_to_csv(txtfile, csvfile):
    file = codecs.open(txtfile,"r","utf-8")
    r = file.readline()
    file.close()
    colnames_de = r.split()
    colnames_de
    
    translate = \
    {'Stations_id':'station_id',
     'von_datum':'date_from',
     'bis_datum':'date_to',
     'Stationshoehe':'altitude',
     'geoBreite': 'latitude',
     'geoLaenge': 'longitude',
     'Stationsname':'name',
     'Bundesland':'state'}
    
    colnames_en = [translate[h] for h in colnames_de]
    
    # Skip the first two rows and set the column names.
    df = pd.read_fwf(txtfile,skiprows=2,names=colnames_en, parse_dates=["date_from","date_to"],index_col = 0)
    
    # write csv
    df.to_csv(csvfile, sep = ";")
    return(df)

In [18]:
basename = os.path.splitext(station_fname)[0]
df_stations = station_desc_txt_to_csv(local_ftp_station_dir + station_fname, local_station_dir + basename + ".csv")
df_stations.head()

Unnamed: 0_level_0,date_from,date_to,altitude,latitude,longitude,name,state
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3,1951-01-01,2011-03-31,202,50.7827,6.0941,Aachen,Nordrhein-Westfalen
44,1981-01-01,2020-02-28,44,52.9336,8.237,Großenkneten,Niedersachsen
52,1976-01-01,2001-12-31,46,53.6623,10.199,Ahrensburg-Wulfsdorf,Schleswig-Holstein
71,1988-07-01,2003-12-31,759,48.2156,8.9784,Albstadt-Badkap,Baden-Württemberg
72,1987-01-01,1995-05-31,794,48.2766,9.0001,Albstadt-Onstmettingen,Baden-Württemberg


### Select Stations Located in NRW from Station Description Dataframe

In [19]:
import datetime
d1 = datetime.datetime(2018, 12, 31) 
#df_stations['date_to']==d1
station_ids_selected = df_stations[df_stations['state'].str.contains("Nordrhein") & (df_stations['date_to']>=d1)].index
station_ids_selected

Int64Index([  603,   617,  1078,  1246,  1303,  1590,  1766,  2483,  2497,
             2667,  2947,  3028,  3031,  3098,  3540,  3623,  4063,  4127,
             4371,  5347,  5480,  6197,  7106,  7330,  7374, 15000],
           dtype='int64', name='station_id')

In [101]:
# Create variable with TRUE if state is Nordrhein-Westfalen
#isNRW = dfStations['state'] == "Nordrhein-Westfalen"

# Create variable with TRUE if date_to is latest date (indicates operation up to now)
#isOperational = dfStations['date_to'] == dfStations.date_to.max() 

# select on both conditions
#dfNRW = dfStations[isNRW & isOperational]
#print("Number of stations in NRW: \n", dfNRW.count())
#dfNRW.head()

In [20]:
df_zips.head()

Unnamed: 0_level_0,name,ext,size,type
station_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3,tageswerte_EB_00003_19510101_20110331_hist.zip,.zip,218207,-
44,tageswerte_EB_00044_19810101_20181231_hist.zip,.zip,136119,-
52,tageswerte_EB_00052_19760101_20011231_hist.zip,.zip,98089,-
71,tageswerte_EB_00071_19880701_20031231_hist.zip,.zip,58069,-
72,tageswerte_EB_00072_19870101_19950531_hist.zip,.zip,35119,-


### Download TS Data from FTP Server

Problem: Not all stations listed in the station description file are associated with a time series (zip file)! The stations in the description file and the set of stations whoch are TS data provided for (zip files) do not match perfectly.  

In [21]:
# Add the names of the zip files only to a list. 
local_zip_list = []

for station_id in station_ids_selected:
    try:
        fname = df_zips["name"][station_id]
        print(fname)
        grabFile(ftp_dir + fname, local_ftp_ts_dir + fname)
        local_zip_list.append(fname)
    except:
        print("WARNING: TS file for key %d not found in FTP directory." % station_id)

tageswerte_EB_00603_20000701_20181231_hist.zip
tageswerte_EB_00617_20030122_20181231_hist.zip
tageswerte_EB_01078_19860101_20181231_hist.zip
tageswerte_EB_01246_20150801_20181231_hist.zip
tageswerte_EB_01303_19510101_20181231_hist.zip
tageswerte_EB_01590_19810101_20181231_hist.zip
tageswerte_EB_01766_19891001_20181231_hist.zip
tageswerte_EB_02483_19810101_20181231_hist.zip
tageswerte_EB_02497_20030604_20181231_hist.zip
tageswerte_EB_02667_19610101_20181231_hist.zip
tageswerte_EB_02947_20061001_20181231_hist.zip
tageswerte_EB_03028_19810101_20181231_hist.zip
tageswerte_EB_03031_19801201_20181231_hist.zip
tageswerte_EB_03098_19931222_20181231_hist.zip
tageswerte_EB_03540_20040818_20181231_hist.zip
tageswerte_EB_03623_20010706_20181231_hist.zip
tageswerte_EB_04063_20021022_20181231_hist.zip
tageswerte_EB_04127_20041213_20181231_hist.zip
tageswerte_EB_04371_19810101_20181231_hist.zip
tageswerte_EB_05347_19880201_20181231_hist.zip
tageswerte_EB_05480_20030910_20181231_hist.zip
tageswerte_EB

### Join (Merge) the Time Series Columns

https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd


In [22]:
def prec_ts_to_df(fname):
    
   # dateparse = lambda dates: [pd.datetime.strptime(str(d), '%Y%m%d%H') for d in dates]
    dateparse = lambda dates: [pd.datetime.strptime(str(d), '%Y%m%d') for d in dates]

    df = pd.read_csv(fname, delimiter=";", encoding="utf8", index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse, na_values = [-999.0, -999])

    #df = pd.read_csv(fname, delimiter=";", encoding="iso8859_2",\
    #             index_col="MESS_DATUM", parse_dates = ["MESS_DATUM"], date_parser = dateparse)
    
    # https://medium.com/@chaimgluck1/working-with-pandas-fixing-messy-column-names-42a54a6659cd

    # Column headers: remove leading blanks (strip), replace " " with "_", and convert to lower case.
    df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
    df.index.name = df.index.name.strip().lower().replace(' ', '_').replace('(', '').replace(')', '')
    return(df)

In [23]:
from zipfile import ZipFile

In [24]:
def ts_merge():
   
    df = pd.DataFrame()
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
            
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = prec_ts_to_df(myfile)
              
                s = dftmp["stations_id"].to_frame()
                # outer merge.
                df = pd.merge(df, s, left_index=True, right_index=True, how='outer')

    #df.index.names = ["year"]
    df.index.rename(name = "time", inplace = True)
    return(df)

In [25]:
df_merged_ts = ts_merge()

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00603_20000701_20181231_hist.zip
Extract product file: produkt_erdbo_tag_20000701_20181231_00603.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00617_20030122_20181231_hist.zip
Extract product file: produkt_erdbo_tag_20030122_20181231_00617.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01078_19860101_20181231_hist.zip
Extract product file: produkt_erdbo_tag_19860101_20181231_01078.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01246_20150801_20181231_hist.zip
Extract product file: produkt_erdbo_tag_20150801_20181231_01246.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01303_19510101_20181231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20181231_01303.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01590_1981

In [26]:
df_merged_ts.head()

Unnamed: 0_level_0,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,...,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y,stations_id_x,stations_id_y
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1951-01-01,,,,,1303.0,,,,,,...,,,,,,,,,,
1951-01-02,,,,,1303.0,,,,,,...,,,,,,,,,,
1951-01-03,,,,,1303.0,,,,,,...,,,,,,,,,,
1951-01-04,,,,,1303.0,,,,,,...,,,,,,,,,,
1951-01-05,,,,,1303.0,,,,,,...,,,,,,,,,,


In [27]:
df_merged_ts.to_csv(local_ts_merged_dir + "ts_merged.csv",sep=";")

In [29]:
def ts_append():
    
    df = pd.DataFrame()
    i=0;
    for elt in local_zip_list:
        ffname = local_ftp_ts_dir + elt
        print("Zip archive: " + ffname)
        with ZipFile(ffname) as myzip:
            prodfilename = [elt for elt in myzip.namelist() if elt.split("_")[0]=="produkt"][0] 
            print("Extract product file: %s" % prodfilename)
            print()
            with myzip.open(prodfilename) as myfile:
                dftmp = prec_ts_to_df(myfile)
                dftmp=dftmp.loc[(dftmp.index>='2015-06-10 00:00:00')&(dftmp.index<='2018-05-11 00:00:00')]
                dftmp = dftmp.merge(df_stations,how="inner",left_on="stations_id",right_on="station_id",right_index=True)

                df = df.append(dftmp)
        
   
    return(df)

In [30]:
df_appended_ts = ts_append()

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00603_20000701_20181231_hist.zip
Extract product file: produkt_erdbo_tag_20000701_20181231_00603.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_00617_20030122_20181231_hist.zip
Extract product file: produkt_erdbo_tag_20030122_20181231_00617.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01078_19860101_20181231_hist.zip
Extract product file: produkt_erdbo_tag_19860101_20181231_01078.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01246_20150801_20181231_hist.zip
Extract product file: produkt_erdbo_tag_20150801_20181231_01246.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01303_19510101_20181231_hist.zip
Extract product file: produkt_erdbo_tag_19510101_20181231_01303.txt

Zip archive: data/original/DWD//daily/soil_temperature/historical/tageswerte_EB_01590_1981

In [31]:
df_appended_ts.head()

Unnamed: 0_level_0,stations_id,qn_2,v_te002m,v_te005m,v_te010m,v_te020m,v_te050m,eor,date_from,date_to,altitude,latitude,longitude,name,state
mess_datum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2015-06-10,603,3,-999.0,20.7,20.0,18.8,17.1,eor,2000-07-01,2020-02-23,144,50.729,7.2047,Königswinter-Heiderhof,Nordrhein-Westfalen
2015-06-11,603,3,-999.0,22.7,21.9,20.4,17.6,eor,2000-07-01,2020-02-23,144,50.729,7.2047,Königswinter-Heiderhof,Nordrhein-Westfalen
2015-06-12,603,3,-999.0,23.3,22.6,21.3,18.4,eor,2000-07-01,2020-02-23,144,50.729,7.2047,Königswinter-Heiderhof,Nordrhein-Westfalen
2015-06-13,603,3,-999.0,20.6,20.7,20.5,18.8,eor,2000-07-01,2020-02-23,144,50.729,7.2047,Königswinter-Heiderhof,Nordrhein-Westfalen
2015-06-14,603,3,-999.0,20.7,20.4,19.8,18.4,eor,2000-07-01,2020-02-23,144,50.729,7.2047,Königswinter-Heiderhof,Nordrhein-Westfalen


In [33]:
df_appended_ts.to_csv(local_ts_appended_dir + "ts_appended.csv",sep=";")

In [39]:
import numpy as np

stations=[]
latitudes=[]
longitudes=[]
temperature_values=[]
years=[]


stations_ids=df_appended_ts['stations_id'].unique()
for st in stations_ids:
    print(st)

    rows2016 = df_appended_ts[df_appended_ts.index <= datetime.datetime(2016, 6, 10)]
    rows2016= rows2016[rows2016.stations_id==st]
    latitude=""
    longitude=""
    if (len(rows2016)>0):
        latitude=rows2016.iloc[0].latitude
        longitude=rows2016.iloc[0].longitude
    stations.append(st)
    latitudes.append(latitude)
    longitudes.append(longitude)
    years.append(2016)
    temperature_values.append(rows2016['v_te020m'].mean())
    
    rows2017 = df_appended_ts[df_appended_ts.index >= datetime.datetime(2016, 5, 26)]
    rows2017 = rows2017[rows2017.index <= datetime.datetime(2017, 5, 26)]   
    rows2017= rows2017[rows2017.stations_id==st]
    if (len(rows2017)>0 and (latitude=="")):
        latitude=rows2017.iloc[0].latitude
        longitude=rows2017.iloc[0].longitude
    stations.append(st)
    latitudes.append(latitude)
    longitudes.append(longitude)
    years.append(2017)
    temperature_values.append(rows2017['v_te020m'].mean())
    
    
    rows2018 = df_appended_ts[df_appended_ts.index >= datetime.datetime(2017, 5, 11)]
    rows2018 = rows2018[rows2018.index <= datetime.datetime(2018, 5, 11)]   
    rows2018= rows2018[rows2018.stations_id==st]
    if (len(rows2018)>0 and (latitude=="")):
        latitude=rows2018.iloc[0].latitude
        longitude=rows2018.iloc[0].longitude
    stations.append(st)
    latitudes.append(latitude)
    longitudes.append(longitude)
    years.append(2018)
    temperature_values.append(rows2018['v_te020m'].mean())



    temperature = {'stationids':stations,
        'year':  years,
        'at': temperature_values,
        'lat':latitudes,
        'long':longitudes 
                
        }


temperature_data_frame = pd.DataFrame(temperature, columns = ['stationids', 'year', 'at','lat','long'])
temperature_data_frame.to_csv(local_ts_appended_dir + "temperature.csv",sep=";")

603
617
1078
1246
1303
1590
1766
2483
2497
2667
2947
3028
3031
3098
3540
3623
4063
4127
4371
5347
5480
6197
7106
7330
7374
15000


In [None]:
! jupyter nbconvert --to html your_notebook_name.ipynb.