 # Downloading the data

First, a data directory is created on your local computer. Then, download parameters for each data source are defined, including the URL. These parameters are then turned into a YAML-string. Finally, the download is executed one by one. If all data need to be downloaded, this usually takes several hours.

# Table of Contents
* [1. Settings](#1.-Settings)
	* [1.1 Libraries](#1.1-Libraries)
	* [1.2 Creating a data directory](#1.2-Creating-a-data-directory)
* [2. Parameters for individual data sources](#2.-Parameters-for-individual-data-sources)
	* [2.1 ENTSO-E](#2.1-ENTSO-E)
	* [2.2 '50Hertz](#2.2-'50Hertz)
	* [2.3 Amprion](#2.3-Amprion)
	* [2.4 TransnetBW](#2.4-TransnetBW)
	* [2.5 TenneT](#2.5-TenneT)
	* [2.6 Creating YAML strings](#2.6-Creating-YAML-strings)
* [3. Downloading files one by one](#3.-Downloading-files-one-by-one)


# 1. Settings

## 1.1 Libraries

Loading some python libraries.

In [1]:
from datetime import datetime , date
import yaml
import requests
import os
import pandas as pd
import logging

Set up a log.

In [2]:
logger = logging.getLogger('log')
logger.setLevel('INFO')

## 1.2 Creating a data directory

This section creates a folder "downloadpath" inside the notebook's directory on the user's local computer for the downloaded data. The folder is labelled with a time stamp.

In [3]:
downloadpath = 'downloads2'
os.makedirs(downloadpath, exist_ok=True)
archivepath = os.path.join(
    downloadpath, 'archive-' + datetime.now().strftime('%Y-%m-%d')
    )

Do you want to save a copy of each downloaded file under the original filename? If so, Set ARCHIVE = True, if not, set ARCHIVE = False. The latter has the advantage that it's considerably faster if some files have already been downladed during a previous run, whereas the former will download every file again, because otherwise it's not possible to compare the filenames.

# 2. Parameters for individual data sources

This section contains a python dictionary for each download source with input parameters needed to generate the URLs for the data.

## 2.1 ENTSO-E

In [4]:
entso = """
ENTSO-E: 
    Data_Portal: 
        url_template: https://www.entsoe.eu/fileadmin/template/other/statistical_database/excel.php
        url_params_template:
            pid: '136'
            opt_period: '0'
            send: send
            opt_Response: '99'
            dataindx: '0'
            opt_Month: '{u_start.month}'
            opt_Year: '{u_start.year}'
        frequency: M
        start: 2006-01-01
        end: recent
        filetype: xls
"""

## 2.2 '50Hertz

In [5]:
hertz = """
50Hertz: 
    wind: 
        url_template: http://ws.50hertz.com/web01/api/WindPowerForecast/DownloadFile
        url_params_template:
            callback: '?'
            fileName: '{u_start:%Y}.csv'
        frequency: A
        start: 2005-01-01
        end: recent
        filetype: csv
    pv: 
        url_template: http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile
        url_params_template:
            callback: '?'
            fileName: '{u_start:%Y}.csv'
        frequency: A
        start: 2012-01-01
        end: recent
        filetype: csv
"""

## 2.3 Amprion

In [6]:
amprion = """
Amprion:
    wind: 
        url_template: http://amprion.de/applications/applicationfiles/winddaten2.php
        url_params_template:
            mode: download
            format: csv
            start: '{u_start.day}.{u_start.month}.{u_start.year}'
            end: '{u_end.day}.{u_end.month}.{u_end.year}' # dates must not be zero-padded
        frequency: complete
        start: 2008-01-04
        end: recent
        filetype: csv
    pv: 
        url_template: http://amprion.de/applications/applicationfiles/PV_einspeisung.php
        url_params_template:
            mode: download
            format: csv
            start: '{u_start.day}.{u_start.month}.{u_start.year}'
            end: '{u_end.day}.{u_end.month}.{u_end.year}' # dates must not be zero-padded        
        frequency: complete
        start: 2010-01-07
        end: recent
        filetype: csv
"""

## 2.4 TransnetBW

In [7]:
transnetbw = """
TransnetBW: 
    wind: 
        url_template: https://www.transnetbw.de/de/kennzahlen/erneuerbare-energien/windenergie
        url_params_template:
            app: wind
            activeTab: csv
            view: '1'
            download: 'true'
            selectMonatDownload: '{u_transnetbw}'
        frequency: M
        start: 2010-01-01
        end: recent
        filetype: csv
    pv: 
        url_template: https://www.transnetbw.de/de/kennzahlen/erneuerbare-energien/fotovoltaik
        url_params_template:
            app: wind
            activeTab: csv
            view: '1'
            download: 'true'
            selectMonatDownload: '{u_transnetbw}'
        frequency: M
        start: 2011-01-01
        end: recent
        filetype: csv
"""

## 2.5 TenneT

In [8]:
tennet = """
TenneT: 
    wind: 
        url_template: http://www.tennettso.de/site/de/phpbridge
        url_params_template:
            commandpath: Tatsaechliche_und_prognostizierte_Windenergieeinspeisung/monthDataSheetCsv.php
            contenttype: text/x-csv
            querystring: monat={u_start:%Y-%m}
        frequency: M
        start: 2006-01-01
        end: recent
        filetype: csv        
    pv: 
        url_template: http://www.tennettso.de/site/de/phpbridge
        url_params_template:
            commandpath: Tatsaechliche_und_prognostizierte_Solarenergieeinspeisung/monthDataSheetCsv.php
            sub: total
            contenttype: text/x-csv
            querystring: monat={u_start:%Y-%m}
        frequency: M
        start: 2010-01-01
        end: recent
        filetype: csv  
"""

## 2.6 Creating YAML strings

Loading the parameters for the data sources we wish to include into a [YAML](https://en.wikipedia.org/wiki/YAML)-string.

In [9]:
conf = yaml.load(hertz + amprion + tennet + transnetbw + entso)

# 3. Downloading files one by one

In the following we iterate over the sources and technology (wind/solar) entries specified above and download the data for a the period given in the parameters. Each file is  saved under it's original filename. Note that the original file names are often not self-explanatory (called "data" or "January"). The files content is revealed by its place in the directory structure.

In [10]:
def download(session, source, tech, s, e, **p):
    """construct URLs from template and parameters, download, and save."""
    logger.info(
        'Proceed to download: {} {} {:%Y-%m-%d}_{:%Y-%m-%d}'.format(
            source, tech, s, e
            )
        )
    # Get number of months between now and s (required for TransnetBW).
    count = datetime.now().month - s.month + (datetime.now().year - s.year)*12
    
    # Create the parameters dict containing timespan info to be pasted with url
    url_params = {}
    for key, value in p['url_params_template'].items():
        url_params[key] = value.format(
            u_start = s,
            u_end = e,
            u_transnetbw = count
            )

    # Each file will be saved in a folder of its own, this allows us to preserve
    # the original filename when saving to disk.  
    unique_path = os.path.join(
        downloadpath,
        source,
        tech,
        s.strftime('%Y-%m-%d') + '_' + e.strftime('%Y-%m-%d')
        )
    os.makedirs(unique_path, exist_ok=True)
    
    # Attempt the download if there is no file yet.  
    count_files =  len(os.listdir(unique_path))   
    if count_files == 0: 
        resp = session.get(p['url_template'], params=url_params)                
        original_filename = resp.headers['content-disposition'].split(
            'filename=')[-1].replace('"','').replace(';','')              
        logger.info(
            'Downloading from URL: %s Original filename: %s',
            resp.url, original_filename
            )
        filepath = os.path.join(unique_path, original_filename)
        with open(filepath, 'wb') as output_file:
            for chunk in resp.iter_content(1024):
                output_file.write(chunk)                
    elif count_files == 1:
        logger.info('There is already a file: %s', os.listdir(unique_path)[0])
    else:
        logger.info('There must not be more than one file in: %s. Please check ', unique_path)        
        
    return


for source, t in conf.items():
    for tech, p in t.items():
        session = requests.session()
#        p['start'] = date(2015,1,1) # uncomment this to set a different start
        if p['end'] == 'recent':
            p['end'] = date(2015,12,31)

        if p['frequency'] == 'complete':
            download(session, source, tech, p['start'], p['end'], **p)            
        else:
            # The files on the servers usually contain the data for subperiods
            # of some regular length (i.e. months or yearsavailable 
            # Create lists of start- and enddates of periods represented in
            # individual files to be downloaded.  
            starts = pd.date_range(
                start=p['start'], end=p['end'], freq=p['frequency']+'S')
            ends = pd.date_range(
                start=p['start'], end=p['end'], freq=p['frequency'])
            for start, end in zip(starts, ends):
                download(session, source, tech, start, end, **p)                

INFO:log:Proceed to download: Amprion wind 2008-01-04_2015-12-31
INFO:log:There is already a file: winddaten2_04.01.2008_31.12.2015.csv
INFO:log:Proceed to download: Amprion pv 2010-01-07_2015-12-31
INFO:log:There is already a file: Photovoltaik_07.01.2010_31.12.2015.csv
INFO:log:Proceed to download: ENTSO-E Data_Portal 2006-01-01_2006-01-31
INFO:log:There is already a file: Statistics.xls
INFO:log:Proceed to download: ENTSO-E Data_Portal 2006-02-01_2006-02-28
INFO:log:There is already a file: Statistics.xls
INFO:log:Proceed to download: ENTSO-E Data_Portal 2006-03-01_2006-03-31
INFO:log:There is already a file: Statistics.xls
INFO:log:Proceed to download: ENTSO-E Data_Portal 2006-04-01_2006-04-30
INFO:log:There is already a file: Statistics.xls
INFO:log:Proceed to download: ENTSO-E Data_Portal 2006-05-01_2006-05-31
INFO:log:There is already a file: Statistics.xls
INFO:log:Proceed to download: ENTSO-E Data_Portal 2006-06-01_2006-06-30
INFO:log:There is already a file: Statistics.xls
IN