Part of the project [Open Power System Data](http://open-power-system-data.org/).

# Table of Contents
* [1. Open Power System Data: time series](#1.-Open-Power-System-Data:-time-series)
* [2. Download](#2.-Download)
* [3. Processing](#3.-Processing)
* [4. What is in the output files?](#4.-What-is-in-the-output-files?)
	* [4.1 Data sources](#4.1-Data-sources)
	* [4.2 Data documentation and interpretation](#4.2-Data-documentation-and-interpretation)


# 1. Open Power System Data: time series

This is the first of 4 Jupyter notebook python scripts that downloads and processes time-series data from European power systems. The notebooks have been used to create the [timeseries-datapackage](http://data.open-power-system-data.org/datapackage_timeseries/) that is available on the [Open Power System Data plattform](http://data.open-power-system-data.org/). A Jupyter notebook is a file that combines executable programming code with visualizations and comments in markdown format, allowing for an intuitive documentation of the code.

The notebooks are part of a [GitHub repository](https://github.com/Open-Power-System-Data/datapackage_timeseries) and can be [downloaded](https://github.com/Open-Power-System-Data/datapackage_timeseries/archive/master.zip) for execution on your local computer (You need a running python installation to do this, for example [Anaconda](https://www.continuum.io/downloads)).  Executed one after another, they can be used to reproduce the dataset that we provide for download.

# 2. Download

Download sources are in `config/sources.yml`, which specifies, for each source, the variables (such as wind and solar generation) alongside all the parameters necessary to execute the downloads.

First, a data directory is created on your local computer. Then, download parameters for each data source are defined, including the URL. These parameters are then turned into a YAML-string. Finally, the download is executed one by one. If all data need to be downloaded, this usually takes several hours.


Each file is saved under it's original filename. Note that the original file names are often not self-explanatory (called "data" or "January"). The files content is revealed by its place in the directory structure.

In [1]:
sources_yaml_path = 'config/sources.yml'
out_path = 'original_data2'

In [6]:
from timeseries_scripts import download

# Optionally, specify a subset to download, e.g. subset=['TenneT', '50Hertz']

download.download(sources_yaml_path, out_path, subset='Amprion')

# 3. Processing

The other scripts/notebooks each implement a distinct function (The local copy will only work if you are running this notebook on your yomputer):

- **The read script** ([GitHub](https://github.com/Open-Power-System-Data/datapackage_timeseries/blob/master/read.ipynb) / [local copy](read.ipynb)) reads each downloaded file into a pandas-DataFrame and merges data from different sources but with the same time resolution.
- **The processing script** ([GitHub](https://github.com/Open-Power-System-Data/datapackage_timeseries/blob/master/processing.ipynb) / [local copy](processing.ipynb)) performs some aggregations and transforms the data to the [tabular data package format](http://data.okfn.org/doc/tabular-data-package), where actual data is saved in a CSV file, while metadata (information on format, units, sources, and descriptions) is stored in a JSON file.

In [7]:
HEADERS = ['variable', 'country', 'attribute', 'source', 'web']
from timeseries_scripts import read

In [8]:
data_sets = read.read(sources_yaml_path, out_path, HEADERS, subset=['Amprion'])

In [9]:
data_sets['15min']

variable,solar,solar,wind,wind
country,DEamprion,DEamprion,DEamprion,DEamprion
attribute,forecast,generation,forecast,generation
source,Amprion,Amprion,Amprion,Amprion
web,http://www.amprion.net/en/photovoltaic-infeed,http://www.amprion.net/en/photovoltaic-infeed,http://www.amprion.net/en/wind-feed-in,http://www.amprion.net/en/wind-feed-in
timestamp,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5
2008-03-31 22:00:00,,,106,39
2008-03-31 22:15:00,,,112,38
2008-03-31 22:30:00,,,114,35
2008-03-31 22:45:00,,,116,34
2008-03-31 23:00:00,,,123,30
2008-03-31 23:15:00,,,132,37
2008-03-31 23:30:00,,,141,52
2008-03-31 23:45:00,,,153,72
2008-04-01 00:00:00,,,165,76
2008-04-01 00:15:00,,,175,67


# 4. What is in the output files?

## 4.1 Data sources

An overview of the sources for the data can be found [here](http://open-power-system-data.org/opsd-sources#time-series).

## 4.2 Data documentation and interpretation

Often, the data that we use is poorly documented. In some cases, primary data owners provide some documentation.


**Load data**
* [ENTSO-E Specific national considerations](https://www.entsoe.eu/Documents/Publications/Statistics/Specific_national_considerations.pdf)
* [Schumacher & Hirth 2015](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2715986), a paper on load data