# Weather data preparation

Weather data can be downloaded from [ECMWF](https://www.ecmwf.int/) archive using weather API provided by JSI. Installation instructions and other information about the API can be found [here](https://github.com/JozefStefanInstitute/weather-data). 

We are using weather data from 1.11.2017 to 30.11.2017 for Slovenia.

## 1. Weather data download

Example of weather data downloading can be found [here](https://github.com/JozefStefanInstitute/weather-data/blob/master/example.py).

In the future steps we will assume that we have already downloaded the weather forecast as described in the example. Corresponding grib file is located in ***data/slovenia-nov2017.grib***.

## 2. Weather data transformation

We curently have a file in **grib** format which cannot be imported to QMiner directly. We first have to transform it to **tsv** format. This can be done using weather-api.

In [5]:
from weather.weather import WeatherExtractor

we = WeatherExtractor()

# load weather data
we.load(['data/slovenia-nov2017.grib'])

Extending parameters...


In [10]:
print "Number of weather points", len(we.grib_msgs['values'].iloc[0])

 Number of weather points 105


We now have weather data for **105** points evenly spread across Slovenia. However we are interested only in a small subset of them - in our case this are points that correspond to locations of Slovenian weather agency measuring stations. We will call this interesting points **interpolation points**.

In [11]:
interp_points = [
  { "lat": 45.8958,     "alt": 55.0,        "lon": 13.6289,     "shortTitle": "BILJE",    "title": "NOVA GORICA" },  
  { "lat": 46.2447,     "alt": 244.0,       "lon": 15.2525,     "shortTitle": "CELJE",    "title": "CELJE" },  
  { "lat": 45.5603,     "alt": 157.0,       "lon": 15.1508,     "shortTitle": "\u010cRNOMELJ - DOBLI\u010cE",    "title": "CRNOMELJ" },
  { "lat": 46.3794,     "alt": 2514.0,      "lon": 13.8539,     "shortTitle": "KREDARICA",    "title": "KREDARICA" },  
  { "lat": 45.8936,     "alt": 154.0,       "lon": 15.525,      "shortTitle": "CERKLJE - LETALI\u0160\u010cE",    "title": "CERKLJE - LETALISCE" },
  { "lat": 46.48,       "alt": 264.0,       "lon": 15.6869,     "shortTitle": "MARIBOR - LETALI\u0160\u010cE",    "title": "MARIBOR/SLIVNICA" },  
  { "lat": 46.2178,     "alt": 364.0,       "lon": 14.4775,     "shortTitle": "BRNIK - LETALI\u0160\u010cE",    "title": "LJUBLJANA/BRNIK" },  
  { "lat": 46.37,       "alt": 515.0,       "lon": 14.18,       "shortTitle": "LESCE",    "title": "LESCE" },
  { "lat": 45.4756,     "alt": 2.0,         "lon": 13.6206,     "shortTitle": "PORTORO\u017d - LETALI\u0160\u010cE",    "title": "PORTOROZ/SECOVLJE" },  
  { "lat": 46.0681,     "alt": 943.0,       "lon": 15.2897,     "shortTitle": "LISCA",    "title": "LISCA" },  
  { "lat": 46.0658,     "alt": 299.0,       "lon": 14.5172,     "shortTitle": "LJUBLJANA - BE\u017dIGRAD",    "title": "LJUBLJANA/BEZIGRAD" },
  { "lat": 46.6525,     "alt": 188.0,       "lon": 16.1961,     "shortTitle": "MURSKA SOBOTA - RAKI\u010cAN",    "title": "MURSKA SOBOTA" },
  { "lat": 45.8019,     "alt": 220.0,       "lon": 15.1822,     "shortTitle": "NOVO MESTO",    "title": "NOVO MESTO" },
  { "lat": 45.7664,     "alt": 533.0,       "lon": 14.1975,     "shortTitle": "POSTOJNA",    "title": "POSTOJNA" },
  { "lat": 46.4975,     "alt": 864.0,       "lon": 13.7175,     "shortTitle": "RATE\u010cE - PLANICA",    "title": "RATECE" },
  { "lat": 46.49,       "alt": 455.0,       "lon": 15.1161,     "shortTitle": "\u0160MARTNO PRI SLOVENJ GRADCU",    "title": "SLOVENJ GRADEC" }
]

Export weather data in **tsv** format keeping only measurements from **interpolation points**

In [14]:
we.export('data/slovenia-nov2017.tsv', interp_points)

## 3. Preparing data for QMiner

Default weather-api tsv format does not correspond to the format defined in **QMiner schema**. We have to do one more transformation **... to be removed in the future**

In this example setting we will only use **Surface Temperature (2t)** and **Total Percipitation (tp)** measurements.

In [15]:
import pandas as pd

# load weather measurements
df = pd.read_csv('data/slovenia-nov2017.tsv', sep='\t')

In [16]:
# keep only temperature and percipitation
df = df[(df.shortName == '2t') | (df.shortName == 'tp')]

In [20]:
# transform to new schema
new_rows = []

groups = df.groupby(['shortName', 'validDate', 'dayOffset', 'region', 'fromHour', 'toHour'])
for i, (name, group) in enumerate(groups):
    new_rows.append({
		'weatherParam': name[0],
		'timestamp': name[1],
		'dayOffset': name[2],
		'region': name[3],
		'fromHour': name[4],
		'toHour': name[5],
		'max': -1., 'min': -1., 'mean': -1., 'cum': -1.
	})
    for _, curr in group.iterrows():
        new_rows[-1][curr.aggFunc] = curr.value
        
new_rows = pd.DataFrame.from_dict(new_rows)

Mind the difference between default weather-api schema:

In [19]:
print df.head()

    aggFunc  dayOffset                 featureName  fromHour  region  \
448     cum          0  WEATHERFC+000tp000CUM00-06         0       0   
449     cum          0  WEATHERFC+000tp001CUM00-06         0       1   
450     cum          0  WEATHERFC+000tp002CUM00-06         0       2   
451     cum          0  WEATHERFC+000tp003CUM00-06         0       3   
452     cum          0  WEATHERFC+000tp004CUM00-06         0       4   

    shortName  toHour   validDate  value  
448        tp       6  2017-11-01    0.0  
449        tp       6  2017-11-01    0.0  
450        tp       6  2017-11-01    0.0  
451        tp       6  2017-11-01    0.0  
452        tp       6  2017-11-01    0.0  


And schema for QMiner import:

In [21]:
print new_rows.head()

   cum  dayOffset  fromHour         max        mean         min  region  \
0 -1.0          0         0  278.960205  277.800659  276.641113       0   
1 -1.0          0         6  285.279785  282.785482  278.960205       0   
2 -1.0          0         6  285.279785  282.587305  278.960205       0   
3 -1.0          0        12  285.279785  283.286621  281.368164       0   
4 -1.0          0         0  276.062500  274.747192  273.431885       1   

    timestamp  toHour weatherParam  
0  2017-11-01       6           2t  
1  2017-11-01      12           2t  
2  2017-11-01      18           2t  
3  2017-11-01      18           2t  
4  2017-11-01       6           2t  


Finally store converted data:

In [22]:
new_rows.to_csv('data/slovenia-nov2017_qminer.tsv', sep='\t', index=False)