# Data Upload and Download in Python

The [Solar Forecast Arbiter](https://solarforecastarbiter.org/) allows users to upload and download metadata and data using an HTTP API. The HTTP API documentation is available [here](https://api.solarforecastarbiter.org/) and contains examples for each type of request. This [Jupyter notebook](https://jupyter.org) is designed to introduce you to the [solarforecatarbiter-core](https://github.com/SolarArbiter/solarforecastarbiter-core) package's Python wrapper of the API. 

Click the ">| Run" button in the toolbar above or type shift-enter to run the code in each cell. The help menu contains a brief User Interface Tour.

In [1]:
import datetime
from functools import partial
from pathlib import Path

import numpy as np
import pandas as pd

from bokeh.core.properties import value
from bokeh.io import output_notebook
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.palettes import Category10_10 as PALETTE
TOOLS = "pan,box_zoom,xwheel_zoom,reset,save"
output_notebook()

In [2]:
from solarforecastarbiter import datamodel

There are two important objects in the `solarforecastarbiter` API wrapper: 

1. `request_cli_access_token`
2. `APISession`

See the documentation [here](https://solarforecastarbiter-core.readthedocs.io/en/latest/api.html#sfa-api).

In [3]:
from solarforecastarbiter.io.api import APISession, request_cli_access_token

To access data in the Solar Forecast Arbiter, a user must use a valid username and password to obtain a *token*. Read more about authentication [here](https://api.solarforecastarbiter.org/#section/Authentication). The [`request_cli_access_token`](https://solarforecastarbiter-core.readthedocs.io/en/latest/generated/solarforecastarbiter.io.api.request_cli_access_token.html#solarforecastarbiter.io.api.request_cli_access_token) is a convenient function for obtaining a token within Python. Your token will be different from the one printed below when you opened the notebook.

In [4]:
# don't store your real passwords or tokens in plain text like this! only for demonstration purposes!
token = request_cli_access_token('testing@solarforecastarbiter.org', 'Thepassword123!')
token

'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik5UZENSRGRFTlVNMk9FTTJNVGhCTWtRelFUSXpNRFF6TUVRd1JUZ3dNekV3T1VWR1FrRXpSUSJ9.eyJpc3MiOiJodHRwczovL3NvbGFyZm9yZWNhc3RhcmJpdGVyLmF1dGgwLmNvbS8iLCJzdWIiOiJhdXRoMHw1YmUzNDNkZjcwMjU0MDYyMzc4MjBiODUiLCJhdWQiOlsiaHR0cHM6Ly9hcGkuc29sYXJmb3JlY2FzdGFyYml0ZXIub3JnIiwiaHR0cHM6Ly9zb2xhcmZvcmVjYXN0YXJiaXRlci5hdXRoMC5jb20vdXNlcmluZm8iXSwiaWF0IjoxNTU5NTc4ODc0LCJleHAiOjE1NTk1ODk2NzQsImF6cCI6ImMxNkVKbzQ4bGJUQ1FFaHFTenRHR2xteHh4bVo0elg3Iiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSIsImd0eSI6InBhc3N3b3JkIn0.k2FNvgCZWy3Gugj-eGl3kiYwt8EHIeV1TjhpdEnPynGQFmP8TGruA59IseYSFq9Pko6T6ntrWCMiyhqdBLcjocm1JdBtkw8Yr5ms5ORD_Ptk4bIzM5rUhfCSeEskfRz5WmiH_sQubfADdnJKCjVkgpiV6wn1CHdZOHjhJJm5BAzC4kwwVLwoQl-KamBPF-BT7Dmff8_31GR3UIEK5pFl90W5MMaMhXD7e-YpoXeKRFHxaWvkmF7rZ9Fol5sPw7-qosvhpVbgfjvgTMKaqvYOdmfXaEegyhhN9dEsOtgcJblANV8G3IfP4kfRHtnj7Pt0OFrvUj5-f79SOC2LTKDyvw'

The [`APISession`](https://solarforecastarbiter-core.readthedocs.io/en/latest/generated/solarforecastarbiter.io.api.APISession.html#solarforecastarbiter.io.api.APISession) uses the valid token to communicate with the API. 

In [5]:
session = APISession(token)

The `APISession.list_sites` method returns a list of all sites that the user has access to. Most of these are reference data sites.

In [6]:
sites = session.list_sites()

Let's see how many sites we have access to.

In [7]:
len(sites)

197

In [8]:
# print every 30th
for site in sites[::30]:
    print(site, '\n')

Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='602fdcae-8596-11e9-9d09-0a580a8003e9', provider='Organization 1', extra_parameters='') 

Site(name='UO SRML Madras OR', latitude=44.69, longitude=-121.16, elevation=997.0, timezone='Etc/GMT+8', site_id='9cfa105a-7e49-11e9-8953-0a580a8003e9', provider='Reference', extra_parameters='{"network": "UO SRML", "network_api_id": "94252.0", "network_api_abbreviation": "MA", "observation_interval_length": 15}') 

Site(name='NOAA SOLRAD Seattle Washington', latitude=47.68685, longitude=-122.25667, elevation=20.0, timezone='America/Los_Angeles', site_id='c2cd5928-7e49-11e9-a977-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NOAA SOLRAD", "network_api_id": "SEA", "network_api_abbreviation": "sea", "observation_interval_length": 1}') 

Site(name='NOAA USCRN Coos Bay OR', latitude=43.27, longitude=-124.31, elevation=12.0, timezone='America/Los_Angeles', site_id='c5965d1e-

We can use Python's `filter` function to remove sites that are provided by the reference database.

In [9]:
for site in filter(lambda x: x.provider != 'Reference', sites):
    print(site, '\n')

Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='602fdcae-8596-11e9-9d09-0a580a8003e9', provider='Organization 1', extra_parameters='') 

Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='859a05f6-859b-11e9-872c-0a580a82006e', provider='Organization 1', extra_parameters='') 

Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='73128e22-861a-11e9-b6a7-0a580a8003e9', provider='Organization 1', extra_parameters='') 

Site(name='Ashland OR', latitude=42.19, longitude=-122.7, elevation=595.0, timezone='Etc/GMT+8', site_id='123e4567-e89b-12d3-a456-426655440001', provider='Organization 1', extra_parameters='{"network_api_abbreviation": "AS","network": "University of Oregon SRML","network_api_id": "94040"}') 

SolarPowerPlant(name='Power Plant 1', latitude=43.73403, longitude=-96.62328, elevation=786.0, timezone='Etc/

We'd like to find the site that represents the NREL MIDC observing station located on the University of Arizona campus.

In [10]:
sites_filtered = list(filter(lambda x: 'NREL MIDC' in x.name and 'Arizona' in x.name, sites))
sites_filtered

[Site(name='NREL MIDC University of Arizona OASIS', latitude=32.22969, longitude=-110.95534, elevation=786.0, timezone='Etc/GMT+7', site_id='9f61b880-7e49-11e9-9624-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NREL MIDC", "network_api_id": "UAT", "network_api_abbreviation": "UA OASIS", "observation_interval_length": 1}')]

The filtered list has just one site, so pick it out for future queries.

In [11]:
oasis = sites_filtered[0]

Now we repeat the process for observations.

In [12]:
observations = session.list_observations()

There are 3-6 observations per site, so the list is quite long.

In [13]:
len(observations)

779

In [14]:
# print every 200th
for observation in observations[::200]:
    print(observation, '\n')

Observation(name='sample observation', variable='ghi', interval_value_type='interval_mean', interval_length=Timedelta('0 days 01:00:00'), interval_label='ending', site=Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='859a05f6-859b-11e9-872c-0a580a82006e', provider='Organization 1', extra_parameters=''), uncertainty=0.0, observation_id='2d42399c-8618-11e9-be8f-0a580a8003e9', extra_parameters='', units='W/m^2') 

Observation(name='Albuquerque NM air_temperature', variable='air_temperature', interval_value_type='interval_mean', interval_length=Timedelta('0 days 00:01:00'), interval_label='ending', site=Site(name='SANDIA Albuquerque NM', latitude=35.05, longitude=-106.53, elevation=1657.0, timezone='America/Denver', site_id='9ffbb3cc-7e49-11e9-aa67-0a580a8003e9', provider='Reference', extra_parameters='{"network": "SANDIA", "network_api_id": "Albuquerque", "network_api_abbreviation": NaN, "observation_interval_length": 1}'), unce

Notice that each observation object contains metadata about the observation type (e.g. variable, interval length) and the site that it is associated with. We can extract the observations from the site of interest using another filter statement.

In [15]:
observations_oasis = list(filter(lambda x: x.site == oasis, observations))
for observation in observations_oasis:
    print(observation, '\n')

Observation(name='University of Arizona OASIS ghi', variable='ghi', interval_value_type='interval_mean', interval_length=Timedelta('0 days 00:01:00'), interval_label='ending', site=Site(name='NREL MIDC University of Arizona OASIS', latitude=32.22969, longitude=-110.95534, elevation=786.0, timezone='Etc/GMT+7', site_id='9f61b880-7e49-11e9-9624-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NREL MIDC", "network_api_id": "UAT", "network_api_abbreviation": "UA OASIS", "observation_interval_length": 1}'), uncertainty=0.0, observation_id='9f657636-7e49-11e9-b77f-0a580a8003e9', extra_parameters='{"network": "NREL MIDC", "network_api_id": "UAT", "network_api_abbreviation": "UA OASIS", "observation_interval_length": 1, "network_data_label": "Global Horiz (platform) [W/m^2]"}', units='W/m^2') 

Observation(name='University of Arizona OASIS dni', variable='dni', interval_value_type='interval_mean', interval_length=Timedelta('0 days 00:01:00'), interval_label='ending', site=Si

Now we're ready to get data from the API using [`session.get_observation_values`](https://solarforecastarbiter-core.readthedocs.io/en/latest/generated/solarforecastarbiter.io.api.APISession.get_observation_values.html#solarforecastarbiter.io.api.APISession.get_observation_values).

In [16]:
start = pd.Timestamp('20190520 0000')
end = pd.Timestamp('20190525 0000')

The currently method requires an `observation_id` string, so extract that from an observation.

In [17]:
oasis_ghi = observations_oasis[0]
oasis_ghi_id = oasis_ghi.observation_id
oasis_ghi_id

'9f657636-7e49-11e9-b77f-0a580a8003e9'

In [18]:
oasis_ghi_values = session.get_observation_values(oasis_ghi_id, start, end)

In [19]:
oasis_ghi_values.head()

Unnamed: 0_level_0,value,quality_flag
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-05-20 00:00:00+00:00,200.669,2
2019-05-20 00:01:00+00:00,199.503,2
2019-05-20 00:02:00+00:00,195.973,2
2019-05-20 00:03:00+00:00,192.769,2
2019-05-20 00:04:00+00:00,190.638,2


In [20]:
oasis_ghi_values.tail()

Unnamed: 0_level_0,value,quality_flag
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-05-24 06:55:00+00:00,-3.10731,18
2019-05-24 06:56:00+00:00,-3.09589,18
2019-05-24 06:57:00+00:00,-3.08448,18
2019-05-24 06:58:00+00:00,-3.0845,18
2019-05-24 06:59:00+00:00,-3.08451,18


In [21]:
from solarforecastarbiter.plotting import timeseries

In [22]:
fig = timeseries.generate_observation_figure(oasis_ghi, oasis_ghi_values)
show(fig)

New sites, observations, and forecasts may be created using the python API wrappers. To do so, we:

1. Create a `Site`.
2. Post the site to the API.
3. Create new observations and forecasts using the new site *returned by the API*. 

The observation and forecast need to be associated with the unique `site_id` assigned by the API.

In [23]:
site = datamodel.Site(
    name='Tucson AZ',
    latitude=32.2,
    longitude=-110.9,
    elevation=700,
    timezone='America/Phoenix'
)

The API returns a new site object that has the same attributes but includes a new, unique `site_id` as well as a `provider` that is automatically determined by the user's affiliation.

In [24]:
site_returned = session.create_site(site)

In [25]:
site_returned

Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='9c674cc8-861b-11e9-bd5c-0a580a8003e9', provider='Organization 1', extra_parameters='')

Now we can use the site returned by the API to create the Observation and Forecast objects.

In [26]:
observation = datamodel.Observation(
    name='sample observation', 
    interval_length=pd.Timedelta('1hr'),
    interval_label='ending',
    interval_value_type='interval_mean',
    variable='ghi',
    uncertainty=0,
    site=site_returned
)

forecast = datamodel.Forecast(
    name='sample forecast', 
    issue_time_of_day=datetime.time(0), 
    lead_time_to_start=pd.Timedelta('1h'),
    interval_length=pd.Timedelta('1h'),
    run_length=pd.Timedelta('1h'),
    interval_label='ending',
    interval_value_type='interval_mean',
    variable='ghi',
    site=site_returned
)

In [27]:
observation_returned = session.create_observation(observation)

In [28]:
observation_returned

Observation(name='sample observation', variable='ghi', interval_value_type='interval_mean', interval_length=Timedelta('0 days 01:00:00'), interval_label='ending', site=Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='9c674cc8-861b-11e9-bd5c-0a580a8003e9', provider='Organization 1', extra_parameters=''), uncertainty=0.0, observation_id='9c7b7e1e-861b-11e9-ae6d-0a580a8003e9', extra_parameters='', units='W/m^2')

In [29]:
forecast_returned = session.create_forecast(forecast)

In [30]:
forecast_returned

Forecast(name='sample forecast', issue_time_of_day=datetime.time(0, 0), lead_time_to_start=Timedelta('0 days 01:00:00'), interval_length=Timedelta('0 days 01:00:00'), run_length=Timedelta('0 days 01:00:00'), interval_label='ending', interval_value_type='interval_mean', variable='ghi', site=Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='9c674cc8-861b-11e9-bd5c-0a580a8003e9', provider='Organization 1', extra_parameters=''), forecast_id='9c847f52-861b-11e9-89fc-0a580a8003e9', extra_parameters='', units='W/m^2')

Let's use the NREL MIDC OASIS data we previously downloaded as the upload for the new observation.

In [31]:
new_obs_id = observation_returned.observation_id

In [32]:
try:
    session.post_observation_values(new_obs_id, oasis_ghi_values)
except Exception as e:
    print(e)

400 Client Error: BAD REQUEST for url: https://api.solarforecastarbiter.org/observations/9c7b7e1e-861b-11e9-ae6d-0a580a8003e9/values


The API rejected the upload because the interval length of the data does not match the interval length we specified for the metadata. Recall that the metadata described data with hourly mean and interval ending label. The resampled `values` are a simple interval average of the minute data. The determining the resampled `quality_flags` is beyond the scope of this tutorial, so we set them to 0.

In [33]:
resampled_data = oasis_ghi_values.resample('1h', label='right').mean()
resampled_data['quality_flag'] = 0

In [34]:
fig = timeseries.generate_observation_figure(observation, resampled_data)
show(fig)

Now we attempt to post the data again.

In [35]:
session.post_observation_values(new_obs_id, resampled_data)

It (should have) worked. Let's confirm that the data exists.

In [36]:
oasis_ghi_values_1h = session.get_observation_values(new_obs_id, start, end)

In [37]:
fig = timeseries.generate_observation_figure(observation, oasis_ghi_values_1h)
show(fig)

The quality flag returned by the API may be "NOT VALIDATED" or it may show at least "NIGHTTIME". The API only checks that the data format is valid. The validation step occurs only after the API sends an "OK" response to the data post function. The validation typically takes at least a few seconds to complete. So, if you're reading along as you execute the code you may have given the server enough time to validate the data. If you're quickly executing the code cells or used the "Run all" command then you may need to wait a few seconds and request the data again.

The cells below wait for 5 seconds to insure that the API has time to process the data and then request the data once again.

In [38]:
import time
time.sleep(5)

In [39]:
oasis_ghi_values_1h = session.get_observation_values(new_obs_id, start, end)
fig = timeseries.generate_observation_figure(observation, oasis_ghi_values_1h)
show(fig)