# Data Upload and Download in Python

The [Solar Forecast Arbiter](https://solarforecastarbiter.org/) allows users to upload and download metadata and data using an HTTP API. The HTTP API documentation is available [here](https://api.solarforecastarbiter.org/) and contains examples for each type of request. This [Jupyter notebook](https://jupyter.org) is designed to introduce you to the [solarforecatarbiter-core](https://github.com/SolarArbiter/solarforecastarbiter-core) package's Python wrapper of the API. 

Click the ">| Run" button in the toolbar above or type shift-enter to run the code in each cell. The help menu contains a brief User Interface Tour.

In [1]:
import datetime
import os

import pandas as pd
import requests

from bokeh.io import output_notebook
from bokeh.plotting import show
TOOLS = "pan,box_zoom,xwheel_zoom,reset,save"
output_notebook()

In [2]:
from solarforecastarbiter import datamodel

There are two important objects in the `solarforecastarbiter` API wrapper: 

1. `request_cli_access_token`
2. `APISession`

See the documentation [here](https://solarforecastarbiter-core.readthedocs.io/en/latest/api.html#sfa-api).

In [3]:
from solarforecastarbiter.io.api import APISession, request_cli_access_token

To access data in the Solar Forecast Arbiter, a user must use a valid username and password to obtain a *token*. Read more about authentication [here](https://api.solarforecastarbiter.org/#section/Authentication). The [`request_cli_access_token`](https://solarforecastarbiter-core.readthedocs.io/en/latest/generated/solarforecastarbiter.io.api.request_cli_access_token.html#solarforecastarbiter.io.api.request_cli_access_token) is a convenient function for obtaining a token within Python. Your token will be different from the one printed below when you opened the notebook. To get started, we will use a testing account that has read-only permissions.

In [4]:
# don't store your real passwords or tokens in plain text like this! only for demonstration purposes!
token = request_cli_access_token('testing@solarforecastarbiter.org', 'Thepassword123!')
token

'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6Ik5UZENSRGRFTlVNMk9FTTJNVGhCTWtRelFUSXpNRFF6TUVRd1JUZ3dNekV3T1VWR1FrRXpSUSJ9.eyJpc3MiOiJodHRwczovL3NvbGFyZm9yZWNhc3RhcmJpdGVyLmF1dGgwLmNvbS8iLCJzdWIiOiJhdXRoMHw1YmUzNDNkZjcwMjU0MDYyMzc4MjBiODUiLCJhdWQiOlsiaHR0cHM6Ly9hcGkuc29sYXJmb3JlY2FzdGFyYml0ZXIub3JnIiwiaHR0cHM6Ly9zb2xhcmZvcmVjYXN0YXJiaXRlci5hdXRoMC5jb20vdXNlcmluZm8iXSwiaWF0IjoxNjA0OTc2MDE0LCJleHAiOjE2MDQ5ODY4MTQsImF6cCI6ImMxNkVKbzQ4bGJUQ1FFaHFTenRHR2xteHh4bVo0elg3Iiwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCIsImd0eSI6InBhc3N3b3JkIn0.TMLxZcXRmzf0QFCYb8XfoC-6GkYEQKraat31sYYiu6BxjaGTzd9oEA9HNh2tlaI8fdwtr8OwfH9jefCuR0RaIXdi2ZgMfLPfubZ9OYDSklBs70d_6ou9wA-CxOv5f6ZGEpd3IyXqJPimb_6KlGaGwTD_ucupUCcBEpUz-LWzMwlm1VzwPQQpN7aGCalbi2XUk-3vBOuTUEtjplLvD_ngn8Hf4ZMTPFzXwZYC-atj83PixwwH-VkzwW1-ZcpDP5ThTf98qEPTPZiOH422yeFNkkUcsKX7DGn7sPYgzGJlU8SARVwdo07oTAhtdGaEJgJ_WDxXfAzJzpS3tQll0QEWsw'

The [`APISession`](https://solarforecastarbiter-core.readthedocs.io/en/latest/generated/solarforecastarbiter.io.api.APISession.html#solarforecastarbiter.io.api.APISession) uses the valid token to communicate with the API. 

In [5]:
session = APISession(token)

The `APISession.list_sites` method returns a list of all sites that the user has access to. Most of these are reference data sites.

In [6]:
sites = session.list_sites()

Let's see how many sites we have access to.

In [7]:
len(sites)

226

In [8]:
# print every 30th
for site in sites[::30]:
    print(site, '\n')

SolarPowerPlant(name='PSEL Reference System', latitude=35.05, longitude=-106.54, elevation=1657.0, timezone='America/Denver', site_id='af1e53ca-7e3d-11e9-ab45-52540015d5ce', provider='Reference', extra_parameters='{"network": "Sandia RTC"}', climate_zones=('Reference Region 3',), modeling_parameters=FixedTiltModelingParameters(ac_capacity=0.003, dc_capacity=0.003, temperature_coefficient=-0.4, dc_loss_factor=0.0, ac_loss_factor=0.0, surface_tilt=35.0, surface_azimuth=180.0, tracking_type='fixed')) 

Site(name='NREL MIDC National Wind Technology Center', latitude=39.9106, longitude=-105.2347, elevation=1855.0, timezone='Etc/GMT+7', site_id='9ecbbad2-7e49-11e9-b20a-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NREL MIDC", "network_api_id": "NWTC", "network_api_abbreviation": "NWTC M2", "observation_interval_length": 1}', climate_zones=('Reference Region 4',)) 

Site(name='NOAA USCRN Elkins WV', latitude=39.01, longitude=-79.47, elevation=1033.0, timezone='America/Ne

We'd like to find the site that represents the NREL MIDC observing station located on the University of Arizona campus.

In [9]:
sites_filtered = list(filter(lambda x: 'NREL MIDC' in x.name and 'Arizona' in x.name, sites))
sites_filtered

[Site(name='NREL MIDC University of Arizona OASIS', latitude=32.22969, longitude=-110.95534, elevation=786.0, timezone='Etc/GMT+7', site_id='9f61b880-7e49-11e9-9624-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NREL MIDC", "network_api_id": "UAT", "network_api_abbreviation": "UA OASIS", "observation_interval_length": 1}', climate_zones=('Reference Region 3',))]

The filtered list has just one site, so pick it out for future queries.

In [10]:
oasis = sites_filtered[0]

Now we repeat the process for observations.

In [11]:
observations = session.list_observations()

There are 3-6 observations per site, so the list is quite long.

In [12]:
len(observations)

913

In [13]:
# print every 200th
for observation in observations[::200]:
    print(observation, '\n')

Observation(name='PSEL Reference POA Irradiance', variable='poa_global', interval_value_type='instantaneous', interval_length=Timedelta('0 days 00:01:00'), interval_label='beginning', site=SolarPowerPlant(name='PSEL Reference System', latitude=35.05, longitude=-106.54, elevation=1657.0, timezone='America/Denver', site_id='af1e53ca-7e3d-11e9-ab45-52540015d5ce', provider='Reference', extra_parameters='{"network": "Sandia RTC"}', climate_zones=('Reference Region 3',), modeling_parameters=FixedTiltModelingParameters(ac_capacity=0.003, dc_capacity=0.003, temperature_coefficient=-0.4, dc_loss_factor=0.0, ac_loss_factor=0.0, surface_tilt=35.0, surface_azimuth=180.0, tracking_type='fixed')), uncertainty=0.1, observation_id='af1ec4ff-7e3d-11e9-ab45-52540015d5ce', provider='Reference', extra_parameters='{"network": "Sandia RTC"}', units='W/m^2') 

Observation(name='Madison Wisconsin dni', variable='dni', interval_value_type='interval_mean', interval_length=Timedelta('0 days 00:01:00'), interval_

Notice that each observation object contains metadata about the observation type (e.g. variable, interval length) and the site that it is associated with. We can extract the observations from the site of interest using another filter statement.

In [14]:
observations_oasis = list(filter(lambda x: x.site == oasis, observations))
for observation in observations_oasis:
    print(observation, '\n')

Observation(name='University of Arizona OASIS ghi', variable='ghi', interval_value_type='interval_mean', interval_length=Timedelta('0 days 00:01:00'), interval_label='ending', site=Site(name='NREL MIDC University of Arizona OASIS', latitude=32.22969, longitude=-110.95534, elevation=786.0, timezone='Etc/GMT+7', site_id='9f61b880-7e49-11e9-9624-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NREL MIDC", "network_api_id": "UAT", "network_api_abbreviation": "UA OASIS", "observation_interval_length": 1}', climate_zones=('Reference Region 3',)), uncertainty=0.0, observation_id='9f657636-7e49-11e9-b77f-0a580a8003e9', provider='Reference', extra_parameters='{"network": "NREL MIDC", "network_api_id": "UAT", "network_api_abbreviation": "UA OASIS", "observation_interval_length": 1, "network_data_label": "Global Horiz (platform) [W/m^2]"}', units='W/m^2') 

Observation(name='University of Arizona OASIS dni', variable='dni', interval_value_type='interval_mean', interval_length=T

Now we're ready to get data from the API using [`session.get_observation_values`](https://solarforecastarbiter-core.readthedocs.io/en/latest/generated/solarforecastarbiter.io.api.APISession.get_observation_values.html#solarforecastarbiter.io.api.APISession.get_observation_values).

In [15]:
start = pd.Timestamp('20190520 0000Z')
end = pd.Timestamp('20190525 0000Z')

The currently method requires an `observation_id` string, so extract that from an observation.

In [16]:
oasis_ghi = observations_oasis[0]
oasis_ghi_id = oasis_ghi.observation_id
oasis_ghi_id

'9f657636-7e49-11e9-b77f-0a580a8003e9'

In [17]:
oasis_ghi_values = session.get_observation_values(oasis_ghi_id, start, end)

In [18]:
oasis_ghi_values.head()

Unnamed: 0_level_0,value,quality_flag
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-05-20 00:00:00+00:00,200.669,2
2019-05-20 00:01:00+00:00,199.503,2
2019-05-20 00:02:00+00:00,195.973,2
2019-05-20 00:03:00+00:00,192.769,2
2019-05-20 00:04:00+00:00,190.638,2


In [19]:
oasis_ghi_values.tail()

Unnamed: 0_level_0,value,quality_flag
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-05-24 23:56:00+00:00,470.606,2
2019-05-24 23:57:00+00:00,467.037,2
2019-05-24 23:58:00+00:00,464.077,2
2019-05-24 23:59:00+00:00,460.442,2
2019-05-25 00:00:00+00:00,456.825,2


In [20]:
from solarforecastarbiter.plotting import timeseries

In [21]:
fig = timeseries.generate_observation_figure(oasis_ghi, oasis_ghi_values)
show(fig)

New sites, observations, and forecasts may be created using the python API wrappers. Our testing account is not allowed to post, so we have to use real credentials to demonstrate this. To maintain security, we've set our username and password in environment variables, and the code below will extract those and use them to obtain a token. The code will not work unless you export your username/password to the correct environment variables. You will also need permissons to create new sites, observations, and data -- check with your organization administrator if you get an error.

In [22]:
token = request_cli_access_token(os.environ['SFA_API_USERNAME'], os.environ['SFA_API_PASSWORD'])
session = APISession(token)

To do so, we:

1. Create a `Site`.
2. Post the site to the API.
3. Create new observations and forecasts using the new site *returned by the API*. 

The observation and forecast need to be associated with the unique `site_id` assigned by the API.

In [23]:
site = datamodel.Site(
    name='Tucson AZ',
    latitude=32.2,
    longitude=-110.9,
    elevation=700,
    timezone='America/Phoenix'
)

The API returns a new site object that has the same attributes but includes:

1. a new, unique `site_id`
2. a `provider` field that is automatically determined by the user's affiliation
3. an automatically generated list of climate zones that the site belongs to (as of SFA 1.0.0, we only support a single database of zones so the list is likely a single element).

In [24]:
site_returned = session.create_site(site)
site_returned

Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='1185ad1c-22fe-11eb-a505-0a580a8201a5', provider='University of Arizona', extra_parameters='', climate_zones=('Reference Region 3',))

Now we can use the site returned by the API to create the Observation and Forecast objects.

In [25]:
observation = datamodel.Observation(
    name='sample observation', 
    interval_length=pd.Timedelta('1hr'),
    interval_label='ending',
    interval_value_type='interval_mean',
    variable='ghi',
    uncertainty=0,
    site=site_returned
)

forecast = datamodel.Forecast(
    name='sample forecast', 
    issue_time_of_day=datetime.time(0), 
    lead_time_to_start=pd.Timedelta('1h'),
    interval_length=pd.Timedelta('1h'),
    run_length=pd.Timedelta('1h'),
    interval_label='ending',
    interval_value_type='interval_mean',
    variable='ghi',
    site=site_returned
)

In [26]:
observation_returned = session.create_observation(observation)
observation_returned

Observation(name='sample observation', variable='ghi', interval_value_type='interval_mean', interval_length=Timedelta('0 days 01:00:00'), interval_label='ending', site=Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='1185ad1c-22fe-11eb-a505-0a580a8201a5', provider='University of Arizona', extra_parameters='', climate_zones=('Reference Region 3',)), uncertainty=0.0, observation_id='1193ad22-22fe-11eb-83a9-0a580a8201a5', provider='University of Arizona', extra_parameters='', units='W/m^2')

In [27]:
forecast_returned = session.create_forecast(forecast)
forecast_returned

Forecast(name='sample forecast', issue_time_of_day=datetime.time(0, 0), lead_time_to_start=Timedelta('0 days 01:00:00'), interval_length=Timedelta('0 days 01:00:00'), run_length=Timedelta('0 days 01:00:00'), interval_label='ending', interval_value_type='interval_mean', variable='ghi', site=Site(name='Tucson AZ', latitude=32.2, longitude=-110.9, elevation=700.0, timezone='America/Phoenix', site_id='1185ad1c-22fe-11eb-a505-0a580a8201a5', provider='University of Arizona', extra_parameters='', climate_zones=('Reference Region 3',)), aggregate=None, forecast_id='11a3e2f0-22fe-11eb-9632-0a580a8201a5', provider='University of Arizona', extra_parameters='', units='W/m^2')

You can visit the dashboard to confirm that the site, observation, and forecast all exist. Browse the full [site list](https://dashboard.solarforecastarbiter.org/sites/) for the name of your new site, or go directly to ``https://dashboard.solarforecastarbiter.org/sites/<<copy/paste-site-id-here>>``. The ``site_id`` is printed below for reference. Once on the site page, click on "Observations" or "Forecasts" to see those metadata.

In [28]:
print(site_returned.site_id)

1185ad1c-22fe-11eb-a505-0a580a8201a5


As a simple example, let's use the NREL MIDC OASIS data we previously downloaded as the upload for the new observation.

In [29]:
# quality_flag=0 indicates no problems with the data. quality_flag=1 indicates user flagged problem.
data_to_upload = pd.DataFrame({'value': oasis_ghi_values['value'], 'quality_flag': 0})

In [30]:
new_obs_id = observation_returned.observation_id

In [31]:
session.post_observation_values(new_obs_id, data_to_upload)

HTTPError: 400 API Request Error: BAD REQUEST for url: https://api.solarforecastarbiter.org/observations/1193ad22-22fe-11eb-83a9-0a580a8201a5/values and text: {"errors":{"timestamp":["7080 extra times present in index. First extra time is 2019-05-20 00:01:00+00:00. Uploads must have equally spaced timestamps from 2019-05-20 00:00:00+00:00 to 2019-05-25 00:00:00+00:00 with 60 minutes between each timestamp."]}}


The API rejected the upload because the interval length of the data does not match the interval length we specified for the metadata. Recall that the metadata described data with hourly mean and interval ending label.

In [32]:
resampled_data = data_to_upload.resample('1h', label='right').mean()

In [33]:
fig = timeseries.generate_observation_figure(observation, resampled_data)
show(fig)

Now we post the data to the API.

In [34]:
session.post_observation_values(new_obs_id, resampled_data)

It (should have) worked. Let's confirm that the data exists.

In [35]:
oasis_ghi_values_1h = session.get_observation_values(new_obs_id, start, end)

In [36]:
fig = timeseries.generate_observation_figure(observation, oasis_ghi_values_1h)
show(fig)

The quality flag returned by the API may be "NOT VALIDATED" or it may show at least "NIGHTTIME". The API only checks that the data format is valid. The validation step occurs only after the API sends an "OK" response to the data post function. The validation typically takes at least a few seconds to complete. So, if you're reading along as you execute the code you may have given the server enough time to validate the data. If you're quickly executing the code cells or used the "Run all" command then you may need to wait a few seconds and request the data again.

The cells below wait for 5 seconds to insure that the API has time to process the data and then request the data once again.

In [37]:
import time
time.sleep(5)

In [38]:
oasis_ghi_values_1h = session.get_observation_values(new_obs_id, start, end)
fig = timeseries.generate_observation_figure(observation, oasis_ghi_values_1h)
show(fig)

Let's finish by deleting the data and metadata that we created. The `solarforecastarbiter-core` python library does wrap the API's delete methods, so we'll have to use the [SFA HTTP API](https://api.solarforecastarbiter.org/) directly. 

Here we use the [`requests`](https://requests.readthedocs.io) library to make the HTTP API calls. First we set up the authorization header.

In [39]:
base_url = 'https://api.solarforecastarbiter.org'
headers = {'Authorization': f'Bearer {token}'}

Sites can only be deleted once all associated observations and forecasts are deleted, so we'll first delete the observations/forecasts and then the site.

This is the API address for the observation we created.

In [40]:
url = f'{base_url}/observations/{observation_returned.observation_id}'
url

'https://api.solarforecastarbiter.org/observations/1193ad22-22fe-11eb-83a9-0a580a8201a5'

We can add [``/metadata``](https://api.solarforecastarbiter.org/#tag/Observations/paths/~1observations~1{observation_id}~1metadata/get) to that url and confirm that this is the observation we're looking for.

In [41]:
r = requests.request('GET', f'{url}/metadata', headers=headers)
r.json()

{'_links': {'site': 'https://api.solarforecastarbiter.org/sites/1185ad1c-22fe-11eb-a505-0a580a8201a5'},
 'created_at': '2020-11-10T02:40:17+00:00',
 'extra_parameters': '',
 'interval_label': 'ending',
 'interval_length': 60,
 'interval_value_type': 'interval_mean',
 'modified_at': '2020-11-10T02:40:17+00:00',
 'name': 'sample observation',
 'observation_id': '1193ad22-22fe-11eb-83a9-0a580a8201a5',
 'provider': 'University of Arizona',
 'site_id': '1185ad1c-22fe-11eb-a505-0a580a8201a5',
 'uncertainty': 0.0,
 'variable': 'ghi'}

In [42]:
r = requests.request('DELETE', url, headers=headers)
r

<Response [204]>

A response of 204 indicates a successful deletion. Now repeat the process for the forecast and finally the site.

In [43]:
url = f'{base_url}/forecasts/single/{forecast_returned.forecast_id}'
r = requests.request('DELETE', url, headers=headers)
r

<Response [204]>

In [44]:
url = f'{base_url}/sites/{site_returned.site_id}'
r = requests.request('DELETE', url, headers=headers)
r

<Response [204]>