### Dendra Query Examples
Author: Collin Bode   
Email: collin@berkeley.edu   
Created: 2019-10-23    
Modified: 2019-10-28  significant changes to dendra_api_client.py required modifications to work with it.   
Modified: 2020-03-27  Collin: updated code to work with next iteration of dendra_api_client.py.    
Modified: 2020-09-22  Collin: changed code to download wells specifically. 
Modified: 2020-11-24  Collin: reverted to generic one station download.

<u>Purpose</u>: Example code to pull all datastreams from one station.         
<u>Requires</u>: dendra_api_client.py file defined in python path. Please download from:       
https://github.com/DendraScience/dendra-api-client-python    

Please note the following functions:   

>df = <b>dendra.get_datapoints</b>(<em>datastream_id,begins_at,ends_before=time_format(),time_type='local',name='default’</em>):   returns one datastream as dataframe.

>df = <b>dendra.get_datapoints_from_id_list</b>(<em>datastream_id_list,begins_at,ends_before=time_format(),time_type='local’</em>):  returns one dataframe of all datastreams.  Input is an array of datastream_ids, e.g.  Permittivity_Avg = ["5d488fe302e4cd88409c2bde", "5d488fda02e4cd135e9c2bc0",“5d488fda02e4cd5ecf9c2bc2"]   

>df = <b>dendra.list_datastreams_by_measurement</b>(<em>measurement,optional:aggregate,station_id,orgslug</em>): returns a list of datastream names and ids for all datastreams that fit your query. This list can then be used in get_datapoints_from_id_list.   

<u>Arguments</u> common to all three functions: 

<i>datastream_id</i> and <i>station_id</i>:  these are Mongo database ID’s.  They must be in quotes to be processed. 5d488fe302e4cd88409c2bde throws an error. "5d488fe302e4cd88409c2bde" works.   

<i>begins_at</i> is an ISO compliant timestamp. ’T’ is placed between date and time.  Time is hours:minutes:seconds in two digits, e.g. '2020-02-20T00:00:00’.   The first timestamp is included in the query (>=).   

<i>ends_before</i> is optional.  It will default to today if left empty. ends_before is NOT included in the query (<).    

<i>time_type</i> is optional.  It will default to ‘local’ if left empty.  This mean Pacific Standard Time (UTC-8hours).  The only other option is ‘utc’ which requires your input time parameters be in UTC.   

In [None]:
%matplotlib inline
import pandas as pd
import json
import os
import sys
path_to_git = '../../dendra-api-client-python/'  # <-- Please change this to match the location you have pulled github
sys.path.append(path_to_git)
import dendra_api_client as dendra

### Parameters: start and end dates

In [None]:
# parameters: start and end time
begins_at = '2019-10-01T00:00:00' 
ends_before = dendra.time_format() # time_format without argument gives current datetime. #'2020-03-01T00:00:00'

### List stations for the UC Natural Reserve System (not required)

In [None]:
# Output is a JSON list with id, name, and web slug
# easiest to query stations by id
print('UCNRS Weather Station List')
stations = dendra.list_stations('ucnrs')
for station in stations:
    print(station)

## Download all datastreams for one weather station

In [None]:
station_id = '58e68cacdf5ce600012602d9'  # 'Stunt Ranch'
# def get_datapoints_from_station_id(station_id,begins_at,ends_before=time_format(),time_type='local'):
# Returns a dataframe with ALL datastreams associated with a particular station for the time period 
df = dendra.get_datapoints_from_station_id(station_id,begins_at,ends_before)

In [None]:
# Take a look at the dataframe
df

In [None]:
df.to_csv('stuntranch_export_wy2019.csv')  # export to disk

## List Datastreams by Measurement
Optional.  If you wish to pull only one kind of measurement, say 'RainfallCumulative' from many locations, you can list all datastreams which perform that measurement. To see what measurements exist, check our vocabulary under 'DQ' or Dendra Queries:  https://dendra.science/vocabulary    

In [None]:
measurement = 'RainfallCumulative'  
query_refinement = { 'is_hidden': False } 
measurement_list = []   # list of only datastreams that you wish to download data from
ds_list = dendra.list_datastreams_by_measurement(measurement,'',[],'ucnrs',query_refinement)
for ds in ds_list:
    dsm = dendra.get_meta_datastream_by_id(ds['_id'])  # This will pull full datastream metadata in JSON format
    station_name = dsm['station_lookup']['name']
    print(station_name,ds['name'],ds['_id'])
    measurement_list.append(ds['_id'])
    

### Pull data for RainfallCumulative
The list of datastreams will be fed to 'get_datapoints' which will pull all data for the date range given earlier.   

In [None]:
# See parameters above for date ranges
df = dendra.get_datapoints_from_id_list(measurement_list,begins_at,ends_before)

In [None]:
# check columns
for col in df.columns:
    print(col)

In [None]:
# Take a look at the full DataFrame
df

In [None]:
df.to_csv('rainfallcumulative_measurement_export_wy2019.csv')  # export to disk