### Dendra Query Examples
Author: Collin Bode   
Email: collin@berkeley.edu

<u>Purpose</u>: Example code to pull all datastreams from one station.         
<u>Requires</u>: dendra_api_client.py file defined in python path. Please download from:       
https://github.com/DendraScience/dendra-api-client-python    
and place in your working directory.

Please note the following functions:   

>df = <b>dendra.get_datapoints</b>(<em>datastream_id,begins_at,ends_before=time_format(),time_type='local',name='default’</em>):   returns one datastream as dataframe.

>df = <b>dendra.get_datapoints_from_id_list</b>(<em>datastream_id_list,begins_at,ends_before=time_format(),time_type='local’</em>):  returns one dataframe of all datastreams.  Input is an array of datastream_ids, e.g.  Permittivity_Avg = ["5d488fe302e4cd88409c2bde", "5d488fda02e4cd135e9c2bc0",“5d488fda02e4cd5ecf9c2bc2"]   

>df = <b>dendra.list_datastreams_by_measurement</b>(<em>measurement,optional:aggregate,station_id,orgslug</em>): returns a list of datastream names and ids for all datastreams that fit your query. This list can then be used in get_datapoints_from_id_list.  Measurements are a controlled vocabulary in Dendra found here https://dendra.science/vocabulary.  Select "Dendra Query Vocabularies (dq)" and use the "label" version of the measurement. It should have no spaces in the label.  

<u>Arguments</u> common to all three functions: 

<i>datastream_id</i> and <i>station_id</i>:  these are Mongo database ID’s.  They must be in quotes to be processed. 5d488fe302e4cd88409c2bde throws an error. "5d488fe302e4cd88409c2bde" works. Use dendra.list_datastreams   

<i>begins_at</i> is an ISO compliant timestamp. ’T’ is placed between date and time.  Time is hours:minutes:seconds in two digits, e.g. '2020-02-20T00:00:00’.   The first timestamp is included in the query (>=).   

<i>ends_before</i> is optional.  It will default to today if left empty. ends_before is NOT included in the query (<).    

<i>time_type</i> is optional.  It will default to ‘local’ if left empty.  This mean Pacific Standard Time (UTC-8hours).  The only other option is ‘utc’ which requires your input time parameters be in UTC.   

<u>Functions to list</u> Organizations, Stations, and Datastreams:   
- list_organizations(orgslug='all')   
- get_organization_id(orgslug)   
- list_stations(orgslug='all',query_add='none')   
- list_datastreams_by_station_id(station_id,query_add = '')   
- list_datastreams_by_query(query_add = '',station_id = '')   
- list_datastreams_by_medium_variable(medium = '',variable = '',aggregate = '', station_id = '', orgslug = '', query_add = '')   
- list_datastreams_by_measurement(measurement = '',aggregate = '', station_id = [], orgslug = '', query_add = '')   


In [1]:
%matplotlib inline
import pandas as pd
import json
import os
import sys
path_to_git = '../dendra-api-client-python/'  # <-- Please change this to match the location you have pulled github
sys.path.append(path_to_git)
import dendra_api_client as dendra

In [3]:
# Authentication
# If you have a login and the data is not public, you must authenticatte using your Dendra login
dendra.authenticate('collin@berkeley.edu')

 ········


### Parameters: start and end dates

In [5]:
# parameters: start and end time
# Note the queries default to local time. Add a captial 'Z' to the end of the timestamp to indicate UTC and many functions have a parameter for local vs utc
begins_at = '2024-05-01T00:00:00'  
ends_before = dendra.time_format() # time_format without argument gives current datetime. #'2020-03-01T00:00:00'

### List stations for one organization, e.g. UC Natural Reserve System

In [7]:
# Output is a JSON list with id, name, and web slug
# easiest to query stations by id
print('UCNRS Weather Station List')
stations = dendra.list_stations('ucnrs')
for station in stations:
    print(station)

UCNRS Weather Station List
{'_id': '58e68cabdf5ce600012602b3', 'name': 'Angelo South Meadow', 'slug': 'angelo-south-meadow'}
{'_id': '58e68cabdf5ce600012602b4', 'name': 'Anza Borrego HQ', 'slug': 'anza-borrego'}
{'_id': '5ed6f71d5ceeff3f92a505b0', 'name': 'Anza Borrego Sentenac', 'slug': 'anza-borrego-sentenac'}
{'_id': '630ffc7fbcd7f8d98fe03a37', 'name': 'Año Nuevo', 'slug': 'ano-nuevo'}
{'_id': '58e68cabdf5ce600012602b5', 'name': 'BigCreek Gatehouse', 'slug': 'bigcreek-gatehouse'}
{'_id': '58e68cabdf5ce600012602b6', 'name': 'BigCreek Highlands', 'slug': 'bigcreek-highlands'}
{'_id': '58e68cabdf5ce600012602b7', 'name': 'BigCreek Whale', 'slug': 'bigcreek-whale'}
{'_id': '58e68cabdf5ce600012602b8', 'name': 'Blue Oak Ranch', 'slug': 'blue-oak-ranch'}
{'_id': '58e68cabdf5ce600012602b9', 'name': 'Bodega', 'slug': 'bodega'}
{'_id': '58e68cabdf5ce600012602ba', 'name': 'Burns', 'slug': 'burns'}
{'_id': '5e5d4f9257a6e41c0adeb739', 'name': 'Carpinteria', 'slug': 'carpinteria'}
{'_id': '58e68ca

### Download all datastream metadata for one weather station

In [9]:
station_id = '58e68cacdf5ce600012602d9'  # 'Stunt Ranch'
# def get_datapoints_from_station_id(station_id,begins_at,ends_before=time_format(),time_type='local'):
# Returns a dataframe with ALL datastreams associated with a particular station for the time period 
df = dendra.get_datapoints_from_station_id(station_id,begins_at,ends_before)

0 StuntRanch_Air_Temp_Max NEW dataframe created!
1 StuntRanch_Air_Temp_Delta_10m2m_Min added.
2 StuntRanch_Air_Temp_2_m_Max added.
3 StuntRanch_Air_Temp_10_m_Avg added.
4 StuntRanch_Air_Temp_Min added.
5 StuntRanch_Air_Temp_Delta_10m2m_Avg added.
6 StuntRanch_Battery_Voltage_Avg added.
7 StuntRanch_Air_Temp_Avg added.
8 StuntRanch_Barometric_Pressure_Avg added.
9 StuntRanch_Air_Temp_2_m_Avg added.
10 StuntRanch_Air_Temp_Delta_10m2m_Max added.
11 StuntRanch_Air_Temp_10_m_Max added.
12 StuntRanch_Air_Temp_10_m_Min added.
13 StuntRanch_Air_Temp_2_m_Min added.
14 StuntRanch_Battery_Voltage_Min added.
15 StuntRanch_Logger_Panel_Temp_Avg added.
16 StuntRanch_Rainfall_Cumulative added.
17 StuntRanch_Battery_Voltage_Max added.
18 StuntRanch_Photosynthetically_Active_Radiation_Max added.
19 StuntRanch_Rainfall added.
20 StuntRanch_Photosynthetically_Active_Radiation_Min added.
21 StuntRanch_Rain_Gauge_Temp_Avg added.
22 StuntRanch_Photosynthetically_Active_Radiation_Avg added.
23 StuntRanch_Rel

In [10]:
# Take a look at the dataframe
df

Unnamed: 0_level_0,timestamp_utc,StuntRanch_Air_Temp_Max,StuntRanch_Air_Temp_Delta_10m2m_Min,StuntRanch_Air_Temp_2_m_Max,StuntRanch_Air_Temp_10_m_Avg,StuntRanch_Air_Temp_Min,StuntRanch_Air_Temp_Delta_10m2m_Avg,StuntRanch_Battery_Voltage_Avg,StuntRanch_Air_Temp_Avg,StuntRanch_Barometric_Pressure_Avg,...,StuntRanch_Volumetric_Water_Content_Period_Horizontal_Avg,StuntRanch_Volumetric_Water_Content_Horizontal_Avg,StuntRanch_Wind_Direction_Avg,StuntRanch_Wind_Speed_Avg,StuntRanch_Volumetric_Water_Content_Vertical_Avg,StuntRanch_Wind_Direction_Std,StuntRanch_Wind_Speed_Max,StuntRanch_Wind_Speed_Std,StuntRanch_Wind_Speed_Min,StuntRanch_Wind_Speed_Vector_Magnitude
timestamp_local,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2024-05-01 00:00:00,2024-05-01 08:00:00+00:00,11.82,0.390,11.82,12.23,11.62,0.656,12.70,11.73,965.779661,...,27.89,0.302,91.20,0.432,0.477,17.92,1.143,0.322,0.000,0.405
2024-05-01 00:10:00,2024-05-01 08:10:00+00:00,11.65,0.136,11.47,11.75,11.22,0.584,12.69,11.36,965.762712,...,27.88,0.302,267.00,0.435,0.477,27.31,1.143,0.397,0.000,0.360
2024-05-01 00:20:00,2024-05-01 08:20:00+00:00,11.29,0.373,11.26,11.61,11.02,0.618,12.65,11.17,965.728814,...,27.87,0.302,66.71,0.071,0.477,24.55,0.588,0.150,0.000,0.042
2024-05-01 00:30:00,2024-05-01 08:30:00+00:00,11.06,0.322,10.95,11.42,10.75,0.680,12.64,10.90,965.745763,...,27.86,0.301,284.60,0.501,0.476,13.26,1.143,0.331,0.000,0.484
2024-05-01 00:40:00,2024-05-01 08:40:00+00:00,10.82,0.356,10.84,11.26,10.69,0.619,12.67,10.75,965.813559,...,27.85,0.301,241.40,0.509,0.476,29.85,1.111,0.316,0.000,0.423
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-07-11 16:20:00,2024-07-12 00:20:00+00:00,36.18,-1.869,36.32,34.35,35.68,-1.365,12.99,35.95,965.169492,...,21.72,0.127,259.20,2.401,0.289,22.81,4.312,1.007,0.000,2.210
2024-07-11 16:30:00,2024-07-12 00:30:00+00:00,36.36,-2.128,36.46,34.36,35.79,-1.560,13.00,36.14,965.169492,...,21.73,0.127,265.00,2.150,0.289,24.19,4.377,1.042,0.163,1.958
2024-07-11 16:40:00,2024-07-12 00:40:00+00:00,36.57,-1.950,36.49,34.65,36.13,-1.329,13.00,36.37,965.237288,...,21.74,0.127,283.10,1.219,0.289,28.31,3.920,0.882,0.000,1.059
2024-07-11 16:50:00,2024-07-12 00:50:00+00:00,36.70,-2.320,36.64,34.40,35.97,-1.349,13.01,36.36,965.288136,...,21.74,0.128,285.20,1.246,0.289,30.05,2.940,0.647,0.000,1.070


In [None]:
df.to_csv('stuntranch_export_wy2019.csv')  # export to disk

### List Datastreams by Measurement
Optional.  If you wish to pull only one kind of measurement, say 'RainfallCumulative' from many locations, you can list all datastreams which perform that measurement. To see what measurements exist, check our vocabulary under 'DQ' or Dendra Queries:  https://dendra.science/vocabulary    

In [None]:
measurement = 'RainfallCumulative'  
query_refinement = { 'is_hidden': False } 
measurement_list = []   # list of only datastreams that you wish to download data from
ds_list = dendra.list_datastreams_by_measurement(measurement,'',[],'ucnrs',query_refinement)
for ds in ds_list:
    dsm = dendra.get_meta_datastream_by_id(ds['_id'])  # This will pull full datastream metadata in JSON format
    station_name = dsm['station_lookup']['name']
    print(station_name,ds['name'],ds['_id'])
    measurement_list.append(ds['_id'])
    

### Download data for RainfallCumulative
The list of datastreams will be fed to 'get_datapoints' which will pull all data for the date range given earlier.   

In [None]:
# See parameters above for date ranges
df = dendra.get_datapoints_from_id_list(measurement_list,begins_at,ends_before)

In [None]:
# check columns
for col in df.columns:
    print(col)

In [None]:
# Take a look at the full DataFrame
df

In [None]:
# export to disk
df.to_csv('rainfallcumulative_measurement_export_wy2019.csv') 