## Data Exploration

#### Objectives
- Extract snow pits from Grand Mesa.
- Filter data conditions: no tree data, low snow variance?
- Plot the snow pits.

In [8]:
# standard imports
import numpy as np
import geopandas as gpd
import matplotlib
import matplotlib.pyplot as plt 
import datetime

# some mapping widgets
import ipyleaflet
from ipyleaflet import Map, GeoData, Rectangle, basemaps, LayersControl, basemap_to_tiles, TileLayer, SplitMapControl, Polygon, MagnifyingGlass
import ipywidgets

# database imports
from snowexsql.db import get_db
from snowexsql.data import PointData, LayerData, ImageData, SiteData
from snowexsql.conversions import query_to_geopandas, query_to_pandas

In [9]:
# load the database
db_name = 'snow:hackweek@db.snowexdata.org/snowex'
engine, session = get_db(db_name)

print('SnowEx Database successfully loaded!')

SnowEx Database successfully loaded!


#### Quering the column names in the database tables

In [10]:
# Import the class reflecting the points table in the db
from snowexsql.data import ImageData
from snowexsql.data import PointData
from snowexsql.data import SiteData
from snowexsql.data import LayerData

# Import the function to investigate a table
from snowexsql.db import get_table_attributes

# Use the function to see what columns are available to use. 
db_columns = get_table_attributes(ImageData)
db_cols_points = get_table_attributes(PointData)
db_cols_sites = get_table_attributes(SiteData)
db_cols_layers = get_table_attributes(LayerData)

# Print out the results nicely
print("\n Image Data: \n")
print("These are the available columns in the table:\n \n* {}\n".format('\n* '.join(db_columns)))
print("\n Points Data: \n")
print("These are the available columns in the table:\n \n* {}\n".format('\n* '.join(db_cols_points)))
print("\n Site Data: \n")
print("These are the available columns in the table:\n \n* {}\n".format('\n* '.join(db_cols_sites)))
print("\n Layer Data: \n")
print("These are the available columns in the table:\n \n* {}\n".format('\n* '.join(db_cols_layers)))



 Image Data: 

These are the available columns in the table:
 
* date
* date_accessed
* description
* doi
* instrument
* metadata
* observers
* raster
* registry
* site_name
* time_created
* time_updated
* type
* units


 Points Data: 

These are the available columns in the table:
 
* date
* date_accessed
* doi
* easting
* elevation
* equipment
* geom
* instrument
* latitude
* longitude
* metadata
* northing
* observers
* registry
* site_id
* site_name
* time
* time_created
* time_updated
* type
* units
* utm_zone
* value
* version_number


 Site Data: 

These are the available columns in the table:
 
* air_temp
* aspect
* date
* date_accessed
* doi
* easting
* elevation
* geom
* ground_condition
* ground_roughness
* ground_vegetation
* latitude
* longitude
* metadata
* northing
* pit_id
* precip
* registry
* site_id
* site_name
* site_notes
* sky_cover
* slope_angle
* time
* time_created
* time_updated
* total_depth
* tree_canopy
* utm_zone
* vegetation_height
* weather_description


#### Querying the SiteData for snow pit locations from Grand Mesa

In [11]:
# query all the sites by site id
qry = session.query(SiteData.site_id).distinct()

# filter out the Grand Mesa IOP sites (this also removes Grand Mesa Time Series sites, but okay for this example)
qry = qry.filter(SiteData.site_name == 'Grand Mesa') # != is "not equal to"

# second filter on open canopy sites
qry = qry.filter(SiteData.tree_canopy == 'No Trees')

# execute the query
ts_sites = qry.all()

# clean up to print a list of sites
ts_sites = [s[0] for s in ts_sites]
ts_sites_str = ", ".join(ts_sites)
print('list of Snow Pit sites:\n', ts_sites_str)

list of Snow Pit sites:
 3S47, 6C10, 3S52, 4N2, 2S9, 2S27, 6C34, 1C1, 6C24, 2S4, 5C27, 2S7, 6N36, 2S25, Skyway Open, 6S22, 2S6, FL1B, 9N42, 1N1, 1N3, 1N6, 9N29, 2S11, 2N8, 1C5, 1N7, 6N31, 3S33, 5N24, 3S5, 2S10, 2S45, 5C21, 5S24, 2N49, 2N14, 3N53, 5N19, 1C7, 2C9, 7N40, 2N4, 1C14, 2C2, County Line Open, 6N18, 5C20, 5N10, 2N21, 1N5, 6S32, 2S20, 1S12, 6S44, 2S48, 9N44, 1S1, 2C3, 6S53, 3S38, 2C13, 5N11, 6S34, 6N16, 5S21, 2C12, 2N12, 5S42, Mesa West Open, 1S2, 3N22, 2N48, 2S3, 5S29, 8N34, 2C4, 2C6, 2S37, 2S16, 1S13, 1S17, 2S35, 1C8, 2S36, FL2A, 1N23, 5S31, 6S26, 2S46, 5N15, 1N20, 2C33, 6N46, 3S14, 2N13, 3N26, 5N32, 1S8


#### Querying LayerData for pit ids from Grand Mesa

In [12]:
# query all the sites by site id
qry = session.query(LayerData.pit_id)

# filter out the Grand Mesa IOP sites (this also removes Grand Mesa Time Series sites, but okay for this example)
qry = qry.filter(LayerData.site_name=="Grand Mesa") # != is "not equal to"
qry = qry.filter(LayerData.site_id.in_(ts_sites))
# execute the query
pit_sites = qry.distinct().all()
# clean up to print a list of sites
pit_sites = [s[0] for s in pit_sites]
print('list of Snow Pit sites:\n', pit_sites)

list of Snow Pit sites:
 ['COGM1C14_20200131', 'COGM1C1_20200131', 'COGM1C1_20200208', 'COGM1C5_20200212', 'COGM1C7_20200131', 'COGM1C8_20200131', 'COGM1N1_20200208', 'COGM1N20_20200205', 'COGM1N23_20200211', 'COGM1N3_20200211', 'COGM1N5_20200211', 'COGM1N6_20200128', 'COGM1N7_20200211', 'COGM1S12_20200211', 'COGM1S13_20200205', 'COGM1S17_20200208', 'COGM1S1_20200129', 'COGM1S2_20200208', 'COGM1S8_20200201', 'COGM2C12_20200212', 'COGM2C13_20200212', 'COGM2C2_20200131', 'COGM2C33_20200130', 'COGM2C3_20200131', 'COGM2C4_20200131', 'COGM2C6_20200131', 'COGM2C9_20200131', 'COGM2N12_20200131', 'COGM2N13_20200206', 'COGM2N14_20200211', 'COGM2N21_20200211', 'COGM2N48_20200201', 'COGM2N49_20200210', 'COGM2N4_20200128', 'COGM2N8_20200208', 'COGM2S10_20200205', 'COGM2S11_20200201', 'COGM2S16_20200208', 'COGM2S20_20200206', 'COGM2S25_20200129', 'COGM2S27_20200204', 'COGM2S35_20200130', 'COGM2S36_20200129', 'COGM2S37_20200201', 'COGM2S3_20200129', 'COGM2S45_20200210', 'COGM2S46_20200204', 'COGM2S4

#### Using Engine Object to run the query

In [35]:
# Form a typical SQL query and use python to populate the table name
pit_id = pit_sites[0]

qry = "SELECT DISTINCT site_id, site_name, pit_id, depth, bottom_depth, value, date FROM layers where pit_id='"+pit_id+"' and type = 'density' ORDER BY depth"
print(qry)
#SELECT DISTINCT depth, bottom_depth, value , date 
#FROM public.layers WHERE pit_id = 'COGM8C22_20200131' and type = 'density' ORDER BY depth;

# Then we execute the sql command and collect the results
results = engine.execute(qry)
df = query_to_pandas(results,engine)
# Create a nice readable string to print the site names using python 
out = ', '.join((row['site_id'] for row in results))

# Print it with a line return for readability
print(out + '\n')

SELECT DISTINCT site_id, site_name, pit_id, depth, bottom_depth, value, date FROM layers where pit_id='COGM1C14_20200131' and type = 'density' ORDER BY depth


AttributeError: 'LegacyCursorResult' object has no attribute 'statement'

#### Using the Session Object

Obviously the Engine Object didn't work out, so let's try the Session Object!

In [13]:
pit_id = pit_sites[0]
result = session.query(LayerData.id, LayerData.site_name, LayerData.site_id, LayerData.pit_id, LayerData.date, LayerData.longitude, LayerData.latitude,
                       LayerData.type, LayerData.depth, LayerData.bottom_depth, LayerData.value)
result = result.filter(LayerData.pit_id == pit_id)
result = result.filter(LayerData.type == 'density')
result = result.distinct()
df_density = query_to_pandas(result, engine)
df_density['density'] = df_density['value']
df_density['density'] = df_density['density'].astype("float64")
df_density = df_density.sort_values(by='depth')
print('Density DataFrame:\n', df_density)

result2 = session.query(LayerData.site_name, LayerData.site_id, LayerData.pit_id, LayerData.date, LayerData.longitude, LayerData.latitude,
                        LayerData.type, LayerData.depth, LayerData.bottom_depth, LayerData.value)
result2 = result2.filter(LayerData.pit_id == pit_id)
result2 = result2.filter(LayerData.type == 'temperature')
df_temp = query_to_pandas(result2, engine)
df_temp['temperature'] = df_temp['value']
df_temp['temperature'] = df_temp['temperature'].astype("float64")
df_temp = df_temp.sort_values(by='depth')
print('Temperature DataFrame:\n', df_temp)

result3 = session.query(LayerData.site_name, LayerData.site_id, LayerData.pit_id, LayerData.date, LayerData.longitude, LayerData.latitude,
                        LayerData.type, LayerData.depth, LayerData.bottom_depth, LayerData.value)
result3 = result3.filter(LayerData.pit_id == pit_id)
result3 = result3.filter(LayerData.type == 'lwc_vol')
df_lwc = query_to_pandas(result3, engine)
df_lwc['lwc'] = df_lwc['value']
df_lwc['lwc'] = df_lwc['lwc'].astype("float64")
df_lwc = df_lwc.sort_values(by='depth')
print('LWC DataFrame:\n', df_lwc)

Density DataFrame:
       id   site_name site_id             pit_id        date   longitude  \
6   4780  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
5   4779  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
12  8671  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
4   4778  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
11  8670  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
3   4777  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
10  8669  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
2   4776  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
9   8668  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
1   4775  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
8   8667  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
0   4774  Grand Mesa    1C14  COGM1C14_20200131  2020-01-31 -108.198415   
7   8

I have three dataframe objects each of which contains the values of denisty, temperature and LWC for layers of a single pit. \
What I want to do next?
- Find this information for all snow pits with no trees at the Grand Mesa site.
- Combine the three dataframes into one that has three extra columns with values of density, temperature and LWC.
- Also, create a new column that contains the total length of a snow pit.

In [None]:
# What I believe the SQL query to get the data would look like
select l1.site_id, l1.site_name, l1.pit_id, l1.longitude, l1.latitude, l1.date, l1.depth, l1.bottom_depth,
l1.values as 'density', l2. depth as 'temp_depth', l2.value as 'temperature'
from LayerData l1 join Layerdata l2 on l1.id = l2.id
where l1.type = "density"
and l2.type = "temperature"
and l1.pit_id = l2.pit_id
and l2.depth > l1.bottom_depth
and l2.depth <= l1.depth

#(Also need to include site name as Grand Mesa and Tree Canopy to No Trees from Site Table)

# Questions I have?
# How to join tables using Session Object
# How to convert the Engine Object result to dataframe
# Or just chuck SQL and figure out a way to merge the data in Python by pivotting the tables?

In [17]:
import pandas as pd
import datetime

df = pd.read_csv('GrandMesaDates.csv')
df['date'] = pd.to_datetime(df['date'])
df.date.apply(lambda x: x.strftime('%Y%m%d')).astype(str)


0     20200312
1     20200204
2     20200307
3     20200406
4     20200219
5     20200226
6     20200130
7     20200201
8     20200422
9     20200209
10    20200419
11    20200331
12    20200206
13    20200305
14    20200131
15    20200122
16    20200212
17    20200318
18    20200208
19    20191220
20    20200205
21    20200128
22    20200421
23    20191219
24    20200211
25    20200328
26    20200210
27    20200304
28    20200409
29    20200321
30    20200316
31    20200225
32    20200408
33    20200325
34    20200129
Name: date, dtype: object