## Adding Tabular data to Azure Blob Storage and Accessing Sentinel-2 L2A data with the Planetary Computer STAC API

## Before beginning this notebook and for later work on this project, please get the credentials for the:

- ACCOUNT_NAME 
    - See gif below {in this example account is `fluviusdata`}
- BLOB_KEY 
    - See gif below {in this example key is found under `Key`}
- CONNECTION_STRING 
    - See gif below {in this example key is found under `Connection string`}
- PLANETARY_COMPUTER_SAS_TOKEN
    - Note that connection string is not necessary, and the Planetary Computer SAS Token can be acquired by applying for Planetary Computer access [here](https://planetarycomputer.microsoft.com/account/request)

![credentials](https://fluviusdata.blob.core.windows.net/example/credential_demo.gif) 

## At this point we will want to move data that we have locally to the container in Azure Blob Storage. Here we demonstrate uploading data to Azure Blob Storage via the portal GUI ![example](https://fluviusdata.blob.core.windows.net/example/data_upload_demo.gif)

## Now that data is available to us in the Cloud, we can access them here in this notebook utilizing the credentials we have in the `credentials` file.

In [1]:
#import all the libraries we will need 

import os
import fsspec 
import folium
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from pystac_client import Client
import planetary_computer as pc

#import the fluvius library
import sys #need to add the current directory to import 
sys.path.append('/content')
from src.fluvius import WaterData, WaterStation
from src.utils import generate_map


In [2]:
# reads the credential file
with open('/content/credentials') as credentials:
    f = credentials.readlines() #gets the individual lines
    
# now assign those values to os.environ as accessible variables
for var in f:
    key, value = var.split(' = ') 
    #be sure the ' = 'has a space before and after in the credentials file
    os.environ[key] = value.rstrip('\n')

# then we will store our keys into a variable called storage_options

storage_options={'account_name':os.environ['ACCOUNT_NAME'],\
                 'account_key':os.environ['BLOB_KEY'],
                 'connection_string': os.environ['CONNECTION_STRING']}

fs = fsspec.filesystem('az',\
                       account_name=storage_options['account_name'],\
                       account_key=storage_options['account_key'])   


## At this point we can use our credentials to open the file we just opened

In [3]:
df = pd.read_csv('az://example/usgs_station_metadata_example.csv',\
                storage_options=storage_options)

## Note this is the same as reading a file with just using pandas, but the addtional assigning of the `storage_option` parameter.

In [4]:
df

Unnamed: 0,site_no,site_name,Latitude,Longitude,geometry
0,7182250,"Cottonwood River at Plymouth, KS",38.3975,-96.3561,POINT (-96.3561 38.3975)
1,7182390,"Neosho River near Neosho Rapids, KS",38.368,-96.0,POINT (-96 38.368)
2,9326500,"Ferron Creek (Upper Station) near Ferron, UT",39.1041,-111.217,POINT (-111.217 39.1041)
3,9327000,Ferron Creek Below Millsite Res & Divs Near Fe...,39.0953,-111.179,POINT (-111.179 39.0953)
4,1673000,"Pamunkey River Near Hanover, VA",37.7676,-77.3322,POINT (-77.3322 37.7676)
5,6805500,"Platte River at Louisville, NE",41.0152,-96.1577,POINT (-96.15770000000001 41.0152)
6,11455146,Liberty Cut at Little Holland Tract near Court...,38.3288,-121.668,POINT (-121.668 38.3288)
7,6795500,"Shell Creek near Columbus, NE",41.5261,-97.2817,POINT (-97.2817 41.5261)
8,1649190,"Paint Branch Near College Park, MD",39.0331,-76.9643,POINT (-76.96429999999999 39.0331)
9,1478245,"White Clay Creek near Strickersville, PA",39.7475,-75.7708,POINT (-75.77079999999999 39.7475)


## With this current data, we have a `Latitude` and `Longitude` column. Here we can use `geopandas` to create a `POINT` data object and then visualize it with `folium`. We built helper function `generate_map` to do this for us. 

In [5]:
m = generate_map(df, lat_colname='Latitude', lon_colname='Longitude')
m

## Alternatively we can use the `WaterData` class in the `fluvius` library to get our data that has already been processed  

In [6]:
#declare the data source we are using
#choices are ['itv', 'ana', 'usgs', 'usgsi']
data_source = 'usgs'
container = f'{data_source}-data'
ds = WaterData(data_source, container, storage_options)
ds.get_source_df()

## Now we want to build a 'chip' around our points. This chip will also serve as the area of interest (AOI) that we will submit to the STAC API to query for Sentinel-2 data. Again we use a help function called `apply_buffer_to_points` in the WaterData Class

In [7]:
buffer_distance = 500
ds.apply_buffer_to_points(buffer_distance)

## Let's plot our map here using the `generate_map` and `plot_map` functions 

In [8]:
ds.generate_map()
ds.plot_map

## Next we will demonstrate getting the specific samples at each station and then creating a time buffer variable called time of interest 'TOI'

In [None]:
# to do, get the AOI and TOI for each sample based on the each
# this is option 1
cloud_thr = 80
day_tolerance = 0
add_cols = []
#iterate across all sites
for station in ds.df['site_no']:
    #get the station data for a given station
    ds.get_station_data(station)
    ds.station[station].build_catalog()
    if ds.station[station].catalog is None:
        print('No matching images! Skipping...')
        continue
    else:
        ds.station[station].get_cloud_filtered_image_df(cloud_thr)
        ds.station[station].merge_image_df_with_samples(day_tolerance)
        ds.station[station].perform_chip_cloud_analysis()
        ds.station[station].get_reflectances()
        sstation = str(station).zfill(8)
        add_cols.append({'site_no':station,\
                         'TOI':ds.station[station].time_of_interest,\
                         'AOI':ds.station[station].area_of_interest})

building catalog for station 01632900 with sentinel-2-l2a!
604 Items found
building catalog for station 01645704 with sentinel-2-l2a!
603 Items found
building catalog for station 01645762 with sentinel-2-l2a!
603 Items found
building catalog for station 01646000 with sentinel-2-l2a!
1205 Items found
building catalog for station 01646305 with sentinel-2-l2a!
180 Items found
building catalog for station 01649190 with sentinel-2-l2a!
487 Items found
building catalog for station 01649500 with sentinel-2-l2a!
233 Items found
building catalog for station 01654000 with sentinel-2-l2a!
1681 Items found
building catalog for station 01656903 with sentinel-2-l2a!
1205 Items found
building catalog for station 01673000 with sentinel-2-l2a!
558 Items found
building catalog for station 02035000 with sentinel-2-l2a!
892 Items found
building catalog for station 03353200 with sentinel-2-l2a!
269 Items found
building catalog for station 04015330 with sentinel-2-l2a!
734 Items found
building catalog for s

In [None]:
z = pd.DataFrame(add_cols)
out = pd.merge(ds.df,z, on='site_no')

In [None]:
out

In [None]:
filename = f'az://modeling-data/{data_source}_data.csv'
out.to_csv(filename,\
            index=False,\
            storage_options=storage_options)


## We can see within our dataframe two columns, one denoting the AOI and another the TOI. These as well as the collection we are looking for are all that is required for exploring the data via PySTAC.

In [1]:
#END