# Topic: Human Influences on the Water Cycle

_Notebook Author: Cassandra Nickles, NASA Jet Propulsion Laboratory - California Institute of Technology_

#### Import Packages

In [None]:
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt

pd.set_option('display.max_columns', None) #all columns displayed default

## Question 1
**Often in heavily managed rivers, locks and dams are prevalent, influencing river flow and water surface elevation (WSE). Plot a river profile of WSE over a specified portion of the Mississippi River. Can you pinpoint the location of the dams and locks based on the water surface elevation profiles? How would we expect the longitudinal profile to look without these dams or locks?**

Note: This code has been adapted from [The PO.DAAC Cookbook tutorial](https://podaac.github.io/tutorials/notebooks/DataStories/SWOTHR_Science_Application.html) by Arnaud Cerbelaud & Jeffrey Wade. If you want to search and access for data in a different region, the tutorial in the Cookbook explains how to do so using the `earthaccess` Python package.

### Open Data
We will be plotting a River Profile for a portion of the Mississippi River between Minnesota and Wisconsin. The two key SWOT passes for this region are pass numbers 216 and 565, so we access these files from February through July 2024.

> These SWOT passes were identified for this region using the [.kmz file](https://podaac.github.io/tutorials/quarto_text/SWOT.html#swot-spatial-coverage) of SWOT passes/swaths imported into Google Earth Pro for visualization


In [None]:
#filenames
filename_shps = ['SWOT_L2_HR_RiverSP_Reach_010_216_NA_20240201T164840_20240201T164851_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_011_216_NA_20240222T133345_20240222T133356_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_013_216_NA_20240404T070741_20240404T070743_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_014_216_NA_20240425T034858_20240425T034909_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_015_216_NA_20240516T003402_20240516T003413_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_016_216_NA_20240605T211908_20240605T211919_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_017_216_NA_20240626T180411_20240626T180422_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_018_216_NA_20240717T144916_20240717T144927_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_010_565_NA_20240214T042609_20240214T042616_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_011_565_NA_20240306T011114_20240306T011120_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_012_565_NA_20240326T215613_20240326T215624_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_014_565_NA_20240507T152627_20240507T152633_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_015_565_NA_20240528T121132_20240528T121138_PIC0_01.shp',
                 'SWOT_L2_HR_RiverSP_Reach_017_565_NA_20240709T054140_20240709T054146_PIC0_01.shp']

#initialize opened list of files
SWOT_HR_shps = []

# Loop through queried granules to open and stack all acquisition dates
for j in range(len(filename_shps)):
    SWOT_HR_shps.append(gpd.read_file(f'data_downloads/{filename_shps[j]}')) 

#### Aggregate files into dataframe

In [None]:
# Combine granules from all acquisition dates into one dataframe
SWOT_HR_df = gpd.GeoDataFrame(pd.concat(SWOT_HR_shps, ignore_index=True))

# Sort dataframe by reach_id and time
SWOT_HR_df = SWOT_HR_df.sort_values(['reach_id', 'time'])

SWOT_HR_df



### Trace reaches downstream of given starting reach using `rch_id_dn` field for Visualization


**First, let's set up a dictionary relating all reaches in the dataset to their downstream neighbor**

SWOT River data comes in vector shapefiles with reaches defined as ~10km stretches of rivers. 2If there is a known dam or lock, often that portion of the river is split into its own reach. Explore more on www.swordexplorer

_Note: rch_dn_dict[rch_id] gives a list of all the reaches directly downstream from rch_id_

In [None]:
# Format rch_id_dn for dictionary. Rch_id_dn allows for multiple downstream reaches to be stored
# Also removes spaces in attribute field
rch_id_dn = [[x.strip() for x in SWOT_HR_df.rch_id_dn[j].split(',')] for j in range(0,len(SWOT_HR_df.rch_id_dn))]

# Filter upstream reach ids to remove 'no_data'
rch_id_dn_filter = [[x for x in dn_id if x.isnumeric()] for dn_id in rch_id_dn]

# Create lookup dictionary for river network topology: Downstream
rch_dn_dict = {SWOT_HR_df.reach_id[i]: rch_id_dn_filter[i] for i in range(len(SWOT_HR_df))}

**Then, starting from a given reach, let's trace all connected downstream reaches**

Find the Reach IDs on [www.swordexplorer.com](https://www.swordexplorer.com/)

In [None]:
# Enter reach_id from which we will trace downstream (e.g. upstream most reach of the River for your study region)
river = "Mississippi River"
rch_dn_st = {"Mississippi River" : "74287700141"
            }

# Initialize list to store downstream reaches, including starting reach
rch_dn_list = [rch_dn_st[river]]
# Retrieve first downstream id of starting reach and add to list
rch_dn_next = rch_dn_dict[rch_dn_st[river]][0]

# Trace next downstream reach until we hit the outlet (or here the last reach on file)
while len(rch_dn_next) != 0:
    # Add reach to list if value exists
    if len(rch_dn_next) != 0:
        rch_dn_list.append(rch_dn_next)
    # Recursively retrieve first downstream id of next reach
    # Catch error if reach isn't in downloaded data
    try:
        rch_dn_next = rch_dn_dict[rch_dn_next][0]
    except:
        break

**Filter downloaded data by the traced reaches to create a plot**

In [None]:
# Filter downloaded data by downstream traced reaches
SWOT_dn_trace = SWOT_HR_df[SWOT_HR_df.reach_id.isin(rch_dn_list)]

# Remove reaches from rch_dn_list that are not present in SWOT data
rch_dn_list = [rch for rch in rch_dn_list if rch in SWOT_HR_df.reach_id.values]

SWOT_dn_trace[['reach_id','river_name','geometry']].explore('river_name', style_kwds=dict(weight=6))

### Plot longitudinal profiles

We've visualized the study region, let's create a time series dataframe from the downstream filtered database next to set up to plot a profile of this river.

In [None]:
# Retrieve all possible acquisition dates (keeping only YYYY-MM-DD)
dates = np.unique([i[:10] for i in [x for x in SWOT_HR_df['time_str'] if x!='no_data']])

# Create a new database for time series analysis with unique reach_ids
SWOT_dn_trace_time = SWOT_dn_trace.set_index('reach_id').groupby(level=0) \
                                  .apply(lambda df: df.reset_index(drop=True)) \
                                  .unstack().sort_index(axis=1, level=1)

SWOT_dn_trace_time.columns = ['{}_{}'.format(x[0],dates[x[1]]) for x in SWOT_dn_trace_time.columns]

#### Plot a longitudinal profile for selected SWOT variable

In [None]:
# Explore variables you could choose to plot
for var in ["wse","slope","width","len", "reach_q"]:
    print(SWOT_dn_trace.columns[SWOT_dn_trace.columns.str.contains(var)])

In [None]:
# Enter variable of interest for plotting
varstr = "wse"

In [None]:
# Find cumulative length on the longitudinal profile
length_list    = np.nan_to_num([SWOT_dn_trace.p_length[SWOT_dn_trace.reach_id == rch].mean()/1000 for rch in rch_dn_list])
cumlength_list = np.cumsum(length_list)

## Plot a longitudinal profile from the downstream tracing database
plt.figure(figsize=(12,8))
for t in dates:
    
    # Store the quantity of interest (wse, width etc.) at time t
    value = SWOT_dn_trace_time.loc[rch_dn_list,varstr+'_'+t]
    
    # Remove set negative values (bad observations) to NaN and forward fill NaNs
    value[value < 0] = np.nan
    value = value.ffill()
    
    # Plot the data
    plt.plot(cumlength_list, value, label = varstr+'_'+t)
    
plt.xlabel('Downstream Distance (km)')
plt.ylabel(varstr)
plt.legend()

Note, we are not looking at the quality flags of the data with these profiles, it may be worth plotting the "reach_q" variable too to see the data quality per reach at the specific locations. For quality flags: 3=bad, 2=degraded, 1=suspect, 0=good. For this version of the data (2.0, Version C), the Quality flags are overly sensitive. See the [SWOT User Handbook](https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/web-misc/swot_mission_docs/D-109532_SWOT_UserHandbook_20240502.pdf?_ga=2.166071870.1206626367.1721664595-1354658737.1715875596) for more information about this dataset.

**Can you pinpoint the location of the dams and locks based on the water surface elevation profiles? How would we expect the longitudinal profile to look without these dams or locks?**

#### Additional Map that could be helpful: the longitudinal profile of selected SWOT variable spatially

In [None]:
# Choose a date
date = dates[0]

In [None]:
#Set one column as the active geometry in the new database
SWOT_dn_trace_time = SWOT_dn_trace_time.set_geometry("geometry_"+date)

#Set cleaner colorbar bounds for better visualization
vmin = np.percentile([i for i in SWOT_dn_trace_time[varstr+'_'+date] if i>0],5)
vmax = np.percentile([i for i in SWOT_dn_trace_time[varstr+'_'+date] if i>0],95)

# Interactive map
SWOT_dn_trace_time.explore(varstr+'_'+date,
                           vmin = vmin,
                           vmax = vmax,
                           cmap = "Blues", #"Blues",
                           control_scale = True,
                           tooltip = varstr+'_'+date,  # show "varstr+'_'+dates[0]" value in tooltip (on hover)
                           popup = True,  # show all values in popup (on click)
                           #tiles = "CartoDB positron",  # use "CartoDB positron" tiles
                           style_kwds=dict(weight=10)
                          )

The -999999999999 values are locations that particular SWOT pass (216 or 565) did not observe. SWOT passes are in two 50 km swaths with 20 km between them.

## Question 2
**The Mississippi River Basin has many man made reservoirs and natural lakes. Here, we'll compare water surface elevation variability from a reservoir (Willow Reservoir, ID: 7421108633) and a lake (Fence Lake, ID: 7421110322) in Wisconsin. SWOT observes reservoirs and lakes > 250 m x 250 m currently over the released observable record (July 2023 - present day). How do the water levels change over time and compare to each other?**

> To find lake IDs, see the SWOT Prior Lake Database Layer in [Hydroweb.next](https://hydroweb.next.theia-land.fr/). The data for this exercise was obtained and consolidated into csv files per water body using [this notebook](https://podaac.github.io/tutorials/notebooks/GIS/SWOTshp_CSVconversion.html) in the PO.DAAC Cookbook. In the future (Sept. 2024), when [Hydrocron](https://podaac.github.io/hydrocron/intro.html) contains lake data, this tool would be the most efficient way to obtain SWOT csv files.

In [None]:
#Open csv files of SWOT time series data per water body
reservoir_df = pd.read_csv('data_downloads/SWOTLake_7421108633.csv')
lake_df = pd.read_csv('data_downloads/SWOTLake_7421110322.csv')

#convert dates to datetime for better visualization
reservoir_df['time_str'] = pd.to_datetime(reservoir_df['time_str'])
lake_df['time_str'] = pd.to_datetime(lake_df['time_str'])

In [None]:
reservoir_df

In [None]:
lake_df

In [None]:
## Plot the two wse timeseries together
fig = plt.figure(figsize=(12,8))

plt.subplot(211)
plt.plot(reservoir_df['time_str'], reservoir_df['wse'], label = 'Reservoir', color='darkorange')
plt.legend()

plt.subplot(212)
plt.plot(lake_df['time_str'], lake_df['wse'], label = 'Lake', color='blue')
plt.xlabel('Time')
plt.legend()
fig.autofmt_xdate()
fig.text(0.04, 0.5, 'Water Surface Elevation (m)', va='center', rotation='vertical')

**How do the water levels change over time and compare to each other? Do your findings concur with this study that used ICESat-2 data? https://www.nature.com/articles/s41586-021-03262-3**

"Cooley and colleagues found that water levels in Earth’s lakes and ponds change about 8.6 inches between the wet and dry seasons. Meanwhile, human-managed reservoirs fluctuate by nearly four times that amount, rising and falling by an average of 2.8 feet from season to season." - from [article advertising paper](https://sustainability.stanford.edu/news/how-much-do-humans-influence-earths-water-levels)

In [None]:
# Students compute statistics for this one year of SWOT data over this lake and reservoir, 
# chosing a method of comparison and making sure to note the caveats. General conclusions often should not be based on one case study, it's okay if results do not match.


## Question 3

**Even if there are no reservoirs, dams or locks along a stretch of river, what other avenues can you see humans influencing the flow of  surface water? How about the water cycle in general? What other NASA datasets could be useful for analyzing human influence on the water cycle?**

*Some Hints:*
- *Think about land cover types and how water flows differently over each type*
- *NASA has datasets for Soil Moisture, Groundwater, Precipitation, and other portions of the water cycle as well.*