# Getting Drainage Area from specific points

Last updated by Simon M Mudd 11/04/2023

In this notebook we will use an example where you have collected some channel characteristics in the field and we want to know the drainage area of the points. This will include the simplest possible example where all we have is the location of the points. 

## Stuff we need to do if you are in colab (not required in the lsdtopotools pytools container)

**If you are in the `docker_lsdtt_pytools` docker container, you do not need to do any of this. 
The following is for executing this code in the google colab environment only.**

If you are in the docker container you can skip to the **Download some data** section. 

**If you are in the `docker_lsdtt_pytools` docker container, you do not need to do any of this. 
The following is for executing this code in the google colab environment only.**

If you are in the docker container you can skip to the **First get data** section. 

First we install `lsdviztools`. 
This will take around a minute.

In [None]:
!pip install lsdviztools &> /dev/null

Now we need to install lsdtopotools. We do this using something called `mamba`. Note that this version of `mamba` works for python 3.9 (which is what google colab currently uses).

This step will take around 20 seconds. 

In [None]:
%%bash
MINICONDA_INSTALLER_SCRIPT=Mambaforge-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh &> /dev/null
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX &> /dev/null

Alternatively we can do this with condacolab, but this broke in March 2023 so will take some time to be fixed

In [None]:
#!pip install -q condacolab
#import condacolab
#condacolab.install()

Now use mamba to install `lsdtopotools`. 
This step takes a bit over a minute. 

In [None]:
!mamba install -y lsdtopotools &> /dev/null

The next line tests to see if it worked. If you get some output asking for a parameter file then `lsdtopotools` is installed. 

In [None]:
!lsdtt-basic-metrics

## First get data

Before we do anything, we need to import a few packages:

In [None]:
import lsdviztools.lsdbasemaptools as bmt
from lsdviztools.lsdplottingtools import lsdmap_gdalio as gio
import lsdviztools.lsdmapwrappers as lsdmw

Now we need to get some data to download. We are going to download data using the opentopography scraper that is included with `lsdviztools`. You will need to get an opentopography.org account and copy in your API key.

You can sign up to an opentopography.org account here: https://portal.opentopography.org/myopentopo 

Before I actually do anything I am going to set up some filenames:

In [None]:
Dataset_prefix = "RioAguas"
source_name = "COP30"

r_prefix = Dataset_prefix+"_"+source_name +"_UTM"
w_prefix = Dataset_prefix+"_"+source_name +"_UTM"

DataDirectory = "./"
Base_file = r_prefix

Now lets grab the data. If you want to do this yourself for a new area just choose your own lower lect and upper right coordinates of your site

In [None]:
# YOU NEED TO PUT YOUR API KEY IN A FILE
your_OT_api_key_file = "my_OT_api_key.txt"

with open(your_OT_api_key_file, 'r') as file:
    print("I am reading you OT API key from the file "+your_OT_api_key_file)
    api_key = file.read().rstrip()
    print("Your api key starts with: "+api_key[0:4])

SB_DEM = bmt.ot_scraper(source = source_name,
                        lower_left_coordinates = [36.97524478026287, -2.3631792251411805], 
                        upper_right_coordinates = [37.3200098350942, -1.7962073552766233],
                        prefix = Dataset_prefix, 
                        api_key_file = your_OT_api_key_file)
SB_DEM.print_parameters()
SB_DEM.download_pythonic()
DataDirectory = "./"
Fname = Dataset_prefix+"_"+source_name+".tif"
gio.convert4lsdtt(DataDirectory,Fname)

## Look at the hillshade

Right, lets see what this place looks like:

In [None]:
lsdtt_parameters = {"write_hillshade" : "true"}
lsdtt_drive = lsdmw.lsdtt_driver(read_prefix = r_prefix,
                                 write_prefix= w_prefix,
                                 read_path = "./",
                                 write_path = "./",
                                 parameter_dictionary=lsdtt_parameters)
lsdtt_drive.print_parameters()
lsdtt_drive.run_lsdtt_command_line_tool()

In [None]:
%matplotlib inline
Base_file = r_prefix
DataDirectory = "./"
this_img = lsdmw.SimpleHillshade(DataDirectory,Base_file,cmap="gist_earth", save_fig=False, size_format="geomorphology",dpi=500)

## Now get a single basin

I add a basin outlet into a pandas dataframe and then copy this to a file. 
The points below are obtained just by clicking in google maps and copying the resulting lat-long into the below code. 

In [None]:
# Import pandas library
import pandas as pd

data = [ [37.15674383710805, -1.9049454817508027]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['latitude', 'longitude'])

df.to_csv("basin_outlets.csv",index=False)
df.head()

We can use the linux `cat` command to make sure the file is what we expect.

In [None]:
!cat basin_outlets.csv

Now lets use *lsdtopotools* to get the basins. We first need to import the `lsdmapwrappers` module, and then run the code.

In [None]:
import lsdviztools.lsdmapwrappers as lsdmw

## Get the basins
lsdtt_parameters = {"print_basin_raster" : "true",
                    "write_hillshade" : "true",
                    "get_basins_from_outlets" : "true",
                    "basin_outlet_csv" : "basin_outlets.csv"}
lsdtt_drive = lsdmw.lsdtt_driver(command_line_tool = "lsdtt-chi-mapping", 
                                 read_prefix = r_prefix,
                                 write_prefix= w_prefix,
                                 read_path = "./",
                                 write_path = "./",
                                 parameter_dictionary=lsdtt_parameters)
lsdtt_drive.print_parameters()
lsdtt_drive.run_lsdtt_command_line_tool()

Now we can print the map with an lsdviztools call.

In [None]:
# uncomment this for debugging
#import lsdviztools.lsdmapwrappers as lsdmw
DataDirectory = "./"
Base_file = r_prefix

#%%capture
basins_img = lsdmw.PrintBasins_Complex(DataDirectory,Base_file,cmap="gist_earth", 
                             size_format="geomorphology",dpi=600, save_fig = True)

In [None]:
print(basins_img)
from IPython.display import display, Image
display(Image(filename=basins_img, width=800))

We can get all the channels out of this basin with a call to `lsdtt-chi-mapping` (alhtough you can allso do this with `lsdtt-basic-metrics`. 
To get channels with various data elements such as the area, flow distance, elevation, and chi coordinate (if you don't know what that is, don't worry about it now), you use the keyword: `"print_chi_data_maps" : "true"`. 

We can also control the extent of the drainage network. Drainage extraction works by computing the flow direction of every pixel by seeing which of its eight neighbours is lowest. It then can count all the pixels contributing to a given pixel. Channels begin where they exceed a threshold number of contributing pixels. 

The default number of contributing pixels to being a channel is 1000, but you can use less with the keyword: `"threshold_contributing_pixels" : "500"` (you can of course change the number to whatever you want). 

We also want some additional information: the channel gradients and how they vary with drainage area. To get this we add `"print_slope_area_data" : "true"`. In addition we need to determine over which interval slopes are calculated. We calculate them over a fixed vertical interval: `"SA_vertical_interval" : "10"` so a new slope measurement is recorded whever the channel has fallen by 10m. 

In [None]:
## Get the basins and the channel profile
lsdtt_parameters = {"print_chi_data_maps" : "true",
                    "get_basins_from_outlets" : "true",
                    "basin_outlet_csv" : "basin_outlets.csv",
                    "threshold_contributing_pixels" : "500",
                    "SA_vertical_interval" : "10",
                    "print_slope_area_data" : "true"}
r_prefix = Dataset_prefix+"_"+source_name +"_UTM"
w_prefix = Dataset_prefix+"_"+source_name +"_UTM"
lsdtt_drive = lsdmw.lsdtt_driver(command_line_tool = "lsdtt-chi-mapping", 
                                 read_prefix = r_prefix,
                                 write_prefix= w_prefix,
                                 read_path = "./",
                                 write_path = "./",
                                 parameter_dictionary=lsdtt_parameters)
lsdtt_drive.print_parameters()
lsdtt_drive.run_lsdtt_command_line_tool()

Lets see where the channels are:

In [None]:
%matplotlib inline
this_chan_img = lsdmw.PrintChannelsAndBasins(DataDirectory,Base_file,
                                       add_basin_labels = True, cmap = "jet", 
                                       size_format = "ESURF", fig_format = "png", 
                                       dpi = 300, save_fig = True)

In [None]:
%matplotlib inline
print(this_chan_img)
from IPython.display import display, Image
display(Image(filename=this_chan_img, width=800))

## Looking at gradients

The channel gradients are in `RioAguas_COP30_UTM_SAvertical.csv`, we can see it if we look for the files:

In [None]:
!ls *.csv

This is what the data looks like:

In [None]:
# Import pandas library
import pandas as pd

df = pd.read_csv("RioAguas_COP30_UTM_SAvertical.csv")
df.head()

## Import some points (that you record with a GPS) and combine with other data

Now we will import a dataset of points and combine it with other data

In [None]:
import geopandas as gpd
import numpy as np
import pandas as pd

from scipy.spatial import cKDTree
from shapely.geometry import Point

We have two datasets. One is the channel data and the other is the site locations. This second dataset could be any set of points.

We will, in the next step, merge these datasets based on the nearest neighbour to one of the set of points (i.e., mapping channel data to the nearest site).

For this to work, the two datasets must be in the same coordinate reference system. For this example it is not really a problem because both datasets have coordinates in a global reference frame with the code EPSG:4326. In the example below, we use .crs to define the coordinate reference system. 

However, sometimes you might have a data set with another coordinate system (for example British National Grid, which is EPSG:27700, or UTM, which is EPSG:326XX where XX is the UTM zone) so you would need to change the corresponding EPSG code. You can look up the EPSG code for a coordinate system with a google search. 

In [None]:
!ls *.csv

We have a few csv datasets here. The ones you can cuse for this purpose are `RioAguas_COP30_UTM_SAvertical.csv` (which has slopes) and `RioAguas_COP30_UTM_chi_data_map.csv` (which does not have slopes). In the example below I use the one with slopes. 

Note that this is on a 30m DEM, so if you have collected data in the field of channel gradients with a laser or stadia rod+hand level, the field collected gradients will likeley be more accurate. 

In [None]:
# Load the channel data
dfA = pd.read_csv("RioAguas_COP30_UTM_SAvertical.csv")
# Convert to a geopandas dataframe
gdfA = gpd.GeoDataFrame(
    dfA, geometry=gpd.points_from_xy(dfA.longitude, dfA.latitude))
# We have to tell the geopandas data what geographic system we are in by using something called an EPSG code. 
# All major geographic projection and transformation system have this code. 
gdfA.crs = "EPSG:4326" 


# Load the width data
dfB = pd.read_csv("Spain_2023_Grid_References.csv")
gdfB = gpd.GeoDataFrame(
    dfB, geometry=gpd.points_from_xy(dfB.Easting, dfB.Northing))
# We have to tell the geopandas data what geographic system we are in by using something called an EPSG code. 
# All major geographic projection and transformation system have this code. 
gdfB.crs = "EPSG:32630" 

# IMPORTANT: we convert one of the datasets to the coordinate reference system of the other
gdfC = gdfB.to_crs(4326)

I now need to add a function for combining datasets. **You don't need to change anything in this function.** The first dataframe keeps its data elements and adds properties from the nearest neighbour that are closest to the points in the first dataframe.

In [None]:
def ckdnearest(gdA, gdB):

    nA = np.array(list(gdA.geometry.apply(lambda x: (x.x, x.y))))
    nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
    btree = cKDTree(nB)
    dist, idx = btree.query(nA, k=1)
    gdB_nearest = gdB.iloc[idx].drop(columns="geometry").reset_index(drop=True)
    gdf = pd.concat(
        [
            gdA.reset_index(drop=True),
            gdB_nearest,
            pd.Series(dist, name='dist')
        ], 
        axis=1)

    return gdf

Now we merge the two files. 

In [None]:
new_gdp = ckdnearest(gdfC, gdfA)
new_gdp.head(10)

Super! Now we can print this new dataset to a file using the .to_csv function:

In [None]:
new_gdp.to_csv("updated_spain_site_infomation.csv")