## <span style="color:green"><h1><center>DEM Accessing using a Shapefile</center></h1></span>
<center>Prepared by <br>
    <b>Jibin Joseph and Venkatesh Merwade</b><br> 
Lyles School of Civil Engineering, Purdue University<br>
joseph57@purdue.edu, vmerwade@purdue.edu<br>
<b><br>
    FAIR Science in Water Resources</b><br></center>


## <span style="color:green">Objective</span>
<p style='text-align: justify;'>We will download DEM raster files from USGS National Elevation Dataset using the extents of watershed shapefile accessed using USGS site number. Later, the DEM raster files will be plotted along with watershed boundary.</p> 

## <span style="color:green"> Data Source </span>

<p style='text-align: justify;'>USGS DEM with varying resolutions (1 arc-second or 1/3 arc-second or 1/9 arc-second)</p>

## <span style="color:green">Overview of steps </span>
<ol type="1">
    <span style="color:red"><li>Using USGS Station Number, get the shapefile for a  basin (watershed) and it extents</li></span>
    <span style="color:red"><li>Download the DEM from USGS-Amazon Web Service</li></span>
     <span style="color:red"><li>Plotting the Unmerged Raster Tiles</li></span>



## <span style="color:green">Import the packages/modules required for this exercise</span>

We need different packages as shown below. It can be either installed using pip method or conda method.


In [None]:
## Import the modules/packages/libraries required
import math
import numpy as np
import os
import matplotlib.pyplot as plt

from pynhd import NLDI
import urllib.request
import progressbar
import rasterio
import rasterio.plot

import geopandas as gpd
from shapely.geometry import Polygon

from datetime import datetime

from os.path import expanduser

In [None]:
## Print the version number
import pynhd
print("PyNHD version: ",pynhd.__version__)
del pynhd

print("Rasterio version: ",rasterio.__version__)
print("Geopandas version: ",gpd.__version__)

import shapely
print("Shapely version: ",shapely.__version__)
del shapely

## <span style="color:green">Step 1a: Input USGS Site, DEM resolution, and create a directory</span> 
<ul>
<li>Input: <span style="color:red">USGS Site</span></li>
<li>Input: <span style="color:red">Desired resolution</span></li>
<li>Create: <span style="color:red">Folder for storing input raster files from USGS AWS</span></li>

In [None]:
## Input the USGS site number to get the shapefile
## E.g. "04180000" has a drainage area of 270 sq mi and can downloaded within 2-3 minutes
## But "03335500" has a drainage area of 7267 sq mi and needs more time and space
## WRITE CODE BELOW


## Resolution of required DEM
## USGS-AWS has different options like 1/3 arc second (code = 13), 1/9 arc second (code = 19; currently unavailable)
## WRITE CODE BELOW


## Define a function for making a directory depending on whether is exists or not.
## We are creating a function so that it can be used later for creating three folders in the later modules
def check_create_path_func(path):
    isExist = os.path.exists(path)
    if not isExist:
        # Create a new directory because it does not exist
        os.makedirs(path)
        print(f"The new directory \033[1m'{path}'\033[0m is created!")
    else:
        print(f"The new directory \033[1m'{path}'\033[0m is not created as it already exists!")
        
## Create the a folder for storing DEMs using the earlier defined function
folder_main=f"{expanduser('~')}/scratch/DEM_Access"
check_create_path_func(folder_main)

## WRITE CODE BELOW





## <span style="color:green">Step 1b: Input USGS Site and get the basin</span> 

<ul>
<li>Input: <span style="color:red">Get a input station</span></li>
<li>Output: <span style="color:red">Find out the basin</span></li>
<li>Output: <span style="color:red">Save the basin file as shapefile</span></li>
    

In [None]:
## Get the watershed using USGS station number using pynhd module
## WRITE THE CODE BELOW


## Other feature sources
## ‘nwissite’ for USGS NWIS Surface Water Sites (default)
## ‘comid’ for NHDPlus comid.
## ‘ca_gages’ for Streamgage catalog for CA SB19
## ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest
## ‘huc12pp’ for HUC12 Pour Points
## ‘nmwdi-st’ for New Mexico Water Data Initative Sites
## ‘nwisgw’ for NWIS Groundwater Sites
## ‘ref_gage’ for geoconnex.us reference gages
## ‘vigil’ for Vigil Network Data
## ‘wade’ for Water Data Exchange 2.0 Sites
## ‘WQP’ for Water Quality Portal

## Transform to Albers Equal Area projection (EPSG:5070)
watershed_albers = watershed.to_crs(epsg=5070)
## Calculate the area in square miles
## 1 square meter = 0.386102 square miles
watershed_albers['area_sq_mi'] = watershed_albers.area / 1e6 * 0.386102  
#print(watershed_albers['area_sq_mi'][0])

## Plot the watershed
## DD indicates latitude/ longitude degrees is in decimal
ax = watershed.plot(facecolor="b", 
                    edgecolor="k", 
                    figsize=(8, 8))
plt.title(f"Watershed Shapefile in {watershed.crs} Projected CRS\n(USGS:{site_id}, "+
          f"Area = {round(watershed_albers['area_sq_mi'].iloc[0],2)} sq. mi.)")
plt.xlabel("Longitude (DD)")
plt.ylabel("Longitude (DD)")

## Saving the watershed file as a shapefile at desired location
shapefile_fileloc_filename=f'{folder_input}/shape_{site_id}.shp'
watershed.to_file(filename=shapefile_fileloc_filename,
                  driver= 'ESRI Shapefile',
                  mode='w')

## <span style="color:green">Step 1c: Creating an Inset Map</span> 

<ul>
<li>Input: <span style="color:red">Get a input station</span></li>
<li>Output: <span style="color:red">Find out the basin</span></li>
<li>Output: <span style="color:red">Save the basin file as shapefile</span></li>
    

In [None]:
# Load shapefiles
watershed_map=gpd.read_file(f'{folder_input}/shape_{site_id}.shp')
us_map = gpd.read_file("/srv/shared/data_dem_access/cb_2018_us_conus_5m.shp")
huc_map = gpd.read_file("/srv/shared/data_dem_access/HUC2_modified6.shp")
huc_map_proj=huc_map.to_crs(watershed_map.crs)
## Create a geodataframe from the watershed centroid centroids to a DataFrame
watershed_centroid_gdf = gpd.GeoDataFrame(geometry=watershed_map['geometry'].centroid)
## Add other fields/ columns from watershed shapefile by merge
watershed_centroid_gdf = watershed_centroid_gdf.merge(watershed_map.drop(columns='geometry'), 
                                                    left_index=True, 
                                                    right_index=True)
## Join the corresponding HUC2 region with the centroid shapefile
watershed_with_huc = gpd.sjoin(watershed_centroid_gdf, huc_map_proj, 
                               how="left"#, 
                               #predicate='within'
                              )
specific_huc_map = watershed_with_huc[watershed_with_huc['identifier'] == f'USGS-{site_id}']['NAME'].iloc[0]
selected_huc2=huc_map_proj[huc_map_proj['NAME']==specific_huc_map]
## Bounding box
watershed_bbox = watershed_map.total_bounds
print("Watershed Bounding Box: ",watershed_bbox)
## Calculate the factors for adjusting the limits the rectangular box
xmin_factor = 0.98
xmax_factor = 1.02
ymin_factor = 0.98
ymax_factor = 1.02

## Construct a Polygon from the bounding box
watershed_bbox_polygon = Polygon([(watershed_bbox[0]*xmin_factor, watershed_bbox[1]*ymin_factor),
                        (watershed_bbox[2]*xmax_factor, watershed_bbox[1]*ymin_factor),
                        (watershed_bbox[2]*xmax_factor, watershed_bbox[3]*ymax_factor),
                        (watershed_bbox[0]*xmin_factor, watershed_bbox[3]*ymax_factor)])
watershed_bbox_gdf = gpd.GeoDataFrame(geometry=[watershed_bbox_polygon])
watershed_bbox_gdf.crs = watershed_map.crs
## Plot the main map (US map)
fig, ax = plt.subplots(figsize=(10, 10))
us_map.to_crs(watershed_map.crs).plot(ax=ax, color='lightgrey', edgecolor='black')
selected_huc2.plot(ax=ax, color='lightblue', edgecolor='black')
watershed_map.plot(ax=ax, color='blue', edgecolor=None)
watershed_bbox_gdf.plot(ax=ax, color=None, edgecolor='red',alpha=0.5)
ax.set_xlim(xmin=-135)
ax.set_ylim(ymin=15)
ax.set_title('Inset Map for the watershed')

## Plot the inset HUC2 (HUC2 map)
inset_ax = fig.add_axes([0.16, 0.23, 0.2, 0.2])  # [left, bottom, width, height]
selected_huc2.plot(ax=inset_ax, color='lightblue', edgecolor='black')
watershed_map.plot(ax=inset_ax, color='blue', edgecolor=None)
watershed_bbox_gdf.plot(ax=inset_ax, color=None, edgecolor='red',alpha=0.5)
inset_ax.set_title('HUC2 Map and Watershed')
# Remove axes numbers for the inset map
#inset_ax.axis('off')
plt.show()

## <span style="color:green">Step 2: Get the extents for downloading DEM</span>

<ul>
<li> Extents of the basin (watershed) is obtained using .total_bounds </li>
<li> Then we will find the bounding extents using math floor and ceil function </li>
    

In [None]:
## Get the min and max of latitude and longitude (or easting and northing)
extents_basin=watershed.total_bounds

## N or S and W or E may become a problem
print(f'Left Bounding Longtitude is {extents_basin[0]:.3f}\u00b0 or {abs(extents_basin[0]):.3f}\u00b0 W')
print(f'Right Bounding Longtitude is {extents_basin[2]:.3f}\u00b0 or {abs(extents_basin[2]):.3f}\u00b0 W')
print(f'Bottom Bounding Latitude is {extents_basin[1]:.3f}\u00b0 or {abs(extents_basin[1]):.3f}\u00b0 N')
print(f'Top Bounding Latitude is {extents_basin[3]:.3f}\u00b0 or {abs(extents_basin[3]):.3f}\u00b0 N')

In [None]:
## DEMs are numbered using integer
## Calculate largest integer that equals or not greater than left and bottom bounds
extent_left=abs(math.floor(extents_basin[0]))
extent_right=abs(math.floor(extents_basin[2]))
## You may be tempted to calculate the ceil of right extent
## But, number scheme is such that 84W indicates data from -84 to -83 deg W

## Calculate smallest integer that equals or not less than right and upper bounds
extent_bottom=abs(math.ceil(extents_basin[1]))
extent_top=abs(math.ceil(extents_basin[3]))
## Similarly, you may be tempted to calculate the floor of bottom extent
## But, number scheme is again such that 40N includes data from +39 to +40 deg N 

## <span style="color:green">Step 3: Find DEM tiles that overlap with the watershed bondary</span>

<ul>
<li> Create a rectangular boundary file using the extents </li>
<li> Make sure the rectangular boundary file have the same projection as the watershed </li>
<li> If the rectangular boundary file overlaps with the watershed, add the lon and lat pair to a list </li>

In [None]:
## Define a empty list to hold lon and lat pair
overlap_lonlat=[]

## Create a for loop to create a rectangular boundary and see if overlaps with watershed
for lon in (range(extent_right,extent_left+1,1)):
    for lat in (range(extent_bottom,extent_top+1,1)):
        ## Defining in anticlockwise direction
        corner_left_bottom=(-lon,lat-1)
        corner_right_bottom=(-lon+1,lat-1)
        corner_right_top=(-lon+1,lat)
        corner_left_top=(-lon,lat)
        ## Create a polygon from the corner points
        rectangular_boundary = Polygon([corner_left_bottom,corner_right_bottom,
                                        corner_right_top,corner_left_top])
        ## Create a GeoDataFrame from the polygon
        rectangular_gdf = gpd.GeoDataFrame(geometry=[rectangular_boundary])
        ## Assign the CRS to watershed's CRS
        rectangular_gdf.crs = watershed.crs
        ## WRITE THE CODE BELOW
        ## Use the overlay operation to find the intersection
        intersection = gpd.overlay(watershed, rectangular_gdf, how='intersection')
        
        ## Check if any intersection and append the lat and lon
        if not intersection.empty:
            #print("The rectangular polygon overlaps with the shapefile.")
            overlap_lonlat.append((lon,lat))     
print("The required lon and lat pairs are: \n",overlap_lonlat)
## Calulate the number of tiles to be downloaded from USGS
num_tiles_download=(((extent_left+1)-extent_right)*((extent_top+1)-extent_bottom))
print(f"\nNumber of tiles required to cover the entire region: {num_tiles_download}")
print(f"Left: {extent_left}, Right: {extent_right}, Bottom: {extent_bottom}, Top: {extent_top}")
print(f"\nNumber of tiles within watershed boundary: {len(overlap_lonlat)}")

In [None]:
## Create a progress bar for monitoring the download process
class MyProgressBar():
    def __init__(self):
        self.pbar = None

    def __call__(self, block_num, block_size, total_size):
        if not self.pbar:
            self.pbar=progressbar.ProgressBar(maxval=total_size)
            self.pbar.start()

        downloaded = block_num * block_size
        if downloaded < total_size:
            self.pbar.update(downloaded)
        else:
            self.pbar.finish()

## <span style="color:green">Step 4a : Sequential download - Downloading the DEM from USGS-Amazon Web Service</span>

<ul>
<li> Create a for loop anf download the DEM covering the shapefile </li>
<li> Save it in a folder </li>

In [None]:
start_time_seq=datetime.now()

current_filenum=1

# Iterate over the locations list and print each pair
for location in overlap_lonlat:
    print("Latitude:", location[1] ,"N ;", ", Longitude:", location[0],"W")

    usgs_filename=f'n{location[1]:02d}w{location[0]:03d}'
    
    print(f'Beginning file download with urllib2 ({current_filenum}/{len(overlap_lonlat)})...')
    url = (f'https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/{resolution}/TIFF'
           f'/current/{usgs_filename}/USGS_{resolution}_{usgs_filename}.tif'
          )
            
    ## The r in 'fr' disables backslach escape sequence processing
    local_fileloc_filename=fr'{dem_files_store}/USGS_{resolution}_{usgs_filename}.tif'
    
    ## Retrieve the file using the weblink and local path with file name
    print('Data downloaded from : ')
    print(url)
    ## WRITE THE CODE BELOW
    #urllib.request.urlretrieve(url,local_fileloc_filename) #without progressbar for multiple USGS sites
    
     
    print(f"Completed file download ({current_filenum}/{len(overlap_lonlat)} and save to '{local_fileloc_filename}'...")
    print(f'*************************************************************************************\n')
    
    current_filenum+=1

end_time_seq=datetime.now()

## <span style="color:green">Step 4b: Threading for faster download - Downloading the DEM from USGS-Amazon Web Service</span>

In [None]:
import threading

def download_dem_file_func(usgs_filename, local_fileloc_filename):
    try:
        #urllib.request.urlretrieve(url, local_fileloc_filename,MyProgressBar())
        print(f'Beginning file download for {usgs_filename}...')
        urllib.request.urlretrieve(f'https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/{resolution}/TIFF'
                                   f'/current/{usgs_filename}/USGS_{resolution}_{usgs_filename}.tif',
        local_fileloc_filename)
        print(f"Completed file download and saved to '{local_fileloc_filename}'")
    except Exception as e_value:
        print(f"Error downloading {url}: {e_value}")

start_time_thread=datetime.now()

## Create empty list and append the names
usgs_file_list=[]
local_fileloc_filename_list=[]
for location in overlap_lonlat:
    usgs_filename=f'n{location[1]:02d}w{location[0]:03d}'
    usgs_file_list.append(f'n{location[1]:02d}w{location[0]:03d}')
    local_fileloc_filename_list.append(fr'{dem_files_store}/USGS_{resolution}_{usgs_filename}.tif')

## Threading for parallel download to reduce time
threads = []
for usgs_file, filename in zip(usgs_file_list, local_fileloc_filename_list):
    #print(url,filename)
    ## WRITE THE CODE BELOW
    
    
    
    

## Wait for all threads to finish to avoid unexpected behavior or incorrect output
for thread in threads:
    thread.join()
end_time_thread=datetime.now()

## <span style="color:green">Step 4c: Time Comparison</span>

In [None]:
## Time Comparison

print(f'Time taken for sequential downloading: {end_time_seq-start_time_seq}')
print(f'Time taken for parallel downloading: {end_time_thread-start_time_thread}')
print(f'\nEfficiency: {round((end_time_seq-start_time_seq)/(end_time_thread-start_time_thread),1)}')

## <span style="color:green">Step 4d: Plotting the downloaded (single/unmerged) DEMs along with watershed shapefile</span>

<ul>
<li> Plot the single or different DEMs using rasterio package </li>
<li> Also, plot the shapefile of the watershed </li>

In [None]:
if (len(overlap_lonlat)>1):
    title=f"Unmerged Raster DEMs\n (USGS: {site_id})"
else:
    title=f"Single Raster DEM\n (USGS: {site_id})"
    
fig, ax = plt.subplots(figsize=(8, 8))

for location in overlap_lonlat:
        usgs_filename=f'n{location[1]:02d}w{location[0]:03d}'
        local_raster_filename=fr'{dem_files_store}/USGS_{resolution}_{usgs_filename}.tif'
        raster = rasterio.open(local_raster_filename)
        rasterio.plot.show(raster,
                           ax=ax,
                           cmap='viridis')
        #print(f'lat: {lat},lon: {lon},file:{local_fileloc_filename}')
## WRITE THE CODE BELOW        



plt.title(title)
plt.xlabel("Longitude (DD)")
plt.ylabel("Longitude (DD)")