<table style="font-size: 1em; padding: 0; margin: 0;">

<tr style="vertical-align: top; padding: 0; margin: 0;background-color: #ffffff">
        <td style="vertical-align: top; padding: 0; margin: 0; padding-right: 15px;">
    <p style="background: #182AEB; color:#ffffff; text-align:justify; padding: 10px 25px;">
        <strong style="font-size: 1.0em;"><span style="font-size: 1.2em;"><span style="color: #ffffff;">The Coastal Grain Size Portal (C-GRASP) dataset <br/><em>Will Speiser, Daniel Buscombe, Evan Goldstein</em></strong><br/><br/>
        <strong>> Assign Depth to Sample </strong><br/>
    </p>                       
        
<p style="border: 1px solid #ff5733; border-left: 15px solid #ff5733; padding: 10px; text-align:justify;">
    <strong style="color: #ff5733">The purpose of this notebook</strong>  
    <br/><font color=grey> This notebook will output a dataframe containing all of the data from a chosen C-GRASP dataset with a new field containing a depth estimation from NOAA CUDEM topobathy dataset. As both C-Grasp and CUDEM file sizes vary completion of this task will vary with internet connectivity.<font><br/>
    <br/><font color=grey> This notebook provides simple code that estimates a sample's depth based on CUDEM files.<font><br/>    
    <br/><font color=grey> To do so, a user can choose a C-GRASP and CUDEM dataset of choice.  <font><br/>
    <br/><font color=grey> The notebook then downloads all of the CUDEM files of chosen resolution to the user's computer. Please choose resolution carefully as this process take a long time depending on which resolution the user chooses.<font><br/>
    <br/><font color=grey> Then the notebook converts each CUDEM cell value to a csv containing the CUDEM file's depth value and location for each cell. After, these csv's are combined into one dataframe<font><br/>
    <br/><font color=grey> After the CUDEM data conversion, the chosen C-GRASP dataset is downloaded and converted to a dataframe.<font><br/>
        <br/><font color=grey> Finally the two datasets are converted to GeoPandasData frames and are joined by proximity, assigning each downloaded CGRASP sample a depth from the nearest CUDEM . <font><br/>
    </p>

In [None]:
import requests
from bs4 import BeautifulSoup
import netCDF4
import pandas as pd
import os
import glob
import ipywidgets
import geopandas as gpd
%matplotlib inline
import matplotlib.pyplot as plt


## Select a dataset. Choose your CUDEM dataset mindfully as the higher resolution files can take significantly longer to download.

In [None]:
#Dataset collection widget
zen=ipywidgets.Select(
    options=['Entire Dataset', 'Estimated Onshore Data', 'Verified Onshore Data', 'Verified Onshore Post 2012 Data'],
    value='Entire Dataset',
    # rows=10,
    description='Dataset:',
    disabled=False
)

display(zen)

#Dataset collection widget
resc=ipywidgets.Select(
    options=['3 arc-second', '1 arc-second', '1/3 arc-second', '1/9 arc-second'],
    value='3 arc-second',
    # rows=10,
    description='CUDEM:',
    disabled=False
)

display(resc)



## Here, we download the chosen Cudem Dataset

First we grab the appropriate abreviation to put in the download url

In [None]:
if resc.value == '3 arc-second':
    res='3as'
if resc.value == '1 arc-second':
    res='1as'
if resc.value == '1/3 arc-second':
    res='13as'
if resc.value == '1/9 arc-second':
    res='19as'

### Then we download your data to file. Depending on the resolution you pick and your download speeds this can take from minutes to nearly a day, so proceed with caution!

Here we find all the file download links

In [None]:
def get_url_paths(url, ext='', params={}):  #function for extracting all the file download link names 
    response = requests.get(url, params=params)
    if response.ok:
        response_text = response.text
    else:
        return response.raise_for_status()
    soup = BeautifulSoup(response_text, 'html.parser')
    parent = [url + node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]
    return parent

url = 'https://www.ngdc.noaa.gov/thredds/catalog/tiles/tiled_'+res+'/catalog.html' #catalogue we will call them from with the appropriate resolution
ext = 'nc' #Only find the .nc CUDEM files
result = get_url_paths(url, ext) #call the scraping function

Here we will use all of the links found above to download all of your data.

In [None]:
nc_folder='../' #Set path name for CUDEM files

#Make links into df
link_df = pd.DataFrame(columns = ['link'])
link_df['link']=result

#split link field by '/' delimiter to get file name

link_df['filename']=link_df['link'].str.split('/', expand=True)[9]

#download iterating through each file name. This will all download to your directory
base_url='https://www.ngdc.noaa.gov/thredds/fileServer/tiles/tiled_'+res+'//' #base url for download
i=0
for i in range(0,len(link_df)):
    file_name=link_df['filename'][i]
    dwnld_link=base_url+file_name
    wget.download(dwnld_link)
    i=i+1
    



### This cell converts all of the downloaded cudem nc's into .csv files with fields for latitude, longitude, depth, and crs.  This may take a bit, especially with larger files.

In [None]:
#Convert downloaded nc files to csv
csv_folder='../' #set the path for csv conversion downloads


for filename in os.listdir():
    if filename.endswith(".nc"):  #find the .nc files in your directory
        try:
            nc = netCDF4.Dataset(os.path.join(os.getcwd(), filename), mode='r')
            #establish naming components
            file_name_no_ext=os. path. splitext(filename)[0] 
            out_name=csv_folder+file_name_no_ext+'.csv'
            #create a pandas dataframe
            df = pd.DataFrame(columns = ['latitude','longitude','depth'])
            #assign values from nc files to dataframe
            df['latitude']=nc.variables['lat'][:]
            df['longitude']= nc.variables['lon'][:]
            df['depth']=nc.variables['Band1'][:]
            df['crs']=nc.variables['crs'][:]
            df.to_csv(out_name)
        except:
            pass
    else:
        continue
print('Data Conversion Sucessful!')

This cell converts combines all of the  csv's into one dataframe

In [None]:
#merge csv's into one df
os.chdir(csv_folder)

#create list of files in folder
extension = 'csv'
all_filenames = [i for i in glob.glob('ncei*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])

cudem_df=combined_csv 



let's take a look at that combined file

In [None]:
cudem_df.head()

This cell will delete all of the uncombined csv's and raw CUDEM files to clean up your folder

In [None]:
for filename in os.listdir():
    if filename.startswith("ncei"):  #find the ncei cudem files and csv's in your folder
        try:
            #establish naming components
            file_name_no_ext=os. path. splitext(filename)[0] 
            out_name=csv_folder+file_name_no_ext+'.csv'
            os.remove(out_name)
        except:
            pass
    else:
        continue
print("Source CSV's deleted")

#### Download the sample dataset

In [None]:
url = 'https://zenodo.org/record/6099266/files/' 
if zen.value=='Entire Dataset':
    filename='dataset_10kmcoast.csv'
if zen.value=='Estimated Onshore Data':
    filename='Data_EstimatedOnshore.csv'
if zen.value=='Verified Onshore Data':
    filename='Data_VerifiedOnshore.csv'
if zen.value=='Verified Onshore Post 2012 Data':
    filename='Data_Post2012_VerifiedOnshore.csv'
print("Downloading {}".format(url+filename))   


The next cell will download the CGRASP dataset and read it in as a pandas dataframe with variable name `sample_df`

In [None]:
url=(url+filename)
print('Retrieving Data, Please Wait')
#retrieve data
sample_df=pd.read_csv(url)
print('Sediment Data Retrieved!') 


Let's take a quick look at the top of the file

In [None]:
sample_df.head()

## Now lets make use of both datasets to assign depths from CUDEM to C-Grasp Samples

Turn the sample and CUDEM datasets in to GeoDataFrames (spatial data) and set them to their CRS(EPSG:4326) and then project them to a projected coordinate system (EPSG 3857)

In [None]:
#convert the CUDEM dataframe

cudem_gdf = gpd.GeoDataFrame(
    cudem_df, geometry=gpd.points_from_xy(cudem_df.longitude, cudem_df.latitude))

cudem_gdf=cudem_gdf.set_crs('EPSG:4326')
cudem_gdf=cudem_gdf.to_crs('EPSG:3857')


#convert the C-GRASP dataframe
sample_gdf = gpd.GeoDataFrame(
    sample_df, geometry=gpd.points_from_xy(sample_df.longitude, sample_df.latitude))

sample_gdf=sample_gdf.set_crs('EPSG:4326')
sample_gdf=sample_gdf.to_crs('EPSG:3857')



This cell will use the package Geopandas' sjoin_nearest function to join together the sediment samples with its nearest CUDEM depth measurement.

In [None]:
joined = sample_gdf.sjoin_nearest(cudem_gdf, how="left")

joined=joined.to_crs('EPSG:4326') #convert back to epsg 4326

df=pd.DataFrame(joined)


Rename the fields in the dataframe to be more recognizeable (i.e. turn "left" and "right" fields to "sample" and "cudem"

In [None]:

#rename some of the new field names
df['latitude_sample']=df['latitude_left']
df['latitude_cudem']=df['latitude_right']
df['longitude_sample']=df['longitude_left']
df['longitude_cudem']=df['longitude_right']

#drop geometry column 
df=df.drop(columns=['geometry','crs','latitude_left','latitude_right','longitude_left','longitude_right'])

Let's take a look to see how that all worked out:

In [None]:
df

let's look at a statistic like depth vs d50 grainsize

In [None]:
plt.plot(df['depth'], df['d50'], 'o', color='black');

### Write to file

Finally, define a csv file name for the output dataframe

In [None]:
output_csvfile='../data_CudemDepths.csv'

write the data to that csv file

In [None]:
df.to_csv(output_csvfile)