<table style="font-size: 1em; padding: 0; margin: 0;">

<tr style="vertical-align: top; padding: 0; margin: 0;background-color: #ffffff">
        <td style="vertical-align: top; padding: 0; margin: 0; padding-right: 15px;">
    <p style="background: #182AEB; color:#ffffff; text-align:justify; padding: 10px 25px;">
        <strong style="font-size: 1.0em;"><span style="font-size: 1.2em;"><span style="color: #ffffff;">The Coastal Grain Size Portal (C-GRASP) dataset <br/><em>Will Speiser, Daniel Buscombe, Evan Goldstein</em></strong><br/><br/>
        <strong>> Interpolate Percentiles from Other Dataset Percentiles </strong><br/>
    </p>                       
        
<p style="border: 1px solid #ff5733; border-left: 15px solid #ff5733; padding: 10px; text-align:justify;">
    <strong style="color: #ff5733">The purpose of this notebook</strong>  
    <br/><font color=grey> This notebook will output a dataframe containing all of the data from a chosen C-GRASP dataset with a new field containing a chosen cumulative distribution percentile that one wishes to calculate/recalculate from pre-existing dataset distribution percentile values. As C-Grasp file sizes vary completion of this task will vary with internet connectivity.<font><br/>
    <br/><font color=grey> This notebook provides simple code that interpolates input distribution percentile values from already calculated values.<font><br/>    
    <br/><font color=grey> To do so, a user can choose a dataset of choice and then types the percentile they wish to calculate. <font><br/>
    <br/><font color=grey> The notebook then runs uses a the scipy interpolation function to calculate the input percentile in mm units.<font><br/>    
    </p>

In [16]:
import pandas as pd
import scipy
from scipy.interpolate import interp1d
import requests
import ipywidgets

#### Select a dataset

In [17]:
#Dataset collection widget
zen=ipywidgets.Select(
    options=['Entire Dataset', 'Estimated Onshore Data', 'Verified Onshore Data', 'Verified Onshore Post 2012 Data'],
    value='Entire Dataset',
    # rows=10,
    description='Dataset:',
    disabled=False
)

display(zen)

Select(description='Dataset:', options=('Entire Dataset', 'Estimated Onshore Data', 'Verified Onshore Data', '…

#### Enter a distribution you want to calculate into the textbox  e.g.: 'd86'

In [18]:
dist=ipywidgets.Text(
    value='d86',
    placeholder='Type something',
    description='Distribution:',
    disabled=False
)

display(dist)

Text(value='d86', description='Distribution:', placeholder='Type something')

#### Download the dataset

In [19]:
url = 'https://zenodo.org/record/5874231/files/' 
if zen.value=='Entire Dataset':
    filename='dataset_10kmcoast.csv'
if zen.value=='Estimated Onshore Data':
    filename='Data_EstimatedOnshore.csv'
if zen.value=='Verified Onshore Data':
    filename='Data_VerifiedOnshore.csv'
if zen.value=='Verified Onshore Post 2012 Data':
    filename='Data_Post2012_VerifiedOnshore.csv'
print("Downloading {}".format(url+filename))   

Downloading https://zenodo.org/record/5874231/files/Data_Post2012_VerifiedOnshore.csv


The next cell will download the CGRASP dataset and read it in as a pandas dataframe with variable name `df`

In [20]:
url=(url+filename)
print('Retrieving Data, Please Wait')
#retrieve data
df=pd.read_csv(url)
print('Sediment Data Retrieved!') 

Retrieving Data, Please Wait
Sediment Data Retrieved!


Let's take a quick look at the top of the file

In [23]:
df.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Unnamed: 0.1.1,Sample_ID,Sample_Type_Code,Project,dataset,Date,Location_Type,latitude,...,d25,d30,d50,d65,d75,d84,d90,d95,Notes,unique_id
0,0,610,610,SPIbeach5,1.0,"SandSnap, image taken by:",sandsnap,2021-11-08,Beach?Y,26.12871,...,0.624976,0.657068,0.785439,0.889342,1.016927,1.131754,1.276942,1.397932,,
1,1,611,611,SPI6,1.0,"SandSnap, image taken by:",sandsnap,2021-11-08,Beach?Y,26.12899,...,0.624976,0.657068,0.785439,0.889342,1.016927,1.131754,1.276942,1.397932,,
2,2,612,612,SPI6,1.0,"SandSnap, image taken by:",sandsnap,2021-11-08,Beach?Y,26.12899,...,0.624976,0.657068,0.785439,0.889342,1.016927,1.131754,1.276942,1.397932,,
3,3,853,853,SPIbeach4,1.0,"SandSnap, image taken by:",sandsnap,2021-11-08,Beach?Y,26.16883,...,0.624976,0.657068,0.785439,0.889342,1.016927,1.131754,1.276942,1.397932,,
4,4,854,854,SPIbeach3,1.0,"SandSnap, image taken by:",sandsnap,2021-11-08,Beach?Y,26.16885,...,0.624976,0.657068,0.785439,0.889342,1.016927,1.131754,1.276942,1.397932,,


The next cell will create seperate the number value from the distribution you input for calculations in the cell after (e.g. '86' from 'd86)

In [24]:
percentile_value=dist.value.split('d')[1]
prcntl=float(percentile_value)/100

## In this cell you will estimate the input distribution percetile for each sample that has other distribution information available using the Scipy interpolation function and add it to a new dataframe column 

In [27]:
df[dist.value]='' #create a new blank column for your values calculated int he loop below

#This loop will iterate for each sample in the dataset
for i in range(0,len(df)):
    try:
        #Set variables for columns of provided percentile distributions. E.g.:
        d10=df['d10'].iloc[i]
        d16=df['d16'].iloc[i]
        d25=df['d25'].iloc[i]
        d50=df['d50'].iloc[i]
        d65=df['d65'].iloc[i]
        d84=df['d84'].iloc[i]
        d90=df['d90'].iloc[i]
        
        #Here, you are creating an array of the variables you just created. Make sure to put each one that you set in the brackets
        grain_size_bins=[d10,d16,d25,d50,d65,d84,d90]
        
        #Here, you are creating an array of the percentile values of the distributions for the above respective variables. Make sure to put each one that you set in the brackets
        grain_size_frequencies=[.1,.16,.25,.5,.65,.84,.9]
        
        #Here we will use scipy's interpolation toolbox to create a function that calculates unknow distributions of interest.
        distribution = scipy.interpolate.interp1d(grain_size_frequencies, grain_size_bins, bounds_error=False, fill_value='extrapolate')
        
        #Here we will create a new column for the input percentile distributions in which we would like to calculate respective grainsize values
        #The extracted numerical value from the input text will be put into the scipy interpolation tool
        df.loc[i,[dist.value]] = distribution(prcntl)
    except:
        pass

Let's check out that new distribution percentile column

In [28]:
df[dist.value]

0        1.1801496377013507
1        1.1801496377013507
2        1.1801496377013507
3        1.1801496377013507
4        1.1801496377013507
               ...         
2112     0.5860622917060153
2113    0.33590998043052833
2114     0.6092682926829268
2115    0.28173399266666665
2116     1.5095520797141415
Name: d86, Length: 2117, dtype: object

### Write to file

Finally, define a csv file name for the output dataframe

In [None]:
output_csvfile='../data_plus_locations.csv'

Write the data to that csv file

### Write to file

Finally, define a csv file name for the output dataframe

In [None]:
output_csvfile='../data_interpolated.csv'

write the data to that csv file

In [None]:
df.to_csv(output_csvfile)