## **Compute  Sample Cohesive, Sand, and Coarse Percent Fraction**

## This notebook serves as a guide of how to calculate sample cohesive, sand, and coarse percentage for each sample within a sample dataset. This function works on a dataset with with measured distribution percentiles (e.g. d10,d50,d90, etc.). To calculate distributions see the sample_compute_percentile repo

* To do so, the user enters a dataset where the values of percentile distributions are provided (e.g., d10, d50, d90).
    
    
*  The user will then enter in the distributions that were provided to them (e.g., d10, d50, d90).

* Then the notebook will then run an iterative function that interpolates the cumulative distribution of each sample, and finds the percentage of samples that are below and above the minimum and wentworth classification of sand sediment grain size in millimeters respectively.


## <font color=grey> *This notebook's output adds threee new field to the input sample data dataframe for each sample that specifies the percent cohesives,sands, and coarse of the sample composition. To calculate interpolation error or to translate sample data from phi to mm units, see the other notebooks within this repository*<font>


## Run these two cells  to get everything set up. The second cell is only necessary in this example to pull the example data from Google Drive:

In [None]:
import pandas as pd
import numpy as np
import scipy
from scipy.interpolate import interp1d
import requests

In [None]:
# from https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    """
    response = filename for input
    destination = filename for output
    """    
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)


DATASET_ID = '1p_x1boL12i-Yt1Eou845vf4CbaOEEOLr'


destination = '../data.csv'
download_file_from_google_drive(DATASET_ID, destination)
df= pd.read_csv(destination)

## Run this cell with your own data instead of the example data by entering the filepath of a desired file. If you want to use the example data, skip it.

In [None]:
#Enter variable here"
filepath='user/your_folder/your_file.csv'


df= pd.read_csv(filepath)

## In this cell specifiy the names of the given distributions within the sample data (i.e. d10, d50, d90) within the "distributions" variable. In our example data from SandSnap, we are provided with d10, d16, d25, d50, d65, d75, d84, d90


In [None]:

distributions=str('d10,d16,d25,d50,d65,d75,d84,d90')



#extract distribution values and distribution names that were provided with the source dataset (e.g, 'd50' and .5)
given_dist_vals=[]
given_dist_names=[]

for i in range(0,len((distributions).split(',',))):
    a=(distributions).split(',',)[i]
    b=a.split('d')[1]
    val=int(b)/100
    given_dist_names.append(a)
    given_dist_vals.append(val)

## Run this cell to calculate percent sand. It will first iterate through each sample and calculate the percent sand via cumulative interpolation. Then it will take the found cumulative percentile for the minimum and maximum wentworth sand grain size (mm) then turn it into a percent fraction and assign the appropriate percent fractions for cohesive, coarse, and sand.

In [None]:

for i in range(0,(len(df))):#repeats for each row, aka sample 
    #create an array of distribution grain sizes for each sample 
    grain_size_bins=[]
    #This collects the values from the distributions
    for ia in range(0,len((distributions).split(',',))):
        bin_size=df[given_dist_names[ia]].iloc[i] 
        grain_size_bins.append(bin_size)
        grain_size_frequencies=given_dist_vals

        #This interpolates the value using the gathered "original" distributions from above
    p=scipy.interpolate.interp1d(grain_size_bins,grain_size_frequencies, bounds_error=False, fill_value='extrapolate')

    #This finds the cumulative percentile for the minimum wentworth sand grain size (mm) and compiles it into a column
    df.loc[i,["%Sand"]]=scipy.interpolate.interp1d(grain_size_bins,grain_size_frequencies, bounds_error=False, fill_value='extrapolate')(2.0)
    df.loc[i,["%Cohesive"]]=scipy.interpolate.interp1d(grain_size_bins,grain_size_frequencies, bounds_error=False, fill_value='extrapolate')(.063)








#If Maximum cohesive grainsize is under 0th percentile, then 0% is cohesive
df['%Cohesive']=np.where(df['%Cohesive']<0,0,df['%Cohesive'])
#If Maximum cohesive grainsize is over 100th percentile, this sets it as 100% percent cohesive
df['%Cohesive']=np.where(df['%Cohesive']>1,1,df['%Cohesive'])
#If sand percentile is under 0%, this sets it to zero
df["%Sand"]=np.where(df["%Sand"]<0,0,df["%Sand"])
#If percentile %sand is over 100%, this sets it to 100% sand
df["%Sand"]=np.where(df["%Sand"]>1,1,df["%Sand"])
#This calculates the final sand fraction. If cohesive is 100th percentile, then sand is 0%
df["%Sand"]=df["%Sand"]-df['%Cohesive']

#Remaining percentile is gravel
df['%Coarse']=1-df["%Sand"]-df['%Cohesive']

## Run this cell if you want to export data back to csv. In the "out_path" variable specify where you want the file to be saved and how you want it to be named.


In [None]:
#enter desired outpath
out_path='user/your_folder/your_converted_file.csv'

pd.write_csv(df, outpath)