## **Compute Phi Percentiles From Measurements (from Sieves etc)**

## This notebook serves as a guide of how to calculate values for common, of interest sediment distributions (d10, d50, d90, etc), from datasets that only provide raw measurement data.


* To do so, the user (you) enters a dataset where sieved/measured percentiles of grain sizes are provided.
    
    
*  The user will enter in the distributions that you are interested in interpolating from from the dataset 

    
    
* For the most accurate interpolation, it is suggested that the user enters in each grainsize provided, as explained in the notebook.

    
    

## <font color=grey> *This notebooks' output adds new fields to the input sample data dataframe for each specified distribution that the user is interested in interpolating*<font>




## Run these two cells  to get everything set up:

In [None]:
import pandas as pd
import numpy as np
import requests


In [None]:
# from https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    """
    response = filename for input
    destination = filename for output
    """    
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)



##  **And then Import the Overall Sample Dataset:**
### For this example we will be using the publically available sediment data from Woodruff et al., 2020

In [None]:
DATASET_ID = '1Im9IqxQfEQGdklaSERcurCYXMpU5BJtv'


destination = '../data.csv'
download_file_from_google_drive(DATASET_ID, destination)
df= pd.read_csv(destination)



## In this next cell, we will be using the Numpy interpolation function to calculate the  sediment size of percentiles along the cumulative distribution

### You will want to find the column rown numbers within your dataset contain the percentile values for different grainsizes. 
### <font color=grey> *These columns will typically just be named with a value, denoting grain size. The unit of this grainsize will typically be within the metadata of the downloaded sample file. In some cases, a unit may be provided within the field. In this case, you will need to go into the CSV and rename each column so it is a unitless float value. In the provided example for this notebook, the column names are in millimeters.* <font>
    
### For the most accurate estimate, you will want to create an bin for each provided measurement within the dataset.
    
### The output of the bin below will be the estimated, interpolated value for each distribution that you choose (as entered in the prcs variable)

In [None]:

i=0

#This function will loop through each row (sample) in the dataset
for i in range(0,len(df)):

    #Enter in the percentile values that you wish to calculate within the brackets. 
    #The ones provided will enable you to calculate any graphical moment 
    prcs = [.05,.1,.16,.25,.3,.5,.75,.84,.9,.95]
    
    #Find the position number for columns containing percentiles at grainsize values. Remember that in Python you start counting from zero!
    #Some datasets will have more size bins than others. This one has many!
    #This variable just extracts the bin size from the column name
    grain_size_bins=[float(df.columns[9]), #The first column containing a grain size bin (.06 mm)
                     float(df.columns[10]),
                    float(df.columns[11]),
                     float(df.columns[12]),
                     float(df.columns[13]),
                     float(df.columns[14]),
                     float(df.columns[15]),
                     float(df.columns[16]),
                     float(df.columns[17]),
                     float(df.columns[18]),
                     float(df.columns[19]),
                     float(df.columns[20]),
                     float(df.columns[21]),
                     float(df.columns[22]),
                     float(df.columns[23]),
                     float(df.columns[24]),
                     float(df.columns[25]),
                     float(df.columns[26]),
                     float(df.columns[27]),
                     float(df.columns[28]),
                     float(df.columns[29]),
                     float(df.columns[30]),
                     float(df.columns[31]),
                     float(df.columns[32]),
                     float(df.columns[33]),
                     float(df.columns[34]),
                     float(df.columns[35]),
                     float(df.columns[36]),
                     float(df.columns[37]),
                     float(df.columns[38]),
                     float(df.columns[39]),
                     float(df.columns[40]),
                     float(df.columns[41]),
                     float(df.columns[42]),
                     float(df.columns[43]),
                     float(df.columns[44]),
                     float(df.columns[45]),
                     float(df.columns[46]),
                     float(df.columns[47]),
                     float(df.columns[48]),
                     float(df.columns[49]),
                     float(df.columns[50]),
                     float(df.columns[51]),
                     float(df.columns[52]),
                     float(df.columns[53]),
                     float(df.columns[54]),
                     float(df.columns[55]),
                     float(df.columns[56]),
                     float(df.columns[57]),
                     float(df.columns[58]),
                    float(df.columns[59])] #The last  column containing a grain size bin (362.04 mm)

    #This variable extracts the percentile from each grainsize bin. It moves on to the next sample in each iteration
    grain_size_frequencies=[df.loc[i,df.columns[9]],
                            df.loc[i,df.columns[10]],
                            df.loc[i,df.columns[11]],
                            df.loc[i,df.columns[12]],
                            df.loc[i,df.columns[13]],
                            df.loc[i,df.columns[14]],
                            df.loc[i,df.columns[15]],
                            df.loc[i,df.columns[16]],
                            df.loc[i,df.columns[17]],
                            df.loc[i,df.columns[18]],
                            df.loc[i,df.columns[19]],
                            df.loc[i,df.columns[20]],
                            df.loc[i,df.columns[21]],
                            df.loc[i,df.columns[22]],
                            df.loc[i,df.columns[23]],
                            df.loc[i,df.columns[24]],
                            df.loc[i,df.columns[25]],
                            df.loc[i,df.columns[26]],
                            df.loc[i,df.columns[27]],
                            df.loc[i,df.columns[28]],
                            df.loc[i,df.columns[29]],
                            df.loc[i,df.columns[30]],
                            df.loc[i,df.columns[31]],
                            df.loc[i,df.columns[32]],
                            df.loc[i,df.columns[33]],
                            df.loc[i,df.columns[34]],
                            df.loc[i,df.columns[35]],
                            df.loc[i,df.columns[36]],
                            df.loc[i,df.columns[37]],
                            df.loc[i,df.columns[38]],
                            df.loc[i,df.columns[39]],
                            df.loc[i,df.columns[30]],
                            df.loc[i,df.columns[41]],
                            df.loc[i,df.columns[42]],
                            df.loc[i,df.columns[43]],
                            df.loc[i,df.columns[44]],
                            df.loc[i,df.columns[45]],
                            df.loc[i,df.columns[46]],
                            df.loc[i,df.columns[47]],
                            df.loc[i,df.columns[48]],
                            df.loc[i,df.columns[49]],
                            df.loc[i,df.columns[50]],
                            df.loc[i,df.columns[51]],
                            df.loc[i,df.columns[52]],
                            df.loc[i,df.columns[53]],
                            df.loc[i,df.columns[54]],
                            df.loc[i,df.columns[55]],
                            df.loc[i,df.columns[56]],
                            df.loc[i,df.columns[57]],
                            df.loc[i,df.columns[58]],
                            df.loc[i,df.columns[59]]]
    
    #Now we interpolate the distribution percentiles 
    prc_values =  np.interp(prcs,np.hstack((0,np.cumsum(grain_size_frequencies))), np.hstack((0,grain_size_bins)) )
    
    #Enter in a new row below for each percentile you entered in the prcs brackets. 
    #Name each variable accordingy in the brackets next to the i (which is the number of row/sample that the loop is interpolating on in the given iteration)
   
    df.loc[i,["d5"]] = (prc_values[0])
    df.loc[i,["d10"]] = (prc_values[1])
    df.loc[i,["d16"]] = (prc_values[2])
    df.loc[i,["d25"]] = (prc_values[3])
    df.loc[i,["d30"]] = (prc_values[4])
    df.loc[i,["d50"]] = (prc_values[5])
    df.loc[i,["d75"]] = (prc_values[6])
    df.loc[i,["d84"]]= (prc_values[7]) 
    df.loc[i,["d90"]] = (prc_values[8])  
    df.loc[i,["d95"]]= (prc_values[9])
    
    i=i+1



write.csv(df, destination)