## **Compute  Percentiles From Percentiles**

## This notebook serves as a guide of how to calculate values for of-interest sediment distributions from that are not provided within a dataset from other distributions that *are* provided. (I.e., calculate d5 from provided d10, d50, d90 values)


* To do so, the user (you) enters a dataset where the values of percentile distributions are provided (e.g., d10, d50, d90).
    
    
*  The user will enter in the distributions that they are interested in calculated from those values (e.g., d5, d30, etc.)

* for this example we will be using data from SandSnap


    
    

## <font color=grey> *This notebooks' output adds new fields to the input sample data dataframe for each specified distribution that the user is interested in estimating. This estimation will be more accurate for datasets with more originally provided distributions. Also see sample_compute_interpolation_error*<font>


## Run these two cells  to get everything set up:

In [None]:
import pandas as pd
import numpy as np
import scipy
from scipy.interpolate import interp1d
import requests

In [None]:
# from https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    """
    response = filename for input
    destination = filename for output
    """    
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)



##  **And then Import the Sample Dataset:**

### For this example we will be using sediment data from SandSnap


In [None]:

DATASET_ID = '1p_x1boL12i-Yt1Eou845vf4CbaOEEOLr'


destination = '../data.csv'
download_file_from_google_drive(DATASET_ID, destination)
df= pd.read_csv(destination)


## In this cell you will estimate your unknown distributions using the ones provided in the dataset. 

In [None]:
#This loop will iterate for each sample in the dataset
for i in range(0,len(df)):
        #Set variables for columns of provided percentile distributions. E.g.:
        d10=df['d10'].iloc[i]
        d16=df['d16'].iloc[i]
        d25=df['d25'].iloc[i]
        d50=df['d50'].iloc[i]
        d65=df['d65'].iloc[i]
        d84=df['d84'].iloc[i]
        d90=df['d90'].iloc[i]
        
        #Here, you are creating an array of the variables you just created. Make sure to put each one that you set in the brackets
        grain_size_bins=[d10,d16,d25,d50,d65,d84,d90]
        
        #Here, you are creating an array of the percentile values of the distributions for the above respective variables. Make sure to put each one that you set in the brackets
        grain_size_frequencies=[.1,.16,.25,.5,.65,.84,.9]
        
        
        #Here we will use scipy's interpolation toolbox to create a function that calculates unknow distributions of interest.
        distribution = scipy.interpolate.interp1d(grain_size_frequencies, grain_size_bins, bounds_error=False, fill_value='extrapolate')
        
        #Here we will create new columns for percentile distributions in which we would like to calculate respective grainsize values
        #For each column (which you set on the left side of the equation, entering the appropriate name in the brackets), you will run the function made in the code line above
        #Enter in the appropriate percentile value in which you wish to calculate within the paranthesis following "distribution"
        #E.g., for "d5"  enter in "distribution(.05)". 
        df.loc[i,["d5"]] = distribution(.05)
        df.loc[i,["d30"]]= distribution(.3)
        df.loc[i,["d75"]] = distribution(.75)
        df.loc[i,["d95"]]= distribution(.95)
