# Alignment of SMP and Snow Pits / Evaluation of Proksch (P15) coefficients
*Josh King, Environment and Climate Change Canada, 2019*

This workbook introduces a snow on sea ice calibration procedure for SMP-derived estimates of density first introduced in [Proksch, et al., 2015](https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2014JF003266). Where indicated, the work modifies portions of the [SMP python package from SLF](https://github.com/slf-dot-ch/snowmicropyn) and uses a number of open source community packages to facilitate processing.

I'm still not great at GIS in python so the maps in the publication were done in ESRI ArcMap.

### ***Alignment takes a long time due to the large number of scaling candidates. If you want to skip that part and just load the result set, set `skip_alignment` below to `True`.***

### Notes on settings and constants
**CUTTER_SIZE** defines the half height in mm of the density cutter used as reference. Can be changed to accommodate different sampler sizes. No need to change this for ECCC data. 

**WINDOW_SIZE** defines the size of the rolling window used in SLF shot noise calculations. A 5 mm window was used in [Proksch, et al., 2015](https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2014JF003266) when there was separation between the SMP and density cutter. Increasing the window reduces sensitivity to sharp transitions and reduces resolution of the analysis. However, moving to something like 2.5 mm makes comparison difficult as some of the very fine structure resolved as very different over the ~10 cm separation between the SMP and density profiles. 

**NUM_TESTS** defines how many random scaling configurations to test against when attempting to align the SMP and snow pit data. We brute-force the alignment in our paper so `NUM_TESTS` must be large to ensure the test space searched is sufficient. A lower number of tests risks poor alignment and therefore poor calibration. In the paper we use 10k permutations.

**MAX_STRETCH_LAYER** and **MAX_STRETCH_OVERALL** define how much an individual layer can be eroded or dilated, and the maximum change in total length of the SMP profile, respectively. We allow a rather large 70% change to individual layers to accommodate pinching out but restrict the total change to 10% to avoid overfitting.

**H_RESAMPLE** and **L_RESAMPLE** define the resampled resolution of the SMP and the layer size used for matching profiles. These terms are interactive with the layer stretching and should be evaluated carefully if changed.

In [1]:
# Community packages
import os 
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from scipy import stats
from statsmodels.formula.api import ols

from matplotlib import pyplot as plt
plt.rcParams["font.family"] = "Times New Roman"

# Local packages
import smpfunc # SMP helper functions

# Import SLF SMP Package
from snowmicropyn import Profile, proksch2015, loewe2012

# Set constants
CUTTER_SIZE = 15  # Half the height of the density cutter in mm
# Have calculated cutter size from density profile measurements
WINDOW_SIZE = 5  # SMP analysis window in mm
H_RESAMPLE = 1  # Delta height in mm for standardized SMP profiles
L_RESAMPLE = 50  # Layer unit height in mm for SMP matching
MAX_STRETCH_LAYER = 0.75  # Max layer change in % of height
MAX_STRETCH_OVERALL = 0.15  # Max profile change in % of total height
NUM_TESTS = 10000  # Number of scaling candidates to generate for alignment testing 

# Set conditions
skip_alignment = False # Set as true to just load the results from a pickle instead of reprocessing
paper_conditions = True # Set as true to reproduce the paper results with seeding

# Small differences in comparison to the paper will occur if a seed is not set.
# This is mainly because we use a brute-force approach to matching the smp and 
# snow pit profiles with modest search size (specified by NUM_TESTS).
if paper_conditions:
    np.random.seed(2019) 

os.makedirs('./output/figures', exist_ok=True)    
    
def rmse(data):
    return np.sqrt(np.mean(data**2))

In [2]:
import mat73

# Replace Josh's pit_density with Vicki's
TVC = mat73.loadmat('./data/TVC_Jan2019/TVC_Jan2019_.mat')
# Redefine upper level of structure (avoids text evaluation problem caused by ['TVC'])

tvc = TVC['TVC_Jan2019']

# Get list of pit names
pit_list = list(tvc.keys())

In [3]:
# Read in file showing nearest SMP profile to pits
nearest_smp = pd.read_excel('./data/TVC_Jan2019/NearestNeighbourSMP_3.xlsx', index_col='Pit')

In [4]:
def get_nearest_smp(pit):
    # Looks up nearest smp profile to pits from excel file, returns nan if not available
    try:
        return nearest_smp.loc[pit]['Nearest Neighbour SMP']
    except:
        return np.nan
    
    
def get_nearest_smp_data(pit):
    # Returns SMP data closest to pit, or NaN if not available
    if type(get_nearest_smp(pit)) == str:
        return pd.DataFrame(eval('tvc.' + pit + '.SMP.CroppedProfiles.' + get_nearest_smp(pit)), columns={'depth_smp', 'force'}).rename(columns={'depth_smp':'distance'})
    else:
        return np.nan

In [5]:
# Function to pull out single pit data from .mat file and put in pandas dataframe
def get_pit_density_data(pit):
    # Make dictionary
    obs = {'id': [pit] * len(tvc[pit].density.densityA), 'density': tvc[pit].density.densityA, 
           'top': tvc[pit].density.boundary_top, 'bottom': tvc[pit].density.boundary_btm}

    return pd.DataFrame(obs, columns={'id', 'density', 'top', 'bottom'})

In [6]:
# Extract density data from all pits
all_pits = [get_pit_density_data(p) for p in pit_list] # list of pandas dataframes

# Join list of dataframes
new_pit_density = pd.concat(all_pits)

In [7]:
#Function to get grain type information for all layers
def get_grain_types(pit):
    #print(pit)
    #Make dictionary
    obs = {'id':[pit] * len(tvc[pit].stratigraphy_layers.grain_type), 'grain_type': tvc[pit].stratigraphy_layers.grain_type, 'strat_comment':tvc[pit].stratigraphy_layers.strat_comment,
           'top': tvc[pit].stratigraphy_layers.strat_top, 'bottom':tvc[pit].stratigraphy_layers.strat_btm}
    
    return pd.DataFrame(obs, columns={'id', 'grain_type', 'strat_comment', 'top', 'bottom'}) #dataframe currently contains strat_comment in cell format as python does not recognise matlab string arrays. this is creating problems with trying to search what the string says. 

all_pits_ = [get_grain_types(p) for p in pit_list] #list of dataframes > assuming needs to not be called "all_pits" so python doesn't get confused, but as obs is re-written anyway, name can stay the same?

# Join list of dataframes
grain_types_pits = pd.concat(all_pits_)

In [8]:
grains = []
# Grain type for each layer is its own list
# Make extract these into a single list, change letters and put back into grain_types_pits 
for val in grain_types_pits.grain_type:
    grains.append(val[0])

# Switch to upper case
# https://stackoverflow.com/questions/1801668/convert-a-python-list-with-strings-all-to-lowercase-or-uppercase
grains = [x.upper() for x in grains]
# Replace M -> N
# https://stackoverflow.com/questions/2582138/finding-and-replacing-elements-in-a-list
grains = ['N' if x == 'M' else x for x in grains]
# Replace C -> I 
# C denotes layers containing crusts, so do not want to include these layers in the calibration, but this'll come later
grains = ['I' if x == 'C' else x for x in grains]


# ^^ I MADE A NEW COLUMN SO YOU CAN SEE DIFFERENCE BETWEEN FORMATS OF GRAIN_TYPE AND NEW_GRAIN_TYPE
# IF YOU WANT TO OVERWRITE grain_type JUST DELETE THE ONE ABOVE AND UNCOMMENT ONE BELOW > Thanks, Done :)

grain_types_pits['grain_type'] = grains

In [9]:
# Create list of grain types at resolution of density info
new_pit_density = new_pit_density.assign(TYPE = 'R') # initally all rounds, will overwrite to include other grain types

# Set others depending on whatever condition
# https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe

# Created new copy of notebook for each campaign so easy to re-write this cell for different campaigns

#Set Faceted Grains
new_pit_density.loc[(new_pit_density.top<=43) & (new_pit_density.id=='RP_01'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='RP_04'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_05'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_06'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=29) & (new_pit_density.id=='RP_07'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=29) & (new_pit_density.id=='RP_08'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=44) & (new_pit_density.id=='RP_09'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=28) & (new_pit_density.id=='RP_10'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_11'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_12'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_13'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=21) & (new_pit_density.id=='RP_14'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_15'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_16'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=34) & (new_pit_density.id=='RP_17'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=15) & (new_pit_density.id=='RP_18'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='SC_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=42) & (new_pit_density.id=='SD_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=32) & (new_pit_density.id=='SM_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=22) & (new_pit_density.id=='SO_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=45) & (new_pit_density.id=='SR_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=21) & (new_pit_density.id=='ST_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=22) & (new_pit_density.id=='SV_02'), 'TYPE']='F'

#Set New Snow
new_pit_density.loc[(new_pit_density.top>=43) & (new_pit_density.id=='RP_01'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_02'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=27) & (new_pit_density.id=='RP_03'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=18) & (new_pit_density.id=='RP_04'), 'TYPE']='N' #Type set as N not R, but description of "faceting wind rounds"
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_05'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=20) & (new_pit_density.id=='RP_06'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=29) & (new_pit_density.id=='RP_07'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=29) & (new_pit_density.id=='RP_08'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=44) & (new_pit_density.id=='RP_09'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=28) & (new_pit_density.id=='RP_10'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=20) & (new_pit_density.id=='RP_11'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_12'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_13'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=21) & (new_pit_density.id=='RP_14'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_15'), 'TYPE']='N' #Type set as N not R but description is of "faceting wind rounds"
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_16'), 'TYPE']='N' #Type set as N not R but description is of "faceting wind rounds"
new_pit_density.loc[(new_pit_density.top>=34) & (new_pit_density.id=='RP_17'), 'TYPE']='N'
# No new snow for RP_18
new_pit_density.loc[(new_pit_density.top>=20) & (new_pit_density.id=='SC_02'), 'TYPE']='N'
# Pit SD_02 describes New snow below layer of rounds > assume errounous and leave for now
new_pit_density.loc[(new_pit_density.top>=33) & (new_pit_density.id=='SM_02'), 'TYPE']='N' #Double check, strat_comment looks odd
new_pit_density.loc[(new_pit_density.top>=22) & (new_pit_density.id=='SO_02'), 'TYPE']='N' #Type set as N not R but description is of "faceting wind rounds"
new_pit_density.loc[(new_pit_density.top>=45) & (new_pit_density.id=='SR_02'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=21) & (new_pit_density.id=='ST_02'), 'TYPE']='N' #Type set as N not R but description is of "faceting wind rounds"
new_pit_density.loc[(new_pit_density.top>=22) & (new_pit_density.id=='SV_02'), 'TYPE']='N'

#Print to check worked

print(new_pit_density) 

    bottom     id  density   top TYPE
0     47.0  RP_01    242.0  50.0    N
1     44.0  RP_01    221.0  47.0    N
2     41.0  RP_01    153.0  44.0    N
3     38.0  RP_01    371.0  41.0    F
4     35.0  RP_01    385.0  38.0    F
..     ...    ...      ...   ...  ...
2     19.0  SV_02    408.0  22.0    N
3     16.0  SV_02    365.0  19.0    F
4     13.0  SV_02    296.0  16.0    F
5     10.0  SV_02    223.0  13.0    F
6      7.0  SV_02    223.0  10.0    F

[273 rows x 5 columns]


In [10]:
# Josh splits F (faceted) into F and D (Depth Hoar), which source data does not do. 
# Starting point is if strat_comment mentions depth hoar, convert manually as above
#(Considered using some sort of loop, but as had to do Facted grains manually and would have to convert to regular expressions, I'm not convinved it's worth it)

# Rule: F becomes D if "strat_comment" mentions "Depth Hoar" or "Hoar" (or a typo clearly meant to be one of these), BUT NOT "Indurated Hoar" or "Hoar Parting"

#Set Depth Hoar
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_01'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='RP_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='RP_03'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=13) & (new_pit_density.id=='RP_04'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=29) & (new_pit_density.id=='RP_05'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='RP_06'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=23) & (new_pit_density.id=='RP_07'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=29) & (new_pit_density.id=='RP_08'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=23) & (new_pit_density.id=='RP_09'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='RP_10'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_11'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=28) & (new_pit_density.id=='RP_12'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_13'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=19) & (new_pit_density.id=='RP_14'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=16) & (new_pit_density.id=='RP_15'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_16'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=22) & (new_pit_density.id=='RP_17'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=15) & (new_pit_density.id=='RP_18'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=17) & (new_pit_density.id=='SC_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='SD_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=32) & (new_pit_density.id=='SM_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=17) & (new_pit_density.id=='SO_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=21) & (new_pit_density.id=='SR_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=21) & (new_pit_density.id=='ST_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=12) & (new_pit_density.id=='SV_02'), 'TYPE']='D'

In [11]:
# Function 2: Find location of ice lenses

# Set layers containing ice as above:
# NO Surface crusts unlike March, probably because earlier in the season and snow continuing to accumulate, not much sunlight avalible to cause them to form. 

# Set remaining ice crusts, having manually identified which density layers they belong to:
new_pit_density.loc[(new_pit_density.top==22)& (new_pit_density.id=='RP_02'), 'TYPE']='I'# Faceted crust at 20cm
new_pit_density.loc[(new_pit_density.top==20)& (new_pit_density.id=='RP_03'), 'TYPE']='I'# Faceted crust at 20cm
new_pit_density.loc[(new_pit_density.top==17)& (new_pit_density.id=='RP_04'), 'TYPE']='I'# Facted crust at 15.5cm
new_pit_density.loc[(new_pit_density.top==29)& (new_pit_density.id=='RP_07'), 'TYPE']='I'# Faceted crust at 27cm
new_pit_density.loc[(new_pit_density.top==22)& (new_pit_density.id=='RP_14'), 'TYPE']='I'# Faceted crust at 20cm
new_pit_density.loc[(new_pit_density.top==27)& (new_pit_density.id=='RP_16'), 'TYPE']='I'# Faceted crust at 27cm

In [12]:
#print to check works
new_pit_density  

#Leave removal of ice type layers until after comparison_df, in order to check fit of SMPs

Unnamed: 0,bottom,id,density,top,TYPE
0,47.0,RP_01,242.0,50.0,N
1,44.0,RP_01,221.0,47.0,N
2,41.0,RP_01,153.0,44.0,N
3,38.0,RP_01,371.0,41.0,F
4,35.0,RP_01,385.0,38.0,F
5,32.0,RP_01,367.0,35.0,F
6,29.0,RP_01,262.0,32.0,F
7,26.0,RP_01,232.0,29.0,F
8,23.0,RP_01,230.0,26.0,D
9,20.0,RP_01,241.0,23.0,D


In [13]:
# It's all about generating the comparison_df
comparison_df = pd.DataFrame()
min_scaling_coeff = []
all_smp_df = pd.DataFrame()

# Loop over pits
for p in pit_list:
    # Extract smp data for profile nearest to snowpit
    nearest_smp_profile = get_nearest_smp_data(p)
    # Only pick out pits with nearest smp
    # Will ignore pits that return NaN (not a pandas dataframe) for get_nearest_smp_data
    if isinstance(nearest_smp_profile, pd.DataFrame):
        # Get density data
        density_df = new_pit_density[new_pit_density.id == p].rename(columns={'density': 'RHO'})
        # Add in relative height
        # Think this is defined as distance of layer midpoint from surface, in mm
        cutter_size = float(stats.mode(density_df.top.iloc[0] - density_df.bottom).mode) 
        density_df = density_df.assign(relative_height_mm = (density_df.top[0] - density_df.bottom - cutter_size / 2) * 10)
        
        # Linear interpolation of SMP data (so have no NaN values)
        nearest_smp_profile = nearest_smp_profile.interpolate(method='linear', limit_direction='forward', axis=0)
        
        print (p) # Tells you which profiles have numerical instabilities - can remove this statement later
        # Make first guess at microstructure based on original profile
        l2012 = loewe2012.calc(nearest_smp_profile, window=WINDOW_SIZE)
        p2015 = proksch2015.calc(nearest_smp_profile, window=WINDOW_SIZE)
        
        smp_profile_height = p2015.distance.max() # This in mm
        # I have used the snow depth from this snow pit. Josh has used mean of magnaprobe depths
        # To get mean of magnaprobe depths, you can use tvc[p].magnaprobe.MgP_Summary.Mean_MgPDepth
        # If SMP height is less than snowpack height, no need to shorten profile
        smp_height_diff = min(0, density_df.top.iloc[0] * 10 - smp_profile_height) 
        
        # Create new SMP resampled arrays and determine the number of layers
        depth_array = np.arange(0, p2015.distance.max() + smp_height_diff, H_RESAMPLE)
        density_array = np.interp(depth_array,p2015.distance,p2015.P2015_density)
        force_array = np.interp(depth_array,p2015.distance,l2012.force_median)
        l_array = np.interp(depth_array,p2015.distance,l2012.L2012_L)
        id_array = p

        smp_df = pd.DataFrame({'distance': depth_array, 
                               'density': density_array,
                               'force_median': force_array,
                               'l': l_array, 
                               'id':id_array,})
        
        all_smp_df = all_smp_df.append(smp_df)

        # Generate a selection of random transformation to brute-force alignment
        # We use this brute force approach because there was no gradient that could be used to optimize the relationship
        num_sections = np.ceil(len(smp_df.index)/L_RESAMPLE).astype(int)
        random_tests = [smpfunc.random_stretch(x, MAX_STRETCH_OVERALL, MAX_STRETCH_LAYER) for x in np.repeat(num_sections, NUM_TESTS)] 

        scaled_profiles = [smpfunc.scale_profile(test, smp_df.distance.values, smp_df.density.values, L_RESAMPLE, H_RESAMPLE) for test in random_tests]
        compare_profiles = [smpfunc.extract_samples(dist, rho, density_df.relative_height_mm.values, cutter_size) for dist, rho in scaled_profiles]
        compare_profiles = [pd.concat([profile, density_df.reset_index()], axis=1, sort=False) for profile in compare_profiles]
        retrieved_skill = [smpfunc.calc_skill(profile, cutter_size) for profile in compare_profiles]
        retrieved_skill = pd.DataFrame(retrieved_skill,columns = ['r','rmse','rmse_corr','mae'])

        min_scaling_idx = retrieved_skill.sort_values(['r', 'rmse_corr'], ascending=[False, True]).head(1).index.values
        min_scaling_coeff.append(random_tests[int(min_scaling_idx)])
        
        dist, scaled_l =  smpfunc.scale_profile(min_scaling_coeff[-1], smp_df.distance.values, smp_df.l.values, L_RESAMPLE, H_RESAMPLE)
        dist, scaled_force_median = smpfunc.scale_profile(min_scaling_coeff[-1], smp_df.distance.values, smp_df.force_median.values, L_RESAMPLE, H_RESAMPLE)

        result = compare_profiles[int(min_scaling_idx)].assign(l=smpfunc.extract_samples(dist, scaled_l, density_df.relative_height_mm.values, cutter_size).mean_samp,
                                                  force_median=smpfunc.extract_samples(dist, scaled_force_median, density_df.relative_height_mm.values, cutter_size).mean_samp)
        comparison_df = comparison_df.append(result, ignore_index=True)

RP_01
RP_02
RP_06
RP_08
RP_09
RP_10
RP_12


  delta = -(3. / 2) * c_f[n - 1] / (c_f[n] - c_f[n - 1]) * spatial_res
  lambda_ = (4. / 3) * (k1 ** 2) / k2 / delta  # Intensity
  f0 = (3. / 2) * k2 / k1
  density = a1 + a2 * np.log(fm) + a3 * np.log(fm) * l + a4 * l
  density = a1 + a2 * np.log(fm) + a3 * np.log(fm) * l + a4 * l
  lc = c1 + c2 * l + c3 * np.log(fm)


RP_13


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)
  keepdims=keepdims)
  arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
  ret = ret.dtype.type(ret / rcount)


RP_15
RP_16
RP_17
RP_18
SD_02
SO_02


  lambda_ = (4. / 3) * (k1 ** 2) / k2 / delta  # Intensity
  r = func(a, **kwargs)


SR_02
ST_02


In [14]:
# save the results to a local file since tbe brute-force method takes a while to compute
comparison_df.to_pickle("./output/TVC/TVC_Jan2019_smp_pit_comparison_3.pkl")

In [15]:
# Save to .csv file so I can check it out in matlab

print(comparison_df)

comparison_df.to_csv('./output/TVC/TVC_Jan2019_comparison_3.csv',na_rep='NaN')


     count_samp   mean_samp  median_samp  stdev_samp  index  bottom     id  \
0             7   22.598556    21.468414    4.066480      0    47.0  RP_01   
1             7  398.596814   409.300656   19.569935      1    44.0  RP_01   
2             7  225.770318   218.796716   39.438038      2    41.0  RP_01   
3             7  568.917286   571.620733   10.750147      3    38.0  RP_01   
4             7  542.346539   543.534441    3.679610      4    35.0  RP_01   
..          ...         ...          ...         ...    ...     ...    ...   
181           7  274.044944   274.206680    3.422400      5    13.0  ST_02   
182           7  294.038296   290.799897   12.610278      6    10.0  ST_02   
183           7  289.689438   291.275857    4.047493      7     7.0  ST_02   
184           7  277.813520   277.221258    5.582478      8     4.0  ST_02   
185           7  320.314389   279.025807   67.761575      9     1.0  ST_02   

       RHO   top TYPE  relative_height_mm         l  force_medi

In [16]:
print(all_smp_df) # smp_df rewrites for each pit, so need to collate

all_smp_df.to_csv('./output/TVC/TVC_Jan2019_allsmpdf_3.csv',na_rep='NaN') 

     distance     density  force_median         l     id
0         0.0  306.719758      0.011839  0.927453  RP_01
1         1.0  288.331933      0.011839  0.877419  RP_01
2         2.0  269.944107      0.011839  0.827384  RP_01
3         3.0  211.403412      0.011576  0.673434  RP_01
4         4.0  112.709846      0.011050  0.415566  RP_01
..        ...         ...           ...       ...    ...
305     305.0  264.526087      0.350569  1.129490  ST_02
306     306.0  263.529362      0.361092  1.129527  ST_02
307     307.0  262.532638      0.371616  1.129565  ST_02
308     308.0  269.555052      0.431337  1.033851  ST_02
309     309.0  284.596606      0.540257  0.842387  ST_02

[6276 rows x 5 columns]


In [17]:
#Filter results
result = comparison_df.dropna() #All NaNs should already have been removed, but just in case
result = result[result['count_samp']>=cutter_size*2] # Remove comparisons outside the profile
#result = result[~result['TYPE'].isin(['N', 'I'])] # Remove new snow and ice because we don't have enough samples
result = result[~result['TYPE'].isin(['I'])] # Alternate option, only remove ice layers because of high number of new snow samples as January is still in the middle of the snow season
result['error'] = result['mean_samp']-result['RHO']
result.head()


Unnamed: 0,count_samp,mean_samp,median_samp,stdev_samp,index,bottom,id,RHO,top,TYPE,relative_height_mm,l,force_median,error
0,7,22.598556,21.468414,4.06648,0,47.0,RP_01,242.0,50.0,N,15.0,0.175196,0.010784,-219.401444
1,7,398.596814,409.300656,19.569935,1,44.0,RP_01,221.0,47.0,N,45.0,0.097293,0.955946,177.596814
2,7,225.770318,218.796716,39.438038,2,41.0,RP_01,153.0,44.0,N,75.0,0.189657,0.457522,72.770318
3,7,568.917286,571.620733,10.750147,3,38.0,RP_01,371.0,41.0,F,105.0,0.150041,7.867569,197.917286
4,7,542.346539,543.534441,3.67961,4,35.0,RP_01,385.0,38.0,F,135.0,0.174863,6.479529,157.346539


In [18]:
#Compare manual density cutter measurements and SMP-derived densities
# P2015 evaluation stats
p2015_rmse = np.sqrt(np.mean(result['error']**2))
p2015_bias = (result['error']).mean()
p2015_r2 = np.ma.corrcoef(result['mean_samp'],result['RHO'])[0, 1]**2
p2015_n = len(result['mean_samp'])
p2015_p = stats.pearsonr(result['mean_samp'],result['RHO'])[1]

print('Proksch et al. 2015 Eval.')
print('N: %i' % p2015_n)
print('RMSE: %0.1f' % np.round(p2015_rmse))
print('bias: %0.1f' % np.round(p2015_bias))
print('r^2: %0.2f' % p2015_r2)

# Error as a % of mean density
np.round(rmse(result.error)/ result['RHO'].mean(),2)

# RMSE by layer type
result.groupby('TYPE')['error'].apply(rmse)

Proksch et al. 2015 Eval.
N: 185
RMSE: 128.0
bias: 99.0
r^2: 0.74


TYPE
D     77.256530
F    139.926481
N    179.147614
R    148.296373
Name: error, dtype: float64

In [19]:
# Export the dataset
result.to_pickle("./output/TVC/TVC_Jan2019_smp_pit_filtered_incN_3.pkl")