# Alignment of SMP and Snow Pits / Evaluation of Proksch (P15) coefficients
*Josh King, Environment and Climate Change Canada, 2019*

This workbook introduces a snow on sea ice calibration procedure for SMP-derived estimates of density first introduced in [Proksch, et al., 2015](https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2014JF003266). Where indicated, the work modifies portions of the [SMP python package from SLF](https://github.com/slf-dot-ch/snowmicropyn) and uses a number of open source community packages to facilitate processing.

I'm still not great at GIS in python so the maps in the publication were done in ESRI ArcMap.

### ***Alignment takes a long time due to the large number of scaling candidates. If you want to skip that part and just load the result set, set `skip_alignment` below to `True`.***

### Notes on settings and constants
**CUTTER_SIZE** defines the half height in mm of the density cutter used as reference. Can be changed to accommodate different sampler sizes. No need to change this for ECCC data. 

**WINDOW_SIZE** defines the size of the rolling window used in SLF shot noise calculations. A 5 mm window was used in [Proksch, et al., 2015](https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2014JF003266) when there was separation between the SMP and density cutter. Increasing the window reduces sensitivity to sharp transitions and reduces resolution of the analysis. However, moving to something like 2.5 mm makes comparison difficult as some of the very fine structure resolved as very different over the ~10 cm separation between the SMP and density profiles. 

**NUM_TESTS** defines how many random scaling configurations to test against when attempting to align the SMP and snow pit data. We brute-force the alignment in our paper so `NUM_TESTS` must be large to ensure the test space searched is sufficient. A lower number of tests risks poor alignment and therefore poor calibration. In the paper we use 10k permutations.

**MAX_STRETCH_LAYER** and **MAX_STRETCH_OVERALL** define how much an individual layer can be eroded or dilated, and the maximum change in total length of the SMP profile, respectively. We allow a rather large 70% change to individual layers to accommodate pinching out but restrict the total change to 10% to avoid overfitting.

**H_RESAMPLE** and **L_RESAMPLE** define the resampled resolution of the SMP and the layer size used for matching profiles. These terms are interactive with the layer stretching and should be evaluated carefully if changed.

In [1]:
# Community packages
import os 
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from scipy import stats
from statsmodels.formula.api import ols

from matplotlib import pyplot as plt
plt.rcParams["font.family"] = "Times New Roman"

# Local packages
import smpfunc # SMP helper functions

# Import SLF SMP Package
from snowmicropyn import Profile, proksch2015, loewe2012

# Set constants
CUTTER_SIZE = 15  # Half the height of the density cutter in mm
# Have calculated cutter size from density profile measurements
WINDOW_SIZE = 5  # SMP analysis window in mm
H_RESAMPLE = 1  # Delta height in mm for standardized SMP profiles
L_RESAMPLE = 50  # Layer unit height in mm for SMP matching
MAX_STRETCH_LAYER = 0.75  # Max layer change in % of height
MAX_STRETCH_OVERALL = 0.15  # Max profile change in % of total height
NUM_TESTS = 10000  # Number of scaling candidates to generate for alignment testing 

# Set conditions
skip_alignment = False # Set as true to just load the results from a pickle instead of reprocessing
paper_conditions = True # Set as true to reproduce the paper results with seeding

# Small differences in comparison to the paper will occur if a seed is not set.
# This is mainly because we use a brute-force approach to matching the smp and 
# snow pit profiles with modest search size (specified by NUM_TESTS).
if paper_conditions:
    np.random.seed(2019)   
    
def rmse(data):
    return np.sqrt(np.mean(data**2))

In [2]:
import mat73

# Replace Josh's pit_density with Vicki's
TVC = mat73.loadmat('./data/TVC_March2019/TVC_March2019_.mat')
# Redefine upper level of structure (avoids text evaluation problem caused by ['TVC'])

tvc = TVC['TVC_March2019']

# Get list of pit names
pit_list = list(tvc.keys())

In [3]:
# Read in file showing nearest SMP profile to pits
nearest_smp = pd.read_excel('./data/TVC_March2019/NearestNeighbourSMP_4.xlsx', index_col='Pit')

In [4]:
def get_nearest_smp(pit):
    # Looks up nearest smp profile to pits from excel file, returns nan if not available
    try:
        return nearest_smp.loc[pit]['Nearest Neighbour SMP']
    except:
        return np.nan
    
    
def get_nearest_smp_data(pit):
    # Returns SMP data closest to pit, or NaN if not available
    if type(get_nearest_smp(pit)) == str:
        return pd.DataFrame(eval('tvc.' + pit + '.SMP.CroppedProfiles.' + get_nearest_smp(pit)), columns={'depth_smp', 'force'}).rename(columns={'depth_smp':'distance'})
    else:
        return np.nan

In [5]:
# Function to pull out single pit data from .mat file and put in pandas dataframe
def get_pit_density_data(pit):
    # Make dictionary
    obs = {'id': [pit] * len(tvc[pit].density.densityA), 'density': tvc[pit].density.densityA, 
           'top': tvc[pit].density.boundary_top, 'bottom': tvc[pit].density.boundary_btm}

    return pd.DataFrame(obs, columns={'id', 'density', 'top', 'bottom'})

In [6]:
# Extract density data from all pits
all_pits = [get_pit_density_data(p) for p in pit_list] # list of pandas dataframes

# Join list of dataframes
new_pit_density = pd.concat(all_pits)

In [7]:
#Function to get grain type information for all layers
def get_grain_types(pit):
    #print(pit)
    #Make dictionary
    obs = {'id':[pit] * len(tvc[pit].stratigraphy_layers.grain_type), 'grain_type': tvc[pit].stratigraphy_layers.grain_type, 'strat_comment':tvc[pit].stratigraphy_layers.strat_comment,
           'top': tvc[pit].stratigraphy_layers.strat_top, 'bottom':tvc[pit].stratigraphy_layers.strat_btm}
    
    return pd.DataFrame(obs, columns={'id', 'grain_type', 'strat_comment', 'top', 'bottom'}) #dataframe currently contains strat_comment in cell format as python does not recognise matlab string arrays. this is creating problems with trying to search what the string says. 

all_pits_ = [get_grain_types(p) for p in pit_list] #list of dataframes > assuming needs to not be called "all_pits" so python doesn't get confused, but as obs is re-written anyway, name can stay the same?

# Join list of dataframes
grain_types_pits = pd.concat(all_pits_)

In [8]:
grains = []
# Grain type for each layer is its own list
# Make extract these into a single list, change letters and put back into grain_types_pits 
for val in grain_types_pits.grain_type:
    grains.append(val[0])

# Switch to upper case
# https://stackoverflow.com/questions/1801668/convert-a-python-list-with-strings-all-to-lowercase-or-uppercase
grains = [x.upper() for x in grains]
# Replace M -> N
# https://stackoverflow.com/questions/2582138/finding-and-replacing-elements-in-a-list
grains = ['N' if x == 'M' else x for x in grains]
# Replace C -> I 
# C denotes layers containing crusts, so do not want to include these layers in the calibration, but this'll come later
grains = ['I' if x == 'C' else x for x in grains]


# ^^ I MADE A NEW COLUMN SO YOU CAN SEE DIFFERENCE BETWEEN FORMATS OF GRAIN_TYPE AND NEW_GRAIN_TYPE
# IF YOU WANT TO OVERWRITE grain_type JUST DELETE THE ONE ABOVE AND UNCOMMENT ONE BELOW > Thanks, Done :)

grain_types_pits['grain_type'] = grains

In [9]:
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(grain_types_pits)
#grain_types_pits

                                       strat_comment grain_type   top  bottom  \
0                       [rounds, hydrometeors, Fist]          N  42.0    41.0   
1                             [ice crust ~1mm thick]          I  41.0    41.0   
2                     [round congolomerates, pencil]          R  41.0    39.0   
3                          [round congolmerates, 4F]          R  39.0    37.0   
4                             [wind slab rounds, 1F]          R  37.0    27.0   
5                                   [depth hoar, 4F]          F  27.0    18.0   
6                                 [depth hoar, Fist]          F  18.0     0.0   
0                        [hydrometeor pellets, fist]          R  41.0    40.0   
1                                        [ice crust]          I  40.0    40.0   
2           [mixed forms, rounded conglomorates, 4F]          N  40.0    38.0   
3                                 [slab, rounds, 1F]          R  38.0    31.0   
4  [hoar, 4F, including an i

In [10]:
# Create list of grain types at resolution of density info
new_pit_density = new_pit_density.assign(TYPE = 'R') # initally all rounds, will overwrite to include other grain types

# Set others depending on whatever condition
# https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe

#specfic to March 2019 Data, would need to create new version of these cells for other campaigns

#Set Faceted Grains
new_pit_density.loc[(new_pit_density.top<=35) & (new_pit_density.id=='RP_01'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=31) & (new_pit_density.id=='RP_02'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=31) & (new_pit_density.id=='RP_04'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=21) & (new_pit_density.id=='RP_05'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_06'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_07'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=16) & (new_pit_density.id=='RP_08'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_09'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_10'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=30) & (new_pit_density.id=='RP_11'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=22) & (new_pit_density.id=='RP_12'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=34) & (new_pit_density.id=='RP_13'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_14'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=23) & (new_pit_density.id=='RP_15'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_16'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_17'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=24) & (new_pit_density.id=='RP_18'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=33) & (new_pit_density.id=='RP_19'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=26) & (new_pit_density.id=='RP_20'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=33) & (new_pit_density.top>=31) & (new_pit_density.id=='RP_21'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=28) & (new_pit_density.id=='RP_21'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=33) & (new_pit_density.id=='RP_22'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=32) & (new_pit_density.id=='RP_23'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=37) & (new_pit_density.id=='RP_24'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=48) & (new_pit_density.id=='RP_25'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='SC_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=38) & (new_pit_density.id=='SD_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='SM_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=19) & (new_pit_density.id=='SO_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=34) & (new_pit_density.top>=27) & (new_pit_density.id=='SR_03'), 'TYPE']='F' #Layer of slab sandwiched between facets
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='SR_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='ST_03'), 'TYPE']='F'
new_pit_density.loc[(new_pit_density.top<=15) & (new_pit_density.id=='SV_03'), 'TYPE']='F'

#Set New Snow
new_pit_density.loc[(new_pit_density.top>=41) & (new_pit_density.id=='RP_01'), 'TYPE']='N'
new_pit_density.loc[(new_pit_density.top>=38) & (new_pit_density.id=='RP_02'), 'TYPE']='N'
# No new snow for pits RP_03 to RP_07
new_pit_density.loc[(new_pit_density.top>=18) & (new_pit_density.id=='RP_08'), 'TYPE']='N'
# No new snow for pits RP_09 and RP_10
new_pit_density.loc[(new_pit_density.top>=30) & (new_pit_density.id=='RP_11'), 'TYPE']='N'
# No new snow for RP_12
new_pit_density.loc[(new_pit_density.top>=62) & (new_pit_density.id=='RP_13'), 'TYPE']='N' #Type set as "N", but description mentions crust
# No new snow for pits RP_14 to RP_18
new_pit_density.loc[(new_pit_density.top>=52) & (new_pit_density.id=='RP_19'), 'TYPE']='N'
# No new snow for pits RP_20 to RP_23
new_pit_density.loc[(new_pit_density.top>=52) & (new_pit_density.id=='RP_24'), 'TYPE']='N'
# No new snow for pits RP_25 to SM_03
new_pit_density.loc[(new_pit_density.top>=48) & (new_pit_density.id=='SR_03'), 'TYPE']='N'
# No new snow for pits ST_03 or SV_03

#Print to check worked

# print(new_pit_density) 

In [11]:
# Josh splits F (faceted) into F and D (Depth Hoar), which source data does not do. 
# Starting point is if strat_comment mentions depth hoar, convert manually as above
#(Considered using some sort of loop, but as had to do Facted grains manually and would have to convert to regular expressions, I'm not convinved it's worth it)

# Rule: F becomes D if "strat_comment" mentions "Depth Hoar" or "Hoar" (or a typo clearly meant to be one of these), BUT NOT "Indurated Hoar"

#Set Depth Hoar
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_01'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_02'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=9) & (new_pit_density.id=='RP_03'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=31) & (new_pit_density.id=='RP_04'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=21) & (new_pit_density.id=='RP_05'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_06'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_07'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=16) & (new_pit_density.id=='RP_08'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='RP_09'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='RP_10'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=30) & (new_pit_density.id=='RP_11'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=22) & (new_pit_density.id=='RP_12'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=34) & (new_pit_density.top>=30) & (new_pit_density.id=='RP_13'), 'TYPE']='D' 
new_pit_density.loc[(new_pit_density.top<=24) & (new_pit_density.id=='RP_13'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=31) & (new_pit_density.id=='RP_14'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=23) & (new_pit_density.id=='RP_15'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=17) & (new_pit_density.id=='RP_16'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=25) & (new_pit_density.id=='RP_17'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=24) & (new_pit_density.id=='RP_18'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=33) & (new_pit_density.id=='RP_19'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=26) & (new_pit_density.id=='RP_20'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=28) & (new_pit_density.id=='RP_21'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=24) & (new_pit_density.id=='RP_22'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=32) & (new_pit_density.id=='RP_23'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=37) & (new_pit_density.id=='RP_24'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=48) & (new_pit_density.id=='RP_25'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=18) & (new_pit_density.id=='SC_03'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=24) & (new_pit_density.id=='SD_03'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=27) & (new_pit_density.id=='SM_03'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=8) & (new_pit_density.id=='SO_03'), 'TYPE']='D'
new_pit_density.loc[(new_pit_density.top<=20) & (new_pit_density.id=='SR_03'), 'TYPE']='D'
# ST_03 all "indurated hoar", think I'm keeping this as "Facets"
new_pit_density.loc[(new_pit_density.top<=15) & (new_pit_density.id=='SV_03'), 'TYPE']='D'


In [12]:
# Function 2: Find location of ice lenses

# Set layers containing ice as above:
# Surface crusts:
new_pit_density.loc[(new_pit_density.top>=41) & (new_pit_density.id=='RP_01'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=40) & (new_pit_density.id=='RP_02'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=29) & (new_pit_density.id=='RP_03'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=39) & (new_pit_density.id=='RP_04'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_05'), 'TYPE']='I'
# No surface crust at pit RP_06
new_pit_density.loc[(new_pit_density.top>=33) & (new_pit_density.id=='RP_07'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=18) & (new_pit_density.id=='RP_08'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=45) & (new_pit_density.id=='RP_09'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=42) & (new_pit_density.id=='RP_10'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=52) & (new_pit_density.id=='RP_11'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=41) & (new_pit_density.id=='RP_12'), 'TYPE']='I'
# No surface crust at pits RP_13 or RP_14
new_pit_density.loc[(new_pit_density.top>=44) & (new_pit_density.id=='RP_15'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=20) & (new_pit_density.id=='RP_16'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=35) & (new_pit_density.id=='RP_17'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=32) & (new_pit_density.id=='RP_18'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=52) & (new_pit_density.id=='RP_19'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=26) & (new_pit_density.id=='RP_20'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=32) & (new_pit_density.id=='RP_21'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=40) & (new_pit_density.id=='RP_22'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=51) & (new_pit_density.id=='RP_23'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=59) & (new_pit_density.id=='RP_24'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=73) & (new_pit_density.id=='RP_25'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=20) & (new_pit_density.id=='SC_03'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=50) & (new_pit_density.id=='SD_03'), 'TYPE']='I'
new_pit_density.loc[(new_pit_density.top>=39) & (new_pit_density.id=='SM_03'), 'TYPE']='I'
# No surface crust at pit SO_03
new_pit_density.loc[(new_pit_density.top>=48) & (new_pit_density.id=='SR_03'), 'TYPE']='I'
# No surface crust for ST_03 or SV_03


In [13]:
# Set remaining ice crusts, having manually identified which density layers they belong to:
new_pit_density.loc[(new_pit_density.top==24)& (new_pit_density.id=='RP_04'), 'TYPE']='I'#crust at 23cm
new_pit_density.loc[(new_pit_density.top==25)& (new_pit_density.id=='RP_06'), 'TYPE']='I'#crusts at 25 and 23 cm
new_pit_density.loc[(new_pit_density.top==16)& (new_pit_density.id=='RP_08'), 'TYPE']='I'#crust at 15cm
new_pit_density.loc[(new_pit_density.top==21)& (new_pit_density.id=='RP_10'), 'TYPE']='I'#crust at 21cm
new_pit_density.loc[(new_pit_density.top==28)& (new_pit_density.id=='RP_11'), 'TYPE']='I'#crust at 28cm
#Comment for RP_13 mentions crust, but grain type does not
new_pit_density.loc[(new_pit_density.top==25)& (new_pit_density.id=='RP_14'), 'TYPE']='I'#crust at 25cm
new_pit_density.loc[(new_pit_density.top==23)& (new_pit_density.id=='RP_15'), 'TYPE']='I'#crust at 23cm, strat_comment also mentions crust at 20cm, but grain type does not
new_pit_density.loc[(new_pit_density.top==20)& (new_pit_density.id=='RP_17'), 'TYPE']='I'#crust at 20cm
new_pit_density.loc[(new_pit_density.top==26)& (new_pit_density.id=='RP_21'), 'TYPE']='I'#crust at 24cm
new_pit_density.loc[(new_pit_density.top==19)& (new_pit_density.id=='RP_22'), 'TYPE']='I'#crust at 17cm
new_pit_density.loc[(new_pit_density.top==45)& (new_pit_density.id=='RP_23'), 'TYPE']='I'#crust at 43cm
new_pit_density.loc[(new_pit_density.top==53)& (new_pit_density.id=='RP_24'), 'TYPE']='I'#crust at 52cm
new_pit_density.loc[(new_pit_density.top==24)& (new_pit_density.id=='SM_03'), 'TYPE']='I'#crust at 24cm

In [14]:
#print to check works
new_pit_density  

# Don't delete Ice until after comparison_df

Unnamed: 0,top,id,bottom,density,TYPE
0,42.0,RP_01,39.0,337.0,I
1,39.0,RP_01,36.0,346.0,R
2,36.0,RP_01,33.0,336.0,R
3,33.0,RP_01,30.0,341.0,F
4,30.0,RP_01,27.0,336.0,F
5,27.0,RP_01,24.0,273.0,D
6,24.0,RP_01,21.0,250.0,D
7,21.0,RP_01,18.0,285.0,D
8,18.0,RP_01,15.0,260.0,D
9,15.0,RP_01,12.0,237.0,D


In [15]:
# It's all about generating the comparison_df
comparison_df = pd.DataFrame()
min_scaling_coeff = []
all_smp_df = pd.DataFrame()

# Loop over pits
for p in pit_list:
    # Extract smp data for profile nearest to snowpit
    nearest_smp_profile = get_nearest_smp_data(p)
    # Only pick out pits with nearest smp
    # Will ignore pits that return NaN (not a pandas dataframe) for get_nearest_smp_data
    if isinstance(nearest_smp_profile, pd.DataFrame):
        # Get density data
        density_df = new_pit_density[new_pit_density.id == p].rename(columns={'density': 'RHO'})
        # Add in relative height
        # Think this is defined as distance of layer midpoint from surface, in mm
        cutter_size = float(stats.mode(density_df.top.iloc[0] - density_df.bottom).mode) 
        density_df = density_df.assign(relative_height_mm = (density_df.top[0] - density_df.bottom - cutter_size / 2) * 10)
        
        # Linear interpolation of SMP data (so have no NaN values)
        nearest_smp_profile = nearest_smp_profile.interpolate(method='linear', limit_direction='forward', axis=0)
        
        print (p) # Tells you which profiles have numerical instabilities - can remove this statement later
        # Make first guess at microstructure based on original profile
        l2012 = loewe2012.calc(nearest_smp_profile, window=WINDOW_SIZE)
        p2015 = proksch2015.calc(nearest_smp_profile, window=WINDOW_SIZE)
        
        smp_profile_height = p2015.distance.max() # This in mm
        # I have used the snow depth from this snow pit. Josh has used mean of magnaprobe depths
        # To get mean of magnaprobe depths, you can use tvc[p].magnaprobe.MgP_Summary.Mean_MgPDepth
        # If SMP height is less than snowpack height, no need to shorten profile
        smp_height_diff = min(0, density_df.top.iloc[0] * 10 - smp_profile_height) 
        
        # Create new SMP resampled arrays and determine the number of layers
        depth_array = np.arange(0, p2015.distance.max() + smp_height_diff, H_RESAMPLE)
        density_array = np.interp(depth_array,p2015.distance,p2015.P2015_density)
        force_array = np.interp(depth_array,p2015.distance,l2012.force_median)
        l_array = np.interp(depth_array,p2015.distance,l2012.L2012_L)
        id_array = p

        smp_df = pd.DataFrame({'distance': depth_array, 
                               'density': density_array,
                               'force_median': force_array,
                               'l': l_array, 
                               'id':id_array,})
        
        all_smp_df = all_smp_df.append(smp_df)

        # Generate a selection of random transformation to brute-force alignment
        # We use this brute force approach because there was no gradient that could be used to optimize the relationship
        num_sections = np.ceil(len(smp_df.index)/L_RESAMPLE).astype(int)
        random_tests = [smpfunc.random_stretch(x, MAX_STRETCH_OVERALL, MAX_STRETCH_LAYER) for x in np.repeat(num_sections, NUM_TESTS)] 

        scaled_profiles = [smpfunc.scale_profile(test, smp_df.distance.values, smp_df.density.values, L_RESAMPLE, H_RESAMPLE) for test in random_tests]
        compare_profiles = [smpfunc.extract_samples(dist, rho, density_df.relative_height_mm.values, cutter_size) for dist, rho in scaled_profiles]
        compare_profiles = [pd.concat([profile, density_df.reset_index()], axis=1, sort=False) for profile in compare_profiles]
        retrieved_skill = [smpfunc.calc_skill(profile, cutter_size) for profile in compare_profiles]
        retrieved_skill = pd.DataFrame(retrieved_skill,columns = ['r','rmse','rmse_corr','mae'])

        min_scaling_idx = retrieved_skill.sort_values(['r', 'rmse_corr'], ascending=[False, True]).head(1).index.values
        min_scaling_coeff.append(random_tests[int(min_scaling_idx)])
        
        dist, scaled_l =  smpfunc.scale_profile(min_scaling_coeff[-1], smp_df.distance.values, smp_df.l.values, L_RESAMPLE, H_RESAMPLE)
        dist, scaled_force_median = smpfunc.scale_profile(min_scaling_coeff[-1], smp_df.distance.values, smp_df.force_median.values, L_RESAMPLE, H_RESAMPLE)

        result = compare_profiles[int(min_scaling_idx)].assign(l=smpfunc.extract_samples(dist, scaled_l, density_df.relative_height_mm.values, cutter_size).mean_samp,
                                                  force_median=smpfunc.extract_samples(dist, scaled_force_median, density_df.relative_height_mm.values, cutter_size).mean_samp)
        comparison_df = comparison_df.append(result, ignore_index=True)

RP_02
RP_05
RP_06
RP_07
SM_03


  density = a1 + a2 * np.log(fm) + a3 * np.log(fm) * l + a4 * l
  density = a1 + a2 * np.log(fm) + a3 * np.log(fm) * l + a4 * l
  lc = c1 + c2 * l + c3 * np.log(fm)
  r = func(a, **kwargs)


In [16]:
# save the results to a local file since tbe brute-force method takes a while to compute
comparison_df.to_pickle("./output/TVC/TVC_March2019_smp_pit_comparison_4.pkl")

In [17]:
# Save to .csv file so I can check it out in matlab

print(comparison_df)

comparison_df.to_csv('./output/TVC/TVC_March2019_comparison_4.csv',na_rep='NaN')


    count_samp   mean_samp  median_samp  stdev_samp  index   top     id  \
0            7  600.464153   610.021568   16.155445      0  41.0  RP_02   
1            7  499.062646   508.479679   13.728547      1  38.0  RP_02   
2            7  454.483428   448.202937   14.221401      2  35.0  RP_02   
3            7  465.045123   465.863167    2.710764      3  32.0  RP_02   
4            7  292.943391   297.934994   12.433111      4  29.0  RP_02   
5            7  417.585754   423.139644   20.297706      5  26.0  RP_02   
6            7  347.070314   353.723405   29.615073      6  23.0  RP_02   
7            7  291.154597   290.853564    5.024897      7  20.0  RP_02   
8            7  328.406313   329.517079    3.656556      8  17.0  RP_02   
9            7  277.036660   278.351234    3.371156      9  14.0  RP_02   
10           7  278.041750   278.072029    3.135357     10  11.0  RP_02   
11           7  276.475979   276.462568    0.732961     11   8.0  RP_02   
12           7  568.98379

In [18]:
print(all_smp_df) # smp_df rewrites for each pit, so need to collate

all_smp_df.to_csv('./output/TVC/TVC_March2019_smp_4.csv',na_rep='NaN') 

     distance      density  force_median         l     id
0         0.0   389.921479      8.674098  0.583570  RP_02
1         1.0   363.165951     11.047175  0.651025  RP_02
2         2.0   336.410423     13.420252  0.718481  RP_02
3         3.0   369.652724     16.153764  0.668636  RP_02
4         4.0   462.892854     19.247709  0.501491  RP_02
5         5.0   556.132984     22.341655  0.334346  RP_02
6         6.0   585.513053     22.978861  0.284453  RP_02
7         7.0   614.893122     23.616066  0.234560  RP_02
8         8.0   626.228987     24.002809  0.216035  RP_02
9         9.0   619.520648     24.139090  0.228875  RP_02
10       10.0   612.812310     24.275371  0.241716  RP_02
11       11.0   608.672120     24.671585  0.251179  RP_02
12       12.0   604.531931     25.067800  0.260643  RP_02
13       13.0   611.737620     25.331811  0.249216  RP_02
14       14.0   630.289189     25.463620  0.216900  RP_02
15       15.0   648.840757     25.595429  0.184583  RP_02
16       16.0 

In [18]:
# Summary of scaling > Whole cell having problems, but not essential now so come back to it later
smp_thickness = [p.distance.max() for p in all_smp_df.depth_array]
scaling_total = [s.sum() for s in min_scaling_coeff]  # left blank in above comparison.df, so leave this line for the time being
scaling_mean_abs =  np.round(np.abs(np.array(scaling_total)).mean(), 3)
print('Average scaling: %0.3f' % scaling_mean_abs) # in %

#Rerun save cell after ?

AttributeError: 'DataFrame' object has no attribute 'depth_array'

In [19]:
#Filter results
result = comparison_df.dropna() #All NaNs should already have been removed, but just in case
result = result[result['count_samp']>=cutter_size*2] # Remove comparisons outside the profile
result = result[~result['TYPE'].isin(['N', 'I'])] # Remove new snow and ice because we don't have enough samples
result['error'] = result['mean_samp']-result['RHO']
result.head()

Unnamed: 0,count_samp,mean_samp,median_samp,stdev_samp,index,top,id,bottom,RHO,TYPE,relative_height_mm,l,force_median,error
2,7,454.483428,448.202937,14.221401,2,35.0,RP_02,32.0,298.0,R,75.0,0.297753,3.591699,156.483428
3,7,465.045123,465.863167,2.710764,3,32.0,RP_02,29.0,318.0,R,105.0,0.248376,3.317155,147.045123
4,7,292.943391,297.934994,12.433111,4,29.0,RP_02,26.0,258.0,F,135.0,0.695096,0.706025,34.943391
5,7,417.585754,423.139644,20.297706,5,26.0,RP_02,23.0,289.0,F,165.0,0.333723,2.355748,128.585754
6,7,347.070314,353.723405,29.615073,6,23.0,RP_02,20.0,259.0,F,195.0,0.513604,1.354725,88.070314


In [20]:
#Compare manual density cutter measurements and SMP-derived densities
# P2015 evaluation stats
p2015_rmse = np.sqrt(np.mean(result['error']**2))
p2015_bias = (result['error']).mean()
p2015_r2 = np.ma.corrcoef(result['mean_samp'],result['RHO'])[0, 1]**2
p2015_n = len(result['mean_samp'])
p2015_p = stats.pearsonr(result['mean_samp'],result['RHO'])[1]

print('Proksch et al. 2015 Eval.')
print('N: %i' % p2015_n)
print('RMSE: %0.1f' % np.round(p2015_rmse))
print('bias: %0.1f' % np.round(p2015_bias))
print('r^2: %0.2f' % p2015_r2)

# Error as a % of mean density
np.round(rmse(result.error)/ result['RHO'].mean(),2)

# RMSE by layer type
result.groupby('TYPE')['error'].apply(rmse)

Proksch et al. 2015 Eval.
N: 43
RMSE: 107.0
bias: 77.0
r^2: 0.71


TYPE
D     94.964447
F     92.216623
R    134.847261
Name: error, dtype: float64

In [21]:
# Export the dataset
result.to_pickle("./output/TVC/TVC_March2019_smp_pit_filtered_4.pkl")