### Part 3 Snow Grain Type Classification
[Benoit Montpetit](https://github.com/ecccben), *CPS/CRD/ECCC*, 2024  
[Josh King](https://github.com/kingjml), *CPS/CRD/ECCC*, 2021  
[Mike Brady](https://github.com/m9brady), *CPS/CRD/ECCC*, 2024


To characterize the grain types of the different SMP profile layers acquired away from the central snowpit (see figure below) a similar Support Vector Machine classifier [(King et al., 2020)](https://tc.copernicus.org/articles/14/4323/2020/tc-14-4323-2020.pdf) was used here with some differences. Only the sites of the January 2019 TVC campaign were used and no fresh snow layer was observed during that campaign.

**Changes from [King et al. (2020)](https://tc.copernicus.org/articles/14/4323/2020/tc-14-4323-2020.pdf)**
- Only two snow types were used compared to three in [King et al. (2020)](https://tc.copernicus.org/articles/14/4323/2020/tc-14-4323-2020.pdf)
- Snow types used: .Rounded grains (wind slab layer in Arctic Snowpacks)
                   .Depth hoar layer
- Mixed/Faceted grain layers were reported by surveyors but were labelled as rounded grains for this classifier (see [Montpetit et al. (2024)](link TBD))

<center><img src="Figures/Figure3.png" height="500px" class="bg-primary mb-1"></center>

<center>Figure 3 of [Montpetit et al. (2024)](Link TBD): Ground based snow measurements sampling scheme..</center>

In [None]:
import pandas as pd
from sklearn.model_selection import cross_val_score, StratifiedShuffleSplit
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.svm import SVC  
from sklearn import preprocessing
import numpy as np
import pickle
from matplotlib import pyplot as plt
import matplotlib

#Parameters used for plotting
font = {'family' : 'sans-serif',
        'weight' : 'bold',
        'size'   : 22}

matplotlib.rc('font', **font)
plt.rcParams["axes.labelsize"] = 22
plt.rcParams["axes.labelweight"] = 'bold'
plt.rcParams['xtick.labelsize']=16
plt.rcParams['ytick.labelsize']=16

# Loading SMP density and SSA data

In [None]:
from constants import TVC02
sites = pd.DataFrame({'site':TVC02})
sites.site.replace({'RS':'RP'}, regex=True, inplace=True)
sites=list(sites.site.values)

In [None]:
density = pd.read_pickle(r"Data/Scaled_SMP_DENS.pkl")
density.site.replace({'RS':'RP'}, regex=True, inplace=True)
density = density.query('site in @sites')
density.loc[density.grain_type=='M','grain_type']='R'
density.head(10)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>rel_height</th>
      <th>force_median</th>
      <th>lambda</th>
      <th>f0</th>
      <th>delta</th>
      <th>l</th>
      <th>smp_val</th>
      <th>height</th>
      <th>rel_height</th>
      <th>ref_val</th>
      <th>grain_type</th>
      <th>site</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>11</th>
      <td>28.75</td>
      <td>1.999456</td>
      <td>13394.126829</td>
      <td>0.036740</td>
      <td>0.027112</td>
      <td>0.148287</td>
      <td>329.372727</td>
      <td>405.0</td>
      <td>15.0</td>
      <td>323.0</td>
      <td>R</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>12</th>
      <td>70.00</td>
      <td>17.116735</td>
      <td>11473.607940</td>
      <td>0.587331</td>
      <td>0.022848</td>
      <td>0.202061</td>
      <td>405.581818</td>
      <td>375.0</td>
      <td>45.0</td>
      <td>407.0</td>
      <td>R</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>13</th>
      <td>108.75</td>
      <td>5.763592</td>
      <td>1790.130465</td>
      <td>0.711349</td>
      <td>0.075452</td>
      <td>0.521895</td>
      <td>290.105000</td>
      <td>345.0</td>
      <td>75.0</td>
      <td>327.0</td>
      <td>F</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>14</th>
      <td>148.75</td>
      <td>0.129682</td>
      <td>44.125177</td>
      <td>0.082887</td>
      <td>0.089510</td>
      <td>0.780494</td>
      <td>222.858333</td>
      <td>315.0</td>
      <td>105.0</td>
      <td>226.0</td>
      <td>F</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>15</th>
      <td>173.75</td>
      <td>0.107949</td>
      <td>24.275730</td>
      <td>0.105600</td>
      <td>0.111294</td>
      <td>0.954482</td>
      <td>226.450000</td>
      <td>285.0</td>
      <td>135.0</td>
      <td>236.0</td>
      <td>H</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>16</th>
      <td>193.75</td>
      <td>0.114691</td>
      <td>28.745240</td>
      <td>0.101713</td>
      <td>0.099316</td>
      <td>0.893205</td>
      <td>224.500000</td>
      <td>255.0</td>
      <td>165.0</td>
      <td>216.0</td>
      <td>H</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>17</th>
      <td>213.75</td>
      <td>0.194934</td>
      <td>47.616875</td>
      <td>0.110115</td>
      <td>0.090603</td>
      <td>0.771190</td>
      <td>228.162500</td>
      <td>225.0</td>
      <td>195.0</td>
      <td>226.0</td>
      <td>H</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>18</th>
      <td>233.75</td>
      <td>0.233657</td>
      <td>49.531328</td>
      <td>0.126143</td>
      <td>0.091845</td>
      <td>0.777757</td>
      <td>231.200000</td>
      <td>195.0</td>
      <td>225.0</td>
      <td>230.0</td>
      <td>H</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>19</th>
      <td>268.75</td>
      <td>0.182684</td>
      <td>26.604495</td>
      <td>0.176244</td>
      <td>0.105839</td>
      <td>0.933630</td>
      <td>227.315000</td>
      <td>165.0</td>
      <td>255.0</td>
      <td>215.0</td>
      <td>H</td>
      <td>RP20</td>
    </tr>
    <tr>
      <th>20</th>
      <td>325.00</td>
      <td>0.205706</td>
      <td>17.904735</td>
      <td>0.236160</td>
      <td>0.123278</td>
      <td>1.120259</td>
      <td>239.936000</td>
      <td>135.0</td>
      <td>285.0</td>
      <td>236.0</td>
      <td>H</td>
      <td>RP20</td>
    </tr>
  </tbody>
</table>
</div>

In [None]:
ssa = pd.read_pickle(r"Data/Scaled_SMP_SSA.pkl")
ssa.site.replace({'RS':'RP'}, regex=True, inplace=True)
ssa = ssa.query('site in @sites')
ssa.loc[ssa.grain_type=='M','grain_type']='R'
ssa.head(10)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>rel_height</th>
      <th>force_median</th>
      <th>lambda</th>
      <th>f0</th>
      <th>delta</th>
      <th>l</th>
      <th>smp_val</th>
      <th>height</th>
      <th>rel_height</th>
      <th>ref_val</th>
      <th>grain_type</th>
      <th>site</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>130</th>
      <td>35.00</td>
      <td>10.682532</td>
      <td>5166.249156</td>
      <td>0.384389</td>
      <td>0.044706</td>
      <td>0.220131</td>
      <td>23.766667</td>
      <td>365.0</td>
      <td>15.0</td>
      <td>30.424397</td>
      <td>R</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>131</th>
      <td>90.00</td>
      <td>10.334586</td>
      <td>6487.907595</td>
      <td>0.155855</td>
      <td>0.023654</td>
      <td>0.150377</td>
      <td>27.429412</td>
      <td>335.0</td>
      <td>45.0</td>
      <td>27.326296</td>
      <td>R</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>132</th>
      <td>127.50</td>
      <td>14.406387</td>
      <td>8268.741854</td>
      <td>0.173074</td>
      <td>0.024442</td>
      <td>0.140396</td>
      <td>27.492308</td>
      <td>305.0</td>
      <td>75.0</td>
      <td>29.027758</td>
      <td>R</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>133</th>
      <td>175.00</td>
      <td>3.632045</td>
      <td>2482.043810</td>
      <td>0.293405</td>
      <td>0.036547</td>
      <td>0.312307</td>
      <td>21.788000</td>
      <td>275.0</td>
      <td>105.0</td>
      <td>25.058522</td>
      <td>R</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>134</th>
      <td>215.00</td>
      <td>1.226848</td>
      <td>473.538267</td>
      <td>0.183021</td>
      <td>0.050627</td>
      <td>0.432032</td>
      <td>16.942857</td>
      <td>245.0</td>
      <td>135.0</td>
      <td>19.776577</td>
      <td>F</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>135</th>
      <td>232.50</td>
      <td>0.177210</td>
      <td>44.186784</td>
      <td>0.112063</td>
      <td>0.091678</td>
      <td>0.803874</td>
      <td>11.642857</td>
      <td>215.0</td>
      <td>165.0</td>
      <td>19.492045</td>
      <td>F</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>136</th>
      <td>252.50</td>
      <td>0.204407</td>
      <td>39.873839</td>
      <td>0.140241</td>
      <td>0.092011</td>
      <td>0.822104</td>
      <td>10.433333</td>
      <td>185.0</td>
      <td>195.0</td>
      <td>14.948928</td>
      <td>F</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>137</th>
      <td>273.75</td>
      <td>0.214830</td>
      <td>37.775974</td>
      <td>0.150803</td>
      <td>0.087268</td>
      <td>0.824758</td>
      <td>10.425000</td>
      <td>155.0</td>
      <td>225.0</td>
      <td>15.396658</td>
      <td>H</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>138</th>
      <td>295.00</td>
      <td>0.134176</td>
      <td>26.514755</td>
      <td>0.119010</td>
      <td>0.105548</td>
      <td>0.926103</td>
      <td>9.633333</td>
      <td>125.0</td>
      <td>255.0</td>
      <td>12.998055</td>
      <td>H</td>
      <td>RP18</td>
    </tr>
    <tr>
      <th>139</th>
      <td>315.00</td>
      <td>0.237909</td>
      <td>29.237959</td>
      <td>0.176389</td>
      <td>0.103899</td>
      <td>0.884335</td>
      <td>8.400000</td>
      <td>95.0</td>
      <td>285.0</td>
      <td>13.651403</td>
      <td>H</td>
      <td>RP18</td>
    </tr>
  </tbody>
</table>
</div>

# Creating the Training/Validation dataframe

In [None]:
class_data = pd.DataFrame({'rel_height': density.rel_height.iloc[:,0], 'site':density.site,
                            'F_med':density.force_median, 'L':density.l, 
                            'rho':density.ref_val, 'type':density.grain_type})
class_data.head(10)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>rel_height</th>
      <th>site</th>
      <th>F_med</th>
      <th>L</th>
      <th>rho</th>
      <th>type</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>11</th>
      <td>28.75</td>
      <td>RP20</td>
      <td>1.999456</td>
      <td>0.148287</td>
      <td>323.0</td>
      <td>R</td>
    </tr>
    <tr>
      <th>12</th>
      <td>70.00</td>
      <td>RP20</td>
      <td>17.116735</td>
      <td>0.202061</td>
      <td>407.0</td>
      <td>R</td>
    </tr>
    <tr>
      <th>13</th>
      <td>108.75</td>
      <td>RP20</td>
      <td>5.763592</td>
      <td>0.521895</td>
      <td>327.0</td>
      <td>F</td>
    </tr>
    <tr>
      <th>14</th>
      <td>148.75</td>
      <td>RP20</td>
      <td>0.129682</td>
      <td>0.780494</td>
      <td>226.0</td>
      <td>F</td>
    </tr>
    <tr>
      <th>15</th>
      <td>173.75</td>
      <td>RP20</td>
      <td>0.107949</td>
      <td>0.954482</td>
      <td>236.0</td>
      <td>H</td>
    </tr>
    <tr>
      <th>16</th>
      <td>193.75</td>
      <td>RP20</td>
      <td>0.114691</td>
      <td>0.893205</td>
      <td>216.0</td>
      <td>H</td>
    </tr>
    <tr>
      <th>17</th>
      <td>213.75</td>
      <td>RP20</td>
      <td>0.194934</td>
      <td>0.771190</td>
      <td>226.0</td>
      <td>H</td>
    </tr>
    <tr>
      <th>18</th>
      <td>233.75</td>
      <td>RP20</td>
      <td>0.233657</td>
      <td>0.777757</td>
      <td>230.0</td>
      <td>H</td>
    </tr>
    <tr>
      <th>19</th>
      <td>268.75</td>
      <td>RP20</td>
      <td>0.182684</td>
      <td>0.933630</td>
      <td>215.0</td>
      <td>H</td>
    </tr>
    <tr>
      <th>20</th>
      <td>325.00</td>
      <td>RP20</td>
      <td>0.205706</td>
      <td>1.120259</td>
      <td>236.0</td>
      <td>H</td>
    </tr>
  </tbody>
</table>
</div>

In [None]:
ssas = []
for i in range(len(class_data)):
    
    temp_data = class_data.iloc[i].copy()
    
    ssa_temp = ssa[ssa.site==temp_data.site]
    
    ssas.append(ssa_temp[np.abs(ssa_temp.rel_height.iloc[:,0]-temp_data.rel_height)==
                         np.abs(ssa_temp.rel_height.iloc[:,0]-temp_data.rel_height).min()].ref_val.values[0])
class_data['ssa']=ssas

In [None]:
class_data=class_data.query('site in @sites')
class_data.head(10)

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>rel_height</th>
      <th>site</th>
      <th>F_med</th>
      <th>L</th>
      <th>rho</th>
      <th>type</th>
      <th>ssa</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>11</th>
      <td>28.75</td>
      <td>RP20</td>
      <td>1.999456</td>
      <td>0.148287</td>
      <td>323.0</td>
      <td>R</td>
      <td>59.673331</td>
    </tr>
    <tr>
      <th>12</th>
      <td>70.00</td>
      <td>RP20</td>
      <td>17.116735</td>
      <td>0.202061</td>
      <td>407.0</td>
      <td>R</td>
      <td>52.017675</td>
    </tr>
    <tr>
      <th>13</th>
      <td>108.75</td>
      <td>RP20</td>
      <td>5.763592</td>
      <td>0.521895</td>
      <td>327.0</td>
      <td>F</td>
      <td>33.944069</td>
    </tr>
    <tr>
      <th>14</th>
      <td>148.75</td>
      <td>RP20</td>
      <td>0.129682</td>
      <td>0.780494</td>
      <td>226.0</td>
      <td>F</td>
      <td>21.074667</td>
    </tr>
    <tr>
      <th>15</th>
      <td>173.75</td>
      <td>RP20</td>
      <td>0.107949</td>
      <td>0.954482</td>
      <td>236.0</td>
      <td>H</td>
      <td>13.834073</td>
    </tr>
    <tr>
      <th>16</th>
      <td>193.75</td>
      <td>RP20</td>
      <td>0.114691</td>
      <td>0.893205</td>
      <td>216.0</td>
      <td>H</td>
      <td>13.086377</td>
    </tr>
    <tr>
      <th>17</th>
      <td>213.75</td>
      <td>RP20</td>
      <td>0.194934</td>
      <td>0.771190</td>
      <td>226.0</td>
      <td>H</td>
      <td>13.944551</td>
    </tr>
    <tr>
      <th>18</th>
      <td>233.75</td>
      <td>RP20</td>
      <td>0.233657</td>
      <td>0.777757</td>
      <td>230.0</td>
      <td>H</td>
      <td>13.944551</td>
    </tr>
    <tr>
      <th>19</th>
      <td>268.75</td>
      <td>RP20</td>
      <td>0.182684</td>
      <td>0.933630</td>
      <td>215.0</td>
      <td>H</td>
      <td>13.067053</td>
    </tr>
    <tr>
      <th>20</th>
      <td>325.00</td>
      <td>RP20</td>
      <td>0.205706</td>
      <td>1.120259</td>
      <td>236.0</td>
      <td>H</td>
      <td>11.193565</td>
    </tr>
  </tbody>
</table>
</div>

Showing the three snow grain types surveyed:

-R: Rounded grains  
-F: Mixed/Facetted grains  
-H: Depth hoar grains  

In [None]:
class_data.type.unique()

array(['R', 'F', 'H'], dtype=object)

In [None]:
fig, ax = plt.subplots(1,2, figsize=(20,10))

hist_rho_ws=ax[0].hist(class_data.rho[class_data.type=='R'], bins=np.arange(100,460,25), density=True, alpha=1, label='Rounded grains', color='k', edgecolor='k')
hist_rho_f=ax[0].hist(class_data.rho[class_data.type=='F'], bins=np.arange(100,460,25), density=True, alpha=0.5, label='Mixed/Facets', color='grey', edgecolor='k')
hist_rho_dh=ax[0].hist(class_data.rho[class_data.type=='H'], bins=np.arange(100,460,25), density=True, alpha=0.5, label='Depth Hoar', color='cyan', edgecolor='k')

hist_ssa_ws=ax[1].hist(class_data.ssa[class_data.type=='R'], bins=np.arange(10,72,4), density=True, alpha=1, color='k', edgecolor='k')
hist_ssa_f=ax[1].hist(class_data.ssa[class_data.type=='F'], bins=np.arange(10,72,4), density=True, alpha=0.5, color='grey', edgecolor='k')
hist_ssa_dh=ax[1].hist(class_data.ssa[class_data.type=='H'], bins=np.arange(10,72,4), density=True, alpha=0.5, color='cyan', edgecolor='k')

ax[0].set_xlabel('$\\rho_{snow}$ $(kg \\cdot m^3)$')
ax[1].set_xlabel('SSA ($m^2 \\cdot kg^{-1}$)')
ax[0].legend()

<center><img src="Figures/Figure9.png" height="500px"></center>

<center>Figure 9 of [Montpetit et al. (2024)](Link TBD): Distribution of snow density and SSA for the three dominant grain type layers for the January campaign.</center>

# Labelling the Mixed/Faceted layers as Rounded grains

In [None]:
class_data.loc[:,'class_type'] = class_data.type.values
class_data = class_data.loc[class_data.type.isin(['R', 'H', 'F'])]
class_data.loc[class_data.type != 'H','type'] = 'R'
class_data.type.unique()

# Creating the input variable dataframe (X) and the output labels (y)

In [None]:
model_data = pd.concat([class_data.loc[class_data.type=='H'].sample(len(class_data.loc[class_data.type=='R'])), class_data.loc[class_data.type=='R']])

X = model_data.drop(['type','class_type','site', 'ssa','rho'], axis=1)
y = model_data['type']

In [None]:
RANDOM_SEED = 2023
stratified_shuffle = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=RANDOM_SEED)
stratified_shuffle.get_n_splits(X, y)

In [None]:
svclassifier = SVC(kernel='linear', gamma='scale', probability = True)

In [None]:
scores = cross_val_score(svclassifier, preprocessing.scale(X), y.values, cv = stratified_shuffle, scoring = 'accuracy')
print("Accuracy: %0.1f (+/- %0.1f)" % (scores.mean()*100, scores.std()*100))

Accuracy: 88.4 (+/- 2.5)


# Generating the confusion matrices for each randomly shuffled iteration

In [None]:
conf_mat_full = None 

fig, ax = plt.subplots(4,3,figsize=(15,15))
fig.subplots_adjust(hspace=.4, wspace=.25)

i=0
j=0
k=0

for train, test in stratified_shuffle.split(X, y):
    svclassifier.fit(X.iloc[train], y.iloc[train].values)
    ypred = svclassifier.predict(X.iloc[test])
    conf_mat = confusion_matrix(y.iloc[test].values, ypred, labels=["R", "H"])
    
    if conf_mat_full is None:
        conf_mat_full = conf_mat
    else:
        conf_mat_full = conf_mat_full + conf_mat
    
    ConfusionMatrixDisplay.from_predictions(y.iloc[test].values,ypred, normalize='true', ax=ax[j,i])
    # ax[j,i].set_title(k)
    k+=1
    i+=1
    if i==3:
        i=0
        j+=1

ax[-1,-1].axis('off')
ax[-1,-2].axis('off')

<center><img src="Figures/Part_3_GrainClass_Fig1.png" height="500px"></center>

<center>Figure: Confusion matrices for each randomly shuffled iteration.</center>

In [None]:
cmat = pd.DataFrame([conf_mat_full[0,:]/conf_mat_full[0,:].sum(),
 conf_mat_full[:,1]/conf_mat_full[:,1].sum()], columns=['H','R'], index=['H','R'])
cmat

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>H</th>
      <th>R</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>H</th>
      <td>0.819444</td>
      <td>0.180556</td>
    </tr>
    <tr>
      <th>R</th>
      <td>0.168831</td>
      <td>0.831169</td>
    </tr>
  </tbody>
</table>
</div>

In [None]:
pickle.dump(svclassifier, open(r'data/SVClassifierTVC02_2.pkl', 'wb'))

In [None]:
class_data = pd.DataFrame({'rel_height': density.rel_height.iloc[:,0], 'site':density.site,
                            'F_med':density.force_median, 'L':density.l, 
                            'rho':density.ref_val, 'type':density.grain_type})

ssas = []
for i in range(len(class_data)):
    
    temp_data = class_data.iloc[i].copy()
    
    ssa_temp = ssa[ssa.site==temp_data.site]
    
    ssas.append(ssa_temp[np.abs(ssa_temp.rel_height.iloc[:,0]-temp_data.rel_height)==
                         np.abs(ssa_temp.rel_height.iloc[:,0]-temp_data.rel_height).min()].ref_val.values[0])
class_data['ssa']=ssas

class_data.loc[:,'class_type']=class_data.type.values

class_data.loc[:,'class_type']=svclassifier.predict(class_data.drop(['type','class_type','site', 'ssa', 'rho'], axis=1))

Confusion matrix of all the Training/Validation dataset. The three original grain types are shown in order to see how the classifier performed for each grain type individually and see how it performed for the mixed/faceted grain type to validate the original assumption of labelling it as rounded grains.

In [None]:
cmat_all = pd.DataFrame({'F':[0,0,0],'R':[0,0,0],'H':[0,0,0]}, index = ['F','R','H'])

for original in ['F','R','H']:
    for final in ['F','R','H']:
        cmat_all.loc[original, final]=len(class_data.loc[(class_data.type==original) & (class_data.class_type==final)])/len(class_data.loc[(class_data.type==original)])

cmat_all

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>F</th>
      <th>R</th>
      <th>H</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>F</th>
      <td>0</td>
      <td>0.673077</td>
      <td>0.326923</td>
    </tr>
    <tr>
      <th>R</th>
      <td>0</td>
      <td>0.981481</td>
      <td>0.018519</td>
    </tr>
    <tr>
      <th>H</th>
      <td>0</td>
      <td>0.082645</td>
      <td>0.917355</td>
    </tr>
  </tbody>
</table>
</div>