In [3]:
from sklearn.cluster import KMeans
from sklearn.metrics import normalized_mutual_info_score
import pandas as pd

The assumption is that, by clustering both the hyperspectral and the fractional cover measurements, the clusterings will be similar, i.e. a sample which is close to other samples in terms of hyperspectral measurements would also be close from the point of view of fractional cover. If this is not true, that it is likely that there is no correlation between those measurements. The second stronger assumption is that each cluster will correspond to measurements from just one plot. If this is not true, then it is likely that the measurements are completely independent of the plot they have been taken in, so there is probably no correlation to other measurements taken from the same plot.

In [4]:
features_pca = pd.read_csv("datasets/10features_pca.csv").to_numpy()
fractional_cover = pd.read_csv("datasets/complete_fc.csv")
fractional_cover = fractional_cover.drop(columns=["plot_ID", "location"]).to_numpy()

In [5]:
features_pca

array([[ -2.78563161, -15.29500909,   9.61307589, ...,   1.88190458,
          0.63761465,  -0.06718317],
       [-29.89724085, -17.97538308,  23.39548645, ...,   3.68282091,
          3.03555686,  -0.95001886],
       [  5.94883324,  -5.42694336,   4.00566377, ...,   4.0219997 ,
          1.92231025,   0.85569335],
       ...,
       [ -5.85832894, -12.32179918, -11.16971251, ...,   1.90179165,
          0.89950262,   0.07780176],
       [ 24.99322276,  10.97564101, -10.04499787, ...,   4.05120409,
          0.69606782,  -0.36735629],
       [-11.09939396, -18.23812107, -14.25655914, ...,   2.22645763,
          0.72874516,   0.20453164]], shape=(869, 10))

In [6]:
fractional_cover

array([[0.41 , 0.04 , 0.   , 0.55 , 0.   ],
       [0.475, 0.03 , 0.   , 0.495, 0.   ],
       [0.54 , 0.02 , 0.   , 0.44 , 0.   ],
       ...,
       [0.64 , 0.   , 0.   , 0.36 , 0.   ],
       [0.64 , 0.   , 0.   , 0.36 , 0.   ],
       [0.64 , 0.   , 0.   , 0.36 , 0.   ]], shape=(869, 5))

In [7]:
kmeans_features_pca = KMeans(n_clusters=58).fit(features_pca)
features_pca_labels = kmeans_features_pca.labels_
kmeans_fractional_cover = KMeans(n_clusters=58).fit(fractional_cover)
fractional_cover_labels = kmeans_fractional_cover.labels_

In [8]:
features_pca_labels

array([31, 23, 31,  3,  2, 18, 50, 39, 54, 18,  2, 23,  2, 23, 11, 29, 35,
       22, 43, 43, 43, 10, 57, 57, 10, 43, 24, 43, 43, 43,  7, 29,  7,  6,
       51, 29, 29,  6,  6, 32,  7,  6,  6, 39,  6, 17,  6, 39, 39, 17, 30,
        2, 13, 19, 35,  2, 24, 17, 29, 39, 48, 10, 22,  5, 22, 48, 36,  5,
       31, 48, 48, 37, 47, 47, 10,  2, 31, 52,  0, 24, 35, 35, 17, 24, 35,
       35, 31, 24, 33, 40, 11, 36, 36, 11,  3, 25,  3, 49, 47, 47, 34, 25,
       47, 25,  1, 23, 49, 49, 23,  2, 52,  4, 12, 31, 52, 49, 19, 50, 31,
       52, 48, 11, 10,  5,  5, 10, 48, 48, 11, 36, 47, 36, 48,  5,  5, 50,
        0, 23,  0, 31, 36,  5, 31, 31, 35,  0, 24, 24, 24, 24, 23, 31, 50,
       18,  2, 50, 52, 52, 35,  2, 31, 11,  5, 31, 50, 17, 17, 48, 48, 48,
       17, 17, 17, 48, 12,  5, 48, 53, 24,  3, 49,  3,  3, 47, 36, 49, 11,
       23, 36, 25, 52, 23, 23, 49, 11, 23, 52, 45, 23,  3, 49,  3, 49, 52,
       25, 49, 25, 25,  3, 47, 47, 10, 47, 25, 25, 25, 25, 25, 25, 47, 36,
       52, 25, 25, 25, 15

In [9]:
fractional_cover_labels

array([40,  0, 42, 13, 51, 42,  0,  9, 46, 46, 46, 46, 46, 46, 46,  0, 42,
       13, 51, 35, 51, 13, 42, 40, 42, 13, 51, 35, 35, 35, 40,  0,  0,  0,
        0, 42, 13, 35, 15, 35, 35, 51, 13, 13, 13, 42, 42, 42, 42, 42, 42,
        0,  0, 40, 40, 40, 40, 40, 40, 40,  1,  1,  1, 34, 34,  1,  1, 32,
       14, 32,  1, 34, 26, 26, 26, 57, 57, 57, 57, 28, 28, 28,  8,  8,  8,
       18, 18, 18, 18, 18, 27, 50, 41, 30, 37, 37, 37, 37, 37, 15, 15, 49,
       49, 49, 49, 44, 22, 22, 22, 11, 11, 52, 52, 33, 52, 52, 22, 22, 22,
       22, 35, 35, 15, 15, 15, 35, 13, 42,  0, 42, 29,  7,  7,  7,  7, 22,
       44,  2,  2, 57,  2,  2, 22, 22,  9, 48, 29, 43, 43, 43, 53, 21, 20,
       20, 28, 20, 20, 23, 23, 39, 12, 50, 41, 41, 12, 50, 30, 37, 26, 37,
       15, 35, 51, 15, 37,  4, 26, 26, 26, 35,  7,  7, 50, 50, 43, 43, 29,
       42, 48, 48, 48,  9,  9,  9, 51, 29, 48, 48, 44, 44, 44, 23, 23, 39,
       43, 49,  4,  4,  4,  4,  4,  4, 26, 26, 26,  4,  4,  4, 15, 42, 40,
       22, 22, 22, 37, 49

Check similarity of clusterings.

In [10]:
normalized_mutual_info_score(features_pca_labels, fractional_cover_labels)

np.float64(0.4280415249854479)

The NMI score indicates a moderate to low level of similarity, so it is likely that the hyperspectral and the fractional cover measurements are modelled by very different distributions, which would make it difficult to predict one from the other. The assumption that one cluster will correspond to one plot was also incorrect, though we do observe some tendencies for neighboring locations to be clustered together.