**Proposed C. Shelf Classification:**

Prior to PCA and Factor Analysis and k-Means Clustering:
- Shallow Shelf: 0 meters to -71.3 meters 
- Intermediate Shelf: -71.3 meters to -202.6 meters 
- Deep Intermediate Shelf: -202.6 meters to -421.9 meters 
- Deep Shelf: -421.9 meters to -741.4 meters 

Following the Analyses:
- Shallow Shelf: 0 meters to -166.8 meters 
- Intermediate Shelf: -168.3 meters to -420.2 meters 
- Deep Shelf: -422.9 meters to -742.9 meters 

**Extract a 1000 point random sample from the composite (amalgamated) continental shelf model:**

In [None]:
# ## In GRASS:
# ## 1.) pull a 1000 observation random sample (arbitrary) from the global dataset w/slopes <= 0.69°
g.region -p region='ETOPO1_World_1km'
r.random input=ETOPO1_bathy_1211m_069deg_1km  npoints=1000 vector=ETOPO1_1211m_069deg_1000rpts

# ## 2.) export to csv file
v.out.ascii -c --overwrite input=ETOPO1_1211m_069deg_1000rpts@user type=point output=/Users/paulparis/Documents/Projects/csi/data/table/ETOPO1_1211m_069deg_1000rpts.csv columns=value separator=comma

# ## 3.) load sample data into R
fn1000<-'/Users/paulparis/Documents/Projects/csi/data/table/ETOPO1_1211m_069deg_1000rpts.csv'
dat1000<-read.csv(file=fn1000, header=TRUE) 
Z<-as.matrix(dat1000)
colnames(Z)<-c('easting','northing','cat','depth')

**In R, find the optimal number of clusters (k).** This is to be based on bootstrapping the clustering operation 100 times and computing average Jaccard similarities. The higher the similarity index value the more reliable the clusters are (the less likely they'll be spurious groupings with no real mathematical or physical meaning). 

Ultimately the number of clusters will also hinge on geomorphic considerations. We cannot divorce the geology from the equation--relying on numbers alone can certainly lead us astray. 

In [None]:
# ## in R using fpc library
# ## ...process thru the 1000-observation random sample using [k-means] clustering and
# ## the fpc library
z = Z[,'depth']

kmc<-clusterboot(Z[,'depth'], B=100, bootmethod="boot",clustermethod=kmeansCBI,krange=4, seed=15555)
print(kmc)
plot(kmc)

In [None]:
# ## CLUSTER VALIDATION USING fpc clusterboot():

# ## For k=2: 
# ## Clusterwise Jaccard bootstrap (omitting multiple points) mean: 0.9637098 0.9920365 
# ## Overall Jaccard mean: 0.97787315
# ## dissolved: 0 0
# ## recovered: 100 100

# ## For k=3:   <- BEST MATHEMATICAL FIT!
# ## Clusterwise Jaccard bootstrap (omitting multiple points) mean: 0.9824788 0.9972110 0.9878141 
# ## Overall Jaccard mean: 0.9891679666666667
# ## dissolved: 0 0 0
# ## recovered: 100 100 100

# ## For k=4:  <- BEST FIT WHEN CONSIDERING GEOMORPHOLOGY:
# ## Clusterwise Jaccard bootstrap (omitting multiple points) mean:  0.8614455 0.9757095 0.9280141 0.8661343
# ## Overall Jaccard mean: 0.90782585
# ## dissolved: 6 0 0 1
# ## recovered: 75 100 95 74

# ## For k=5: 
# ## Clusterwise Jaccard bootstrap (omitting multiple points) mean:  0.8377269 0.8526793 0.8772120 0.8051233 0.9587961
# ## Overall Jaccard mean: 0.86630752
# ## dissolved: 2 4 0 3 0 
# ## recovered: 81 81 77 60 100

# ## For k=6: 
# ## Clusterwise Jaccard bootstrap (omitting multiple points) mean:  0.8136434 0.9033047 0.8305438 0.7645273 0.9686151 0.7440459
# ## Overall Jaccard mean: 0.8374467
# ## dissolved: 1  1  3  5  0 16
# ## recovered: 71  95  66  51 100  53

# jaccard=[0.97787315,0.9891679666666667,0.90782585,0.86630752,0.8374467]
print(( 0.8136434 + 0.9033047 + 0.8305438 + 0.7645273 + 0.9686151 + 0.7440459)/6)

** K-Means Clustering in R:**

In [None]:
# ## Generating k clusters using k-means via fpc:

kmclust<-kmeansCBI(Z[,'depth'], k=3)
kmclust<-kmeansCBI(z, k=3)
    
kmclust$result[1]   # cluster assignments vector
kmclust$result[2]   # cluster centers (means)
kmclust$result[3]   # total sum of squares
kmclust$result[4]   # within [by cluster] sum of squares
kmclust$result[5]   # total within sum of squares
kmclust$result[6]   # between sum of squares
kmclust$result[7]   # number of members in each cluster
kmclust$result[8]   # number of iterations?
kmclust$result[9]   # ifault???
kmclust$result[10]  # critcal values???
kmclust$result[11]  # best k???

# ## Cluster Assignment Map (kam) (in a Python list) for k=4:
kam3=[ 1,3,2,2,3,1,1,3,3,3,1,1,3,1,3,3,1,3,1,3,3,3,3,3,3,3,1,3,3,3,3,3,3,1,1,1,3,3,3,3,3,3,3,1,3,3,3,3,1,3,1,3,3,3,3,3,3,1,3,1,1,3,3,3,3,2,1,3,3,3,3,2,3,3,3,3,1,3,3,1,3,1,3,1,3,3,3,1,3,2,3,3,3,3,3,3,3,3,1,3,2,3,3,3,1,3,3,3,3,3,3,1,3,3,3,3,3,3,3,1,1,3,3,3,3,3,1,3,3,3,3,2,3,3,3,3,1,3,1,3,3,1,1,1,3,3,3,3,3,3,3,3,3,1,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,2,3,3,3,3,1,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,1,3,3,2,3,3,1,3,2,3,3,3,1,2,3,3,2,3,3,3,3,3,2,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,1,3,3,3,3,1,3,3,1,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,2,3,1,2,3,1,3,3,2,2,2,3,3,1,3,3,3,1,1,3,3,3,3,3,1,3,3,3,2,3,3,3,3,3,3,3,1,3,1,3,3,1,3,3,3,1,2,3,3,3,3,1,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,2,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,1,3,3,3,3,3,3,3,3,1,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,1,3,3,1,3,3,1,3,3,3,3,3,3,3,2,3,3,1,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,1,3,3,1,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,2,3,3,3,3,1,3,3,3,3,3,2,3,3,3,3,3,1,2,3,3,2,3,2,3,3,3,1,3,1,3,3,3,1,2,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,2,3,3,3,3,3,3,1,3,1,3,1,1,2,2,3,3,3,3,3,3,2,2,3,2,2,2,2,2,2,1,3,3,3,3,3,3,3,3,1,2,3,1,1,3,1,3,1,1,1,3,3,3,1,1,3,1,3,3,1,1,3,1,2,1,3,1,2,3,1,1,1,3,3,1,1,1,3,1,1,1,2,1,2,1,2,3,2,1,1,1,1,1,1,3,2,1,3,1,3,1,3,1,1,1,3,2,2,2,1,3,1,1,1,2,2,1,1,2,1,1,1,2,2,1,2,1,3,2,1,3,1,1,1,2,1,1,2,1,1,1,2,1,2,1,1,2,3,1,3,2,2,1,3,1,1,1,1,1,1,3,1,3,2,2,3,2,1,1,1,3,2,1,2,2,1,3,3,1,1,3,1,1,1,2,3,3,1,2,2,1,1,3,2,2 ]

# ## Cluster Assignment Map (kam) (in a Python list) for k=4:
kam4=[ 2,3,1,4,3,2,2,2,2,2,4,4,3,2,2,3,4,3,4,2,3,3,2,2,2,2,4,2,3,3,2,3,3,2,4,2,3,3,3,2,2,3,3,2,3,2,2,3,4,3,2,2,2,3,3,3,3,2,2,2,2,3,3,3,3,4,2,3,2,2,2,1,2,2,3,3,2,2,2,2,3,4,3,4,3,3,3,4,2,4,3,3,2,3,3,3,3,3,4,3,1,3,2,2,4,3,3,3,3,3,3,2,3,3,3,3,3,3,3,2,2,3,2,3,3,3,2,3,3,3,3,1,3,3,3,3,2,3,2,2,3,2,2,2,3,3,3,2,3,3,3,3,2,2,2,3,3,3,2,3,3,3,2,3,3,2,2,2,3,3,3,3,3,3,3,2,3,3,3,3,2,3,3,3,3,4,3,3,3,1,3,3,2,3,4,3,3,3,2,2,2,3,3,3,3,3,3,3,3,3,3,4,3,3,3,2,3,3,1,2,3,2,3,1,3,3,3,4,1,3,3,1,3,3,2,3,2,1,3,3,3,4,3,2,3,3,3,2,3,3,2,3,2,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,2,3,2,3,3,3,3,3,2,3,3,3,2,3,3,3,3,3,3,2,2,2,2,3,2,3,3,2,3,2,3,3,3,3,4,3,3,2,3,3,3,1,3,3,3,3,3,3,2,3,3,3,2,3,4,3,3,3,2,1,3,4,1,3,4,3,3,1,1,1,3,3,2,3,3,3,2,4,3,3,3,3,3,2,3,3,3,1,3,3,3,3,3,3,3,2,3,2,3,3,2,3,3,3,2,1,2,3,3,2,4,3,3,3,3,4,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,4,3,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,4,3,3,3,3,3,3,3,3,2,3,3,3,1,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,1,3,3,3,3,2,3,3,3,2,3,3,3,3,2,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,4,3,2,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,4,3,3,2,3,3,4,3,3,3,3,3,3,3,1,3,2,2,3,2,3,3,3,4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,4,3,3,3,3,3,2,4,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,2,3,3,3,3,3,3,1,3,3,3,3,4,3,3,3,3,2,4,3,3,3,3,3,2,1,3,3,1,3,1,3,3,3,4,3,4,3,3,3,4,1,3,2,3,3,3,3,3,2,2,1,3,3,2,3,3,3,1,1,3,2,3,3,2,3,3,3,2,3,3,3,3,3,3,3,3,1,3,3,3,3,1,3,3,3,3,3,3,4,2,4,3,4,4,4,1,3,3,3,2,3,3,1,1,3,1,1,1,1,1,4,4,3,2,2,2,2,3,2,3,4,1,3,4,4,2,4,3,2,4,2,3,3,3,4,2,3,4,2,3,4,4,3,4,1,4,3,4,1,3,2,2,4,3,3,4,4,2,2,4,4,4,1,4,1,4,4,2,1,4,4,4,4,4,4,3,1,4,3,2,3,4,3,4,2,4,2,1,1,1,2,2,2,4,2,1,1,4,4,4,4,2,4,4,1,4,1,2,3,1,4,2,4,4,4,1,4,2,1,4,2,4,1,4,4,4,4,1,2,2,3,1,1,2,3,4,4,2,2,4,4,3,4,3,1,1,2,1,2,2,4,3,1,4,1,1,4,3,3,4,4,3,4,4,4,1,3,3,4,4,4,4,4,3,4,4 ]

In [None]:
print(df1000[df1000["cluster_assign_k4"] == 1]["depth"].min()) 
print(df1000[df1000["cluster_assign_k4"] == 1]["depth"].max()) 
print(df1000[df1000["cluster_assign_k4"] == 1]["depth"].mean())
print()
print(df1000[df1000["cluster_assign_k4"] == 2]["depth"].min()) 
print(df1000[df1000["cluster_assign_k4"] == 2]["depth"].max()) 
print(df1000[df1000["cluster_assign_k4"] == 2]["depth"].mean())
print()
print(df1000[df1000["cluster_assign_k4"] == 3]["depth"].min()) 
print(df1000[df1000["cluster_assign_k4"] == 3]["depth"].max())
print(df1000[df1000["cluster_assign_k4"] == 3]["depth"].mean())
print()
print(df1000) #print(df1000[df1000["cluster_assign_k4"] == 2])

# ## in R:
# plot(Z[,'depth'], col = kmclust$result[1])
# points(kmclust$result[2], col = 1:2, pch = 8, cex = 2)

**K-Means Clustering using Python SciKit Learn (sklearn.cluster):**

In [2]:
# ## import requisite libs:
import numpy as np 
import pandas
import matplotlib.pyplot as plt
#from mpl_toolkits.mplot3d import Axes3D
#from sklearn.neighbors import KernelDensity
from sklearn.cluster import KMeans
#from sklearn.cluster import DBSCAN
from sklearn.metrics import silhouette_samples, silhouette_score 

In [3]:
# ## load the sample data (1000 random observations) into a pandas data frame:
fn1000='/Users/paulparis/Documents/Projects/csi/data/table/ETOPO1_1211m_069deg_1000rpts.csv'

df=pandas.read_csv(fn1000, sep=',', header=1, names=['northing','area','cat','depth'] )

In [4]:
# ## FOR CLUSTERING ON DEPTH ONLY...

# ## extract the depth column vector from df (where we only cluster on depth; 1-D)
depth=df['depth']
ds=depth.reshape(-1,1)    # reshape the depth array so that KMeans will process

k=4                       # number of clusters
kmc=KMeans(n_clusters=k, random_state=15000)  # k-means++
kmc.fit(ds)
kmc_pred=kmc.predict(ds)

# retrieve clustering results...
labels=kmc.labels_                # cluster assignment map (identifies which cluster each obs belongs)
centroids=kmc.cluster_centers_    # cluster centers or centroids
inertia=kmc.inertia_              # cluster inertia-total 
print(len(ds))

999


In [5]:
# ## PLOTTING: DEPTH VS LATITUDE:
# ##
# ## map the depth data observations to their assigned clusters and plot 'em against latitude...

stats={}
dx=np.array([])
dy=np.array([])
xticklabels=['0','22.5','45','67.5','90']   # 'converts' northings to latitude

fig1=plt.figure(figsize=(7,7), dpi=300)
ax1 = fig1.add_subplot(1,1,1)

for i in range(k):
    stats_area=[]
    dx=df.iloc[np.where(labels==i)]['northing']
    dy=ds[np.where(labels==i)]
    
    # find the min and max values, and compute the mean, for the depths in dy[i]...
    stats_area.append(len(dy))
    stats_area.append(np.amin(dy))
    stats_area.append(np.amax(dy))
    stats_area.append(np.mean(dy))
    
    stats[i]=stats_area
    
    
    # plot dx against dy[i]
    ax1.plot(abs(dx),dy, 'o')
    
    #ax1.axhline(y=-167.0, xmin=0.025, xmax=0.975, linewidth=2.0, color='k')
    #ax1.axhline(y=-421.0, xmin=0.025, xmax=0.975, linewidth=2.0, color='k')
    #ax1.axhline(y=-743.0, xmin=0.025, xmax=0.975, linewidth=2.0, color='k')
    
    plt.xticks([0,5000000,10000000,15000000,20000000], xticklabels)  # remap the northings to latitude
    ax1.set_xlabel('Latitude (degrees North and South)')
    ax1.set_ylabel('Depth (meters)')
    ax1.set_title('K-Means Clustered Water Depths with Latitude')

#plt.show()
plt.savefig("/Users/paulparis/Dropbox/projects/csi/graphics/kmeansdepthlat.pdf")

In [12]:
# ## PLOTTING: DEPTH VS SHELF AREA:
# ##
# ## map the depth data observations to their assigned clusters and plot 'em against shelf area...

stats={}
dx=np.array([])
dy=np.array([])
xticklabels=['0','1-E06','2-E06','3-E06','4-E06','5-E06','6-E06','7-E06','8-E06','9-E06']   # 'converts' northings to latitude

fig2=plt.figure(figsize=(7,7), dpi=300)
ax2 = fig2.add_subplot(1,1,1)

for i in range(k):
    stats_area=[]
    dx=df.iloc[np.where(labels==i)]['area']
    dy=ds[np.where(labels==i)]
    
    # find the min and max values, and compute the mean, for the depths in dy[i]...
    stats_area.append(len(dy))
    stats_area.append(np.amin(dy))
    stats_area.append(np.amax(dy))
    stats_area.append(np.mean(dy))
    
    stats[i]=stats_area
    
    # plot dx against dy[i]
    ax2.plot(abs(dx),dy, 'o')
    
    ax2.axhline(y=-167.0, xmin=0.025, xmax=0.975, linewidth=2.0, color='k')
    ax2.axhline(y=-421.0, xmin=0.025, xmax=0.975, linewidth=2.0, color='k')
    ax2.axhline(y=-743.0, xmin=0.025, xmax=0.975, linewidth=2.0, color='k')
    
    plt.xticks([0,1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000], xticklabels)  # remap the northings to latitude
    ax2.set_xlabel('Shelf Surface Area (Sq. Meters)')
    ax2.set_ylabel('Depth (meters)')
    ax2.set_title('K-Means Clustered Water Depths with Shelf Area')

#plt.show()
plt.savefig("/Users/paulparis/Documents/Projects/csi/docs/graphics/factor_cluster_graphics/kmeansdeptharea.pdf")

In [13]:
#dx=df.iloc[np.where(labels==0)]['northing']
print(len(dx))
print(inertia)
print(stats )
print(centroids)

60
4174356.61104
{0: [157, -420.21063199999998, -168.27789300000001, -279.34461382802544], 1: [682, -166.75762900000001, -0.0091090000000000008, -57.587527803519059], 2: [100, -742.946594, -422.91021699999999, -564.00234098999999], 3: [60, -1197.429077, -782.00598100000002, -952.36715690000005]}
[[-278.63203797]
 [ -57.42721928]
 [-564.00234099]
 [-952.3671569 ]]


In [None]:
# ## silhouette analysis:

import matplotlib.cm as cm

# ## compute the silhouette score: the silhouette score yields an average value for all samples.
silhouette_avg=silhouette_score(ds,kmc_pred)
print('The average silhouette score for',k,'clusters is:',silhouette_avg)

# map (assign) each data point to its designated cluster...
sample_silhouette_values=silhouette_samples(ds,kmc_pred)

fig3=plt.figure(figsize=(14,7), dpi=300)
ax3 = fig3.add_subplot(1,1,1)

#fig, (ax1) = plt.subplots(1, 1)
#fig.set_size_inches(18, 7)

    # The 1st subplot is the silhouette plot
    # The silhouette coefficient can range from -1, 1 but in this example all
    # lie within [-0.1, 1]
ax3.set_xlim([-0.1, 1])
    # The (n_clusters+1)*10 is for inserting blank space between silhouette
    # plots of individual clusters, to demarcate them clearly.    
ax3.set_ylim([0, len(ds) + (k + 1) * 10])

y_lower=10
for i in [3,2,0,1]:     # when k=3 [2,1,0] ; when k=4 [3,2,0,1]
    # select from the cluster assigned data points all observations in cluster i
    ith_silhouette_value = sample_silhouette_values[kmc_pred == i]
    ith_silhouette_value.sort()
    
    # Draw the cluster silhouettes in the canvas frame
    size_cluster_i = ith_silhouette_value.shape[0]
    y_upper = y_lower+size_cluster_i
    color = cm.gray(float(i) / k)
    ax3.fill_betweenx(np.arange(y_lower, y_upper),0, ith_silhouette_value,facecolor=color, edgecolor=color, alpha=0.7)
    
    # Label the silhouette plots with their cluster numbers at the middle
    cluster_text=str(round(stats[i][2],1))+'m to '+str(round(stats[i][1],1))+'m'
    print(cluster_text)
    ax3.text(0.05, y_lower + 0.5 * size_cluster_i,  cluster_text, color='w')   # str(i)

    # Compute the new y_lower for next plot
    y_lower = y_upper + 10  # 10 for the 0 samples

ax3.set_title("Silhouette Plot for the K-Means Clustered Water Depths - k=3")
ax3.set_xlabel("Silhouette Coefficient")
ax3.set_ylabel("Cluster (Depth Class)")

# The vertical line for average silhoutte score of all the values
ax3.axvline(x=silhouette_avg, color="red", linestyle="--")
    
ax3.set_yticks([])  # Clear the yaxis labels / ticks
ax3.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])
    
plt.show()
#plt.savefig("/Users/paulparis/Documents/Projects/csi/docs/graphics/factor_cluster_graphics/kmeanssilhouettek3.pdf")

In [None]:
The average silhouette score for 2 clusters is: 0.762828548598
The average silhouette score for 3 clusters is: 0.726671045297
The average silhouette score for 4 clusters is: 0.672984662494
The average silhouette score for 5 clusters is: 0.639961261179
The average silhouette score for 6 clusters is: 0.599966286044

In [20]:
# ## Plot the Jaccard Similarity Metric against k

K_=[2,3,4,5,6]
jaccard=[0.97787315,0.989167967,0.90782585,0.86630752,0.8374467]

fig4=plt.figure(figsize=(7,7), dpi=300)
ax4 =fig4.add_subplot(1,1,1)

ax4.plot(K_,jaccard, '-o')

ax4.set_xticks([2, 3, 4, 5, 6])
ax4.set_xlabel('k (number of clusters)')
ax4.set_ylabel('Jaccard Similarity')
ax4.set_title('Jaccard Similarities')
#plt.show()
plt.savefig("/Users/paulparis/Documents/Projects/csi/docs/graphics/factor_cluster_graphics/kmeansJaccard.pdf")

In [47]:
i=0
cluster_text=str(round(stats[i][2],1))+'m to '+str(stats[i][1])+'m'
print(stats[i])
print(stats[i][2], stats[i][1])
print(cluster_text)

[157, -420.21063199999998, -168.27789300000001, -279.34461382802544]
-168.277893 -420.210632
-168.3m to -420.210632m


In [33]:
# maybe:
# cluster depth and latitude
#         depth and area
#         depth, area, and latitude
#
# just to see what they might reveal...

[[ -56.62612675]
 [-562.5786607 ]
 [-952.3671569 ]
 [-274.27361741]]


In [None]:
# ## SPARE PARTS:

#lines=plt.plot(centroids[i,0],centroids[i,1], 'kx')
    
#plt.setp(lines,ms=15.0)
#plt.setp(lines,mew=2.0)
    
#df.iloc[np.where(labels==i)]

