Script for performing SOM training and saving results.

<p style="color:#006400; font-family:Computer Modern; font-size:35px; text-align:center; font-weight:bold">CMASS run SOM clustering method</p>
<p style="color:#006400; font-family:Computer Modern; font-size:15px; text-align:left; font-weight:bold">Dr. Ina Storch 06-11-2023 </p>
<p style="color:#006400; font-family:Computer Modern; font-size:15px; text-align:left; font-weight:bold">Note: This notebook is designed to run SOM using preprocessed data gained from the datacube from: Lawley et al., 2021. </p>
<p style="color:#006400; font-family:Computer Modern; font-size:15px; text-align:left; font-weight:bold">Reference: Lawley, C.J.M., McCafferty, A.E., Graham, G.E., Gadd, M.G., Huston, D.L., Kelley, K.D., Paradis, S., Peter, J.M., and Czarnota, K., 2021. Datasets to support prospectivity modelling for sediment-hosted Zn-Pb mineral systems; Geological Survey of Canada, Open File 8836, 1 .zip file. https://doi.org/10.4095/329203</p>

1) Import libraries

In [1]:
from nextsomcore.nextsomcore import NxtSomCore
import pickle

In [2]:
import argsSOM

arg = argsSOM.Args()

2) Specify parameter for SOM. Input data can eighter be in .lrn file format or .tiff file format. Choose one.

The No Data Handeling is not jet implemented.

In [3]:
# #------------- 
# #- Data Input .tiff files:
# #------------- 
# #- If input data is geotiff: list geotiff files, separated by "," ["name1.tiff","name2.tiff"]
# #input_list_text=["data/input/70Gravity_Bouguer._norm.tif","data/input/83Magnetic_LongWavelength_HGM._norm.tif","data/input/50Geology_Fault_Proximity._norm.tif"]
# input_list_text=["data/input/testdata/Magnetics.tif",
#                 "data/input/testdata/RockContact(bmgg_bvc).tif",
#                 "data/input/testdata/RockContact(gsh_bs).tif",
#                 "data/input/testdata/Unit(bmgg).tif",
#                 "data/input/testdata/Unit(bvc).tif",
#                 "data/input/testdata/Unit(gsb).tif",
#                 "data/input/testdata/Unit(gsh).tif"
#                 ]
# arg.input_file= ",".join(input_list_text)
# 
# arg.geotiff_input=None      # geotiff_input ("None", arg.input_file)

#------------- 
#- Or: Data Input .lrn file:
#------------- 
arg.input_file="data/input/SOM_grav_mag.lrn"

#-------------
#- Data Output
#-------------

arg.output_folder="data/output"         # Folder to save som dictionary and cluster dictionary

arg.output_file_somspace= arg.output_folder+"/result_som.txt"   # DO NOT CHANGE! Text file that will contain calculated values: som_x som_y b_data1 b_data2 b_dataN umatrix cluster in geospace.
        

#-------------
#- Parameter
#-------------

arg.som_x=10                # X dimension of generated SOM
arg.som_y=10                # Y dimension of generated SOM
arg.epochs=10               # Number of epochs to run

# Base parameters required for som calculation. 
# Additional optional parameters below:
arg.outgeofile= arg.output_folder+"/result_geo.txt"             # DO NOT CHANGE!
arg.output_file_geospace=arg.outgeofile   # Text file that will contain calculated values: {X Y Z} data1 data2 dataN som_x som_y cluster b_data1 b_data2 b_dataN in geospace.

arg.kmeans="false"          # Run k-means clustering (true, false)
arg.kmeans_init=5           # Number of initializations
arg.kmeans_min=2            # Minimum number of k-mean clusters
arg.kmeans_max=25           # Maximum number of k-mean clusters

arg.neighborhood='gaussian'     # Shape of the neighborhood function. gaussian or bubble
arg.std_coeff=0.5               # Coefficient in the Gaussian neighborhood function
arg.maptype='toroid'            # Type of SOM (sheet, toroid)
arg.initialcodebook=None        # File path of initial codebook, 2D numpy.array of float32.
arg.radius0=0                   # Initial size of the neighborhood
arg.radiusN=1                   # Final size of the neighborhood
arg.radiuscooling='linear'      # Function that defines the decrease in the neighborhood size as the training proceeds (linear, exponential)
arg.scalecooling='linear'       # Function that defines the decrease in the learning scale as the training proceeds (linear, exponential)
arg.scale0=0.1                  # Initial learning rate
arg.scaleN=0.01                 # Final learning rate
arg.initialization='random'     # Type of SOM initialization (random, pca)
arg.gridtype='rectangular'      # Type of SOM grid (hexagonal, rectangular)
#arg.xmlfile="none"              # SOM inputs as an xml file

arg.normalized="false"      # Whether the data has been normalized or not (true, false)
arg.minN=0                  # Minimum value for normalization
arg.maxN=1                  # Maximum value for normalization
arg.label=None              # Whether data contains label column, true or false


In [4]:
print(arg.input_file)

data/input/SOM_12layer.lrn


3) Run SOM with parameters specified above and save the results. Uses NxtSomCore package to do the actual work.

In [5]:
import functions.do_nextsomcore_save_results as dnsr

dnsr.run_SOM(arg)

data/output


Time for epoch 1: 0.3402 Time for epoch 2: 0.3117 Time for epoch 3: 0.3155 Time for epoch 4: 0.2956 Time for epoch 5: 0.3158 Time for epoch 6: 0.3051 Time for epoch 7: 0.3321 Time for epoch 8: 0.3031 Time for epoch 9: 0.2714 Time for epoch 10: 0.2966 

ValueError: could not convert string '-156,55' to float32 at row 0, column 2.

4. Plot Results.

In [None]:
#import functions.argsPlot
#
#argsP = functions.argsPlot.Args()
#
#argsP.outsomfile= "data/output/somspace.txt"   # som calculation somspace output text file
#argsP.som_x= 100                 # som x dimension
#argsP.som_y= 100                 # som y dimension
#argsP.input_file= "data/input/SOM_grav_mag.lrn"    # Input file(*.lrn)
#argsP.dir= "data/output"        # Input file(*.lrn) or directory where som.dictionary was safet to (/output/som.dictionary)
#argsP.grid_type= 'rectangular'  # grid type (square or hexa), (rectangular or hexagonal)
#argsP.redraw='true'             # whether to draw all plots, or only those required for clustering (true: draw all. false:draw only for clustering).
#argsP.outgeofile='data/output/geospace.txt'     # som geospace results txt file
#argsP.dataType=None             # Data type (scatter or grid)
#argsP.noDataValue='NA'          # noData value

In [None]:
run functions/plot_som_results.py

GeoSpace plots finished
SomSpace plots finshed


  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):
  if pd.api.types.is_categorical_dtype(vector):


Boxplots finished
