<p style="font-size:35px; text-align:center; font-weight:bold">CMAAS run SOM clustering method</p>
<p style="font-size:17px; text-align:left">Ina Storch 06-11-2023 </p>
<p style="font-size:17px; text-align:left">Note: This notebook is designed to run SOM using preprocessed data gained from the datacube from: Lawley et al., 2021. </p>
<p style="font-size:17px; text-align:left">Reference: Lawley, C.J.M., McCafferty, A.E., Graham, G.E., Gadd, M.G., Huston, D.L., Kelley, K.D., Paradis, S., Peter, J.M., and Czarnota, K., 2021. Datasets to support prospectivity modelling for sediment-hosted Zn-Pb mineral systems; Geological Survey of Canada, Open File 8836, 1 .zip file. https://doi.org/10.4095/329203</p>

<p style="font-size:19px; text-align:left; font-weight:bold">1) Import libraries</p>

In [1]:
from src.nextsomcore.nextsomcore import NxtSomCore
import pickle

In [2]:
import configs.argsSOM

args = configs.argsSOM.Args()

<p style="font-size:19px; text-align:left; font-weight:bold">2) Specify parameter for SOM. 

Input data can eighter be in .lrn file format or .tiff file format. Choose one.

Create a "data" folder within the "methods/som/" folder. This "data" folder should contain a folder for "input" and "output" data, each. To be able to run this jupyter notebook, copy your input data into the folder "methods/som/data/input". 

No Data Handeling is not jet implemented.

In [3]:
# #------------- 
# #- Data Input .tiff files:
# #------------- 
# #- If input data is geotiff: list geotiff files, separated by "," ["name1.tiff","name2.tiff"]
# #input_list_text=["data/input/70Gravity_Bouguer._norm.tif","data/input/83Magnetic_LongWavelength_HGM._norm.tif","data/input/50Geology_Fault_Proximity._norm.tif"]
# input_list_text=["data/input/testdata/Magnetics.tif",
#                 "data/input/testdata/RockContact(bmgg_bvc).tif",
#                 "data/input/testdata/RockContact(gsh_bs).tif",
#                 "data/input/testdata/Unit(bmgg).tif",
#                 "data/input/testdata/Unit(bvc).tif",
#                 "data/input/testdata/Unit(gsb).tif",
#                 "data/input/testdata/Unit(gsh).tif"
#                 ]
# args.input_file= ",".join(input_list_text)
# args.geotiff_input=arg.input_list_text      # geotiff_input ("None", arg.input_file)

#------------- 
#- Or: Data Input .lrn file:
#------------- 
#args.input_file="data/input/SOM_grav_mag.lrn"
args.input_file="/methods/methods/som/data/input/SOM_grav_mag.lrn"

#-------------
#- Data Output
#-------------

#args.output_folder="data/output"         # Folder to save som dictionary and cluster dictionary
args.output_folder="/methods/methods/som/data/output"

args.output_file_somspace= args.output_folder+"/result_som.txt"   # DO NOT CHANGE! Text file that will contain calculated values: som_x som_y b_data1 b_data2 b_dataN umatrix cluster in geospace.
        

#-------------
#- Parameter
#-------------

args.som_x=10                # X dimension of generated SOM
args.som_y=10                # Y dimension of generated SOM
args.epochs=10               # Number of epochs to run

# Base parameters required for som calculation. 
# Additional optional parameters below:
args.outgeofile= args.output_folder+"/result_geo.txt"             # DO NOT CHANGE!
args.output_file_geospace=args.outgeofile   # Text file that will contain calculated values: {X Y Z} data1 data2 dataN som_x som_y cluster b_data1 b_data2 b_dataN in geospace.

args.kmeans="true"          # Run k-means clustering (true, false)
args.kmeans_init=5           # Number of initializations
args.kmeans_min=2            # Minimum number of k-mean clusters
args.kmeans_max=25           # Maximum number of k-mean clusters

args.neighborhood='gaussian'     # Shape of the neighborhood function. gaussian or bubble
args.std_coeff=0.5               # Coefficient in the Gaussian neighborhood function
args.maptype='toroid'            # Type of SOM (sheet, toroid)
args.initialcodebook=None        # File path of initial codebook, 2D numpy.array of float32.
args.radius0=0                   # Initial size of the neighborhood
args.radiusN=1                   # Final size of the neighborhood
args.radiuscooling='linear'      # Function that defines the decrease in the neighborhood size as the training proceeds (linear, exponential)
args.scalecooling='linear'       # Function that defines the decrease in the learning scale as the training proceeds (linear, exponential)
args.scale0=0.1                  # Initial learning rate
args.scaleN=0.01                 # Final learning rate
args.initialization='random'     # Type of SOM initialization (random, pca)
args.gridtype='rectangular'      # Type of SOM grid (hexagonal, rectangular)
#args.xmlfile="none"              # SOM inputs as an xml file

args.normalized="false"      # Whether the data has been normalized or not (true, false)
args.minN=0                  # Minimum value for normalization
args.maxN=1                  # Maximum value for normalization
args.label=None              # Whether data contains label column, true or false


In [4]:
print(args.input_file)

/methods/methods/som/data/input/SOM_grav_mag.lrn


<p style="font-size:19px; text-align:left; font-weight:bold">3) Run SOM 

Before running SOM - clean up existing files and move them to a subfolder.

In [5]:
import shutil
import os
import glob

file_path = args.output_folder
file_patterns = ["*som.*","*geo.*", "RunStats.txt"]
destination_path = file_path+"/old_results/"

# Create the destination folder if it doesn't exist
if not os.path.exists(destination_path):
    os.makedirs(destination_path)

for file_pattern in file_patterns:
    # Use glob to get all files with the specified pattern
    matching_files = glob.glob(os.path.join(file_path, file_pattern))

    # Move each matching file to the destination folder and overwrite existing files if necessary
    for source_file in matching_files:
        file_name = os.path.basename(source_file)
        destination_file_path = os.path.join(destination_path, file_name)

        # If the file already exists in the destination folder, delete it first
        if os.path.exists(destination_file_path):
            os.remove(destination_file_path)

        # Move the file to the destination folder
        shutil.move(source_file, destination_file_path)

Run SOM with parameters specified above and save the results. Uses NxtSomCore package to do the actual work. 

In [6]:
import src.do_nextsomcore_save_results as dnsr

dnsr.run_SOM(args)

/methods/methods/som/data/output
Clustering progress:
0.00%


  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super().

10.00%
20.00%


  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super().

30.00%
40.00%


  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)


50.00%
60.00%


  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super().

70.00%
80.00%


  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)


90.00%
100% Clustering completed.


  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
  super()._check_params_vs_input(X, default_n_init=10)
ERROR 4: None: No such file or directory


AttributeError: 'NoneType' object has no attribute 'GetGeoTransform'

<p style="font-size:19px; text-align:left; font-weight:bold">4) Plot results.

Specify the parameters to plot the results and create figures. The Python script "plot_som_results.py" creates .png files of the results in som space, geospace and also creates boxplots.

In [None]:
import configs.argsPlot
import src.plot_som_results as plot

argsP = configs.argsPlot.Args()

argsP.outsomfile= args.output_file_somspace   # som calculation somspace output text file
argsP.som_x= args.som_x         # som x dimension
argsP.som_y= args.som_y         # som y dimension
argsP.input_file= args.input_file   # Input file(*.lrn)
argsP.dir= args.output_folder            # Input file(*.lrn) or directory where som.dictionary was safet to (/output/som.dictionary)
argsP.grid_type= 'rectangular' # grid type (square or hexa), (rectangular or hexagonal)
argsP.redraw='true'       # whether to draw all plots, or only those required for clustering (true: draw all. false:draw only for clustering).
argsP.outgeofile=args.outgeofile     # som geospace results txt file
argsP.dataType='grid'       # Data type (scatter or grid)
argsP.noDataValue='-9999'    # noData value

plot.run_plotting_script(argsP)

Move figures into a sub folder, since "plot_som_results.py" does not override existing files. If the destination folder does not exist, it is created here. All file names are stored in a list that is used in the next step to show all output figures.

In [None]:
import shutil
import os
import glob

file_path = args.output_folder
file_patterns = ["geoplot_*.png", "somplot_*.png", "boxplot_*.png"]
destination_path = file_path+"/plots/"


# Lists to store matching files with their corresponding destination paths
all_figs = []
all_figs_lable = []

# Create the destination folder if it doesn't exist
if not os.path.exists(destination_path):
    os.makedirs(destination_path)

for file_pattern in file_patterns:
    # Use glob to get all files with the specified pattern
    matching_files = glob.glob(os.path.join(file_path, file_pattern))

    # Add matching files and their corresponding destination paths to the lists
    all_figs.extend([os.path.join(destination_path, os.path.basename(file)) for file in matching_files])
    all_figs_lable.extend([os.path.basename(file) for file in matching_files])

    # Move each matching file to the destination folder and overwrite existing files if necessary
    for source_file in matching_files:
        file_name = os.path.basename(source_file)
        destination_file_path = os.path.join(destination_path, file_name)

        # If the file already exists in the destination folder, delete it first
        if os.path.exists(destination_file_path):
            os.remove(destination_file_path)

        # Move the file to the destination folder
        shutil.move(source_file, destination_file_path)

print(all_figs)

Show all figures that were produced by "plot_som_results.py" and moved into a subfolder.

In [None]:
import ipyplot

images = all_figs
labels = all_figs_lable
tabs = [image.split('_')[-2] for image in labels]

print("List of figures:")
print(labels)
#print(tabs)
#ipyplot.plot_images(images, max_images=50, img_width=250)
ipyplot.plot_class_representations(images,  labels, img_width=200, show_url=False)
ipyplot.plot_class_tabs(images, tabs, max_imgs_per_tab=5, img_width=250)