# Simplicial Complex Analysis

<span style='color:Red'>TODO: Generate a system to analyze the largest shape per stratum per year.</span> <br>
<span style='color:Red'> Currently using the largest number of nodes in each shape. </span> <br> 
Written by Frederick Miller, Casey McKean, and Wako Bungula. <br> 
The kepler mapper object gives an output that is not easily navigatible. To resolve this, we wish to create shapes that are easier to navigate and understand, and reveal the data inside of them. <br>
We generate all the shapes in the simplicial complex, condense 1-simplices where possible, and obtain summary statistics on the shapes and the nodes within the shapes.

In [1]:
import numpy as np
import pandas as pd 
import queue
import animation
import json
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', None)
import pickle
print("Imports Done")

Imports Done


# File paths, `.p`, and `.json` import

From the `kmapper_demo` file, I added one extra code block to place the resulting simplicial complices in a `.json` file, which is a way to store dictionaries in long term storage. Additionally, it stores the dictionary of dataframes in a `.p` file, which is similar. The code below only needs to have the file paths changed, and then it will read the simplicial complices generated from kepler mapper. <br>
Here, we also import the actual data set, with data interpolated for the specific pool. <br>
Lastly, a list of the 11 continuous variables (the interpolated versions) is created.

In [2]:
jsonFilePath = r"C:\Users\forre\Desktop\REU\TDA\Data\TDAOutputs\TDA1PCA10Cubes30perc_complices.json"
jsonFile = open(jsonFilePath, "r")
jsonData = json.load(jsonFile) 
jsonFile.close()

dataFilePath = r"C:\Users\forre\Desktop\REU\TDA\Data\TDAOutputs\TDA1PCA10Cubes30perc_dfs.p"
df_dict = pickle.load(open(dataFilePath, "rb"))

variables = ["PredictedWDP", "PredictedSECCHI", "PredictedTEMP", "PredictedDO", 
           "PredictedTURB", "PredictedVEL", "PredictedTP", #"PREDICTED_COND",
           "PredictedTN", "PredictedSS", "PredictedCHLcal"]

variables = ["PREDICTED_WDP", "PREDICTED_SECCHI", "PREDICTED_TEMP", "PREDICTED_DO", 
           "PREDICTED_TURB", "PREDICTED_VEL", "PREDICTED_TP", #"PREDICTED_COND",
           "PREDICTED_TN", "PREDICTED_SS", "PREDICTED_CHLcal"]



print("Json file imported")

print(len(jsonData.keys()))

Json file imported
25


# Functions
See the `docstring`'s for what each function does and how it works.

In [22]:
def getSubdf(scomplex, shape, df):
    """
    Returns the part of the data frame from the particular shape in the simplicial complex.
    params:
    scomplex: the entire simplicial complex
    shape: the particular shape being inspected (within the simplicial complex)
    df: the entire data frame
    
    Description:
    1. Get all the nodes from the particular simplicial complex. 
    2. Generate the indices we care about from the particular shape. To do this, we read each node and append it's 
    indices to a list. Then, we convert the list to a set and then back to a list to eliminate duplicates.
    3. Return the dataframe with only those indices.
    """
    nodes = scomplex.get('nodes')
    indices = []
    npShape = np.array(shape).flatten()
    for node in npShape:
        indices.append(nodes.get(node))
    indices = list(set([item for sublist in indices for item in sublist]))
    subdf = df.loc[indices]
    return subdf

def shapeDataSummary(scomplex, shape, df, variables, verbose = False):
    """
    Generates summary statistics of the given variables for a given shape in the simplicial.
    params:
    scomplex: the entire simplicial complex
    shape: the particular shape being inspected (within the simplicial complex) at this function call.
    df: the entire dataframe
    variables: the variables of interest
    verbose: Determines if the function will print out extra information. False by default
    
    Description:
    1. Create an empty result dataframe to store the summary statistics.
    2. Get the sub dataframe (see getSubdf) for the particular shape
    3. For each variable we are analzying, generate summary statistics from the sub dataframe and place them
    inside the result dataframe.
    4. Return the result dataframe
    
    NOTE: this only creates summaries for one particular shape. In executing this method, it is done for each shape 
    outside of the function.
    
    """
    result = pd.DataFrame()
    if verbose == True:
        print("Obtaining sub dataframe for: ", shape)
        print("The number of nodes in this shape is: ", len(shape))
    subdf = getSubdf(scomplex, shape, df)
    if verbose == True:
        print("The number of datapoints in this shape is: ", subdf.shape[0])
    for var in variables:
        result[var] = subdf[var].describe()
    return result
    
    

def adjacent(v, scomplex):
    """
    Determines the nodes adjacent to a given vertex
    
    params:
    v: vertex
    scomlex: the entire simplicial complex
    
    Description:
    Determines the nodes that are adjacent to a given vertex.
    """
    
    simplices = scomplex.get('simplices')
    edges = [item for item in simplices if len(item) == 2]
    result = []
    for edge in edges:
        if v in edge:
            for item in edge:
                if item != v:
                    result.append(item)
    return result

def bfs(node, scomplex):
    """
    Conducts a breadth first search to obtain the entire shape from a given node
    params:
    node: the start node
    scomplex: the entire simplicial complex
    
    Description:
    Preforms a breadth first search to obtain the entire shape for a given start node.
    """
    Q = queue.Queue()
    result = []
    result.append(node)
    Q.put(node)
    while not Q.empty():
        v = Q.get()
        adjacentEdges = adjacent(v, scomplex)
        for edge in adjacentEdges:
            if edge not in result:
                result.append(edge)
                Q.put(edge)
    return result


        
    
def getShapes(scomplex):
    """
    Gets all of the shapes from a given simplicial complex.
    
    params:
    scomplex: the entire simplicial complex
    
    Description:
    1. Obtain all the nodes for the entire complex
    2. For each node, preform a breadth first search to obtain everything in that particular shape. 
    If this entire shape has not already been discovered, add it to the set of results. 
    The result item is a set as the order of the shapes does not matter. The resulting shape is a frozenset
    which means items cannot be added or removed once created, and is needed to allow the set object to have other sets within it.
    3. Convert each shape to a list and the result to a list for easier navigation outside of the function.
    4. Return the result
    
    """
    
    nodes = list(scomplex.get('nodes').keys())
    result = set()
    for node in nodes: # currently does more computations than necessary due to going through every node without considering it is already in a shape
        bfsResult = frozenset(bfs(node, scomplex))
        result.add(bfsResult)
    result = [list(x) for x in result]
    # Sort the list depending on what is decided: nodes or indices. Currently doing it by number of nodes
    result.sort(key = len, reverse = True)
    
    
    
    return result

def nodeDataSummary(node, scomplex, variables,df):
    """
    Returns a data summary of a particular node
    params:
    node: node in question
    scomplex: The entire simplicial complex
    variables: The variables to obtain summaries
    df: the entire dataframe 
    
    description:
    1. Creates a result dataframe
    2. Get all the indices from the node from the simplicial complex
    3. Generate summaries for each variable
    4. Return the result
    """
    result = pd.DataFrame()
    indices = scomplex.get('nodes').get(node)
    subdf = df.loc[indices]
    for var in variables:
        result[var] = subdf[var].describe()
    return result
    
    
def condenseShape(shape, scomplex):
    """
    
    params:
    shape: a shape of two nodes. must be 2
    scomplex: the entire simplicial complex
    
    description:
    gets the two nodes a and b
    gets the indices for a and b (what is inside the nodes)
    if a \subseteq b, return b
    elif b \subseteq a, return a 
    else return shape 
    
    """
    nodes = scomplex.get('nodes')
    a = shape[0]
    b = shape[1]
    aIndices = set(nodes.get(a))
    bIndices = set(nodes.get(b))
    
    if aIndices.issubset(bIndices):
        return b
    elif bIndices.issubset(aIndices):
        return a
    else:
        return shape

def clean_getShapes(scomplex):
    """
    Condenses 1-simplices down to 0-simplices when each node 
    is a subset of the other 
    
    params:
    scomplex: the entire simplicial complex
    
    Description:
    1. Get all the shapes from the original getShapes function
    2. For shapes that of length 2, if one is a subset of the other, return the larger of the two
        Otherwise, do nothing
    3. return the clean Shapes list 
    
    """
    shapes = getShapes(scomplex)
    cleanShapes = []
    for shape in shapes:
        if len(shape) == 2:
            shape = condenseShape(shape, scomplex)
            cleanShapes.append([shape])
        else:
            cleanShapes.append(shape)
    return cleanShapes


def getBoxplots(subdf, shape, key,filePath):
    """
    Generates box plots for 10 of the 11 continuous variables
    NOTE: CONDUCTIVITY IS NOT INCLUDED
    
    params:
    subdf: the sub dataframe of the particular shape
    shape: the shape in question
    key: what strata year season combo we are looking at 
    filePath: the output file path for all the box plots 
    
    description:
    clears the current plot 
    generates the sub dataframes for the respective variables.
    the reason they are grouped is based upon the numerical outputs for making the boxplots readable
    create a box plot, and then save it based upon the file path
    clear the plot
    repeat for the second set of variables
    """
    plt.clf()
    varDf1 = subdf[["PREDICTED_SS","PREDICTED_TURB","PREDICTED_CHLcal"]]
    varDf2 = subdf[["PREDICTED_TP","PREDICTED_TN","PREDICTED_TEMP","PREDICTED_DO","PREDICTED_VEL","PREDICTED_WDP",
                  "PREDICTED_SECCHI"]]
    plot1 = varDf1.boxplot()
    plt.savefig(filePath + "\\" + key +"_" + str(shape)  + "_SS_TURB_CHLcal"  + ".png")
    plt.clf()
    plot2 = varDf2.boxplot(rot = 45)
    plt.savefig(filePath + "\\" + key +"_" + str(shape)  + "_" + "TP_TN_TEMP_etc" + ".png")
    return plot1, plot2

def determineOverlap(scomplex, shapes, verbose = True):
    """
    Determines the overlap within a shape. 
    
    For each node, find it's neighbors, and generate the intersection, and saving the result without
    duplicates through utlizing the set functionality of python.
    
    params:
    scomplex: the entire simplicial complex in question
    shapes: all the shapes
    verbose: see prinout as the code works

    
    """
    
    shape = shapes[0] # chosen arbitrarily 
    
    result = set()
    if verbose:
        print("Shape: ", shape)
    nodes = scomplex.get('nodes')
    for node in shape:
        # currently, this displays a lot of repeats. 
        A = set(nodes.get(node))
        if verbose:
            print("Node: " , node  , " | Indices: ",A)
        B = adjacent(node, scomplex)
        if verbose:
            print("Adjacent nodes: ",B)
        for b in B:
            bSet = set(nodes.get(b))
            name = str(node) + " -> " + str(b) +": "
            intersection = set(A.intersection(bSet))
            intersection.add(name)
            intersection = frozenset(intersection)
            result.add(intersection)
            if verbose:
                print("Node: ", b, " | Indices: ", bSet)
                print("Overlap is: ", A.intersection(bSet))
    return result
print("Functions loaded")

Functions loaded


# Generating Summary Statistics on the entire simplicial complex
For each `mapper` output from `kepler-mapper`, we can generate the summary statistics for each of the continuous variables. This is done by first obtaining a list of the keys from the `.json` file, and then iterating through each complex, generating the shape and obtaining data summaries on each shape.

In [4]:
allComplices = list(jsonData.keys())
for key in allComplices: # remove indices here to get all the strata for all the time periods
    print("Current Simplical Complex: ", key)
    scomplex = jsonData.get(key)
    shapes = clean_getShapes(scomplex)
    for shape in shapes:
        summaries = shapeDataSummary(scomplex, shape, df_dict.get(key), variables, verbose = False)
        if summaries.loc['count'][0] > 5 and len(shape)  > 2: # at least 6 datapoints and 3 nodes to see info
            print("The shape is: ",shape)
            print("The number of nodes in the shape is: ", len(shape))
            display(summaries) # Uncomment to see summaries

Current Simplical Complex:  ['Stratum 1 SUMMER 93-00: ']
The shape is:  ['cube6_cluster1', 'cube0_cluster0', 'cube2_cluster0', 'cube5_cluster0', 'cube8_cluster1', 'cube1_cluster0', 'cube4_cluster0', 'cube7_cluster1', 'cube5_cluster2', 'cube3_cluster0']
The number of nodes in the shape is:  10


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,108.0,108.0,108.0,108.0,108.0,108.0,108.0,108.0,108.0,108.0
mean,5.807438,39.445314,28.055503,7.612425,31.490741,0.311708,0.173857,3.620438,46.316052,25.681036
std,2.689415,10.622191,1.600216,1.281878,13.943474,0.248585,0.064355,0.637487,21.792598,10.4875
min,0.34,14.0,25.8,5.4,11.0,0.0,0.03348,2.432,12.7,9.75195
25%,4.0,32.5,26.675,6.875,22.0,0.019134,0.14775,3.28275,31.7,18.31366
50%,5.35,40.0,27.9,7.5,27.5,0.332943,0.172609,3.664633,41.85,23.86642
75%,7.6,46.0,29.2,8.025,36.5,0.493413,0.20625,4.039438,59.9,29.58458
max,12.2,68.0,32.9,13.2,88.0,0.971381,0.380512,5.372994,108.3,60.9933


Current Simplical Complex:  ['Stratum 2 SUMMER 93-00: ']
The shape is:  ['cube6_cluster1', 'cube0_cluster0', 'cube2_cluster0', 'cube8_cluster0', 'cube5_cluster0', 'cube8_cluster1', 'cube7_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube9_cluster1', 'cube4_cluster2', 'cube3_cluster0', 'cube2_cluster3']
The number of nodes in the shape is:  13


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0
mean,3.770625,35.1,27.993179,7.773181,37.765,0.544608,0.185213,3.636574,57.774063,26.393068
std,1.969209,10.464445,1.389663,1.491258,18.169783,0.264792,0.067865,0.656414,31.614296,12.546677
min,0.55,12.0,25.581729,4.4,11.0,0.0,0.041,2.255,14.3,5.91168
25%,2.1,26.75,26.8,6.8,25.0,0.3775,0.153345,3.259775,33.9,15.399952
50%,3.5,35.0,27.8,7.6,34.0,0.52,0.17655,3.6125,48.923178,24.1354
75%,5.0,43.0,29.2,8.1,46.0,0.7,0.211986,4.12825,69.95,34.470165
max,11.3,63.0,31.6,14.3,99.0,1.7,0.723927,5.849319,151.4,65.76935


Current Simplical Complex:  ['Stratum 3 SUMMER 93-00: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster15', 'cube1_cluster0', 'cube2_cluster10', 'cube2_cluster18', 'cube2_cluster1']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,105.0,105.0,105.0,105.0,105.0,105.0,105.0,105.0,105.0,105.0
mean,0.950344,24.215653,29.115963,9.32988,51.472049,0.03576,0.279727,2.567889,54.283605,72.462298
std,0.74418,8.567753,2.752198,4.664146,22.087829,0.131865,0.183523,1.126765,21.921755,36.496744
min,0.32,11.0,24.1,3.3,10.0,0.0,0.036,0.56,9.9,9.29984
25%,0.6,19.0,27.2,6.5,34.0,0.0,0.171,1.793,38.7,42.32622
50%,0.76,22.0,28.7,8.2,52.0,0.0,0.271561,2.213,54.8,68.04318
75%,0.97,26.0,30.5,10.8,68.0,0.0,0.364,3.238,68.6,105.35466
max,4.9,51.0,39.5,25.0,101.0,0.948848,1.613,8.370005,112.7,148.50546


Current Simplical Complex:  ['Stratum 4 SUMMER 93-00: ']
The shape is:  ['cube3_cluster5', 'cube3_cluster4', 'cube1_cluster0', 'cube2_cluster0']
The number of nodes in the shape is:  4


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0
mean,0.883333,18.393939,30.2,9.487879,73.242424,0.000303,0.693154,2.056373,73.342424,87.074066
std,0.519854,3.90464,2.516073,5.039268,16.355485,0.001741,0.383099,0.663917,17.35778,14.711485
min,0.3,11.0,26.4,2.2,31.0,0.0,0.277,1.29,36.6,59.44047
25%,0.59,15.0,28.3,6.8,65.0,0.0,0.417589,1.531,61.2,77.08169
50%,0.75,18.0,29.4,7.9,78.0,0.0,0.67,1.969333,72.4,84.95826
75%,0.85,20.0,32.1,11.6,85.0,0.0,0.715929,2.302,88.9,98.08012
max,2.3,27.0,34.5,25.0,101.0,0.01,1.874142,3.965,100.7,118.09814


The shape is:  ['cube6_cluster1', 'cube5_cluster0', 'cube4_cluster6']
The number of nodes in the shape is:  3


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0
mean,0.47,11.857143,30.042857,11.771429,143.142857,0.008571,0.644323,1.924376,143.759134,161.737229
std,0.118603,1.9518,1.263404,2.976415,10.945754,0.006901,0.230879,0.794909,7.464677,8.991427
min,0.35,10.0,28.1,5.4,128.0,0.0,0.329618,0.872723,132.6,148.98595
25%,0.38,10.0,29.6,11.8,136.0,0.005,0.459816,1.415669,139.356969,155.664325
50%,0.4,12.0,29.6,12.3,144.0,0.01,0.767838,2.026682,142.8,164.06756
75%,0.575,13.0,30.65,13.45,149.0,0.01,0.793333,2.359944,149.5,167.28065
max,0.63,15.0,32.1,14.2,160.0,0.02,0.906506,3.02,153.2,173.21714


The shape is:  ['cube4_cluster3', 'cube3_cluster6', 'cube5_cluster6']
The number of nodes in the shape is:  3


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0,6.0
mean,0.561667,12.666667,29.666667,16.183333,103.333333,0.015,0.731152,2.175667,112.95,155.186351
std,0.274257,1.861899,1.102119,7.301073,6.683313,0.022583,0.300707,0.96343,3.850065,9.346303
min,0.33,9.0,28.3,6.8,97.0,0.0,0.309983,0.817835,107.6,141.05285
25%,0.34,13.0,28.85,13.125,98.25,0.0025,0.514997,1.450323,110.375,150.099012
50%,0.47,13.0,29.65,13.65,101.5,0.01,0.797273,2.477946,113.4,156.91905
75%,0.735,13.75,30.525,22.2,107.0,0.01,0.934675,2.919357,115.3,159.853487
max,0.98,14.0,31.0,25.0,114.0,0.06,1.081,3.106,118.0,167.569858


Current Simplical Complex:  ['Stratum 5 SUMMER 93-00: ']
The shape is:  ['cube7_cluster3', 'cube2_cluster0', 'cube5_cluster0', 'cube6_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube7_cluster1', 'cube3_cluster0']
The number of nodes in the shape is:  8


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,76.0,76.0,76.0,76.0,76.0,76.0,76.0,76.0,76.0,76.0
mean,2.193158,33.052084,28.497368,8.714474,33.135869,0.011711,0.157153,3.344466,36.740737,34.98507
std,1.166738,8.166568,2.114078,3.034456,10.945652,0.045325,0.076908,0.556751,13.311718,15.22521
min,0.6,19.0,25.4,4.0,14.0,0.0,0.028,2.097,12.4,9.26625
25%,1.2475,27.75,26.7,7.175,26.5,0.0,0.1375,3.060156,26.875,25.71833
50%,2.0,32.0,27.65,8.1,32.0,0.0,0.15425,3.271625,35.25,31.28754
75%,3.0,37.0,30.5,9.95,39.0,0.0,0.17625,3.615125,44.125,39.980025
max,7.0,53.0,33.5,25.0,68.0,0.34,0.64,4.691,71.049995,76.63824


Current Simplical Complex:  ['Stratum 1 SUMMER 98-04: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube5_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
mean,6.248455,44.239706,28.06,8.08,24.78,0.275793,0.200605,3.719023,37.423,32.525356
std,2.951668,9.850165,1.582927,1.232145,9.116064,0.225755,0.068732,0.82789,15.503879,12.390508
min,1.0,21.0,25.3,5.4,7.0,0.0,0.050028,1.656369,6.5,12.26124
25%,4.175,37.75,26.8,7.475,18.0,0.07859,0.1595,3.319,25.425,23.149375
50%,6.0,44.0,27.75,7.9,24.0,0.242579,0.17745,3.698082,34.5,31.0293
75%,8.25,50.0,29.3,8.725,31.0,0.451573,0.22225,4.352265,46.85,38.720465
max,13.0,75.0,31.5,13.2,45.0,0.971381,0.48027,5.758,80.2,61.63623


The shape is:  ['cube6_cluster0', 'cube8_cluster0', 'cube7_cluster1']
The number of nodes in the shape is:  3


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,5.415783,21.312333,28.48,6.6,63.7,0.40131,0.228682,3.979537,94.02,12.933602
std,3.490749,3.193622,1.615756,0.661648,11.585911,0.229256,0.025222,0.55299,9.933199,2.059359
min,0.34,14.0,26.3,5.4,51.0,0.0,0.195702,3.322286,77.8,9.75195
25%,3.529457,20.342501,27.275,6.325,56.25,0.314449,0.204922,3.474136,86.6,12.12926
50%,4.0,21.0,28.25,6.55,61.0,0.462964,0.232558,4.001772,96.0,12.34361
75%,7.975,23.0,29.4,6.7,70.75,0.574217,0.24175,4.3275,100.7,14.046267
max,11.0,26.0,31.0,7.9,88.0,0.641584,0.273,4.83,108.3,17.05024


Current Simplical Complex:  ['Stratum 2 SUMMER 98-04: ']
The shape is:  ['cube2_cluster0', 'cube0_cluster1', 'cube5_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,244.0,244.0,244.0,244.0,244.0,244.0,244.0,244.0,244.0,244.0
mean,3.636066,38.286885,28.059836,8.22418,33.490164,0.491611,0.189916,3.916744,52.061475,33.714166
std,2.265481,13.177159,1.449828,1.577422,19.713413,0.272844,0.062361,0.733997,32.591371,15.946011
min,0.3,12.0,25.4,4.4,6.0,0.0,0.002,1.874175,11.5,5.95598
25%,2.0,26.0,26.8,7.5,19.0,0.32,0.15294,3.392692,29.85,22.10668
50%,3.2,38.5,28.2,8.2,28.0,0.49,0.174165,3.884106,40.75,33.50528
75%,4.825,45.0,29.2,8.6,41.0,0.62,0.215,4.479119,62.225,41.507005
max,18.4,80.0,31.6,18.2,99.0,1.7,0.723927,5.426,151.4,68.8488


Current Simplical Complex:  ['Stratum 3 SUMMER 98-04: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube0_cluster2', 'cube0_cluster1', 'cube1_cluster6', 'cube1_cluster0', 'cube1_cluster7', 'cube2_cluster1']
The number of nodes in the shape is:  8


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0
mean,0.909731,22.830505,28.786318,8.310614,59.274326,0.037945,0.349227,2.647885,59.982014,74.590876
std,0.749517,8.982122,2.155192,3.436111,28.765338,0.138655,0.25884,1.024042,28.100218,42.152079
min,0.3,11.0,23.4,2.8,10.0,0.0,0.002,1.055,9.9,9.29984
25%,0.5225,17.0,27.225,6.05,34.141278,0.0,0.239116,1.992872,38.0,42.33616
50%,0.745,20.0,28.7,7.8,58.5,0.0,0.319777,2.334811,58.65,63.23046
75%,0.95,26.0,29.9,9.4,84.75,0.0,0.386613,3.225,79.1,110.273562
max,4.9,51.0,35.5,25.0,116.0,0.948848,2.527667,6.398813,130.2,167.21153


Current Simplical Complex:  ['Stratum 4 SUMMER 98-04: ']
The shape is:  ['cube3_cluster0', 'cube1_cluster0', 'cube0_cluster0', 'cube2_cluster1']
The number of nodes in the shape is:  4


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0
mean,0.752,19.65,30.08,9.565,65.725,0.00025,0.716168,1.695329,63.8125,89.584307
std,0.464698,3.613011,2.768523,4.477554,18.989859,0.001581,0.337686,0.688453,18.851201,14.029394
min,0.3,11.0,26.3,4.6,31.0,0.0,0.026,0.024,34.5,65.05684
25%,0.5075,18.0,27.075,6.875,47.25,0.0,0.408442,1.402048,46.375,78.631705
50%,0.615,20.0,30.2,8.2,71.5,0.0,0.7105,1.803091,63.7,87.34632
75%,0.77,21.25,32.25,9.725,82.0,0.0,1.003897,2.108997,77.475,99.253425
max,2.3,27.0,34.4,25.0,94.0,0.01,1.307046,2.885041,99.3,119.12773


Current Simplical Complex:  ['Stratum 5 SUMMER 98-04: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube0_cluster1', 'cube5_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  7


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0
mean,2.115584,32.23991,28.185714,8.576623,31.467532,0.021688,0.162828,3.376298,35.283117,36.638372
std,1.203185,5.813152,2.3485,2.222072,9.970462,0.052551,0.027163,0.620355,13.013054,17.355426
min,0.36,19.0,24.7,4.0,15.0,0.0,0.036,2.221,16.7,8.13295
25%,1.0,28.0,26.1,7.3,25.0,0.0,0.150461,2.972483,26.1,25.57466
50%,2.1,33.0,27.5,7.8,29.0,0.0,0.1665,3.3145,31.1,33.11044
75%,3.0,35.0,30.6,9.5,36.0,0.01,0.179,3.708,41.5,51.68742
max,7.0,48.0,31.6,14.9,68.0,0.34,0.22,4.965,78.0,74.02625


Current Simplical Complex:  ['Stratum 1 SUMMER 01-13: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube3_cluster1', 'cube1_cluster0', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0,225.0
mean,6.236978,41.737778,28.984889,7.906667,31.248889,0.275929,0.235926,3.276117,45.081778,34.952097
std,3.53673,14.433581,1.802885,1.903709,25.535695,0.279318,0.129205,1.031044,35.353,21.700931
min,0.45,12.0,25.3,3.7,7.0,0.0,0.050028,1.098997,6.5,9.16428
25%,3.5,32.0,27.7,6.5,15.0,0.0,0.17,2.491,20.6,23.08282
50%,6.0,43.0,29.2,7.9,22.0,0.214855,0.200444,3.2876,31.7,29.54552
75%,8.2,50.0,30.1,9.1,35.0,0.473235,0.257074,4.084857,57.6,38.82138
max,19.3,75.0,33.4,12.3,128.0,1.05,1.228158,5.940472,166.8,138.52582


The shape is:  ['cube4_cluster1', 'cube5_cluster0', 'cube6_cluster0']
The number of nodes in the shape is:  3


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,7.09,16.7,27.22,5.37,131.6,0.413261,0.293474,4.011093,183.0,11.584294
std,1.97228,2.58414,1.301964,0.676675,8.934328,0.254628,0.078758,0.40463,14.707066,2.558159
min,3.4,13.0,25.6,4.4,120.0,0.0,0.192748,3.48609,159.2,9.30356
25%,5.875,14.25,26.1,5.0,128.0,0.268029,0.229881,3.744103,175.975,9.9601
50%,7.15,17.0,27.15,5.25,130.0,0.424766,0.288151,3.955636,184.55,10.96771
75%,8.325,19.0,28.175,6.075,138.0,0.596561,0.357,4.16095,192.775,11.49722
max,9.9,20.0,28.9,6.1,148.0,0.745295,0.421,4.807087,205.4,17.3818


Current Simplical Complex:  ['Stratum 2 SUMMER 01-13: ']
Current Simplical Complex:  ['Stratum 3 SUMMER 01-13: ']
The shape is:  ['cube1_cluster16', 'cube0_cluster0', 'cube2_cluster0', 'cube1_cluster14', 'cube2_cluster6', 'cube1_cluster0', 'cube2_cluster1']
The number of nodes in the shape is:  7


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0,291.0
mean,1.018832,23.848797,30.469072,9.054334,51.725086,0.039278,0.389884,2.524294,52.737113,66.890048
std,0.773418,9.27201,3.130584,3.80894,26.072446,0.126496,0.285081,0.935908,26.616907,44.212644
min,0.18,10.0,23.2,0.4,9.0,0.0,0.002,0.919,8.8,7.21436
25%,0.5,17.0,28.4,6.2,31.0,0.0,0.22533,1.836,33.0,30.807055
50%,0.78,21.0,30.0,8.369485,48.0,0.0,0.323,2.325333,48.3,59.37612
75%,1.1,27.0,32.35,11.55,70.5,0.0,0.4645,2.931138,70.6,91.30299
max,3.9,67.0,38.6,21.8,116.0,1.0,2.904651,6.531588,130.2,175.99205


Current Simplical Complex:  ['Stratum 4 SUMMER 01-13: ']
The shape is:  ['cube3_cluster2', 'cube0_cluster0', 'cube2_cluster0', 'cube0_cluster1', 'cube1_cluster0', 'cube1_cluster1', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  8


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0
mean,1.110101,28.89899,30.0,8.013131,41.080808,0.001515,0.803153,1.670896,40.044444,61.984517
std,0.909488,12.781919,2.915301,2.406057,23.413618,0.012402,0.537995,0.624384,21.998619,34.527586
min,0.25,11.0,24.0,3.7,6.0,0.0,0.017,0.024,6.0,10.02786
25%,0.515,20.0,27.75,6.2,22.0,0.0,0.358496,1.13061,21.75,29.37335
50%,0.65,27.0,30.0,7.5,36.0,0.0,0.535,1.691,38.1,68.87386
75%,1.45,36.0,32.65,9.3,63.5,0.0,1.121904,2.258098,58.4,86.09897
max,3.3,70.0,35.5,14.5,94.0,0.12,2.168,2.879706,94.0,140.55638


Current Simplical Complex:  ['Stratum 5 SUMMER 01-13: ']
The shape is:  ['cube2_cluster2', 'cube1_cluster0', 'cube0_cluster0', 'cube2_cluster1']
The number of nodes in the shape is:  4


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0,169.0
mean,2.501243,34.372781,29.785799,11.010059,23.674556,0.012426,0.174256,2.827293,26.992899,53.186539
std,1.575602,7.88742,2.544001,3.608895,8.114912,0.04582,0.046326,0.828452,9.295734,23.185295
min,0.4,18.0,24.7,5.1,10.0,0.0,0.036,1.289,11.1,16.45222
25%,1.2,30.0,27.8,8.4,19.0,0.0,0.143,2.28,20.4,34.93108
50%,2.2,34.0,30.1,10.3,22.0,0.0,0.167,2.811,24.7,49.95732
75%,3.6,38.0,31.6,13.3,27.0,0.0,0.192905,3.3295,32.0,69.15741
max,9.4,58.0,34.9,20.5,53.0,0.38,0.335,4.965,56.1,126.42644


Current Simplical Complex:  ['Stratum 1 SUMMER 10-16: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube4_cluster1', 'cube1_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  5


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,120.0,120.0,120.0,120.0,120.0,120.0,120.0,120.0,120.0,120.0
mean,6.28,30.316667,28.701667,6.611667,51.108333,0.375711,0.241029,3.340142,74.576917,23.04689
std,3.411069,11.978961,2.05107,1.351201,31.503713,0.331638,0.097719,1.014256,39.039223,7.859109
min,0.7,12.0,24.8,3.7,8.0,0.0,0.140535,1.056507,11.4,9.54901
25%,3.6,21.75,27.8,5.675,33.0,0.0,0.19587,2.693934,43.175,16.729913
50%,6.1,28.0,28.6,6.6,46.0,0.404451,0.226832,3.432168,74.99,22.85347
75%,7.925,35.25,30.2,7.3,67.0,0.655683,0.253341,4.1333,106.0625,29.03796
max,19.3,62.0,32.8,10.2,128.0,1.02237,0.928245,5.940472,156.35,38.82138


Current Simplical Complex:  ['Stratum 2 SUMMER 10-16: ']
The shape is:  ['cube2_cluster0', 'cube4_cluster1', 'cube3_cluster1', 'cube0_cluster1', 'cube1_cluster0', 'cube4_cluster0', 'cube5_cluster2', 'cube3_cluster0']
The number of nodes in the shape is:  8


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,262.0,262.0,262.0,262.0,262.0,262.0,262.0,262.0,262.0,262.0
mean,4.014885,28.973282,28.829008,6.662595,56.51145,0.698494,0.238551,3.435145,81.519606,23.33208
std,2.233446,10.333299,1.979918,1.435421,34.018221,0.341312,0.100529,1.016961,40.431537,9.185101
min,0.3,9.0,25.2,4.1,10.0,0.0,0.145235,0.99,14.5,2.32957
25%,2.4,23.0,27.9,5.7,35.0,0.48,0.196109,2.858469,55.025,16.711955
50%,3.8,29.0,28.8,6.4,47.0,0.765,0.223578,3.515385,81.99,21.53379
75%,5.4,33.0,30.0,7.1,78.0,0.96,0.262526,4.227563,111.35,30.42327
max,12.0,60.0,35.2,12.0,172.0,1.36,1.67331,6.523909,193.58,43.96561


The shape is:  ['cube6_cluster0', 'cube7_cluster2', 'cube7_cluster0', 'cube5_cluster0']
The number of nodes in the shape is:  4


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0
mean,5.821053,11.736842,28.247368,5.868421,175.157895,0.863684,0.281485,3.839048,241.412105,17.930186
std,2.387835,1.939132,0.676895,0.331751,20.958327,0.184066,0.03428,0.339757,23.200092,2.555011
min,0.7,8.0,27.5,4.8,152.0,0.35,0.21428,3.091,199.51,11.25145
25%,4.35,10.0,27.6,5.7,158.0,0.74,0.25339,3.597762,223.285,17.397865
50%,6.3,12.0,27.8,5.9,170.0,0.92,0.28,3.932013,250.39,19.00362
75%,7.7,13.0,29.0,6.05,185.0,0.975,0.2935,4.107243,258.99,19.468515
max,8.6,15.0,29.3,6.3,220.0,1.17,0.346,4.299,276.87,20.4657


Current Simplical Complex:  ['Stratum 3 SUMMER 10-16: ']
Current Simplical Complex:  ['Stratum 4 SUMMER 10-16: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube3_cluster1', 'cube0_cluster1', 'cube1_cluster0', 'cube2_cluster1']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,49.0,49.0,49.0,49.0,49.0,49.0,49.0,49.0,49.0,49.0
mean,1.591633,35.795918,29.553061,7.365306,23.22449,0.003061,0.369782,1.372517,23.421079,35.650223
std,1.030834,12.622697,3.437362,2.206576,10.188362,0.017584,0.10807,0.329019,10.47948,21.083073
min,0.46,16.0,24.0,3.7,6.0,0.0,0.21,0.963,6.0,10.02786
25%,0.78,27.0,26.8,5.8,16.0,0.0,0.281449,1.107,15.9,19.43585
50%,1.0,35.0,29.5,6.9,22.0,0.0,0.334286,1.297983,21.9,29.05215
75%,2.9,40.0,33.0,8.4,29.0,0.0,0.432,1.568,31.62,50.26681
max,3.3,70.0,35.5,13.4,50.0,0.12,0.547,2.218,42.5,91.54987


Current Simplical Complex:  ['Stratum 5 SUMMER 10-16: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube4_cluster1', 'cube1_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  5


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,95.0,95.0,95.0,95.0,95.0,95.0,95.0,95.0,95.0,95.0
mean,2.423158,29.968421,30.395789,11.118947,24.526316,0.0,0.192469,2.761972,26.835579,58.019429
std,1.617232,4.585947,2.274742,3.828263,7.298852,0.0,0.058694,0.742897,7.645653,22.779512
min,0.26,18.0,26.0,5.1,13.0,0.0,0.117,1.289,15.04,17.85106
25%,1.225,27.0,29.0,8.8,20.0,0.0,0.14945,2.3215,21.5,42.018255
50%,2.0,30.0,30.5,10.6,22.0,0.0,0.18,2.759659,24.75,51.37092
75%,3.05,34.0,31.85,13.8,28.0,0.0,0.209124,3.271713,29.96,72.089495
max,9.4,42.0,34.9,19.6,48.0,0.0,0.444069,4.398842,53.1,119.4624


Current Simplical Complex:  ['Stratum 1 SUMMER 14-20: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube4_cluster4', 'cube3_cluster1', 'cube2_cluster2', 'cube1_cluster0']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,106.0,106.0,106.0,106.0,106.0,106.0,106.0,106.0,106.0,106.0
mean,6.149811,28.399878,27.166038,7.578302,54.90566,0.499686,0.229527,3.181405,81.943679,32.555738
std,2.528473,6.854749,1.482462,1.163738,24.4632,0.3173,0.050133,0.678528,26.904529,15.833449
min,0.68,13.0,24.8,5.1,15.0,0.0,0.109,1.056507,22.01,9.6534
25%,4.325,25.0,25.6,6.7,36.25,0.227204,0.19516,2.749118,64.72,19.115893
50%,6.2,28.0,27.1,7.3,51.5,0.56,0.2305,3.165176,82.175,25.9203
75%,7.625,32.0,28.475,8.3,67.0,0.754384,0.253,3.597841,103.7225,47.30613
max,12.3,48.0,29.7,11.0,128.0,1.02237,0.494508,4.628146,137.35,72.41696


Current Simplical Complex:  ['Stratum 2 SUMMER 14-20: ']
The shape is:  ['cube5_cluster1', 'cube0_cluster0', 'cube2_cluster0', 'cube7_cluster2', 'cube6_cluster0', 'cube7_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  9


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,280.0,280.0,280.0,280.0,280.0,280.0,280.0,280.0,280.0,280.0
mean,4.078179,25.946429,27.270714,7.4975,71.285714,0.736567,0.247037,3.251966,102.589179,35.74801
std,1.974336,7.766807,1.426905,1.538384,41.597885,0.24371,0.053167,0.644665,50.476622,25.322034
min,0.3,8.0,25.2,5.3,15.0,0.0,0.152,1.450107,19.72,3.15592
25%,2.6,21.75,26.3,6.4,40.0,0.62,0.207724,2.963901,67.545,19.42316
50%,4.0,26.5,26.8,7.0,59.0,0.75,0.246327,3.303877,90.525,23.152325
75%,5.3,31.0,28.5,8.4,88.0,0.9125,0.280056,3.5695,119.5725,42.589875
max,11.4,42.0,32.2,12.0,220.0,1.36,0.536388,4.791793,276.87,107.31176


Current Simplical Complex:  ['Stratum 3 SUMMER 14-20: ']
The shape is:  ['cube2_cluster4', 'cube0_cluster0', 'cube2_cluster0', 'cube1_cluster5', 'cube2_cluster2', 'cube1_cluster1']
The number of nodes in the shape is:  6


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0,133.0
mean,0.919173,22.756958,28.723308,8.717293,52.315789,0.02985,0.317627,2.247321,52.58015,53.671621
std,0.505967,8.682136,2.132394,2.98109,30.3241,0.117827,0.116237,0.932839,29.353381,26.629376
min,0.3,9.0,23.1,1.1,5.0,0.0,0.138,0.93,7.64,12.91261
25%,0.55,15.0,27.8,6.6,28.0,0.0,0.247752,1.585523,30.68,30.735
50%,0.8,22.0,28.9,8.4,41.0,0.0,0.289822,1.987938,44.14,51.08128
75%,1.2,30.0,30.1,10.6,75.0,0.0,0.366,2.941437,79.15,70.53538
max,3.0,47.0,34.3,15.5,116.0,0.62,0.959,5.341,125.63,121.77792


Current Simplical Complex:  ['Stratum 4 SUMMER 14-20: ']
The shape is:  ['cube1_cluster0', 'cube0_cluster0', 'cube2_cluster0']
The number of nodes in the shape is:  3


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0,12.0
mean,1.258333,24.416667,26.991667,8.675,32.833333,0.0,0.296585,1.3874,34.786073,53.86856
std,0.339942,5.775471,0.78214,1.649862,9.675023,0.0,0.042173,0.108285,6.148955,7.851244
min,0.85,16.0,25.9,6.8,21.0,0.0,0.24,1.262,23.99,43.35145
25%,0.995,20.0,26.375,7.45,24.75,0.0,0.243549,1.295844,30.55,47.457445
50%,1.15,25.0,26.8,8.15,33.0,0.0,0.312067,1.354604,36.41144,51.779545
75%,1.535,27.0,27.55,10.025,38.75,0.0,0.330938,1.485797,39.765,60.964008
max,1.93,35.0,28.3,12.0,50.0,0.0,0.345,1.568,42.22,66.25858


Current Simplical Complex:  ['Stratum 5 SUMMER 14-20: ']
The shape is:  ['cube0_cluster0', 'cube2_cluster0', 'cube0_cluster2', 'cube5_cluster0', 'cube6_cluster0', 'cube1_cluster0', 'cube4_cluster0', 'cube3_cluster0']
The number of nodes in the shape is:  8


Unnamed: 0,PREDICTED_WDP,PREDICTED_SECCHI,PREDICTED_TEMP,PREDICTED_DO,PREDICTED_TURB,PREDICTED_VEL,PREDICTED_TP,PREDICTED_TN,PREDICTED_SS,PREDICTED_CHLcal
count,95.0,95.0,95.0,95.0,95.0,95.0,95.0,95.0,95.0,95.0
mean,1.958842,28.452632,29.24,12.674737,30.978947,0.017474,0.240889,2.421064,33.654211,71.726039
std,0.867914,4.081011,1.478239,4.508927,8.772514,0.074562,0.095062,0.563783,10.997084,31.960131
min,0.26,20.0,26.5,3.4,13.0,0.0,0.139,1.438,15.04,17.85106
25%,1.3,26.0,28.15,9.2,25.0,0.0,0.185765,2.107062,27.715,44.1778
50%,2.0,28.0,29.2,13.3,30.0,0.0,0.201172,2.43042,31.22,73.99915
75%,2.6,30.0,30.6,15.7,35.5,0.0,0.258601,2.760124,36.3,94.769645
max,4.2,42.0,31.9,22.9,56.0,0.47,0.511,3.458,74.53,152.64079


# Analyzing the largest structure
Largest = Node count of the shape. The largest structure is likely to be the dominant feature of the stratum during this particular time period. As such, it is important to analyze the nodes within it. To do this, we generate all the shapes, and since the shapes are returned in descending order of the number of nodes per shape, we pull the first shape. From here, we can preform an analysis on each one.

In [None]:
allComplices = list(jsonData.keys())
for key in allComplices[0:3]: # remove the indices here to get all the strata for all the time periods
    print("Current Simplical Complex: ", key)
    scomplex = jsonData.get(key)
    largestShape = clean_getShapes(scomplex)[0]
    nodes = scomplex.get('nodes')
    print("Largest shape is: ", largestShape)
    print("Number of nodes is: ", len(largestShape))
    for node in largestShape:
        summary = nodeDataSummary(node, scomplex,variables,df_dict.get(key))
        if summary.loc['count'][0] > 5: # 5 is chosen arbitraily
            print("Information for: ", node)
            display(summary)

# Condensing 1-simplices
Currently, many one simplices that we have contain information that means one of them is a subset of the other. To resolve this, we replace them with one cluster with all the indices in one node.

This is stored in the function `clean_getShapes(scomplex)` function. Below is a comparison of running the two functions

In [None]:
allComplices = list(jsonData.keys())
print("Standard shape version")
for key in allComplices[0:1]:
    print("Current Simplical Complex: ", key)
    scomplex = jsonData.get(key)
    nodes = scomplex.get('nodes')
    shapes = getShapes(scomplex)
    for shape in shapes:
        indices = []
        for node in shape:
            indices.append(nodes.get(node))
        indices = list(set([item for sublist in indices for item in sublist]))
        print(str(shape) + " : " + str(indices))

print("Clean shape version")
for key in allComplices[0:1]:
    print("Current Simplical Complex: ", key)
    scomplex = jsonData.get(key)
    nodes = scomplex.get('nodes')
    cleanShapes = clean_getShapes(scomplex)
    for shape in cleanShapes:
        indices = []
        npShape = np.array(shape).flatten()
        for node in npShape:
            indices.append(nodes.get(node))
        indices = list(set([item for sublist in indices for item in sublist]))
        print(str(shape) + " : " + str(indices))

# Box plot per shape
Here, we generate box plots for the variables of interest. for each shape in the simplicial complex <br>
TODO: Plot output ideas: <br>
SS, Turb, CHLCal <br>
Vel, TN, TP <br>
Temp, DO, SECCHI <br>

In [None]:
allComplices = list(jsonData.keys())
for key in allComplices[0:1]: # remove the indices here to get all the strata for all the time periods
    print("Current Simplical Complex: ", key)
    scomplex = jsonData.get(key)
    shapes = clean_getShapes(scomplex)
    print("number of shapes: ", len(shapes))
    for shape in shapes:
        """
        Hello whoever is using this function
        getBoxplots takes a couple arguments. 
        the big thing that matters here is the 
        strataYear variable. Essentially, the key that allows us to access each simplicial complex is weird.
        Depending on your file system, using the str(key) conversion may cause errors. to resolve this, 
        below is a potential example. Feel free to change it as you go for your use.
        """
        strataYear = str(key).replace(" ","-").replace(":","").replace("[","").replace("]","").replace("'","")
        print(strataYear)
        """
        before: 
        ['Stratum 1 SUMMER 93-00: ']
        after:
        Stratum-1-SUMMER-93-00-
        """
        subdf = getSubdf(scomplex, shape, df_dict.get(key))
        plots = getBoxplots(subdf, shape, strataYear,  
                            filePath = r"C:\Users\forre\Desktop\REU\TDA\Data\TDAOutputs\Boxplots")


# Discovering which indices within the nodes overlap
This code will determine what points in the nodes are overlapping within two nodes.

In [23]:
allComplices = list(jsonData.keys())
for key in allComplices[0:1]: # remove the indices here to get all the strata for all the time periods
    print("Current Simplical Complex: ", key)
    scomplex = jsonData.get(key)
    shapes = clean_getShapes(scomplex)
    print("number of shapes: ", len(shapes))
    result = determineOverlap(scomplex, shapes, verbose = False)
    print(result)

Current Simplical Complex:  ['Stratum 1 SUMMER 93-00: ']
number of shapes:  8
{frozenset({97, 66, 'cube6_cluster1 -> cube5_cluster0: '}), frozenset({'cube2_cluster0 -> cube3_cluster0: ', 51, 22, 87, 72, 56, 76}), frozenset({59, 'cube4_cluster0 -> cube5_cluster0: ', 83, 4}), frozenset({98, 'cube7_cluster1 -> cube8_cluster1: '}), frozenset({67, 44, 'cube1_cluster0 -> cube0_cluster0: '}), frozenset({32, 1, 34, 65, 36, 69, 71, 'cube3_cluster0 -> cube4_cluster0: ', 62, 63}), frozenset({'cube7_cluster1 -> cube6_cluster1: ', 102}), frozenset({20, 'cube5_cluster2 -> cube4_cluster0: '}), frozenset({'cube0_cluster0 -> cube1_cluster0: ', 67, 44}), frozenset({0, 2, 5, 10, 11, 15, 18, 24, 26, 27, 28, 29, 43, 47, 111, 112, 114, 53, 54, 'cube1_cluster0 -> cube2_cluster0: '}), frozenset({0, 2, 5, 10, 11, 15, 18, 24, 26, 27, 28, 29, 43, 47, 111, 112, 114, 'cube2_cluster0 -> cube1_cluster0: ', 53, 54}), frozenset({98, 'cube8_cluster1 -> cube7_cluster1: '}), frozenset({'cube4_cluster0 -> cube5_cluster2: 

# See compare shapes over the years
NOTE: Comparing the largest shape in each.