# Generating Larger Simplicial Complices
Written by Frederick Miller, Casey McKean, and Wako Bungula
The kepler mapper object gives an output that is not easily navigatible. To resolve this, we wish to create shapes that are easier to navigate and understand, and reveal the data inside of them.

In [3]:
import numpy as np
import pandas as pd 
import queue
import json
pd.set_option('display.max_rows', None)
print("Imports Done")

Imports Done


# File paths and `.json` import

From the `kmapper_demo` file, I added one extra code block to place the results in a `.json` file, which is a way to store dictionaries in long term storage. The code below only needs to have the file paths changed, and then it will read the simplicial complices generated from kepler mapper. <br>
Here, we also import the actual data set, with data interpolated for the specific pool. <br>
Lastly, a list of the 11 continuous variables (the interpolated versions) is created.

In [33]:
jsonFilePath = r"C:\Users\forre\Desktop\REU\TDA\Data\TDAOutputs\TDA1PCA10Cubes45perc.json"
jsonFile = open(jsonFilePath, "r")
jsonData = json.load(jsonFile) 
jsonFile.close()

dataFilePath = r"C:\Users\forre\Desktop\REU\TDA\Data\interpolatedPool26.csv"
df = pd.read_csv(dataFilePath)

variables = ['PREDICTED_TN',
             'PREDICTED_TP',
             'PREDICTED_TEMP',
             'PREDICTED_DO',
             'PREDICTED_TURB',
             'PREDICTED_COND',
             'PREDICTED_VEL',
             'PREDICTED_SS',
             'PREDICTED_WDP',
             'PREDICTED_CHLcal',
             'PREDICTED_SECCHI']
print("Json file imported")

print(len(jsonData.keys()))

Json file imported
25


# Functions
See the `docstring`'s for what each function does and how it works.

In [108]:
def getSubdf(scomplex, shape, df):
    """
    Returns the part of the data frame from the particular shape in the simplicial complex.
    params:
    scomplex: the entire simplicial complex
    shape: the particular shape being inspected (within the simplicial complex)
    df: the entire data frame
    
    Description:
    1. Get all the nodes from the particular simplicial complex. 
    2. Generate the indices we care about from the particular shape. To do this, we read each node and append it's 
    indices to a list. Then, we convert the list to a set and then back to a list to eliminate duplicates.
    3. Return the dataframe with only those indices.
    """
    nodes = scomplex.get('nodes')
    indices = []
    for node in shape:
        indices.append(nodes.get(node))
    indices = list(set([item for sublist in indices for item in sublist]))
    subdf = df.loc[indices]
    return subdf

def shapeDataSummary(scomplex, shape, df, variables, verbose = False):
    """
    Generates summary statistics of the given variables for the entire simplicial complex.
    params:
    scomplex: the entire simplicial complex
    shape: the particular shape being inspected (within the simplicial complex) at this function call.
    df: the entire dataframe
    variables: the variables of interest
    verbose: Determines if the function will print out extra information. False by default
    
    Description:
    1. Create an empty result dataframe to store the summary statistics.
    2. Get the sub dataframe (see getSubdf) for the particular shape
    3. For each variable we are analzying, generate summary statistics from the sub dataframe and place them
    inside the result dataframe.
    4. Return the result dataframe
    
    NOTE: this only creates summaries for one particular shape. In executing this method, it is done for each shape 
    outside of the function.
    
    """
    result = pd.DataFrame()
    if verbose == True:
        print("Obtaining sub dataframe for: ", shape)
        print("The number of nodes in this shape is: ", len(shape))
    subdf = getSubdf(scomplex, shape, df)
    if verbose == True:
        print("The number of datapoints in this shape is: ", subdf.shape[0])
    for var in variables:
        result[var] = subdf[var].describe()
    return result
    
    

def adjacent(v, scomplex):
    """
    Determines the nodes adjacent to a given vertex
    
    params:
    v: vertex
    scomlex: the entire simplicial complex
    
    Description:
    Determines the nodes that are adjacent to a given vertex.
    """
    
    simplices = scomplex.get('simplices')
    edges = [item for item in simplices if len(item) == 2]
    result = []
    for edge in edges:
        if v in edge:
            for item in edge:
                if item != v:
                    result.append(item)
    return result

def bfs(node, scomplex):
    """
    Conducts a breadth first search to obtain the entire shape from a given node
    params:
    node: the start node
    scomplex: the entire simplicial complex
    
    Description:
    Preforms a breadth first search to obtain the entire shape for a given start node.
    """
    Q = queue.Queue()
    result = []
    result.append(node)
    Q.put(node)
    while not Q.empty():
        v = Q.get()
        adjacentEdges = adjacent(v, scomplex)
        for edge in adjacentEdges:
            if edge not in result:
                result.append(edge)
                Q.put(edge)
    return result
                
    
def getShapes(scomplex):
    """
    Gets all of the shapes from a given simplicial complex.
    
    params:
    scomplex: the entire simplicial complex
    
    Description:
    1. Obtain all the nodes for the entire complex
    2. For each node, preform a breadth first search to obtain everything in that particular shape. 
    If this entire shape has not already been discovered, add it to the set of results. 
    The result item is a set as the order of the shapes does not matter. The resulting shape is a frozenset
    which means items cannot be added or removed once created, and is needed to allow the set object to have other sets within it.
    3. Convert each shape to a list and the result to a list for easier navigation outside of the function.
    4. Return the result
    
    """
    
    nodes = list(scomplex.get('nodes').keys())
    result = set()
    for node in nodes: # currently does more computations than necessary due to going through every node without considering it is already in a shape
        bfsResult = frozenset(bfs(node, scomplex))
        result.add(bfsResult)
    result = [list(x) for x in result]
    return result




print("Functions loaded")

Functions loaded


# Generating Summary Statistics
For each `mapper` output from `kepler-mapper`, we can generate the summary statistics for each of the continuous variables. This is done by first obtaining a list of the keys from the `.json` file, and then iterating through each complex, generating the shape and obtaining data summaries on each shape.

In [118]:
allComplices = list(jsonData.keys())
for complex in allComplices:
    print("Current Simplical Complex: ", complex)
    scomplex = jsonData.get(complex)
    shapes = getShapes(scomplex)
    for shape in shapes:
        summaries = shapeDataSummary(scomplex, shape, df, variables, verbose = True)
#         display(summaries) # Comment to see the summaries

Current Simplical Complex:  ['Stratum 1 SUMMER 93-00: ']
Obtaining sub dataframe for:  ['cube0_cluster0']
The number of nodes in this shape is:  1
The number of datapoints in this shape is:  1
Obtaining sub dataframe for:  ['cube2_cluster9', 'cube1_cluster7']
The number of nodes in this shape is:  2
The number of datapoints in this shape is:  1
Obtaining sub dataframe for:  ['cube5_cluster2', 'cube4_cluster1']
The number of nodes in this shape is:  2
The number of datapoints in this shape is:  2
Obtaining sub dataframe for:  ['cube0_cluster3']
The number of nodes in this shape is:  1
The number of datapoints in this shape is:  1
Obtaining sub dataframe for:  ['cube5_cluster3']
The number of nodes in this shape is:  1
The number of datapoints in this shape is:  1
Obtaining sub dataframe for:  ['cube1_cluster1', 'cube2_cluster1']
The number of nodes in this shape is:  2
The number of datapoints in this shape is:  1
Obtaining sub dataframe for:  ['cube0_cluster4']
The number of nodes in t