Skip to content

Python implementation of the C-optimized FlowSOM library

License

Notifications You must be signed in to change notification settings

angelolab/pyFlowSOM

Repository files navigation

pyFlowSOM

CI / CD CI Coverage Status
Package PyPI - Version PyPI - Downloads PyPI - Python Version
Meta PyPI - License

Python runner for the FlowSOM library.

Basic usage:

import numpy as np
import pandas as pd
from pyFlowSOM import map_data_to_nodes, som

# generate example input data, rows are observations (e.g. cells), columns are features (e.g. proteins)
df = pd.DataFrame(np.random.rand(500, 16))

# alternatively, specify path to your own input data
df = pd.read_csv('path/to/som/input.csv')

example_som_input_arr = df.to_numpy()

# train the SOM
node_output = som(example_som_input_arr, xdim=10, ydim=10, rlen=10)

# use trained SOM to assign clusters to each observation in your data
clusters, dists = map_data_to_nodes(node_output, example_som_input_arr)

To put the data back into dataframes:

eno = pd.DataFrame(data=node_output, columns=df.columns)
eco = pd.DataFrame(data=clusters, columns=["cluster"])

To export to csv:

eno.to_csv('examples/example_node_output.csv', index=False)
eco.to_csv('examples/example_clusters_output.csv', index=False)

To plot the output as a heatmap:

import seaborn as sns

# Append results to the input data
df['cluster'] = clusters

# Find mean of each cluster
df_mean = df.groupby(['cluster']).mean()

# Make heatmap
sns_plot = sns.clustermap(df_mean, z_score=1, cmap="vlag", center=0, yticklabels=True)
sns_plot.figure.savefig(f"example_cluster_heatmap.png")

Develop

The C code (pyFlowSOM/flowsom.c) is wrapped using Cython (pyFlowSOM/cyFlowSOM.cyx).

Tests do an exact comparison to cluster id ground truth and an approximate comparison to node values only because of floating point differences. Randomness works in tandem to the seed flag to the som function.

To run the tests, use the following command:

pytest