Analysis of Bacterial Community Compositions

Population dynamics for abundances calculated from meta-omics sequencing in bacterial community samples from Lake Washington.

Project Background:

88 samples: (4 replicates of high oxygen + 4 replicates of low oxygen) * 11 samples per replicate.
Sequenced for 11 weeks: Weeks 4 - 14.
Oxygen conditions were switched for the last 4 samples.
Organisms "taxonomy" is described by: Kingdom, Phylum, Class Order, Family, Genus

Visit our:

Technology Review
Project Poster on Google Slides or click here for a pdf

Tools used in this project:

|

Name|

Source package |

Description |

Output| | ----------------------- |:--------------------------------:| -----------------------------:| -------------------------: ||

Dynamic Mode Decomposition (DMD)|

Python modred|

Dimensionality reduction algorithm for a time series of data that computes a set of modes each of which is associated with a fixed oscillation frequency and decay/growth rate |

Matrix of interaction values A for every sample, either computed for every time step or bulked over time ||

NetworkX|

Software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks|

Classic graphs, random graphs, and synthetic networks with any kind of node (e.g. text, images, XML records) and edges holding arbitrary data (e.g. weights, time-series) ||

Density-based spatial clustering of applications with noise (DBSCAN)|

Python scikit-learn|

Density-based data clustering algorithm that groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions|

Clusters of data points with performance metrics||

Gaussian Mixture Models (GMM)|

Python scikit-learn|

Parametric probability density function that generates all data points from weighted sum of Gaussian component densities with unknown parameters|

Clusters of data points with performance metrics||

Packages and libraries used in this project:

Directory Structure:

Next Steps:

Make one A matrix per sample per replicate (2*4*10 A matrices) & compare to the current results with one A per replicate.
Test normalization of data before finding the A matrices so total abundance doesn't dominate signal. Remove taxa with small abundances first.
Plot networks as node graphs now that data reduction tools are ready
Train on a subset of the data and see how predictive it is for the rest
Compare including vs omitting the last 4 samples of each series, which have the oxygen tension reversed.
Do multiple hypothesis corrections, and use this to guide the cutoff for plotting and further analysis.
Connect these mathematical results to our real biological questions.

Why we chose the Apache License 2.0 :

The Apache License allows us to manage the software package as we please, while providing clear language regarding the terms. It makes it clear that individual contributors grant copyright license to anyone who receives the code, that their contribution is free from patent encumbrances (and if it is not, that they license that patent to anyone who receives the code,) and that use of Trademarks extends only as far as is necessary to use the product. It also includes a patent termination clause, should a lawsuit arise. The Apache licenses encourage open-source development and our software is made better by every person who runs it, files tickets about it, or patches it. This is invaluable contribution – each user is given freedom and respect from the other members of the developer community.

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
Standup_reports		Standup_reports
depreciated		depreciated
maker_files		maker_files
plots		plots
raw_data		raw_data
.gitignore		.gitignore
Demo.ipynb		Demo.ipynb
Donut_chart.py		Donut_chart.py
Generate_Poster_Graphics.ipynb		Generate_Poster_Graphics.ipynb
LICENSE		LICENSE
Project_tasks.md		Project_tasks.md
README.md		README.md
bacteriopop_utils.py		bacteriopop_utils.py
dbscan.py		dbscan.py
dynamic_mode_decomposition.py		dynamic_mode_decomposition.py
feature_selection_utils.py		feature_selection_utils.py
gmm.py		gmm.py
load_data.py		load_data.py
network_construction.py		network_construction.py
plot_counts.py		plot_counts.py
tests.py		tests.py
time_heatmap.py		time_heatmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Bacterial Community Compositions

About

Releases

Packages

Contributors 3

Languages

License

JanetMatsen/bacteriopop

Folders and files

Latest commit

History

Repository files navigation

Analysis of Bacterial Community Compositions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages