Skip to content

population dynamics for abundances calculated from meta-omics sequencing

License

Notifications You must be signed in to change notification settings

JanetMatsen/bacteriopop

Repository files navigation

Analysis of Bacterial Community Compositions

Population dynamics for abundances calculated from meta-omics sequencing in bacterial community samples from Lake Washington.

Project Background:

  • 88 samples: (4 replicates of high oxygen + 4 replicates of low oxygen) * 11 samples per replicate.
  • Sequenced for 11 weeks: Weeks 4 - 14.
  • Oxygen conditions were switched for the last 4 samples.
  • Organisms "taxonomy" is described by: Kingdom, Phylum, Class Order, Family, Genus

Visit our:

Tools used in this project:

|

Name|

Source package |

Description |

Output| | ----------------------- |:--------------------------------:| -----------------------------:| -------------------------: ||

Dynamic Mode Decomposition (DMD)|

Python modred|

Dimensionality reduction algorithm for a time series of data that computes a set of modes each of which is associated with a fixed oscillation frequency and decay/growth rate |

Matrix of interaction values A for every sample, either computed for every time step or bulked over time ||

NetworkX|

NetworkX|

Software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks|

Classic graphs, random graphs, and synthetic networks with any kind of node (e.g. text, images, XML records) and edges holding arbitrary data (e.g. weights, time-series) ||

Density-based spatial clustering of applications with noise (DBSCAN)|

Python scikit-learn|

Density-based data clustering algorithm that groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions|

Clusters of data points with performance metrics||

Gaussian Mixture Models (GMM)|

Python scikit-learn|

Parametric probability density function that generates all data points from weighted sum of Gaussian component densities with unknown parameters|

Clusters of data points with performance metrics||

Packages and libraries used in this project:

Directory Structure:

Bacteriopop directory structure

Next Steps:

  1. Make one A matrix per sample per replicate (2*4*10 A matrices) & compare to the current results with one A per replicate.
  2. Test normalization of data before finding the A matrices so total abundance doesn't dominate signal. Remove taxa with small abundances first.
  3. Plot networks as node graphs now that data reduction tools are ready
  4. Train on a subset of the data and see how predictive it is for the rest
  5. Compare including vs omitting the last 4 samples of each series, which have the oxygen tension reversed.
  6. Do multiple hypothesis corrections, and use this to guide the cutoff for plotting and further analysis.
  7. Connect these mathematical results to our real biological questions.

Why we chose the Apache License 2.0 :

The Apache License allows us to manage the software package as we please, while providing clear language regarding the terms. It makes it clear that individual contributors grant copyright license to anyone who receives the code, that their contribution is free from patent encumbrances (and if it is not, that they license that patent to anyone who receives the code,) and that use of Trademarks extends only as far as is necessary to use the product. It also includes a patent termination clause, should a lawsuit arise. The Apache licenses encourage open-source development and our software is made better by every person who runs it, files tickets about it, or patches it. This is invaluable contribution – each user is given freedom and respect from the other members of the developer community.


About

population dynamics for abundances calculated from meta-omics sequencing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published