Code to compare total genetic diversity in Humans vs Wheat
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea
.ipynb_checkpoints
__pycache__
Data_munging.ipynb
Human_entropy.npy
README.md
entropy_calculators.py
human_test_data.csv
inverse_entropy.py
same_score.py
swarm_plot.png
swarm_plot2.png
swarm_visulaization.py
test_data
test_data.csv
test_entropy.npy
total_data_driver.py
wheat_entropy.npy
wheat_test_data.csv

README.md

SNP_analysis

Code to compare total genetic diversity in Humans vs Wheat

A word of warning

Almost all sequencing data is collected in a different way. I used an Ipython notebook (Data_munging.ipynb) to lightly munge the data before feeding it into a more consistent pipeline.

Analyzing the data

entropy_calculators.py is a set of functions with unit test calculating average entropy for a species with a number of individuals and a number of SNPs. total_data_driver.py uses these functions to calculated the analysis for wheat and humans. The wheat and human data comes in large csv files which must be downloaded separately hereand here

Plotting

swarm_visulization.py makes the pretty swarm plots. Note that it uses an inverse entropy calculation to approximate what a population of individuals with a given entropy score would look like. Granted, a single entropy score has a degenerate number of possible populations that could produce it, so I just found one solution that would work.