Skip to content

vischulisem/Maize_Scanner

Repository files navigation

Overview of Maize Scanner 🌽

This repository is a group of scripts and data to analyze an ear of maize. Each script can be cloned and run in the terminal and uses argparse as argument parser.
Scripts and the order they should be run are outlined below.

Maize ear scanner video projections created with this method: https://github.com/fowler-lab-osu/flatten_all_videos_in_pwd

Prerequisites

numpy https://docs.scipy.org/doc/numpy/user/install.html
argparse https://pypi.org/project/argparse/
xml.etree.ElementTree https://docs.python.org/3/library/xml.etree.elementtree.html
pandas https://pandas.pydata.org/pandas-docs/stable/install.html
matplotlib https://matplotlib.org/3.1.1/users/installing.html
seaborn https://seaborn.pydata.org/installing.html
pylab https://www.techwalla.com/articles/how-to-install-pylab-on-python
scipy https://www.scipy.org/install.html
pysal https://pysal.org/install

XML Analysis

1. XML_to_ChiSquareTrasmPlot.py

What it does:
Takes xml file, gets X,Y coordinate points for each kernel, and labels as fluorescent or nonfluorescent. Creates sliding window parameter to scan ear counting the number of kernels, fluor, and nonfluor in each window. Calculates chi square statistic for entire ear and each individual window. Plots positional percent transmission line colored based on whether point is above or below p = 0.05 for chi squared test. Also plots and calculates regression.
This script accepts an xml file or directory of xml files, window width, and step size in pixels as arguments. Defaults are set for width and step size. Optional arguments listed:
-tk --total_kernels to adjust the minimum number of kernels per entire ear. Default set at 50. Therefore any xml files with less than 50 kernels per ear are skipped.
-n will normalize the X coordinates "window_mean" for all graphs
-p --path designates the path to save files. Default is current path.
-a will adjust chi square p values according to Benjamini–Hochberg procedure with statsmodels. This takes into account multiple comparisons, however, it is not the best choice for a test. This should be changed in the future. Statsmodels has a variety of other tests that could easily be changed. You can read more here:
https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html
Outputs: New directory 'Transmission_plots' containing graphs corresponding to each xml file with positional percent transmission for each window, regression, and ear statistics. Also outputs 'meta_df.txt' which contains overall ear stats and p-values. Also outputs 'everything_df.txt' which contains all window calculations and p values for each window for each xml file. .txt files are tab delimited use to reconvert to new pandas dataframe in later scripts.
X400x417-3m1

2. Family_Graphs.py

What it does:
Takes 'everything_df.txt' and creates a variety of plots based on male/female crosses.
Plots all xml files on single scatterplot with regression line.
Plots all files with male and female in 400s on scatterplot with regression line.
Plots all females in 400s on scatterplot with regression line.
Plots each male family in 400s with one or more lines on each graph corresponding to each xml file in family. Also plots regression lines.
Plots regression line from each male family colored based on expression.
Plots regression line from each male family colored based on known transmission defect or none.
This script accepts everything_df.txt, starting value for male family plots (must be in 400s), and ending value for male family plots (must be in 400s). Example) 410 499. Defaults are set for sv and ev. Also requires Table8.tsv from Data_for_Analysis folder.
-n will normalize x coordinates 'window_mean' for all graphs
-p will allow you to select path for saving files. Default is current path.
Outputs are plots saved as .png files. Male faimly plots are saved in new directory 'Male Plots/' with each family as plot name.
everything_norm_plot
Everything_norm_400
Female 400s Norm Cross Plot
417
Exp_Reg_Plot
Trans_Reg_Plot

3. Kern_Coord.py

What it does:
This script is optional to run and can be run in any order.
Takes XML file and plots coordinates of each kernel on plot. Labels kernels whether fluor or nonfluor. Saves plots to new directory with xml file as plot name.
This script accepts single xml file or directory of xml files as arguments. Outputs scatterplots.
-p will allow you to select path to save files. Default is current path.
X401x492-2m1

4. Male_plot_folder.py

This script allows you to 'drag and drop' desired xml files into a folder, process all files, and output a male family plot based on everything in the folder. Input arguments are xml file or directory of xml files. Ouptut is one male family plot. It is really important that you designate the path where you want it saved, as all graphs have the same title. Graphs can be distingushed apart by the key but that may be annoying.

Modelling Analysis

This group of scripts must be run in this order. Scripts are similar to those above however this generates 5 different models for each xml file. At each kernel coordinate, whether the point is fluorescent or nonfluorescent is randomly assigned.

1. Model.py

This script is nearly identicle to XML_to_ChiSquareTrasmPlot.py
What it does differently:
Creates 5 models based on randomly assigning fluor or nonfluor to coordinates. Calculates chi squared statistic for each model. Plots positional percent transmission line (colored based on p value) for each model with regression line.
This script accepts xml file or directory of xml files, window width, and step size as arguements. Defaults are set for width and step size. Optional arguments listed:
-tk --total_kernels to adjust the minimum number of kernels per entire ear. Default set at 50. Therefore any xml files with less than 50 kernels per ear are skipped.
-n will normalize the X coordinates "window_mean" for all graphs
-p --path designates the path to save files. Default is current path.
-a will adjust for multiple comparisons of the p-values from chi square test using Benjamini–Hochberg procedure.
Outputs new directory 'Model_Transmission_plots/' with each model graph labeled with file name. Also outputs 2 text files to become dataframes in later scripts. 'meta_model_df.txt' contains overall ear statistics for each model. 'everything_model_df.txt' contains window calculations and chi square statistics for each ear. Again txt files are tab delimited.
X400x417-3m1_model

2. Fam_MODEL_Graphs.py

This script is nearly identicle to Family_Graphs.py
What it does differently:
Creates scatterplots with regression line for each model and overlays them.
Plots all xml files on single scatterplot with regression line. --5 MODELS PER XML FILE
Plots all files with male and female in 400s on scatterplot with regression line. --5 MODELS PER XML FILE
Plots all females in 400s on scatterplot with regression line. --5 MODELS PER XML FILE
This script accepts 'everything_model_df.txt' as arguments. Outputs are saved graphs. Defaults are set for sv and ev.
-n will normalize x coordinates 'window_mean' for all graphs
-p will allow you to select path for saving files. Default is current path.
everything_norm_MODEL_plot
Everything_400_norm_MODEL
Female MODEL norm 400s Cross Plot

Histograms

1. Histogram.py

From model_meta_df.txt or model_normalized_meta_df.txt, and meta_df.txt or meta_normalized_df.txt creates 6 histograms (R-squared, slope, and p-value) based on regression statistics for all models and all normal xmls.
This script accepts the model_meta_df.txt and meta_df.txt as an argument. Optional arguments are '-p' to determine path where plots are saved. Default is current path. If -f will only make histograms of X4..x4.. crosses in file names. If -s will only make histogram for the 492 male family.
Example Histograms:
Model_400sSlope_Hist
XML_400sSlope_Hist

2. Histogram_defect.py

From model_meta_df.txt and meta_df.txt, will make histograms for transmission defect or no transmission defect for both normal xml files and model. Requires Table8.tsv as argument as well. Can designate new path for saving plots with -p otherwise current path is default.
Defect_Slope_Hist
Model_defect_Slope_Hist
Model_nodefect_Slope_Hist
No_defect_Slope_Hist

Spatial Statistics

Beginning scripts to analyze the spatial distribution of kernels across the ear.

1. Spatial_stats.py

This is very beginning steps to begin statstically evaluating the spatial distribution of kernel coordinates. Accepts xml file, or directory of xml files as arguments. Then computes quadrat based statistics for homogeneous planar points. Output is a giant dataframe of p-values and pseudo-pvalues for evaluation. I believe our data must be evaluated based on inhomogeneous poisson processes so new stats must be applied, perhaps using R spatstat package. More info for what is involved in this script can be found here:
https://nbviewer.jupyter.org/github/pysal/pointpats/blob/master/notebooks/Quadrat_statistics.ipynb
https://nbviewer.jupyter.org/github/pysal/pointpats/blob/master/notebooks/distance_statistics.ipynb

2. Export_xml_coord.py

This script will take a single xml, or directory of xml files, as an arugument and output kernel coordinates to a tab-delimited text file. This could potentially be used for R spatstat.

Authors

Elyse Vischulis

Contributors

Cedar Warman, Oregon State University

About

Analyze corn kernels across an ear!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages