Testing

Installing McDiff from Github

Step 1:

# Clone the repo and move into it
$ git clone https://github.com/NCBI-Hackathons/McDiff.git
$ cd McDiff

That's it! You're in the right folder and should be ready to run.

Configuration and dependencies

System requirements: python 3.6.5
Libraries: shapely, matplotlib, descartes, numpy, scipy, random

You can use pip install to install the required python libraries

Testing

We tested McDiff with the sample files found in the main directory. These files are:

McDiff

A Monte Carlo Approach for Estimating Diffusion Coefficients
This would be used by a cell biologist to get the coeficeint of free diffusion, and fraction of protein that accumulates at the region of interest within a cell.
$python3 run_sims.py mask_file ROI_file data_file

Where mask_file is the file of the outline of the nucleus ROI_file is the file for region of interest and data_file is the file of the outputs from your FADD experiment.

Abstract

Cell biologists can study the recruitment of DNA repair proteins to sites of DNA damage in live cells by using laser micro-irradiation to induce damage, which is known as Fluorescence Accumulation after DNA Damage (FADD). By monitoring the time-dependent accumulation of proteins at the sites of damage, or region of interest (ROI), the biologists then aim to calculate the coefficient of free diffusion (D) of each molecule of interest within the nucleus. This code simulates particles freely diffusing within a cell nucleus using a Monte Carlo model, becoming trapped at the ROI. Our code then seeks to optimize the best Diffusion constant (D) and fraction of protein that accumulates (F), and creates a heatmap for the best fit by using r-squared values comparing a fit to the data.

Intro

Our focus is on Poly(ADP-ribose) polymerase 1 (PARP1), a protein which is considered a first-responder protein to sites of DNA damage due to its rapid accumulation at those sites. This accumulation can be quantified through laser microirradiation, a protocol which induces a discrete site of laser damage. The protein of interest is fluorescently labeled and then tracked over time. The movement of these proteins will vary depending on whether there are biochemical attractions/repulsions to the damaged DNA. However, random diffusion will model a number of these proteins as well, and for Parp one of our primary goals was to show that there is a discrete diffusion constant and mobility fraction after microirradiation, which provides evidence that additional similar proteins can be modeled using the random walk after DNA damage.

What's the problem?

Before the implementation of this program, biologists would have to run either a mathematica or matlab scripts where each single simulation is run with manually entered coefficients. Quality of the fit to the experimental data was evaluated by visual inspection by the biologist. If the simulation was not aligned with the experimental data, then the biologist would change their coefficient values, then re-run the script and see if those values fit better, a tedious and labor-intensive process. Our new implementation runs simulations with multiple coefficients and calculates an r-squared error value that represents a best fit sample. Then the biologist can use that simulation, and be given the proper coefficients with the proper error.

Keywords

Free Diffusion Constant, Nucleus, Fluorescence Accumulation after DNA Damage (FADD)

How to use McDiff

$python3 run_sims.py mask_file ROI_file data_file

Where mask_file is the file of the oultine of the nucleus ROI_file is the file for region of intrest and data_file is the file of the outputs from your FADD experiment.

Output

You will get a .png of the best fit simulation over your experimental data, the D and F coeficients and the r-squared error value for the simulation that was best.

The results of running the simulation on the provided test files is below.

Software Workflow Diagram

Methods

Warm-up:

We started by getting comfortable with the experiments that were being performed in the nucleus. We then reviewed the existing Matlab and Mathematica scripts that originally ran this program, although with the lack of simulation to find optimal parameters.

Parsing/ Setup functions:

We uploaded the outline of the simulated nucleus. We then populated the nucleus with a gaussian distributed collection of simulated particles. This is done by creating a square region around the nucleus. Then 36,000 particles are placed in the square; they are checked for localization within or without the nucleus and the first 12,000 points localized within the nucleus are used for the simulation. This implementation is ~10x faster than placing individual points and checking for their inclusion in the nucleus.

Simulation:

The simulation starts with all of the point in the nucleus, then the region of interest (ROI) is placed within the nucleus. We then calculate the percent of particles that are considered to be immobile, as for some cells, there are some proteins that are not free to move. Then within the ROI we calculate the percent that will not be read by the camera in the experiment because they will be bleached. Once this setup is complete we can start to have the mobile particles move.

The way that we calculate movement was given to us by P-chem Guy as he created the original program, and knows how the simulations should move. Each particle is moved either in the positive/negative x,y directions, with lengths of each having a random gaussian movement. The program then moves the particles for each timestep, and the number of particles within the ROI is recorded, with each timestep.

We ran into a roadblock when the simulation of movement within the nucleus was taking a long time (~10 minutes). We found that the we could speed it up by almost 100x by using a vector library within the shapely library. Shapely is used to outline the nucleus and simulate the walls.

Once we had a simulation of particles moving in the nucleus, we started work on plotting the nucleus simulation results.

Then we worked on importing data from the experiments, and fitting simulated curves to the experimental curves. From there we could get an r-squared value and see which simulation gave us the best fitted curve with adjusted D and F values. This is done by running each simulation with different D’s and F’s and finding the best simulation with the most fit curve.

Output:

The user is then given a graph in .png form of the best fit graph, along with the D,F, and error values for reporting.

Installation options:

McDiff should be installed directly from Github.

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
app		app
images		images
test_files		test_files
.gitignore		.gitignore
FILEMCMC_results.csv		FILEMCMC_results.csv
FILE_MCMC_results.csv		FILE_MCMC_results.csv
Initial_flowchart_mcdiff_1.xml		Initial_flowchart_mcdiff_1.xml
LICENSE		LICENSE
README.md		README.md
config.py		config.py
create_random_points.py		create_random_points.py
fadd.py		fadd.py
initialize_points_w.py		initialize_points_w.py
optimization.py		optimization.py
parp_simulator.py		parp_simulator.py
parpsimulator.m		parpsimulator.m
requirements.txt		requirements.txt
run_sims.py		run_sims.py
sims.py		sims.py

License

NCBI-Hackathons/McDiff

Folders and files

Latest commit

History

Repository files navigation

Installing McDiff from Github

Step 1:

Configuration and dependencies

Testing

McDiff

Abstract

Intro

What's the problem?

Keywords

How to use McDiff

Output

Software Workflow Diagram

Methods

Installation options:

License

About

Resources

License

Stars

Watchers

Forks

Languages