Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization of the execution/code #1

Open
elolaine opened this issue Apr 11, 2018 · 0 comments
Open

Parallelization of the execution/code #1

elolaine opened this issue Apr 11, 2018 · 0 comments

Comments

@elolaine
Copy link
Collaborator

At the moment, PhyloSofS is given a number of iterations for the reconstruction of transcripts' phylogenies, and each time a forest structure is visited (computation of the lower bound) counts as an iteration. As the space of forest structures is being explored, better phylogenies are found with lower costs, and the lower bound cut gets more efficient.

To gain computing time, we should probably try and parallelize the forest structure space search. A naive way to do this is the following:

  • (0) Launch 1 short run (a few tens of iterations) starting from the widest forest structure. Save the best forest structure, best associated phylogeny and best associated cost.
  • (1) Then launch n medium-length runs (a few hundreds of thousands of iterations) in parallel, starting from a random phylogeny and given the best cost found for the cut. One of the run can be started from the best phylogeny instead of a random one. Each run will end up with a best phylogeny and associated best cost.
  • (2) We can launch again in the same way n medium-length runs... and loop over this procedure.
    The different runs will probably visit a lot of identical forest structures that will systematically be eliminated with the lower bound cut. It remains to be seen whether it would be interesting to try and record visited structures and avoid to visit them several times... the computation of the lower bound is clearly quick, so... it may not be worth it.

A more refined procedure would be to make the n processes (a) write down best phylogeny/cost each time they find a new one, and (b) read best cost each time they are about to make a random jump, so that the lower bound cut is as efficient as it can be. This would require to share I/O between processes, maybe by using semaphores...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant