Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype threaded writing interface and code for diginorm and filter-abund. #92

Closed
wants to merge 1,290 commits into from
Closed

Conversation

ctb
Copy link
Member

@ctb ctb commented Jul 29, 2013

No description provided.

Eric McDonald and others added 30 commits February 5, 2013 16:48
… compatibility wrapper method for hashtables.
Modified sweep-reads3.py to compute the estimated mem usage as #hts * ht size, instead of just estimated size of 1 ht
(Note: Still need to test in multithreaded operation.)
Create C++ exception for signalling when no more reads are available.
… with 'sed' may have corrupted some quality scores.)
…ed that <error.h> was a GNU extension. Usually I write my own version of 'error' but got lazy this time.)
@mr-c
Copy link
Contributor

mr-c commented Aug 20, 2013

Did someone volunteer to rewrite the _process_fn and SingleWriter/PairWriter logic into c++?

@ctb
Copy link
Member Author

ctb commented Aug 20, 2013

On back burner for now.

@ctb
Copy link
Member Author

ctb commented Aug 20, 2013

@cswelcher, could you take a look at the latest normalize-by-median on this branch? I merged in my prototype changes to yours, and I need a fresh set of eyes to make sure I didn't do something stupid. (Tests do pass, note.)

@mr-c
Copy link
Contributor

mr-c commented Oct 11, 2013

@ctb I'll review this in more detail when it is merge-able :-)

@ged-jenkins
Copy link

Can one of the admins verify this patch?

@mr-c
Copy link
Contributor

mr-c commented Oct 11, 2013

Not yet, jenkins.

The pull request builder needs the jenkins-ci.sh script to work. The pull request builder needs the jenkins-ci.sh script to work. Lets wait for #167 to be merged into master first.

@mr-c
Copy link
Contributor

mr-c commented Nov 12, 2013

@ctb can you resolve the merge conflicts so that Jenkins can test this?

@mr-c
Copy link
Contributor

mr-c commented Jan 9, 2014

ping @ctb

@mr-c
Copy link
Contributor

mr-c commented Jan 20, 2014

@ctb ,

When this branch is merged it would close many issues:

#75 abundance-dist-inmem fails with small files and many threads
#23 Casava 1.8 pair checking in scripts/normalize-by-median, scripts/abund-filter and khmer/thread-utils
#39 (maybe?) Refactor 'Read' Class
#69 Encapsulate common read threading code in an API
#76 make filter-abund use the C++ threading code

Would you like to me get it mergeable with the master branch and review it?

@mr-c mr-c mentioned this pull request Jan 20, 2014
12 tasks
@ctb
Copy link
Member Author

ctb commented Jan 21, 2014

On Mon, Jan 20, 2014 at 11:22:51AM -0800, Michael R. Crusoe wrote:

@ctb ,

When this branch is merged it would close many issues:

#75 abundance-dist-inmem fails with small files and many threads
#23 Casava 1.8 pair checking in scripts/normalize-by-median, scripts/abund-filter and khmer/thread-utils
#39 (maybe?) Refactor 'Read' Class
#69 Encapsulate common read threading code in an API
#76 make filter-abund use the C++ threading code

Would you like to me get it mergeable with the master branch and review it?

Right now the branch is not viable for performance reaons. So, it needs some
reasonably extensive work. Not ready for merge at all.

cheers,

--titus

C. Titus Brown, ctb@msu.edu

@camillescott
Copy link
Member

@ctb and @mr-c,

As you (maybe) know, this semester I'm taking the parallel programming course. A big part of the course is a final project, and I've opted for mine to be fixing/implementing khmer's multithreading capabilities. It seems like a good project for me, as a) it's useful b) it's doable and c) I know the codebase. As of @ctb's last comment, this particular pull request "needs some reasonably extensive work" -- consider this my taking on that reasonably extensive work.

My brief writeup for that project can be found here: https://github.com/camillescott/fs2014-cse891/tree/master/final

My goals here are:

  • Enable full multithreading support for hashing and querying
  • In line with the former goal, implement the necessary machinery in C++ land for fully-threaded diginorm
  • Depending on time and doability constraints, implement at least a prototype of a distributed memory hashtable/countinghash/etc
  • Contingent on the previous goal, an MPI implementation that is largely compatible with the current traversal framework for partitioning

The first two goals fit nicely into existing work; the MPI stuff will need further discussion and its own pull request when the time comes. For now, I'm going to start with this code (though I need to fix the total horking it took from the new project structure); I'd like to sit down with you two and @luizirber when everyone is back in town to go over all existing parallelization, and what are good and bad approaches (ie what has failed in the past).

Thoughts?

@ged-jenkins
Copy link

Test FAILed.

@camillescott camillescott mentioned this pull request Nov 14, 2014
25 tasks
@ctb
Copy link
Member Author

ctb commented Jan 18, 2015

I think we can close this now ;)

@ctb ctb closed this Jan 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants