Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-m option not working properly with large values #70

Closed
ekopylova opened this Issue Feb 18, 2015 · 4 comments

Comments

Projects
None yet
2 participants
@ekopylova
Copy link
Contributor

ekopylova commented Feb 18, 2015

SortMeRNA uses memory mapping (mmap) to load reads into RAM. Memory mapping works with contiguous blocks of memory. Some users are reporting issues setting the -m parameter to larger values where the program appears to stall on the following mmap call,

char* raw = (char*)mmap ( 0, partial_file_size, PROT_READ, MAP_SHARED, fd, offset_map );

@shaman-narayanasamy

This comment has been minimized.

Copy link

shaman-narayanasamy commented Jul 7, 2015

Sorry, but I am not sure if I should write in this thread or open a new one.

Anyhow, I am facing an issue with this parameter as well. While playing around with the values, we noticed that if one supplies a value more than half the memory available (as per displayed in "-help"), then the program throws an error. I don't know if this information would help in any way. I will keep you posted if I find any other strange behaviour...

@ekopylova

This comment has been minimized.

Copy link
Contributor Author

ekopylova commented Jul 7, 2015

Hi Shaman,
Thanks for reporting this observation. I should have time to look at this bug next month, just finishing up with some urgent projects right now.

@shaman-narayanasamy

This comment has been minimized.

Copy link

shaman-narayanasamy commented Jul 9, 2015

@ekopylova, no worries. Please do take your time. Meanwhile, I do have some additional observations to report.

I attempted running sortmerna on a metatranscriptomic Ilumina paired-end data set of the size 7.9 GB x 2 (R1 and R2). I have access to a machine with rather large memory; 48GB and cores; 12. As mentioned the previous post, I attempted providing the full memory; -m 48000 and it failed with errors. Therefore, I tried two different scenarios.

  1. First scenario: Using -m 24000, sortmerna runs without any errors/warnings, but it runs for a long time. It didn't complete even after more than TWO days! There must be something that went wrong here...
  2. Second scenario: Using a smaller -m 4000, sortmerna completed. Output seems to be good with ~54% of the data containing rRNA reads.

On a side note: I have a bunch of data sets like these (of similar size) and most of them ran through sortmerna with no issues whatsoever. I don't really know why this particular data set had an issue. I noticed that most of the other data sets that passed through with the default -m 1000 parameter seemed to contained less rRNA reads (<50%) compared to the one that failed. I don't know if this makes sense, but I am just reporting what I found. Let us know what you think. FYI, I would need to recheck the data sets in detail to report more on this. Let me know if you need more information/logs etc.

@ekopylova

This comment has been minimized.

Copy link
Contributor Author

ekopylova commented Aug 31, 2015

Hi Shaman,
I believe the error is fixed, my tests work for m values of n*1024 for n up to ~23 (-m 24000, results in ~24Gb peak virtual memory use).
Can you pull the latest master code and retry with your data? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.