New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak: Bio::DB::HTS::Tabix - very reproducible #17
Comments
Shoving a quick |
OK - I'll have a look, Cheers Rishi On 18/05/2016 15:55, Andrew Yates wrote:
|
Okay think I've got it. |
Think this is solved by PR #18 |
Problems in the cleanup code
|
Looks good now 👍 |
Closing then |
:( There's still something very tiny leaking. It doesn't create a memory increase between 10 and 100 iterations so I didn't spot it, perl appears to use ~44MB itself. It isn't as critical as it's so much smaller (50MB@1000 iterations vs 600MB previously)
|
It looks like there is the same string memory issue in the header function as well, I'll update. |
This should be fixed in #21 |
There's still a leak somewhere but I think you've plugged another one @rishidev. My local using your code version shows a much smaller increase @keiranmraine had. But there is still an increase.
There's about a 14MB increase between 500 and 10,000 iterations and another 14MB going from 10k to 20k records. If I reduce the search space then there's still an increase but the amounts involved are much smaller (I did a smaller area over 500 iterations and then one that returns nothing 20K iterations).
Altering the amount returned does have an effect. I've tried a few things to plug other possible leaks in the code including:
|
Example XS code that I mentioned before that cuts out any internal Perl stuff (apart from XS). I assume it's easy enough to convert into a standalone C program if need be void
hts_open(packname, filename, mode="r", region, repeats)
char * packname
char * filename
char * mode
char * region
int repeats
PREINIT:
htsFile * htsfile;
tbx_t * tabix;
hts_itr_t * iter;
kstring_t str = {0,0,0};
char * mode = "r";
SV * line;
CODE:
printf("%s | %s | %d\n", filename, region, repeats);
htsfile = hts_open(filename,mode);
tabix = tbx_index_load(filename);
int i = 0;
for( ; i <= repeats; i++) {
iter = tbx_itr_querys(tabix, region);
//printf("Iteration %d | ", i);
int linecount = 0;
while(tbx_itr_next(htsfile, tabix, iter, &str) >= 0) {
//printf("\t%s\n", str.s);
//line = newSVpvn(str.s, str.l);
linecount++;
//free(str.s);
}
//printf("Lines: %d\n", linecount);
tbx_itr_destroy(iter);
//free(str.s);
}
free(str.s);
hts_close(htsfile);
tbx_destroy(tabix); |
And the C binary version of that code (I assume is actually doing the right thing). #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "config.h"
#include "htslib/kstring.h"
#include "htslib/hts.h"
#include "htslib/tbx.h"
int main(int argc, char *argv[])
{
//parsing cmd line
char *endptr, *file, *region;
file = argv[1];
region = argv[2];
int repeats;
repeats = (int)strtol(argv[3], &endptr, 10);
// Start of local decs
htsFile * htsfile;
tbx_t * tabix;
hts_itr_t * iter;
kstring_t str = {0,0,0};
char * mode = "r";
printf("%s | %s | %d\n", file, region, repeats);
htsfile = hts_open(file,mode);
tabix = tbx_index_load(file);
int i = 0;
for( ; i <= repeats; i++) {
iter = tbx_itr_querys(tabix, region);
//printf("Iteration %d | ", i);
int linecount = 0;
while(tbx_itr_next(htsfile, tabix, iter, &str) >= 0) {
//printf("\t%s\n", str.s);
//line = newSVpvn(str.s, str.l);
linecount++;
//free(str.s);
}
//printf("Lines: %d\n", linecount);
tbx_itr_destroy(iter);
//free(str.s);
}
free(str.s);
hts_close(htsfile);
tbx_destroy(tabix);
return 0;
} Compiled in the libhts directory with
There's no leak from htslib as far as I can tell. There's something in the XS layer that we're not understanding. |
I believe PR #22 solves the issue. Seems we were returning from the subroutine too early. I've checked it on my test env and memory usage is pretty static. Fingers crossed! |
Thanks Andy |
Going to mark as close. @keiranmraine if you find another memory leak please re-open (please please please don't otherwise I'm going to spend the entire weekend listening to The Smiths songs and being depressed) |
Close, but hopefully easy to fix this time... Is it possible to add an automatic destroy for the query iterator? There was still a leak evident when you pushed into high numbers: Before adding a
After adding:
This isn't in the docs so all code using the module would need an upgrade to correct if it can't be handled in the background. It would also be quite difficult to communicate. |
It's already there ... https://github.com/Ensembl/Bio-HTS/blob/master/lib/Bio/DB/HTS/Tabix/Iterator.pm#L69. I'm sure it's called. Okay just taken the |
Okay there's a fix. Using |
Thanks again! |
Use the following small bit of code to reproduce with a tabix indexed BED file:
Example runs (coordinate must hit records):
Seems a very similar problem to
Bio::DB::HTS::Faidx
This is pretty critical as we've discovered this in the middle of a pre-release test cycle
The text was updated successfully, but these errors were encountered: