Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: iterator #11

Closed
dzerbino opened this issue Nov 14, 2016 · 6 comments
Closed

Feature request: iterator #11

dzerbino opened this issue Nov 14, 2016 · 6 comments

Comments

@dzerbino
Copy link
Contributor

Hello,

I have developed a library to compute whole genome statistics from multiple BigWig files WiggleTools. To do this, it needs to be efficient with memory, and therefore uses iterators intensively.

WiggleTools uses the Kent source tree, but this entails quite a few dependencies that I would like to get rid of. I would be very keen to switch to libBigWig.

Would it be possible to create iterator functions within libBigWig?

Typically, an iterator could be created either as a whole genome iterator, or over a region of interest.

Instead of returning all results in a single bwOverlappingIntervals_t struct, it could return a sequence of bwOverlappingIntervals_t structs, each object covering a number of consecutive compressed blocks on disk. FWIW, my code currently looks like (using Kent functions):

	struct fileOffsetSize *blockList, *block, *beforeGap, *afterGap;

       // Search for linked list of blocks overlapping region of interest 
       blockList = bbiOverlappingBlocks(file_handle, search_tree, chrom, start, finish, NULL);

	for (block = blockList; block; block=afterGap) {
		/* Read contiguous blocks into mergedBuf. */
		fileOffsetSizeFindGap(block, &beforeGap, &afterGap);

		// Little hack to limit the number of blocks read at any time
		struct fileOffsetSize * blockPtr, * prevBlock;
		int blockCounter = 0;
		prevBlock = block;

                // Count max blocks or until you hit a gap in the disk
		for (blockPtr = block; blockPtr != afterGap && blockCounter < MAX_BLOCKS; blockPtr = blockPtr->next) {
			blockCounter++;
			prevBlock = blockPtr;
		}

                // If you stopped before the gap, pretend you hit a gap
		if (blockCounter == MAX_BLOCKS) {
			beforeGap = prevBlock;
			afterGap = blockPtr;
		}

		bits64 mergedSize = beforeGap->offset + beforeGap->size - block->offset;

		if (downloadBlockRun(data, chrom, block, afterGap, mergedSize)) {
			slFreeList(blockList);
			return true;
		}
	}

Thanks in advance for considering my request,

Daniel

@dpryan79
Copy link
Owner

dpryan79 commented Nov 17, 2016

There's now an iterators branch with a possible implementation of this. I currently have iterators for intervals (bigWig files) and entries (bigBed files), so let me know if you'd like base-level values (bigWig files) as well. There's a full example here, but the gist is:

iter = bbOverlappingEntriesIterator(fp, chrom, start, end, withString, blocksPerIteration);
while(iter->data) {
    //do something with entries in iter->entries
    iter = bwIteratorNext(iter);
}
bwIteratorDestroy(iter);

The code is the same for bigWig intervals, except bwOverlappingIntervals() and iter->intervals are used.

If that sort of interface works for you then I'll start writing the documentation.

@dzerbino
Copy link
Contributor Author

This looks great! I'm afraid I won't be able to test it directly in my own code until next week, but the API seems sensible enough.

When you say base-level values, do you mean splitting the intervals into 1bp intervals? I personally wouldn't use it.

@dpryan79
Copy link
Owner

Yes, that's exactly what I mean by base-level values. That's popular in the python wrapper (pyBigWig), but if no one currently has a use for it in C then I'll hold off on the implementation.

Anyway, I'll try to get some documentation finished. This seems to work in my tests, so let me know when you have a chance to play around with it. If it works for your needs I'll merge it in and make a new release.

@dzerbino
Copy link
Contributor Author

Thanks a lot. At best, I can play with it on Friday in a week. I'll let you know how it goes!

@dzerbino
Copy link
Contributor Author

OK curiosity killed the cat, I just tested it on a separate branch, all unit tests succeeded:
https://github.com/Ensembl/WiggleTools/tree/libBigWig

Thanks a lot!

@dpryan79
Copy link
Owner

Great! I'll make a new release then (probably tomorrow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants