Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[ADAM-384] Adds import from FASTQ. #385
Resolves #384. Adds:
Load from single ended FASTQ and interleaved FASTQ are handled seamlessly by the ADAMContext.adamLoad method. Since paired ended (but non-interleaved) data requires two file paths, I haven't added it into adamLoad; thus, it sits on its own.
I wrote our own FASTQ input format instead of using the one in Hadoop-BAM; theirs is only compatible with Hadoop 1, performs unnecessary parsing, and doesn't seem to pick splits correctly anyways. The SingleFastqInputFormat is almost a direct copy of the InterleavedFastqInputFormat, which we've tested pretty well on clusters.