Skip to content

Latest commit

 

History

History
1452 lines (1403 loc) · 169 KB

CHANGES.md

File metadata and controls

1452 lines (1403 loc) · 169 KB

ADAM Changelog

Version 0.23.0

Closed issues:

  • Readthedocs build error /#1854
  • Add pip release to release scripts /#1847
  • Publish scaladoc script still attempts to build markdown docs /#1845
  • Allow variant annotations to be loaded into genotypes /#1838
  • Specify correct extensions for SAM/BAM output /#1834
  • Fix link anchors and other issues in readthedocs /#1822
  • Sphinx fulltoc is not included /#1821
  • Readme link to bigdatagenomics/lime 404s /#1819
  • Bump to Hadoop-BAM 7.9.1 /#1817
  • LoadVariants Header Format /#1815
  • Right and Left Outer Shuffle Region Join don't match /#1813
  • Pipe command can fail with empty partitions /#1807
  • adam files with outdated formats throw FileNotFoundException /#1804
  • Move GenomicRDD.writeTextRDD outside of GenomicRDD /#1803
  • find-adam-assembly fails to recognize more than 1 jar /#1801
  • tests/testthat.R failed on git head /#1799
  • Run python and R tests conditionally in build /#1795
  • scala-lang should be a provided dependency /#1789
  • loadIndexedBam does an unnecessary union /#1784
  • Release bdgenomics.adam R package on CRAN /#1783
  • Issue with transformVariant // Adam to vcf /#1782
  • Add code of conduct /#1779
  • Reinstantiation of SQLContext in pyadam ADAMContext /#1774
  • Genotypes should only contain the core variant fields /#1770
  • Add SingleFASTQInFormatter /#1768
  • INDEL realigner can emit negative partition IDs /#1763
  • Request for a new release /#1762
  • INDEL realigner generates targets for reads with more than 1 INDEL /#1753
  • Fragment Issue /#1752
  • Variant Caller!!! /#1751
  • Spark Version!! /#1750
  • ReferenceRegion.subtract eliminating valid regions /#1747
  • New Shuffle Join Implementation - Left Outer + Group By Left /#1745
  • command failure after build success /#1744
  • Recalibrate_base_Qualities /#1743
  • Standardize regionFn for ShuffleJoin returned objects /#1740
  • Shuffle, Broadcast Joins with threshold /#1739
  • Adam on Spark 2.1 /#1738
  • Opening up permission on GenericGenomicRDD constructor /#1735
  • Consistency on ShuffleRegionJoin returns /#1734
  • vcf2adam support /#1731
  • Cloud-scale BWA MEM /#1730
  • Aligned Human Genome couldn't convert to Adam /#1729
  • Mark Duplicates /#1726
  • Genomics Pipeline /#1724
  • .fastq Alignment /#1723
  • Is it correct Adam file /#1720
  • .fastQ to .adam /#1718
  • Unable to create .adam from .sam /#1717
  • Add adam- prefix to distribution module name /#1716
  • Python load methods don't have ability to specify validation stringency /#1715
  • NPE when trying to map loadVariants over RDD /#1713
  • Add left normalization of INDELs as an RDD level primitive /#1709
  • Allow validation stringency to be set in AnySAMOutFormatter /#1703
  • InterleavedFastqInFormatter should sort by readInFragment /#1702
  • Allow silencing the # of reads in fragment warning in InterleavedFastqInFormatter /#1701
  • GenomicRDD.toXxx method names should be consistent /#1699
  • Exception thrown in VariantContextConverter.formatAllelicDepth despite SILENT validation stringency /#1695
  • Make GenomicRDD.toString more adam-shell friendly /#1694
  • Add adam-shell friendly VariantContextRDD.saveAsVcf method /#1693
  • change bdgenomics.adam package name for adam-python to bdg-adam /#1691
  • Conflict in bdg-formats dependency version due to org.hammerlab:genomic-loci /#1688
  • Convert and store variant quality field. /#1682
  • Region join shows non-determinism /#1680
  • Shuffle region join throws multimapped exception for unmapped reads /#1679
  • Push validation checks down to INFO/FORMAT fields /#1676
  • IndexOutOfBounds thrown when saving gVCF with no likelihoods /#1673
  • Generate docs from R API for distribution /#1672
  • Support loading a subset of VCF fields /#1670
  • Error with metadata: Multivalued flags are not supported for INFO lines /#1669
  • Include bdg.adam-0.23.0.tar.gz in distribution tarballs /#1668
  • Include bdgenomics.adam-0.23.0_SNAPSHOT-py2.7.egg in distribution tarball /#1667
  • Add SUPPORT.md file to complement CONTRIBUTING.md /#1664
  • Can't merge BAM files containing the same sample /#1663
  • Incorrect README.md kmer.scala loadAliments method parameter name /#1662
  • Add performance benchmarks similar to Samtools CRAM benchmarking page /#1661
  • Transient bad GZIP header bug when loading BGZF FASTQ /#1658
  • bdgenomics.adam vs bdg.adam for R/Python APIs /#1655
  • Need adamR script /#1649
  • incorrect grep for assembly jars in bin/pyadam /#1647
  • VariantRDD union creates multiple records for the same SNP ID /#1644
  • S3 access documentation /#1643
  • Algorithms docs formatting /#1639
  • Building downstream apps docs reformatting /#1638
  • FastqInputFormat.FILE_SPLITTABLE in conf not getting passed properly /#1635
  • Add benchmarks to documentation /#1634
  • Intro docs contain outdated/incompatible code /#1633
  • Intro docs missing a number of active projects /#1632
  • Installation instructions for Homebrew missing from documentation /#1631
  • Architecture section is missing from docs /#1630
  • Seq vs. Seq with javac /#1625
  • ProcessingStep missing from adam-codegen /#1623
  • Add ADAM recipe to bioconda /#1618
  • adam-submit cannot find assembly jar if installed as symlink /#1616
  • Expose transform/transmute in Java/Python/R /#1615
  • Expose VariantContextRDD in R/Python /#1614
  • Expose pipe API from Python/R /#1611
  • Serialization issue with TwoBitFile /#1610
  • Snapshot Distribution Does not include jar files /#1607
  • ManualRegionPartitioner is broken for ParallelFileMerger codepath /#1602
  • VariantRDD doesn't save partition map /#1601
  • Scala copy method not supported in abstract classes such as AlignmentRecordRDD /#1599
  • Interleaved FASTQ recognizes only /1 suffix pattern /#1589
  • Use empty sequence dictionary when loading features /#1588
  • New Illumina FASTQ spec adds metadata to read name line /#1585
  • first run of ADAM /#1582
  • Add unit test coverage for BED12 parser and writer /#1579
  • Spark 1.x Scala 2.10 snapshot artifacts missing since 31 March 2017 /#1578
  • Unable to save GenomicRDDs after a join. /#1576
  • Add filterBySequenceDictionary to GenomicRDD /#1575
  • Unaligned Trait does nothing /#1573
  • Bump to bdg-formats 0.11.1 /#1570
  • PhredUtils conversion to log probabilities has insufficient resolution for PLs /#1569
  • Reference model import code is borked /#1568
  • SequenceDictionary vs Feature[RDD] of reference length features /#1567
  • giab-NA12878 truth_small_variants.vcf.gz header issues /#1566
  • VCF header read from stream ignored in VCFOutFormatter /#1564
  • VCF genotype Number=A attribute throws ArrayIndexOutOfBoundsException /#1562
  • Save compressed single file VCF via HadoopBAM /#1554
  • bucketing strategy /#1553
  • Is parquet using delta encoding for positions? /#1552
  • Export to VCF does not include symbolic non-ref if site has a called alt /#1551
  • Refactor filterByOverlappingRegions not to require a List /#1549
  • Move docs to Sphinx/pure Markdown /#1548
  • java.lang.IncompatibleClassChangeError: Implementing class /#1544
  • Support locus predicate in TransformAlignments /#1539
  • Visibility from Java, jrdd has private access in AvroGenomicRDD /#1538
  • Rename o.b.adam.apis.java package to o.b.adam.api.java /#1537
  • VCF header genotype reserved key FT cardinality clobbered by htsjdk /#1535
  • Compute a SequenceDictionary from a *.genome file /#1534
  • Queryname sorted check should check for queryname grouped as well /#1530
  • Bump to bdg-formats 0.11.0 /#1520
  • Move to Spark 2.2, Parquet 1.8.2 /#1517
  • Minor refactor for TreeRegionJoin for consistency /#1514
  • Allow +Inf and -Inf Float values when reading VCF /#1512
  • SparkFiles temp directory path should be accessible as a variable /#1510
  • SparkFiles.get expects just the filename /#1509
  • Split apart #1324 /#1507
  • Where can I find "Phred-scaled quality score" (QUAL)? /#1506
  • Alignment Record sort is not consistent with samtools /#1504
  • Sequence dictionary records in TwoBitFile are not stable /#1502
  • Move coverage counter over to Dataset API /#1501
  • Allow users to set the minimum partition count across all load methods /#1500
  • Enable reuse of broadcast object across broadcast region joins /#1499
  • Take union across genomic RDDs /#1497
  • Adam files created by vcf2adam is not recognizable /#1496
  • Scalatest log output disappears with Maven 3.5.0 /#1495
  • ArrayOutOfBoundsException in vcf2adam (spark2_2.11-0.22.0) on UK10K VCFs (VCFv4.1) /#1494
  • ReferenceRegion overlaps and covers returns false if overlap is 1 /#1492
  • Provide asSingleFile parameter for saveAsFastq and related /#1490
  • Min Phred score gets bumped by 33 twice in BQSR /#1488
  • Should throw error when BAM header load fails /#1486
  • Default value for reads.toCoverage(collapse) should be false /#1483
  • Refactor ADAMContext loadXxx methods for consistency /#1481
  • loadGenotypes three time /#1480
  • Fall back to sequential concat when HDFS concat fails /#1478
  • VCF line with . ALT gets dropped /#1476
  • ADAM works on Cloudera but does NOT work on MAPR /#1475
  • Clean up ReferenceRegion.scala /#1474
  • Allow joins on regions that are within a threshold (instead of requiring overlap) /#1473
  • FeatureRDD.toCoverage throws NullPointerException when there is no coverage information /#1471
  • Add quality score binner /#1462
  • Splittable compression and FASTQ /#1457
  • Don't convert .{different-type}.adam in loadAlignments and loadFragments /#1456
  • New primitives for adam-core /#1454
  • Port over code for populating SequenceDictionaries from .dict files /#1449
  • Ignore failed push to Coveralls during CI builds /#1444
  • No asSingleFile parameter for saveAsFasta in NucleotideContigFragmentRDD /#1438
  • shufflejoin and ArrayIndexOutOfBoundsException /#1436
  • Document using ADAM snapshot /#1432
  • Improve metrics coverage across ADAMContext load methods /#1428
  • loadReferenceFile missing from Java API /#1421
  • loadCoverage missing from Java API /#1420
  • Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? /#1419
  • Clean up possibly unused methods in Projection /#1417
  • Problem loading SNPeff annotated VCF /#1390
  • RecordGroupDictionary should support isEmpty /#1380
  • Get rid of mutable collection transformations in ShuffleRegionJoin /#1379
  • Add tab5/6 as native output format for AlignmentRecordRDD /#1377
  • ValidationStringency in MDTagging should apply to reads on unknown references /#1365
  • Assembly final name doesn't include spark2 for Spark 2.x builds /#1361
  • Merge reads2fragments and fragments2reads into a single CLI /#1359
  • Investigate failures to load ExAC.0.3.GRCh38.vcf variants /#1351
  • adam-shell does not allow additional jars via Spark jars argument /#1349
  • Loading GZipped VCF returns an empty RDD /#1333
  • Bump Spark 2 build to Spark 2.1.0 /#1330
  • Rename Transform command TransformAlignments or similar /#1328
  • Replace ADAM2Vcf and Vcf2ADAM commands with TransformGenotypes and TransformVariants /#1327
  • FeatureRDD instantiation tries to cache the RDD /#1321
  • Repository for Pipe API wrappers for bioinformatics tools /#1314
  • Trying to get Spark pipeline working with slightly out of date code. /#1313
  • Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) /#1312
  • Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) /#1311
  • Don't include log4j.properties in published JAR /#1300
  • Removing ProgramRecords info when saving data to sam/bam? /#1257
  • ADAM on Slurm/LSF /#1229
  • Maintaining sorted/partitioned knowledge /#1216
  • Evaluate bdg-convert external conversion library proposal /#1197
  • Port AMPCamp Tutorial over /#1174
  • Top level WrappedRDD or similar abstraction /#1173
  • GFF3 formatted features written as single file must include gff-version pragma /#1169
  • Can probably eliminate sort in RealignIndels /#1137
  • Load SV type info field - need for allele uniquness /#1134
  • BroadcastRegionJoin is not a broadcast join /#1110
  • AlignmentRecordRDD does not extend GenomicRDD per javac /#1092
  • Add generic ReferenceRegion pushdown for parquet files /#1047
  • Use of dataset api in ADAM /#1018
  • Difference running markdups with and without projection /#1014
  • ADAM to BAM conversion fails using relative path /#1012
  • Refactor SequenceDictionary to use Contig instead of SequenceRecord /#997
  • NoSuchMethodError due to kryo minor-version mismatch /#955
  • Autogen field names in projection package /#941
  • Future of schemas in bdg-formats /#925
  • genotypeType for genotypes with multiple OtherAlt alleles? /#897
  • How to filter genotype RDD with FeatureRDD /#890
  • How to convert genotype DataFrame to VariantContext DataFrame / RDD /#886
  • R language package for Adam /#882
  • How to count genotypes with a 10 node Spark/Adam cluster faster than with BCFTools on a single machine? /#879
  • Ensure Java API is up-to-date with Scala API /#855
  • BroadcastRegionJoin fails with unmapped reads /#821
  • Resolve Fragment vs. SingleReadBucket /#789
  • Updating/Publishing the docs/ directory /#774
  • Next on empty iterator in BroadcastRegionJoin /#661
  • Cleanup code smell in sort work balancing code /#635
  • Provide low-impact alternative to transform -repartition for reducing partition size /#594
  • Create an ADAM Python API /#538
  • Migrate serialization libraries out of ADAM core /#448
  • Create standardized, interpretable exceptions for error reporting /#420
  • Build info/version info inside ADAM-generated files /#188

Merged and closed pull requests:

  • [ADAM-1854] Add requirements.txt file for RTD. /#1856 (fnothaft)
  • [ADAM-1783] Resolve check issues that block pushing to CRAN. /#1849 (fnothaft)
  • [ADAM-1847] Update ADAM scripts to support self-contained pip install. /#1848 (fnothaft)
  • [ADAM-1845] Only build and publish scaladocs in publish-scaladoc.sh. /#1846 (heuermh)
  • [ADAM-1843] Install sources before calling scala:doc in publish scaladoc /#1844 (fnothaft)
  • Remove python and R profiles from release script /#1842 (heuermh)
  • [ADAM-1817] Bump to Hadoop-BAM 7.9.1. /#1841 (fnothaft)
  • [ADAM-1838] Make populating variant.annotation field in Genotype configurable /#1839 (fnothaft)
  • [ADAM-1834] Add proper extensions for SAM/BAM/CRAM output formats. /#1835 (fnothaft)
  • [ADAM-1822] Misc docs cleanup /#1827 (fnothaft)
  • Added missing init.py for fulltoc. /#1824 (fnothaft)
  • [ADAM-1821] Add missing fulltoc for Sphinx documentation. /#1823 (fnothaft)
  • Fix link to documentation /#1820 (nzachow)
  • [ADAM-1634] Add algorithm benchmarks to documentation. /#1818 (fnothaft)
  • [ADAM-1813] Delegate right outer shuffle region join to left OSRJ implementation. /#1814 (fnothaft)
  • [ADAM-1807] Check for empty partition when running a piped command. /#1812 (fnothaft)
  • [ADAM-1803] Refactor GenomicRDD.writeTextRdd to util.TextRddWriter. /#1809 (heuermh)
  • Added Filter error when file loaded does not match schema /#1805 (akmorrow13)
  • changed num_jars count /#1802 (akmorrow13)
  • [ADAM-1795] Map -DskipTests=true to exec.skip for Python and R tests. /#1800 (heuermh)
  • [ADAM-1672] Use working directory for R devtools::document(). /#1798 (heuermh)
  • [ADAM-1789] Move scala-lang to provided scope. /#1790 (fnothaft)
  • [ADAM-1784] loadIndexedBam should pass the raw globbed path to Hadoop-BAM /#1785 (fnothaft)
  • [ADAM-1664] Add SUPPORT.md file to complement CONTRIBUTING.md. /#1781 (heuermh)
  • [ADAM-1779] Adding code of contact adapted from the Contributor Convenant, version 1.4. /#1780 (heuermh)
  • [ADAM-1661] Add file storage benchmarks. /#1772 (fnothaft)
  • [ADAM-1770] Genotype should only store core variant fields. /#1771 (fnothaft)
  • [ADAM-1768] Add InFormatter for unpaired FASTQ. /#1769 (fnothaft)
  • [ADAM-1643] Add S3 access documentation. /#1767 (fnothaft)
  • [ADAM-1763] Apply absolute value to destination partition in ModPartitioner /#1766 (fnothaft)
  • Add R and Python into distribution artifacts /#1765 (fnothaft)
  • [ADAM-1655] Move R package to bdgenomics.adam. /#1764 (fnothaft)
  • [ADAM-1753] Only emit realignment targets for reads containing a single INDEL /#1756 (fnothaft)
  • [ADAM-1715] Support validation stringency in Python/R. /#1755 (fnothaft)
  • [ADAM-1680] Eliminate non-determinism in the ShuffleRegionJoin. /#1754 (fnothaft)
  • update to _replaceRdd with tests /#1749 (akmorrow13)
  • [ADAM-1747] Fixed subtract bug and tests /#1748 (devin-petersohn)
  • [ADAM-1745] Adding LeftOuterShuffleRegionJoinAndGroupByLeft and tests /#1746 (devin-petersohn)
  • Enabled thresholding for joins and standardized regionFn /#1741 (devin-petersohn)
  • Making join return types consistent /#1737 (devin-petersohn)
  • Opening up permissions on GenericGenomicRDD /#1736 (devin-petersohn)
  • [ADAM-1716] Add adam- prefix to distribution module name. /#1733 (heuermh)
  • [ADAM-1695] Check for illegal genotype index after splitting multi-allelic variants. /#1725 (heuermh)
  • [ADAM-1517] Bump Parquet version in a manner compatible with Spark 2.2.x /#1722 (fnothaft)
  • [ADAM-1512] Support VCFs with +Inf/-Inf float values. /#1721 (fnothaft)
  • [ADAM-1709] Add ability to left normalize reads containing INDELs. /#1711 (fnothaft)
  • [ADAM-1691] Move bdgenomics.adam to use a namespace package. /#1706 (fnothaft)
  • moved bdgenomics.adam package to bdgenomics-adam /#1705 (akmorrow13)
  • Misc cleanup needed for bigdatagenomics/cannoli#65 /#1704 (fnothaft)
  • [ADAM-1699] Make GenomicRDD.toXxx method names consistent. /#1700 (heuermh)
  • [ADAM-1694] Add short readable descriptions for toString in subclasses of GenomicRDD. /#1698 (heuermh)
  • [ADAM-1693] Add adam-shell friendly VariantContextRDD.saveAsVcf method. /#1696 (heuermh)
  • [ADAM-1688] Add bdg-formats exclusion to org.hammerlab:genomic-loci dependency. /#1690 (heuermh)
  • [ADAM-1679] Unmapped items should not get caught in requirement when sorting /#1687 (fnothaft)
  • [ADAM-1566] Merge VCF header lines with VCFHeaderLineCount.INTEGER correctly. /#1685 (heuermh)
  • [ADAM-1682] Add variant quality field. /#1684 (fnothaft)
  • Remove adam- prefix from module directory names. /#1681 (heuermh)
  • Update to hadoop-bam 7.9.0 and htsjdk 2.11.0. /#1678 (heuermh)
  • [ADAM-1676] Add more finely grained validation for INFO/FORMAT fields. /#1677 (fnothaft)
  • Python API fixes for AlignmentRecordRDD /#1675 (akmorrow13)
  • [ADAM-1673] Don't set PL to empty when no PL is attached to a gVCF record /#1674 (fnothaft)
  • [ADAM-1670] Add ability to selectively project VCF fields. /#1671 (fnothaft)
  • [ADAM-1663] Enable read groups with repeated names when unioning. /#1665 (fnothaft)
  • Maint 2.11 0.18.0 /#1659 (Douglas-H)
  • [ADAM-1630] Overhauled docs introduction and added architecture section. /#1653 (fnothaft)
  • Add adamR script /#1651 (fnothaft)
  • [ADAM-1647] Fix bad JAR discovery grep in bin/pyadam. /#1648 (fnothaft)
  • [ADAM-1548] Generate reStructuredText from pandoc markdown. /#1646 (fnothaft)
  • Algorithms docs formatting /#1645 (gunjanbaid)
  • Cleaned up docs. /#1642 (gunjanbaid)
  • Making example code compatible with current ADAM build /#1641 (devin-petersohn)
  • Cleaning up formatting and spacing of docs. /#1640 (devin-petersohn)
  • added ExtractRegions /#1637 (antonkulaga)
  • [ADAM-1635] Eliminate passing FASTQ splittable status via config. /#1636 (fnothaft)
  • [ADAM-1614] Add VariantContextRDD to R and Python APIs. /#1628 (fnothaft)
  • [ADAM-1615] Add transform and transmute APIs to Java, R, and Python /#1627 (fnothaft)
  • [ADAM-1625] Use explicit types for header lines /#1626 (heuermh)
  • [ADAM-1623] Add ProcessingStep to adam-codegen. /#1624 (heuermh)
  • [ADAM-1607] Update distribution assembly task to attach assembly überjar /#1622 (fnothaft)
  • [ADAM-1490] Add asSingleFile to saveAsFastq and related. /#1621 (heuermh)
  • Update load method docs in Python and R. /#1619 (heuermh)
  • [ADAM-1616] Resolve installation directory if scripts are symlinks. /#1617 (heuermh)
  • [ADAM-1611] Extend pipe APIs to Java, Python, and R. /#1613 (fnothaft)
  • [ADAM-1610] Mark non-serializable field in TwoBitFile as transient. /#1612 (fnothaft)
  • [ADAM-1554] Support saving BGZF VCF output. /#1608 (fnothaft)
  • Adding examples of how to use joins in the real world /#1605 (devin-petersohn)
  • [ADAM-1599] Add explicit functions for updating GenomicRDD metadata. /#1600 (fnothaft)
  • [ADAM-1576] Allow translation between two different GenomicRDD types. /#1598 (fnothaft)
  • [ADAM-1444] Ignore failed push to Coveralls. /#1595 (fnothaft)
  • Testing, testing, 1... 2... 3... /#1592 (fnothaft)
  • [ADAM-1417] Removed unused Projection.apply method, add test for Filter. /#1591 (fnothaft)
  • [ADAM-1579] Add unit test coverage for BED12 format. /#1587 (fnothaft)
  • [ADAM-1585] Support additional Illumina FASTQ metadata. /#1586 (fnothaft)
  • [ADAM-1438] Add ability to save FASTA back as a single file. /#1581 (fnothaft)
  • Bump bdg-formats correctly to 0.11.1, not SNAPSHOT. /#1577 (fnothaft)
  • [ADAM-1573] Remove unused Unaligned trait. /#1574 (fnothaft)
  • Slurm deployment readme /#1571 (jpdna)
  • [ADAM-1564] Read VCF header from stream in VCFOutFormatter. /#1565 (heuermh)
  • [ADAM-1562] Index off by one for VCF genotype Number=A attributes. /#1563 (heuermh)
  • [ADAM-1533] Set Theory /#1561 (devin-petersohn)
  • Freebayes FORMAT=<ID=AO,Number=A attribute throws ArrayIndexOutOfBoundsException /#1560 (heuermh)
  • [ADAM-1551] Emit non-reference model genotype at called sites. /#1559 (fnothaft)
  • [ADAM-1449] Add loadSequenceDictionary to ADAM context. /#1557 (heuermh)
  • [ADAM-1537] Rename o.b.adam.apis.java package to o.b.adam.api.java /#1556 (heuermh)
  • [ADAM-1549] Make regions provided to filterByOverlappingRegions an Iterable. /#1550 (fnothaft)
  • [ADAM-941] Automatically generate projection enums. /#1547 (fnothaft)
  • [ADAM-1361] Fix misnamed ADAM überjar. /#1546 (fnothaft)
  • [ADAM-1257] Add program record support for alignment/fragment files. /#1545 (fnothaft)
  • [ADAM-1359] Merge reads2fragments and fragments2reads into transformFragments /#1543 (fnothaft)
  • Fix minor format mistakes (and typo) in docs /#1542 (kkaneda)
  • Add a simple unit test to SingleFastqInputFormat /#1541 (kkaneda)
  • Support locus predicate in Transform /#1540 (fnothaft)
  • [ADAM-1421] Add java API for loadReferenceFile. /#1536 (fnothaft)
  • Refactor Vcf2ADAM and ADAM2Vcf into TransformGenotypes and TransformVariants /#1532 (heuermh)
  • [ADAM-1530] Support loading GO:query (S/CR/B)AMs as fragments. /#1531 (fnothaft)
  • [ADAM-1169] Write GFF header line pragma in single file mode. /#1529 (fnothaft)
  • [ADAM-1501] Compute coverage using Dataset API. /#1528 (fnothaft)
  • [ADAM-1497] Add union to GenomicRDD. /#1526 (fnothaft)
  • [ADAM-1486] Respect validation stringency if BAM header load fails. /#1525 (fnothaft)
  • [ADAM-1499] Enable reuse of broadcasted objects in region join. /#1524 (fnothaft)
  • [ADAM-1520] Bump to bdg-formats 0.11.0. /#1523 (fnothaft)
  • Adding fragment InFormatter for Bowtie tab5 format /#1522 (heuermh)
  • [ADAM-1328] Rename Transform to TransformAlignments. /#1521 (fnothaft)
  • [ADAM-1517] Move to Parquet 1.8.2 in preparation for moving to Spark 2.2.0 /#1518 (fnothaft)
  • Fixed minor typos in README. /#1516 (gunjanbaid)
  • Making TreeRegionJoin consistent with ShuffleRegionJoin /#1515 (devin-petersohn)
  • Resolve #1508, #1509 for Pipe API /#1511 (fnothaft)
  • [ADAM-1502] Preserve contig ordering in TwoBitFile sequence dictionary. /#1508 (fnothaft)
  • [ADAM-1483] Remove collapse parameter from AlignmentRecordRDD.toCoverage /#1493 (fnothaft)
  • [ADAM-1377] Adding fragment InFormatter for Bowtie tab6 format /#1491 (heuermh)
  • [ADAM-1488] Only increment BQSR min quality by 33 once. /#1489 (fnothaft)
  • [ADAM-1481] Refactor ADAMContext loadXxx methods for consistency /#1487 (heuermh)
  • Add quality score binner /#1485 (fnothaft)
  • Clean up ReferenceRegion.scala and add thresholded overlap and covers /#1484 (devin-petersohn)
  • [ADAM-1456] Remove .{type}.adam file extension conversions in type-guessing methods. /#1482 (heuermh)
  • [ADAM-1480] Add switch to disable the fast concat method. /#1479 (fnothaft)
  • [ADAM-1476] Treat . ALT allele as symbolic non-ref. /#1477 (fnothaft)
  • Adding require for Coverage Conversion and related tests /#1472 (devin-petersohn)
  • Add cache argument to loadFeatures, additional Feature timers /#1427 (heuermh)
  • [ADAM-882] R API /#1397 (fnothaft)
  • [ADAM-1018] Add support for Spark SQL Datasets. /#1391 (fnothaft)
  • WIP Python API /#1387 (fnothaft)
  • [ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging /#1366 (fnothaft)
  • Update dependency and plugin versions /#1360 (heuermh)
  • [ADAM-1330] Move to Spark 2.1.0. /#1332 (fnothaft)
  • Efficient Joins and (re)Partitioning /#1324 (devin-petersohn)

Version 0.22.0

Closed issues:

  • Realign all reads at target site, not just reads with no mismatches #1469
  • Parallel file merger fails if the output file is smaller than the HDFS block size #1467
  • Add new realigner arguments to docs #1465
  • Recalibrate method misspelled as recalibateBaseQualities #1463
  • FASTQ may try to split GZIPed files #1459
  • Update to Hadoop-BAM 7.8.0 #1455
  • Publish Markdown and Scaladoc to the interwebs #1453
  • Make VariantContextConverter public #1451
  • Apply method in FragmentRDD is package private #1445
  • Thread pool will block inside of pipe command for streams too large to buffer #1442
  • FeatureRDD.apply() does not allow addition of other parameters with defaults in the case class #1439
  • Question : Why the number of paired sequence in adam-0.21.0 less than adam-0.19.0? #1424
  • loadCoverage missing from Java API #1420
  • Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1410
  • loadIntervalList FeatureRDD has empty SequenceDictionary #1409
  • problem using transform command #1406
  • Add coveralls #1403
  • INDEL realigner binary search conditional is flipped #1402
  • Delete adam-scripts/R #1398
  • Data missing when transfroming FASTQ to Adam #1393
  • java.io.FileNotFoundException when file exists #1385
  • Off-by-1 error in FASTQ InputFormat start positioning code #1383
  • Set the wrong value for end for symbolic alts #1381
  • RecordGroupDictionary should support isEmpty #1380
  • Add pipe API in and out formatters for Features #1374
  • Increase visibility for SupportedHeaderLines.allHeaderLines #1372
  • Bits of VariantContextConverter don't get ValidationStringencied #1371
  • Add Markdown docs for Pipe API #1368
  • Array[Consensus] not registered #1367
  • ValidationStringency in MDTagging should apply to reads on unknown references #1365
  • When doing a release, the SNAPSHOT should bump by 0.1.0, not 0.0.1 #1364
  • FromKnowns consensus generator fails if no reads overlap a consensus #1362
  • Performance tune-up in BQSR #1358
  • Increase visibility for ADAMContext.sc and/or getFs... methods #1356
  • Pipe API formatters need to be public #1354
  • Version 0.21.0: VariantContextConverter fails for 1000G VCF data #1353
  • ConsensusModel's can't really be instantiated #1352
  • Runtime conflicts in transitive versions of Guava dependency #1350
  • Transcript Effects ignored if more than 1 #1347
  • Remove "fork" tag from releases #1344
  • Refactor isSorted boolean parameters to sorted #1341
  • Loading GZipped VCF returns an empty RDD #1333
  • Follow up on error messages in build scripts #1331
  • Bump Spark 2 build to Spark 2.1.0 #1330
  • FeatureRDD instantiation tries to cache the RDD #1321
  • Load queryname sorted BAMs as Fragments #1303
  • Run Duplicate Marking on Fragments #1302
  • GenomicRDD.pipe may hang on failure error codes #1282
  • IllegalArgumentException Wrong FS for vcf_head files on HDFS #1272
  • java.io.NotSerializableException: org.bdgenomics.formats.avro.AlignmentRecord #1240
  • Investigate sorted join in dataset api #1223
  • Support looser validation stringency for loading some VCF Integer fields #1213
  • Add new feature-overlap command to demonstrate new region joins #1194
  • What should our API at the command line look like? #1178
  • Split apart partition and join in ShuffleRegionJoin #1175
  • Merging files should be multithreaded #1164
  • File _rgdict.avro does not exist #1150
  • how to collect the .adam files from Spark cluster multiple nodes and some questions about avocado #1140
  • JFYI: tiny forked adam-core "0.20.0" release #1139
  • Samtools (htslib) integration testing #1120
  • AlignmentRecordRDD does not extend GenomicRDD per javac #1092
  • Release ADAM version 0.21.0 #1088
  • Difference running markdups with and without projection #1014
  • ADAM to BAM conversion fails using relative path #1012
  • Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
  • Customize adam-main cli from configuration file #918
  • genotypeType for genotypes with multiple OtherAlt alleles? #897
  • How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
  • Ensure Java API is up-to-date with Scala API #855
  • Improve parallelism during FASTA output #842
  • Explicitly validate user args passed to transform enhancement #841
  • BroadcastRegionJoin fails with unmapped reads #821
  • Resolve Fragment vs. SingleReadBucket #789
  • Add profile for skipping test compilation/resolution #713
  • Next on empty iterator in BroadcastRegionJoin #661
  • Cleanup code smell in sort work balancing code #635
  • Remove reliance on MD tags #622
  • Provide low-impact alternative to transform -repartition for reducing partition size #594
  • Clean up Rich records #577
  • Create standardized, interpretable exceptions for error reporting #420
  • Create ADAM Benchmarking suite #120

Merged and closed pull requests:

  • [ADAM-1469] Don't filter on whether reads have mismatches during realignment #1470 (fnothaft)
  • [ADAM-1467] Skip concat call if there is only one shard. #1468 (fnothaft)
  • [ADAM-1465] Updating realigner CLI docs. #1466 (fnothaft)
  • [ADAM-1463] Rename recalibateBaseQualities method as recalibrateBaseQualities #1464 (heuermh)
  • [ADAM-1453] Add hooks to publish ADAM docs from CI flow. #1461 (fnothaft)
  • [ADAM-1459] Don't split FASTQ when compressed. #1459 (fnothaft)
  • [ADAM-1451] Make VariantContextConverter class and convert methods public #1452 (fnothaft)
  • Moving API overview from building apps doc to new source file. #1450 (heuermh)
  • [ADAM-1424] Adding test for reads dropped in 0.21.0. #1448 (heuermh)
  • [ADAM-1439] Add inferSequenceDictionary ctr to FeatureRDD. #1447 (heuermh)
  • [ADAM-1445] Make apply method for FragmentRDD public. #1446 (fnothaft)
  • [ADAM-1442] Fix thread pool deadlock in GenomicRDD.pipe #1443 (fnothaft)
  • [ADAM-1164] Add parallel file merger. #1441 (fnothaft)
  • Dependency version bump + BroadcastRegionJoin fix #1440 (fnothaft)
  • added JavaApi for loadCoverage #1437 (akmorrow13)
  • Update versions, etc. in build docs #1435 (heuermh)
  • Add test sample(verify number of reads in loadAlignments function) and ADAM SNAPSHOT document #1433 (xubo245)
  • Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
  • feat: speed up 2bit file extract #1426 (Blaok)
  • BQSR refactor for perf improvements #1423 (fnothaft)
  • Add ADAMContext/GenomicRDD/pipe docs #1422 (fnothaft)
  • INDEL realigner cleanup #1412 (fnothaft)
  • Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1411 (heuermh)
  • Add coveralls badge to README.md. #1408 (fnothaft)
  • [ADAM-1403] Push coverage reports to Coveralls. #1404 (fnothaft)
  • Added instrumentation timers around joins. #1401 (fnothaft)
  • Add Apache Spark version to --version text #1400 (heuermh)
  • [ADAM-1398] Delete adam-scripts/R. #1399 (fnothaft)
  • [ADAM-1383] Use gt instead of gteq in FASTQ input format line size checks #1396 (fnothaft)
  • Maint spark2 2.11 0.21.0 #1395 (A-Tsai)
  • [ADAM-1393] fix missing reads when transforming fastq to adam #1394 (A-Tsai)
  • [ADAM-1380] Adds isEmpty method to RecordGroupDictionary. #1392 (fnothaft)
  • [ADAM-1381] Fix Variant end position. #1389 (fnothaft)
  • Make javac see that AlignmentRecordRDD extends GenomicRDD #1386 (fnothaft)
  • Added ShuffleRegionJoin usage docs #1384 (devin-petersohn)
  • Misc. INDEL realigner bugfixes #1382 (fnothaft)
  • Add pipe API in and out formatters for Features #1378 (heuermh)
  • [ADAM-1356] Make ADAMContext.getFsAndFiles and related protected visibility #1376 (heuermh)
  • [ADAM-1372] Increase visibility for DefaultHeaderLines.allHeaderLines #1375 (heuermh)
  • [ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373 (fnothaft)
  • [ADAM-1367] Register Consensus array for serialization. #1369 (fnothaft)
  • [ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
  • [ADAM-1362] Fixing issue where FromKnowns consensus model fails if no reads hit a target. #1363 (fnothaft)
  • [ADAM-1352] Clean up consensus model usage. #1357 (fnothaft)
  • Increase visibility for InFormatter case classes from package private to public #1355 (heuermh)
  • Use htsjdk getAttributeAsList for VCF INFO ANN key #1348 (heuermh)
  • Fixes parsing variant annotations for multi-allelic rows #1346 (majkiw)
  • Sort pull requests by id #1345 (heuermh)
  • HBase genotypes backend -revised #1335 (jpdna)
  • [ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
  • Support deduping fragments #1309 (fnothaft)
  • [ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
  • Added test to try and repro #1282. #1292 (fnothaft)

Version 0.21.0

Closed issues:

  • Update Markdown docs with ValidationStringency in VCF<->ADAM CLI #1342
  • Variant VCFHeaderLine metadata does not handle wildcards properly #1339
  • Close called multiple times on VCF header stream #1337
  • BroadcastRegionJoin has serialization failures #1334
  • adam-cli uses git-commit-id-plugin which breaks release? #1322
  • move_to_xyz scripts should have interlocks... #1317
  • Lineage for partitionAndJoin in ShuffleRegionJoin causes StackOverflow Errors #1308
  • Add move_to_spark_1.sh script and update README to mention #1307
  • adam-submit transform fails with Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class #1306
  • private ADAMContext constructor? #1296
  • AlignmentRecord.mateAlignmentEnd never set #1290
  • how to submit my own driver class via adam-submit? #1289
  • ReferenceRegion on Genotype seems busted? #1286
  • Clarify strandedness in ReferenceRegion apply methods #1285
  • Parquet and CRAM debug logging during unit tests #1280
  • Add more ANN field parsing unit tests #1273
  • loadVariantAnnotations returns empty RDD #1271
  • Implement joinVariantAnnotations with region join #1259
  • Count how many chromosome in the range of the kmer #1249
  • ADAM minor release to support htsjdk 2.7.0? #1248
  • how to config kryo.registrator programmatically #1245
  • Does the nested record Flattener drop Maps/Arrays? #1244
  • Dead-ish code cleanup in org.bdgenomics.adam.utils #1242
  • java.io.FileNotFoundException for old adam file after upgrade to adam0.20 #1240
  • please add maven-source-plugin into the pom file #1239
  • Assembly jar doesn't get rebuilt on CLI changes #1238
  • how to compare with the last the column for the same chromosome name? #1237
  • Need a way for users to add VCF header lines #1233
  • Enhancements to VCF save #1232
  • Must we split multi-allelic sites in our Genotype model? #1231
  • Can't override default -collapse in reads2coverage #1228
  • Reads2coverage NPEs on unmapped reads #1227
  • Strand bias doesn't get exported #1226
  • Move ADAMFunSuite helper functions upstream to SparkFunSuite #1225
  • broadcast join using interval tree #1224
  • Instrumentation is lost in ShuffleRegionJoin #1222
  • Bump Spark, Scala, Hadoop dependency versions #1221
  • GenomicRDD shuffle region join passes partition count to partition size #1220
  • Scala compile errors downstream of Spark 2 Scala 2.11 artifacts #1218
  • Javac error: incompatible types: SparkContext cannot be converted to ADAMContext #1217
  • Release 0.20.0 artifacts failed Sonatype Nexus validation #1212
  • Release script failed for 0.20.0 release #1211
  • gVCF - can't load multi-allelic sites #1202
  • Allow open-ended intervals in loadIndexedBam #1196
  • Interval tree join in ADAM #1171
  • spark-submit throw exception in spark-standalone using .adam which transformed from .vcf #1121
  • BroadcastRegionJoin is not a broadcast join #1110
  • Improve test coverage of VariantContextConverter #1107
  • Variant dbsnp rs id tracking in vcf2adam and ADAM2Vcf #1103
  • Document core ADAM transform methods #1085
  • Document deploying ADAM on Toil #1084
  • Clean up packages #1083
  • VariantCallingAnnotations is getting populated with INFO fields #1063
  • How to load DatabaseVariantAnnotation information ? #1049
  • Release ADAM version 0.20.0 #1048
  • Support VCF annotation ANN field in vcf2adam and adam2vcf #1044
  • How to create a rich(er) VariantContext RDD? Reconstruct VCF INFO fields. #878
  • Add biologist targeted section to the README #497
  • Update usage docs running for EC2 and CDH #493
  • Add docs about building downstream apps on top of ADAM #291
  • Variant filter representation #194

Merged and closed pull requests:

Version 0.20.0

Closed issues:

  • Sorting by reference index seems doesn't work or sorted by DESC order? #1204
  • master won't compile #1200
  • VCF format tag SB field parse error in loading #1199
  • Publish sources JAR with snapshots #1195
  • Type SparkFunSuite in package org.bdgenomics.utils.misc is not available #1193
  • MDTagging fails on GRCh38 #1192
  • Fix stack overflow in IndelRealigner serialization #1190
  • Delete ./scripts/commit-pr.sh #1188
  • Hadoop globStatus returns null if no glob matches #1186
  • Swapping out IntervalRDD under GenomicRDDs #1184
  • How to get "SO coordinate" instead of "SO unsorted"? #1182
  • How to read glob of multiple parquet Genotype #1179
  • Update command line doc and examples in README.md #1176
  • FastqRecordConverter needs cleanup and tests #1172
  • TransformFormats write to .gff3 file path incorrectly writes as parquet #1168
  • Should be able to merge shards across two different file systems #1165
  • RG ID gets written as the index, not the record group name #1162
  • Users should be able to save files as -single without merging them #1161
  • Users should be able to set size of buffer used for merging files #1160
  • Bump Hadoop-BAM to 7.7.0 #1158
  • adam-shell prints command trace to stdout #1154
  • Map IntervalList format column four to feature name or attributes? #1152
  • Parquet storage of VariantContext #1151
  • vcf2adam unparsable vcf record #1149
  • Reorder kryo.register statements in ADAMKryoRegistrator #1146
  • Make region joins public again #1143
  • Support CRAM input/output #1141
  • Transform should run with spark.kryo.requireRegistration=true #1136
  • adam-shell not handling bash args correctly #1132
  • Remove Gene and related models and parsing code #1129
  • Generate Scoverage reports when running CI #1124
  • Remove PairingRDD #1122
  • SAMRecordConverter.convert takes unused arguments #1113
  • Add Pipe API #1112
  • Improve coverage in Feature unit tests #1106
  • K-mer.scala code #1105
  • add -single file output option to ADAM2Vcf #1102
  • adam2vcf Fails with Sample not serializable #1100
  • ReferenceRegion.apply(AlignmentRecord) should not NPE on unmapped reads #1099
  • Add outer region join implementations #1098
  • VariantContextConverter never returns DatabaseVariantAnnotation #1097
  • loadvcf: conflicting require statement #1094
  • ADAM version 0.19.0 will not run on Spark version 2.0.0 #1093
  • Be more rigorous with FileSystem.get #1087
  • Remove network-connected and default test-related Maven profiles #1073
  • Releases should get pushed to Spark Packages #1067
  • Invalid POM for cli on 0.19.0 #1066
  • scala.MatchError RegExp does not catch colons in value part properly #1061
  • Support writing IntervalList header for features #1059
  • Add -single support when writing features in native formats #1058
  • Remove workaround for gzip/BGZF compressed VCF headers #1057
  • Clean up if clauses in Transform #1053
  • Adam-0.18.2 can not load Adam-0.14.0 adamSave function data (sam) #1050
  • filterByOverlappingRegion Incorrect for Genotypes #1042
  • Move Interval trait to utils, added in #75 #1041
  • Remove implicit GenomicRDD to RDD conversion #1040
  • VCF sample metadata - proposal for a GenotypedSampleMetadata object #1039
  • [build system] ADAM test builds pollute /tmp, leaving lots of cruft... #1038
  • adamMarkDuplicates function in AlignmentRecordRDDFunctions class can not mark the same read? #1037
  • test MarkDuplicatesSuite with two similar read in ref and start position and different avgPhredScore, error! #1035
  • Explore protocol buffers vs Avro #1031
  • Increase Avro dependency version to 1.8.0 #1029
  • ADAM specific logging #1024
  • Reenable Travis CI for pull request builds #1023
  • Bump Apache Spark version to 1.6.1 in Jenkins #1022
  • ADAM compatibility with Spark 2.0 #1021
  • ADAM to BAM conversion failing on 1000G file #1013
  • Factor out *RDDFunctions classes #1011
  • Port single file BAM and header code to VCF #1009
  • Roll Jenkins JDK 8 changes into ./scripts/jenkins-test #1008
  • Support GFF3 format #1007
  • Separate fat jar build from adam-cli to new maven module #1006
  • adam-cli POM invalid: maven.build.timestamp #1004
  • Sub-partitioning of Parquet file for ADAM #1003
  • Flattening the Genotype schema #1002
  • install adam 0.19 error! #1001
  • How to solve it please? #1000
  • Has the project realized alignment reads to reference genome algorithm? #996
  • All file-based input methods should support running on directories, compressed files, and wildcards #993
  • Contig to ContigName Change not reflected in AlignmentRecordField #991
  • Add homebrew guidelines to release checklist or automate PR generation #987
  • fix deprecation warnings #985
  • rename fragments package #984
  • Explore if SeqDict data can be factored out more aggressively #983
  • Make "Adam" all caps in filename Adam2Fastq.scala #981
  • Adam2Fastq should output reverse complement when 0x10 flag is set for read #980
  • Allow lowercase letters in jar/version names #974
  • Add stringency parameter to flagstat #973
  • Arg-array parsing problem in adam-submit #971
  • Pass recordGroup parameter to loadPairedFastq #969
  • Send a number of partitions to sc.textFile calls #968
  • adamGetReferenceString doesn't reduce pairs correctly #967
  • Update ADAM formula in homebrew-science to version 0.19.0 #963
  • BAM output in ADAM appears to be corrupt #962
  • Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #959
  • Issue with version 18.0.2 #957
  • Expose sorting by reference index #952
  • .rgdict and .seqdict files are not placed in the adam directory #945
  • Why does count_kmers not return k-mers that are split between two records? #930
  • Load legacy file formats to Spark SQL Dataframes #912
  • Clean up RDD method names #910
  • Load/store sequence dictionaries alongside Genotype RDDs #909
  • vcf2adam -print_metrics throws IllegalStateException on Spark 1.5.2 or later #902
  • error: no reads in first split: bad BAM file or tiny split size? #896
  • FastaConverter.FastaDescriptionLine not kryo-registered #893
  • Work With ADAM fasta2adam in a distributed mode #881
  • vcf2adam -> Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; #871
  • Code coverage profile is broken #849
  • Building Adam on OS X 10.10.5 with Java 1.8 #835
  • Normalize AlignmentRecord.recordGroup* fields onto a separate record type #828
  • Gracefully handle missing Spark- and Hadoop-versions in jenkins-test; document how to set them. #827
  • Use Adam File with Hive #820
  • How do we handle reads that don't have original quality scores when converting to FASTQ with original qualities? #818
  • SAMFileHeader "sort order" attribute being un-set during file-save job #800
  • Use same sort order as Samtools #796
  • RNAME and RNEXT fields jumbled on transform BAM->ADAM->BAM #795
  • Support loading multiple indexed read files #787
  • Duplicate OUTPUT command line argument metaVar in adam2fastq #776
  • Allow Variant to ReferenceRegion conversion #768
  • Spark Errors References Deprecated SPARK_CLASSPATH #767
  • Spark Errors References Deprecated SPARK_CLASSPATH #766
  • adam2vcf fails with -coalesce #735
  • Writing to a BAM file with adamSAMSave consistently fails #721
  • BQSR on C835.HCC1143_BL.4 uses excessive amount of driver memory #714
  • Support writing RDD[Feature] to various file formats #710
  • adamParquetSave has a menacing false error message about *.adam extension #681
  • BAMHeader not set when running on a cluster #676
  • spark 1.3.1 upgarde to hortonworks HDP 2.2.4.2-2? #675
  • Symbol case class is nucleotide-centric #672
  • xAssembler cannot be build using mvn #658
  • adam-submit VerifyError #642
  • vcf2adam : Unsupported type ENUM #638
  • Update CDH documentation #615
  • Remove and generalize plugin code #602
  • Fix record oriented shuffle #599
  • Migrate preprocessing stages out of ADAM #598
  • Publish/socialize a roadmap #591
  • Eliminate format detection and extension checks for loading data #587
  • Improve error message when we can't find a ReferenceRegion for a contig #582
  • Do reference partitioners restrict a partition to contain keys from a single contig? #573
  • Connection refused errors when transforming BAM file with BQSR #516
  • ReferenceRegion shouldn't extend Ordered #511
  • Documentation for common usecases #491
  • Improve handling of "*" sequences during BQSR #484
  • Original qualities are parsed out, but left in attribute fields #483
  • Need a FileLocator that mirrors the use of Path in HDFS #477
  • FileLocator should support finding "child" locators. #476
  • Add S3 based Parquet directory loader #463
  • Should FASTQ output use reads' "original qualities"? #436
  • VcfStringUtils unused? #428
  • We should be able to filter genotypes that overlap a region #422
  • Create a simplified vocabulary for naming projections. #419
  • Update documentation #406
  • Bake off different region join implementations #395
  • Handle no-ops more intelligently when creating MD tags #392
  • Remove all the commands in the "CONVERSION OPERATIONS" CommandGroup #373
  • Fail to Write RDD into HDFS with Parquet Format #344
  • Refactor ReferencePositionWithOrientation #317
  • Add docs about SPARK_LOCAL_IP #305
  • PartitionAndJoin should throw an exception if it sees an unmapped read #297
  • Add insert size calculation #296
  • Newbie questions - learning resources? Reading a range of records from Adam? #281
  • Add variant effect ontology #261
  • Don't flatten optional SAM tags into a string #240
  • Characterize impact of partition size on pileup creation #163
  • Need to support BCF output format #153
  • Allow list of commands to be injected into adam-cli AdamMain #132
  • Parse out common annotations stored in VCF format #118
  • Update normalization code to enable normalization of sequences with more than two indels #64
  • Add clipping heuristic to indel realigner #63
  • BQSR should support recalibration across multiple ADAM files #58

Merged and closed pull requests:

  • fix SB tag parsing #1209 (fnothaft)
  • Fastq record converter #1208 (fnothaft)
  • Doc suggested partitionSize in ShuffleRegionJoin #1207 (jpdna)
  • Test demonstrating region join failure #1206 (jpdna)
  • fix SB tag parsing #1203 (jpdna)
  • fix build #1201 (ryan-williams)
  • [ADAM-1192] Correctly handle other whitespace in FASTA description. #1198 (fnothaft)
  • [ADAM-1190] Manually (un)pack IndelRealignmentTarget set. #1191 (fnothaft)
  • [ADAM-1188] Delete scripts/commit-pr.sh #1189 (fnothaft)
  • [ADAM-1186] Mask null from fs.globStatus. #1187 (fnothaft)
  • Fastq record converter #1185 (zyxue)
  • [ADAM-1182] isSorted=true should write SO:coordinate in SAM/BAM/CRAM header. #1183 (fnothaft)
  • Add scoverage aggregator and fail on low coverage. #1181 (fnothaft)
  • [ADAM-1179] Improve error message when globbing a parquet file fails. #1180 (fnothaft)
  • [ADAM-1176] Update command line doc and examples in README.md #1177 (heuermh)
  • Refactor CLIs for merging sharded files #1167 (fnothaft)
  • Update Hadoop-BAM to version 7.7.0 #1166 (heuermh)
  • [ADAM-1162] Write record group string name. #1163 (fnothaft)
  • Map IntervalList format column four to feature name #1159 (heuermh)
  • Make AlignmentRecordConverter public so that it can be used from other projects #1157 (tomwhite)
  • added predicate option to loadCoverage #1156 (akmorrow13)
  • [ADAM-1154] Change set -x to set -e in ./bin/adam-shell. #1155 (fnothaft)
  • Remove Gene and related models and parsing code #1153 (heuermh)
  • Reorder kryo.register statements in ADAMKryoRegistrator #1148 (heuermh)
  • Updated GenomicPartitioners to accept additional key. #1147 (akmorrow13)
  • [ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145 (fnothaft)
  • misc pom/test/resource improvements #1142 (ryan-williams)
  • [ADAM-1136] Transform runs successfully with kryo registration required #1138 (fnothaft)
  • [ADAM-1132] Fix improper quoting of bash args in adam-shell. #1133 (fnothaft)
  • Remove StructuralVariant and StructuralVariantType, add names field to Variant #1131 (heuermh)
  • Remove StructuralVariant and StructuralVariantType, add names field to Variant #1130 (heuermh)
  • PR #1108 with issue #1122 #1128 (fnothaft)
  • [ADAM-1038] Eliminate writing to /tmp during CI builds. #1127 (fnothaft)
  • Update for bdg-formats code style changes #1126 (heuermh)
  • [ADAM-1124] Add Scoverage and generate coverage reports in Jenkins. #1125 (fnothaft)
  • [ADAM-1093] Move to support Spark 2.0.0. #1123 (fnothaft)
  • remove duplicated dependency #1119 (ryan-williams)
  • Clean up ADAMContext #1118 (fnothaft)
  • [ADAM-993] Support loading files using globs and from directory paths. #1117 (fnothaft)
  • [ADAM-1087] Migrate away from FileSystem.get #1116 (fnothaft)
  • [ADAM-1099] Make reference region not throw NPE. #1115 (fnothaft)
  • Add pipes API #1114 (fnothaft)
  • [ADAM-1105] Use assembly jar in adam-shell. #1111 (fnothaft)
  • Add outer joins #1109 (fnothaft)
  • Modified CalculateDepth to calcuate coverage from alignment files #1108 (akmorrow13)
  • Resolves various single file save/header issues #1104 (fnothaft)
  • [ADAM-1100] Resolve Sample Not Serializable exception #1101 (fnothaft)
  • added loadIndexedVcf and loadIndexedBam for multiple ReferenceRegions #1096 (akmorrow13)
  • Added support for Indexed VCF files #1095 (akmorrow13)
  • [ADAM-582] Eliminate .get on option in FragmentCoverter. #1091 (fnothaft)
  • [ADAM-776] Rename duplicate OUTPUT metaVar in ADAM2Fastq. #1090 (fnothaft)
  • refactored ReferenceFile to require SequenceDictionary #1086 (akmorrow13)
  • [ADAM-1073] Remove network-connected and default test-related Maven profiles #1082 (heuermh)
  • [ADAM-1053] Clean up Transform #1081 (fnothaft)
  • [ADAM-1061] Clean up attributes regex and denormalized fields #1080 (fnothaft)
  • Extended TwoBitFile and NucleotideContigFragmentRDDFunctions to behave more similar #1079 (akmorrow13)
  • Refactor variant and genotype annotations #1078 (heuermh)
  • [ADAM-1039] Add basic support for Sample record. #1077 (fnothaft)
  • Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #1076 (heuermh)
  • [ADAM-194] Use separate filtersFailed and filtersPassed arrays for variant quality filters #1075 (heuermh)
  • Whitespace code style fixes #1074 (heuermh)
  • [ADAM-1006] Split überjar out to adam-assembly submodule. #1072 (fnothaft)
  • Remove code coverage profile #1071 (heuermh)
  • [ADAM-768] ReferenceRegion from variant/genotypes #1070 (fnothaft)
  • [ADAM-1044] Support VCF annotation ANN field #1069 (heuermh)
  • [ADAM-1067] Add release documentation and scripting for Spark Packages. #1068 (fnothaft)
  • [ADAM-602] Remove plugin code. #1065 (fnothaft)
  • Refactoring org.bdgenomics.adam.io package. #1064 (fnothaft)
  • Cleanup in org.bdgenomics.adam.converters package. #1062 (fnothaft)
  • [ADAM-1057] Remove workaround for gzip/BGZF compressed VCF headers #1057 (heuermh)
  • Cleanup on org.bdgenomics.adam.algorithms.smithwaterman package. #1056 (fnothaft)
  • Documentation cleanup and minor refactor on the consensus package. #1055 (fnothaft)
  • Add KEYS with public code signing keys #1054 (heuermh)
  • Adding GA4GH 0.5.1 converter for reads. #1052 (fnothaft)
  • [ADAM-1011] Refactor to add GenomicRDDs for all Avro types #1051 (fnothaft)
  • removed interval trait and redirected to interval in utils-intervalrdd #1046 (akmorrow13)
  • [ADAM-952] Expose sorting by reference index. #1045 (fnothaft)
  • overlap query reflects new formats #1043 (erictu)
  • Changed loadIndexedBam to use hadoop-bam InputFormat #1036 (fnothaft)
  • Increase Avro dependency version to 1.8.0 #1034 (heuermh)
  • Improved README fix using feedback from other approach review. #1034 (InvisibleTech)
  • Error in the README.md for kmer.scala example, need to get rdd first. #1032 (InvisibleTech)
  • Add fragmentEndPosition to NucleotideContigFragment #1030 (heuermh)
  • Logging to be done by ADAM utils code rather than Spark #1028 (jpdna)
  • add maxScore #1027 (xubo245)
  • [ADAM-1008] Modify jenkins-test script to support Java 8 build. #1026 (fnothaft)
  • whitespace change, do not merge #1025 (shaneknapp)
  • require kryo registration in tests #1020 (ryan-williams)
  • print full stack traces on test failures #1019 (ryan-williams)
  • bump commons-io version #1017 (ryan-williams)
  • exclude javadoc jar in adam-shell #1016 (ryan-williams)
  • [ADAM-909] Refactoring variation RDDs. #1015 (fnothaft)
  • Modified CalculateDepth to get coverage on whole alignment adam files #1010 (akmorrow13)
  • [ADAM-1004] Remove recursive maven.build.timestamp declaration #1005 (heuermh)
  • Maint 2.11 0.19.0 #999 (tushu1232)
  • [ADAM-710] Add saveAs methods for feature formats GTF, BED, IntervalList, and NarrowPeak #998 (heuermh)
  • Moving Adam2Fastq to ADAM2Fastq #995 (heuermh)
  • Update release doc for CHANGES.md and homebrew #994 (heuermh)
  • Update to AlignmentRecordField and its usages as contig changed to co… #992 (jpdna)
  • [ADAM-974] Short term fix for multiple ADAM cli assembly jars check #990 (heuermh)
  • Update hadoop-bam dependency version to 7.5.0 #989 (heuermh)
  • Replaced Contig with ContigName in AlignmentRecord and related changes #988 (jpdna)
  • fix some deprecation/style things and rename a pkg #986 (ryan-williams)
  • Fix Adam2fastq in case of read with both reverse and unmapped flags #982 (jpdna)
  • [ADAM-510] Refactoring RDD function names #979 (heuermh)
  • Use .adam/_{seq,rg}dict.avro paths for Avro-formatted dictionaries #978 (heuermh)
  • Remove unused file VcfHeaderUtils.scala #977 (heuermh)
  • add validation stringency to bam parsing, flagstat #976 (ryan-williams)
  • more permissible jar regex in adam-submit #975 (ryan-williams)
  • fix bash arg array processing in adam-submit #972 (ryan-williams)
  • adamGetReferenceString reduces pairs correctly, fixes #967 #970 (erictu)
  • A few improvements #966 (ryan-williams)
  • improve SW performance by replacing functional reductions with imperative ones #965 (noamBarkai)
  • [ADAM-962] Fix corrupt single-file BAM output. #964 (fnothaft)
  • [ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
  • [ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
  • Use hadoop-bam BAMInputFormat to do loadIndexedBam #953 (andrewmchen)
  • Add -print_metrics option to Jenkins build #947 (heuermh)
  • adam2vcf doesn't have info fields #939 (andrewmchen)
  • [ADAM-893] Register missing serializers. #933 (fnothaft)

Version 0.19.0

Closed issues:

  • Update bdg-utils dependency version to 0.2.4 #960
  • Drop support for Spark version 1.2.1, Hadoop version 1.0.x #958
  • Exception occurs when running tests on master #956
  • Flagstat results still don't match samtools flagstat #946
  • readInFragment value is not properly read from parquet file into RDD[AlignmentRecord] #942
  • adam2vcf -sort_on_save flag broken #940
  • Transform -limit_projection requires .sam.seqdict file #937
  • MarkDuplicates fails if library name is not set #934
  • fastqtobam or sam #928
  • Vcf2Adam uses SB field instead of FS field for fisher exact test for strand bias #923
  • Add back limit_projection on Transform #920
  • BAM header is not getting set on partition 0 with headerless BAM output format #916
  • Add numParts apply method to GenomicRegionPartitioner #914
  • Add Spark version 1.6.x to Jenkins build matrix #913
  • Target Spark 1.5.2 as default Spark version #911
  • Move to bdg-formats 0.7.0 #905
  • secondOfPair and firstOfPair flag is missing in the newest 0.18 adam transformed results from BAM #903
  • Future pull request #900
  • error in vcf2adam #899
  • Importing directory of VCFs seems to fail #898
  • How to filter genotypeRDD on sample names? org.apache.spark.SparkException: Task not serializable? #891
  • Add Spark version 1.5.x to Jenkins build matrix #889
  • Transform DAG causes stages to recompute #883
  • adam-submit buildinfo is confused #880
  • move_to_scala_2.11 and maven-javadoc-plugin #863
  • NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable #837
  • Fix record oriented shuffle #599
  • Avro.GenericData error with ADAM 0.12.0 on reading from ADAM file #290

Merged and closed pull requests:

  • [ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
  • [ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
  • Fix for travis build, replace reads2ref with reads2fragments #950 (heuermh)
  • [ADAM-940] Fix adam2vcf -sort_on_save flag #949 (massie)
  • Remove BuildInformation and extraneous git-commit-id-plugin configuration #948 (heuermh)
  • Update readme for spark 1.5.2 and hadoop 2.6.0 #944 (heuermh)
  • [ADAM-942] Replace first/secondInRead with readInFragment #943 (heuermh)
  • [ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path #938 (heuermh)
  • [ADAM-934] Properly handle unset library name during duplicate marking #935 (fnothaft)
  • [ADAM-911] Move to Spark 1.5.2 and Hadoop 2.6.0 as default versions. #932 (fnothaft)
  • added start and end values to Interval Trait. Used for IntervalRDD #931 (akmorrow13)
  • Removing buildinfo command #929 (heuermh)
  • Removing symbolic test resource links, read from test classpath instead #927 (heuermh)
  • Changed fisher strand bias field for VCF2Adam from SB to FS #924 (andrewmchen)
  • [ADAM-920] Limit tag/orig qual flags in Transform. #921 (fnothaft)
  • Change the README to use adam-shell -i instead of pasting #919 (andrewmchen)
  • [ADAM-916] New strategy for writing header. #917 (fnothaft)
  • [ADAM-914] Create a GenomicRegionPartitioner given a partition count. #915 (fnothaft)
  • Squashed #907 and ran format-sources #908 (fnothaft)
  • Various small fixes #907 (huitseeker)
  • ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata #906 (fnothaft)
  • Rewrote the getType method to handle all ploidy levels #904 (NeillGibson)
  • Single file save from #733, rebased #901 (fnothaft)
  • Added is* genotype methods from HTS-JDK Genotype to RichGenotype #895 (NeillGibson)
  • [ADAM-891] Mark SparkContext as @transient. #894 (fnothaft)
  • Update README URLs based on HTTP redirects #892 (ReadmeCritic)
  • adding --version command line option #888 (heuermh)
  • Add exception in move_to_scala_2.11.sh for maven-javadoc-plugin #887 (heuermh)
  • Fix tightlist bug in Pandoc #885 (massie)
  • [ADAM-883] Add caching to Transform pipeline. #884 (fnothaft)

Version 0.18.2

  • ISSUE 877: Minor fix to commit script to support https.
  • ISSUE 876: Separate command line argument words by underscores
  • ISSUE 875: P Operator parsing for MDTag
  • ISSUE 873: [ADAM-872] Modify regex to capture release and SNAPSHOT jars but not javadoc or sources jars
  • ISSUE 866: [ADAM-864] Don't force shuffle if reducing partition count.
  • ISSUE 856: export valid fastq
  • ISSUE 847: Updating build dependency versions to latest minor versions

Version 0.18.1

  • ISSUE 870: [ADAM-867] add pull requests missing from 0.18.0 release to CHANGES.md
  • ISSUE 869: [ADAM-868] make release branch and tag names consistent
  • ISSUE 862: [ADAM-861] use -d to check for repo assembly dir

Version 0.18.0

  • ISSUE 860: New release and pr-commit scripts
  • ISSUE 859: [ADAM-857] Corrected handling of env vars in bin scripts
  • ISSUE 854: [ADAM-853] allow main class in adam-submit to be specified
  • ISSUE 852: [ADAM-851] Slienced Parquet logging.
  • ISSUE 850: [ADAM-848] TwoBitFile now support nBlocks and maskBlocks
  • ISSUE 846: Updating maven build plugin dependency versions
  • ISSUE 845: [ADAM-780] Make DecadentRead package private.
  • ISSUE 844: [ADAM-843] Aggressively project out metadata fields.
  • ISSUE 840: fix flagstat output file encoding
  • ISSUE 839: let flagstat write to file
  • ISSUE 831: Support loading paired fastqs
  • ISSUE 830: better validation when saving paired fastqs
  • ISSUE 829: fix Long != null warnings
  • ISSUE 819: Implement custom ReferenceRegion hashcode
  • ISSUE 816: [ADAM-793] adding command to convert ADAM nucleotide contig fragments to FASTA files
  • ISSUE 815: Upgrade to bdg-formats:0.6.0, add Fragment datatype converters
  • ISSUE 814: [ADAM-812] fix for javadoc errors on JDK8
  • ISSUE 813: [ADAM-808] build an assembly cli jar with maven shade plugin
  • ISSUE 810: [ADAM-807] workaround for git-commit-id/git-commit-id-maven-plugin#61
  • ISSUE 809: [ADAM-785] Add support for all numeric array (TYPE=B) tags
  • ISSUE 806: [ADAM-755] updating utils dependency version to 0.2.3
  • ISSUE 805: Better transform error when file doesn't exist
  • ISSUE 803: fix unmapped-read sorting
  • ISSUE 802: stop writing contig names as md5 sums
  • ISSUE 798: fix SAM-attr conversion bug; int[]'s not byte[]'s
  • ISSUE 790: optionally add MDTags to reads with transform
  • ISSUE 782: Fix SAM Attribute parser for numeric array tags
  • ISSUE 773: [ADAM-772] fix some bash var quoting
  • ISSUE 765: [ADAM-752] Build for many combos of Spark/Hadoop versions.
  • ISSUE 764: More involved README restructuring
  • ISSUE 762: [ADAM-132] allowing list of commands to be injected into adam-cli ADAMMain

Version 0.17.1

  • ISSUE 784: [ADAM-783] Write @SQ header lines in sorted order.
  • ISSUE 792: [ADAM-791] Add repartition parameter to Fasta2ADAM.
  • ISSUE 781: [ADAM-777] Add validation stringency flag for BQSR.
  • ISSUE 757: We should print a warning message if the user has ADAM_OPTS set.
  • ISSUE 770: [ADAM-769] Fix serialization issue in known indel consensus model.
  • ISSUE 763: Clean up README links, other nits
  • ISSUE 749: Remove adam-cli jar from classpath during adam-submit
  • ISSUE 754: Bump ADAM to Spark 1.4
  • ISSUE 753: Bump Spark to 1.4
  • ISSUE 748: Fix for mdtag issues with insertions
  • ISSUE 746: Upgrade to Parquet 1.8.1.
  • ISSUE 744: [ADAM-743] exclude conflicting jackson dependencies
  • ISSUE 737: Reverse complement negative strand reads in fastq output
  • ISSUE 731: Fixed bug preventing use of TLEN attribute
  • ISSUE 730: [ADAM-729] Stuff TLEN into attributes.
  • ISSUE 728: [ADAM-709] Remove FeatureHierarchy and FeatureHierarchySuite
  • ISSUE 719: [ADAM-718] Use filesystem path to get underlying file system.
  • ISSUE 712: unify header-setting between BAM/SAM and VCF
  • ISSUE 696: include SequenceRecords from second-in-pair reads
  • ISSUE 698: class-ify ShuffleRegionJoin, force setting seqdict
  • ISSUE 706: restore clause guarding pruneCache check
  • ISSUE 705: GeneFeatureRDDFunctions → FeatureRDDFunctions

Version 0.17.0

  • ISSUE 691: fix BAM/SAM header setting when writing on cluster
  • ISSUE 688: make adamLoad public
  • ISSUE 694: Fix parent reference in distribution module
  • ISSUE 684: a few region-join nits
  • ISSUE 682: [ADAM-681] Remove menacing error message about reqd .adam extension
  • ISSUE 680: [ADAM-674] Delete Bam2ADAM.
  • ISSUE 678: upgrade to bdg utils 0.2.1
  • ISSUE 668: [ADAM-597] Move correction out of ADAM and into a downstream project.
  • ISSUE 671: Bug fix in ReferenceUtils.unionReferenceSet
  • ISSUE 667: [ADAM-666] Clean up key not found error in partitioner code.
  • ISSUE 656: Update Vcf2ADAM.scala
  • ISSUE 652: added filterByOverlappingRegion in GeneFeatureRDDFunctions
  • ISSUE 650: [ADAM-649] Support transform of all BAM/SAM files in a directory.
  • ISSUE 647: [ADAM-646] Special case reads with '*' quality during BQSR.
  • ISSUE 645: [ADAM-634] Create a local ParquetLister for testing purposes.
  • ISSUE 633: [Adam] Tests for SAMRecordConverter.scala
  • ISSUE 641: [ADAM-640] Fix incorrect exclusion for org.seqdoop.htsjdk.
  • ISSUE 632: [ADAM-631] Allow VCF conversion to sort on output after coalescing.
  • ISSUE 628: [ADAM-627] Makes ReferenceFile trait extend Serializable.
  • ISSUE 637: check for mac brew alternate spark install structure
  • ISSUE 624: Conceptual fix for duplicate marking and sorting stragglers
  • ISSUE 629: [ADAM-604] Remove normalization code.
  • ISSUE 630: Add flatten command.
  • ISSUE 619: [ADAM-540] Move to new HTSJDK release; should support Java 8.
  • ISSUE 626: [ADAM-625] Enable globbing for BAM.
  • ISSUE 621: Removes the predicates package.
  • ISSUE 620: [ADAM-600] Adding RegionJoin trait.
  • ISSUE 616: [ADAM-565] Upgrade to Parquet filter2 API.
  • ISSUE 613: [ADAM-612] Point to proper k-mer counters.
  • ISSUE 588: [ADAM-587] Clean up loading checks.
  • ISSUE 592: [ADAM-513] Remove ReferenceMappable trait.
  • ISSUE 606: [ADAM-605] Remove visualization code.
  • ISSUE 596: [ADAM-595] Delete the 'comparisons' code.
  • ISSUE 590: [ADAM-589] Removed pileup code.
  • ISSUE 586: [ADAM-452] Fixes SM attribute on ADAM to BAM conversion.
  • ISSUE 584: [ADAM-583] Add k-mer counting functionality for nucleotide contig fragments

Version 0.16.0

  • ISSUE 570: A few small conversion fixes
  • ISSUE 579: [ADAM-578] Update end of read when trimming.
  • ISSUE 564: [ADAM-563] Add warning message when saving Parquet files with incorrect extension
  • ISSUE 576: Changed hashCode implementations to improve performance of BQSR
  • ISSUE 569: Typo in the narrowPeak parser
  • ISSUE 568: Moved the Timers object from bdg-utils back to ADAM
  • ISSUE 478: Move non-genomics code
  • ISSUE 550: [ADAM-549] Added documentation for testing and CI for ADAM.
  • ISSUE 555: Makes maybeLoadVCF private.
  • ISSUE 558: Makes Features2ADAMSuite use SparkFunSuite
  • ISSUE 557: Randomize ports and turn off Spark UI to reduce bind exceptions in tests
  • ISSUE 552: Create test suite for FlagStat
  • ISSUE 554: privatize ADAMContext.maybeLoad{Bam,Fastq}
  • ISSUE 551: [ADAM-386] Multiline FASTQ input
  • ISSUE 542: Variants Visualization
  • ISSUE 545: [ADAM-543][ADAM-544] Fix issues with ADAM scripts and classpath
  • ISSUE 535: [ADAM-441] put a check in for Nothing. Throws an IAE if no return type is provided
  • ISSUE 546: [ADAM-532] Fix wigFix intermittent test failure
  • ISSUE 534: [ADAM-528][ADAM-533] Adds new RegionJoin impl that is shuffle-based
  • ISSUE 531: [ADAM-529] Attaching scaladoc to released distribution.
  • ISSUE 413: [ADAM-409][ADAM-520] Added local wigfix2bed tool
  • ISSUE 527: [ADAM-526] VcfAnnotation2ADAM only counts once
  • ISSUE 523: don't open non-.adam-extension files as ADAM files
  • ISSUE 521: quieting wget output
  • ISSUE 482: [ADAM-462] Coverage region calculation
  • ISSUE 515: [ADAM-510] fix for bash syntax error; add ADDL_JARS check to adam-submit

Version 0.15.0

  • ISSUE 509: Add a 'distribution' module to create assemblies
  • ISSUE 508: Upgrade from Parquet 1.4.3 to 1.6.0rc4
  • ISSUE 498: [ADAM-496] Changes VCF to flat ADAM command name and usage
  • ISSUE 500: [ADAM-495] Require SPARK_HOME for adam-submit
  • ISSUE 501: [ADAM-499] Add -onlyvariants option to vcf2adam
  • ISSUE 507: [ADAM-505] Removed adam-local from docs
  • ISSUE 504: [ADAM-502] Add missing Long implicit to ColumnReaderInput
  • ISSUE 503: [ADAM-473] Make RecordCondition and FieldCondition public
  • ISSUE 494: Fix foreach block for vcf ingest
  • ISSUE 492: Documentation cleanup and style improvements
  • ISSUE 481: [ADAM-480] Switch assembly to single goal.
  • ISSUE 487: [ADAM-486] Add port option to viz command.
  • ISSUE 469: [ADAM-461] Fix ReferenceRegion and ReferencePosition impl
  • ISSUE 440: [ADAM-439] Fix ADAM to account for BDG-FORMATS-35: Avro uses Strings
  • ISSUE 470: added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions
  • ISSUE 468: refactor RDD loading; explicitly load alignments
  • ISSUE 474: Consolidate documentation into a single location in source.
  • ISSUE 471: Fixed typo on MAVEN_OPTS quotation mark
  • ISSUE 467: [ADAM-436] Optionally output original qualities to fastq
  • ISSUE 451: add adam view command, analogous to samtools view
  • ISSUE 466: working examples on .sam included in repo
  • ISSUE 458: Remove unused val from Reads2Ref
  • ISSUE 438: Add ability to save paired-FASTQ files
  • ISSUE 457: A few random Predicate-related cleanups
  • ISSUE 459: a few tweaks to scripts/jenkins-test
  • ISSUE 460: Project only the sequence when kmer/qmer counting
  • ISSUE 450: Refactor some file writing and reading logic
  • ISSUE 455: [ADAM-454] Add serializers for Avro objects which don't have serializers
  • ISSUE 447: Update the contribution guidelines
  • ISSUE 453: Better null handling for isSameContig utility
  • ISSUE 417: Stores original position and original cigar during realignment.
  • ISSUE 449: read “OQ” attr from structured SAMRecord field
  • ISSUE 446: Revert "[ADAM-237] Migrate to Chill serialization libraries."
  • ISSUE 437: random nits
  • ISSUE 434: Few transform tweaks
  • ISSUE 435: [ADAM-403] Remove seqDict from RegionJoin
  • ISSUE 431: A few tweaks, typo corrections, and random cleanups
  • ISSUE 430: [ADAM-429] adam-submit now handles args correctly.
  • ISSUE 427: Fixes for indel realigner issues
  • ISSUE 418: [ADAM-416] Removing 'ADAM' prefix
  • ISSUE 404: [ADAM-327] Adding gene, transcript, and exon models.
  • ISSUE 414: Fix error in adam-local alias
  • ISSUE 415: Update README.md to reflect Spark 1.1
  • ISSUE 412: [ADAM-411] Updated usage aliases in README. Fixes #411.
  • ISSUE 408: [ADAM-405] Add FASTQ output.
  • ISSUE 385: [ADAM-384] Adds import from FASTQ.
  • ISSUE 400: [ADAM-399] Fix link to schemas.
  • ISSUE 396: [ADAM-388] Sets Kryo serialization with --conf args
  • ISSUE 394: [ADAM-393] Adds knobs to SparkContext creation in SparkFunSuite
  • ISSUE 391: [ADAM-237] Migrate to Chill serialization libraries.
  • ISSUE 380: Rewrite of MarkDuplicates which seems to improve performance
  • ISSUE 387: fix some deprecation warnings

Version 0.14.0

  • ISSUE 376: [ADAM-375] Upgrade to Hadoop-BAM 7.0.0.
  • ISSUE 378: [ADAM-360] Upgrade to Spark 1.1.0.
  • ISSUE 379: Fix the position of the jar path in the submit.
  • ISSUE 383: Make Mdtags handle '=' and 'X' cigar operators
  • ISSUE 369: [ADAM-369] Improve debug output for indel realigner
  • ISSUE 377: [ADAM-377] Update to Jenkins scripts and README.
  • ISSUE 374: [ADAM-372][ADAM-371][ADAM-365] Refactoring CLI to simplify and integrate with Spark model better
  • ISSUE 370: [ADAM-367] Updated alias in README.md
  • ISSUE 368: erasure, nonexhaustive-match, deprecation warnings
  • ISSUE 354: [ADAM-353] Fixing issue with SAM/BAM/VCF header attachment when running distributed
  • ISSUE 357: [ADAM-357] Added Java Plugin hook for ADAM.
  • ISSUE 352: Fix failing MD tag
  • ISSUE 363: Adding maven assembly plugin configuration to create tarballs
  • ISSUE 364: [ADAM-364] Fixing remaining cs.berkeley.edu URLs.
  • ISSUE 362: Remove mention of uberjar from README

Version 0.13.0

  • ISSUE 343: Allow retrying on failure for HTTPRangedByteAccess
  • ISSUE 349: Fix for a NullPointerException when hostname is null in Task Metrics
  • ISSUE 347: Bug fix for genome browser
  • ISSUE 346: Genome visualization
  • ISSUE 342: [ADAM-309] Update to bdg-formats 0.2.0
  • ISSUE 333: [ADAM-332] Upgrades ADAM to Spark 1.0.1.
  • ISSUE 341: [ADAM-340] Adding the TrackedLayout trait and implementation.
  • ISSUE 337: [ADAM-335] Updated README.md to reflect migration to appassembler.
  • ISSUE 311: Adding several simple normalizations.
  • ISSUE 330: Make mismatch and deletes positions accessible
  • ISSUE 334: Moving code coverage into a profile
  • ISSUE 329: Add count of mismatches to mdtag
  • ISSUE 328: [ADAM-326] Adding a 5-second retry on the HttpRangedByteAccess test.
  • ISSUE 325: Adding documentation for commit/issue nomenclature and rebasing

Version 0.12.1

  • ISSUE 308: Fixing the 'index 0' bug in features2adam
  • ISSUE 306: Adding code for lifting over between sequences and the reference genome.
  • ISSUE 320: Remove extraneous implicit methods in ReferenceMappingContext
  • ISSUE 314: Updates to indel realigner to improve performance and accuracy.
  • ISSUE 319: Adding scripts for publishing scaladoc.
  • ISSUE 315: Added table of (wall-clock) stage durations when print_metrics is used
  • ISSUE 312: Fixing sources jar
  • ISSUE 313: Making the CredentialsProperties file optional
  • ISSUE 267: Parquet and indexed Parquet RDD implementations, and indices.
  • ISSUE 301: Add Beacon's AlleleCount
  • ISSUE 293: Add aggregation and display of metrics obtained from Spark
  • ISSUE 295: Fix broken link to ADAM specification for storing reads.
  • ISSUE 292: Cleaning up scaladoc generation warnings.
  • ISSUE 289: Modifying interleaved fastq format to be hadoop version independent.
  • ISSUE 288: Add ADAMFeature to Kryo registrator
  • ISSUE 286: Removing some debug printout that was left in.
  • ISSUE 287: Cleaning hadoop dependencies
  • ISSUE 285: Refactoring read groups to increase the amount of data stored.
  • ISSUE 284: Cleaning up build warnings.
  • ISSUE 280: Move to bdg-formats
  • ISSUE 283: Fix reference name comment
  • ISSUE 282: Minor cleanup on interleaved FASTQ input format.
  • ISSUE 277: Implemented HTTPRangedByteAccess.
  • ISSUE 274: Added clarifying note to ADAMVariantContext
  • ISSUE 279: Simplify format-source
  • ISSUE 278: Use maven license plugin to ensure source has correct license
  • ISSUE 268: Adding fixed depth prefix trie implementation
  • ISSUE 273: Fixes issue in reference models where strings are not sanitized on collection from avro.
  • ISSUE 272: Created command categories
  • ISSUE 269: Adding k-mer and q-mer counting.
  • ISSUE 271: Consolidate Parquet logging configuration

Version 0.12.0

  • ISSUE 264: Parquet-related Utility Classes
  • ISSUE 259: ADAMFlatGenotype is a smaller, flat version of a genotype schema
  • ISSUE 266: Removed extra command 'BuildInformation'
  • ISSUE 263: Added AdamContext.referenceLengthFromCigar
  • ISSUE 260: Modifying conversion code to resolve #112.
  • ISSUE 258: Adding an 'args' parameter to the plugin framework.
  • ISSUE 262: Adding reference assembly name to ADAMContig.
  • ISSUE 256: Upgrading to Spark 1.0
  • ISSUE 257: Adds toString method for sequence dictionary.
  • ISSUE 255: Add equals, canEqual, and hashCode methods to MdTag class

Version 0.11.0

  • ISSUE 254: Cleanup import statements
  • ISSUE 250: Adding ADAM to SAM conversion.
  • ISSUE 248: Adding utilities for read trimming.
  • ISSUE 252: Added a note about rebasing-off-master to CONTRIBUTING.md
  • ISSUE 249: Cosmetic changes to FastaConverter and FastaConverterSuite.
  • ISSUE 251: CHANGES.md is updated at release instead of per pull request
  • ISSUE 247: For #244, Fragments were incorrect order and incomplete
  • ISSUE 246: Making sample ID field in genotype nullable.
  • ISSUE 245: Adding ADAMContig back to ADAMVariant.
  • ISSUE 243: Rebase PR#238 onto master

Version 0.10.0

  • ISSUE 242: Upgrade to Parquet 1.4.3
  • ISSUE 241: Fixes to FASTA code to properly handle indices.
  • ISSUE 239: Make ADAMVCFOutputFormat public
  • ISSUE 233: Build up reference information during cigar processing
  • ISSUE 234: Predicate to filter conversion
  • ISSUE 235: Remove unused contiglength field
  • ISSUE 232: Add -pretty and -o to the print command
  • ISSUE 230: Remove duplicate mdtag field
  • ISSUE 231: Helper scripts to run an ADAM Console.
  • ISSUE 226: Fix ReferenceRegion from ADAMRecord
  • ISSUE 225: Change Some to Option to check for unmapped reads
  • ISSUE 223: Use SparkConf object to configure SparkContext
  • ISSUE 217: Stop using reference IDs and use reference names instead
  • ISSUE 220: Update SAM to ADAM conversion
  • ISSUE 213: BQSR updates

Version 0.9.0

  • ISSUE 214: Upgrade to Spark 0.9.1
  • ISSUE 211: FastaConverter Refactor
  • ISSUE 212: Cleanup build warnings
  • ISSUE 210: Remove Scalariform from process-sources phase
  • ISSUE 209: Fix Scalariform issues and Maven warnings
  • ISSUE 207: Change from deprecated manifest erasure to runtimeClass
  • ISSUE 206: Add Scalariform settings to pom
  • ISSUE 204: Update Avro code gen to not mark fields as deprecated.

Version 0.8.0

  • ISSUE 203: Move package from edu.berkeley.cs.amplab to org.bdgenomics
  • ISSUE 199: Updating pileup conversion code to convert sequences that use the X and = (EQ) CIGAR operators
  • ISSUE 191: Add repartition parameter
  • ISSUE 183: Fixing Job.getInstance call that breaks hadoop 1 compatibility.
  • ISSUE 192: Add docs and scripts for creating a release
  • ISSUE 193: Issue #137, clarify role of CHANGES.{md,txt}

Version 0.7.2

  • ISSUE 187: Add summarize_genotypes command
  • ISSUE 178: Upgraded to Hadoop-BAM 0.6.2/Picard 1.107.
  • ISSUE 173: Parse annotations out of vcf files
  • ISSUE 162: Refactored SequenceDictionary
  • ISSUE 180: BQSR using vcf loader
  • ISSUE 179: Update maven-surefire-plugin dependency version to 2.17, also create an ...
  • ISSUE 175: VariantContext converter refactor
  • ISSUE 169: Cleaning up mpileup command
  • ISSUE 170: Adding variant field enumerations

Version 0.7.1

Version 0.7.3

Version 0.7.2

  • ISSUE 166: Pair-wise genotype concordance of genotype RDDs, with CLI tool

Version 0.7.0

  • ISSUE 171: Add back in allele dosage for genotypes.

Version 0.7.0

  • ISSUE 167: Fix for Hadoop 1.0.x support
  • ISSUE 165: call PluginExecutor in apply method, fixes issue 164
  • ISSUE 160: Refactoring FASTA work to break contig sizes.
  • ISSUE 78: Upgrade to Spark 0.9 and Scala 2.10
  • ISSUE 138: Display Git commit info on command line
  • ISSUE 161: Added switches to spark context creation code
  • ISSUE 117: Add a "range join" method.
  • ISSUE 151: Vcf work concordance and genotype
  • ISSUE 150: Remaining variant changes for adam2vcf, unit tests, and CLI modifications
  • ISSUE 147: Resurrect VCF conversion code
  • ISSUE 148: Moving createSparkContext into core
  • ISSUE 142: Enforce Maven and Java versions
  • ISSUE 144: Merge of last few days of work on master into this branch
  • ISSUE 124: Vcf work rdd master merge
  • ISSUE 143: Changing package declaration to match test file location and removing un...
  • ISSUE 140: Update README.md
  • ISSUE 139: Update README.md
  • ISSUE 129: Modified pileup transforms to improve performance + to add options
  • ISSUE 116: add fastq interleaver script
  • ISSUE 125: Add design doc to CONTRIBUTING document
  • ISSUE 114: Changes to RDD utility files for new variant schema
  • ISSUE 122: Add IRC Channel to readme
  • ISSUE 100: CLI component changes for new variant schema
  • ISSUE 108: Adding new PluginExecutor command
  • ISSUE 98: Vcf work remove old variant
  • ISSUE 104: Added the port erasure to SparkFunSuite's cleanup.
  • ISSUE 107: Cleaning up change documentation.
  • ISSUE 99: Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
  • ISSUE 105: Add initial documentation on contributing
  • ISSUE 97: New schema, variant context converter changes, and removal of old genoty...
  • ISSUE 79: Adding ability to convert reference FASTA files for nucleotide sequences
  • ISSUE 91: Minor change, increase adam-cli usage width to 150 characters
  • ISSUE 86: Fixes to pileup code
  • ISSUE 88: Added function for building variant context from genotypes.
  • ISSUE 81: Update README and cleanup top-level cli help text
  • ISSUE 76: Changing hadoop fs call to be compatible with Hadoop 1.
  • ISSUE 74: Updated CHANGES.txt to include note about the recursive-load branch.
  • ISSUE 73: Support for loading/combining multiple ADAM files into a single RDD.
  • ISSUE 72: Added ability to create regions from reads, and to merge adjacent regions
  • ISSUE 71: Change RecalTable to use optimized phred calculations
  • ISSUE 68: sonatype-nexus-snapshots repository is already in parent oss-parent-7 pom
  • ISSUE 67: fix for wildcard exclusion maven warnings
  • ISSUE 65: Create a cache for phred -> double values instead of recalculating
  • ISSUE 60: Bugfix for BQSR: Offset into qualityScore list was wrong
  • ISSUE 66: add pluginDependency section and remove versions in plugin sections
  • ISSUE 61: Filter utility for inverse of Projection
  • ISSUE 48: Fix read groups mapping and add Y as base type
  • ISSUE 36: Adding reads to rods transformation.
  • ISSUE 56: Adding Yy as base in MdTag

Version 0.6.0

  • ISSUE 53: Fix Hadoop 2.2.0 support, upgrade to Spark 0.8.1
  • ISSUE 52: Attributes: Use 't' instead of ',', as , is a valid character
  • ISSUE 47: Adding containsRefName to SequenceDictionary
  • ISSUE 46: Reduce logging for the actual adamSave job
  • ISSUE 45: Make MdTag immutable
  • ISSUE 38: Small bugfixes and cleanups to BQSR
  • ISSUE 40: Fixing reference position from offset implementation
  • ISSUE 31: Fixing a few issues in the ADAM2VCF2ADAM pipeline.
  • ISSUE 30: Suppress parquet logging in FieldEnumerationSuite
  • ISSUE 28: Fix build warnings
  • ISSUE 24: Add unit tests for marking duplicates
  • ISSUE 26: Fix unmapped reads in sequence dictionary
  • ISSUE 23: Generalizing the Projection class
  • ISSUE 25: Adding support for before, after clauses to SparkFunSuite.
  • ISSUE 22: Add a unit test for sorting reads
  • ISSUE 21: Adding rod functionality: a specialized grouping of pileup data.
  • ISSUE 13: Cleaning up VCF<->ADAM pipeline
  • ISSUE 20: Added Apache License 2.0 boilerplate to tops of all the GB-(c) files
  • ISSUE 19: Allow the Hadoop version to be specified
  • ISSUE 17: Fix transform -sort_reads partitioning. Add -coalesce option to transform.
  • ISSUE 16: Fixing an issue in pileup generation and in the MdTag util.
  • ISSUE 15: Tweaks 1
  • ISSUE 12: Subclass testing bug in AdamContext.adamLoad
  • ISSUE 11: Missing brackets in VcfConverter.getType
  • ISSUE 10: Moved record field name enum over to the projections package.
  • ISSUE 8: Fixes to sorting in ReferencePosition
  • ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
  • ISSUE 1: Fix scalatest configuration and fix unit tests
  • ISSUE 14: Converting some of the Option() calls to Some()
  • ISSUE 13: Cleaning up VCF<->ADAM pipeline
  • ISSUE 9: Adding support for a Sequence Dictionary from BAM files
  • ISSUE 8: Fixes to sorting in ReferencePosition
  • ISSUE 7: ADAM variant and genotype formats; and a VCF->ADAM converter
  • ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
  • ISSUE 3: Adding in implicit conversion functions for going between Java and Scala...
  • ISSUE 2: Update from Spark 0.7.3 to 0.8.0-incubating
  • ISSUE 1: Fix scalatest configuration and fix unit tests