-
Notifications
You must be signed in to change notification settings - Fork 304
/
CHANGES.txt
112 lines (73 loc) · 5.62 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
ADAM Changelog
Trunk (not yet released)
NEW FEATURES
OPTIMIZATIONS
IMPROVEMENTS
BUG FIXES
ADAM 0.7.0
NEW FEATURES
* Added ability to load and merge multiple ADAM files into a single RDD.
* Pairwise, quantitative ADAM file comparisons: the CompareAdam command has been extended to calculate
metrics on pairs of ADAM files which contain the same reads processed in two different ways (e.g.
two different implementations of a pre-processing pipeline). This can be used to compare different
pipelines based on their read-by-read concordance across a number of fields: position, alignment,
mapping and base quality scores, and can be extended to support new metrics or aggregations.
* Added FASTA import, and RDD convenience functions for remapping contig IDs. This allows for reference
sequences to be imported into an efficient record where bases are stored as a list of enums. Additionally,
convenience values are calculated. This feature was introduced in PR #79 and is a breaking change.
* Added helper functions for properly generating VCF headers for VCF export. This streamlines the process
of converting ADAM Variant Calls to the legacy VCF format. This was added in PR#85.
* Added functions to the ADAMVariantContext which allows a Variant context to be built directly from genotypes.
Previously, this operation could only be done at the RDD level. This was introduced in PR#88.
* Added API functions and CLI tools for merging multiple ADAM files. This code performs a smart merge and
ensures that there are no collisions between reference IDs or read group IDs. These features were added
in PR#73.
* Added ADAMRod model and Reads2Rods transformation; this is a pileup generation function that better takes
advantage of locality for data that is already sorted. This was introduced in PR#36.
* ISSUE 101: Adding ability to call plugins from the command line not defined in the main Adam jar and included
in the classpath.
* ISSUE 83: Add ability to perform a "region join" to RDDs of ADAMRecords.
OPTIMIZATIONS
* Transformed phred --> double calculation into a LUT, which improves performance. This change was introduced
into the API in PR#65 and was a breaking change. This change was then propegated into BQSR by PR#71.
* Removed unnecessary count during pileup conversion; this count substantially increased the time it took to
do pileup conversion. This was introduced in PR#125 which came out of issue #121.
IMPROVEMENTS
* ISSUE 148: Moved SparkContext creation code for ADAM to the adam-core module instead of adam-cli. This allows downstream
users to only depend on adam-core instead of adam-cli
* ISSUE 92: improved the representation of the types of 'optional' fields from the BAM, and their encoding
in the 'attributes' field of ADAMRecord. This encoding now includes the type, and should no longer be
lossy, therefore making it possible to write code to re-export a BAM from the ADAM file in the future.
* Added code to reference region model that allowed for the creation of regions from individual reads, and
that allowed for adjacent non-overlapping regions to merge together. This was added in PR#73.
* Added code to the projection package which allows for the creation of an inverse projection from a given
projection. This was added in PR#61.
* CLI option printout width was increased to 150 characters to improve display on large monitors. This was
added by PR#91.
* Various build improvements were added by PR#68 and PR#66.
* Added option to pileup transformations to set whether reads that are not at their primary alignment positions
should be converted into pileups. By default, we only convert reads at a primary mapping location. This
is does not introduce incompatible changes to the API, but changes the APIs functionality. This change was
introduced in PR#125 which came out of issue #121.
* Added switches to AdamContext creation code to allow for configuration of Kryo buffer size, and to allow
registration of job statistics listener for profiling. This change was introduced in PR#161 which came out
of issue #149.
BUG FIXES
* Fixed issues where VCF header was not being written correctly. This prevented variant calls from being
written after conversion. This was fixed in PR#85.
* Fixed a possible issue where pileup generation may have been ignoring reference ID when grouping pileups
into rods. This was in PR#86. This necessitated changing the ADAMRod model which was introduced in PR#36;
however, this is not an API breaking change as ADAMRod did not show up in a previous release.
* Fixed code which performed Hadoop 1 incompatible HDFS access. This was fixed in PR#76.
* Added Y as a valid base inside of the MdTag utility. The omission of this base caused BAM file import to
fail for some datasets. This was addressed in PR#56. This change was propegated more widely across the
API by PR#48.
* ISSUE 103: Added a call to clearProperty('spark.driver.port') in the cleanup from a sparkTest, so that
we correctly clean up the test Spark workers and avoid errors about attempting to bind to a port that's
already in use.
* ISSUE 109: Added code that splits very large assemblies into fragments. This improves a performance
bottleneck when running on machines with finite memory. This fix was added in PR#160.
BREAKING CHANGES
* ADAMFasta was changed to ADAMNucleotideContig, and internal field types and names were changed in PR #79.
* When optimizing phred --> double calculation, several public methods in the PhredUtils were renamed to clarify
their operations.