Somatic CNV tagging germline events, bringing AnnotatedInterval in line with tribble reading (and gearing up for writing as well), and region merging. #4276

LeeTL1220 · 2018-01-26T20:43:32Z

Most of these changes are to support automated evaluation of GATK CNV.

Updates AnnotatedIntervals (formerly SimpleAnnotatedGenomicRegion) to use the tribble framework for reading. Writing is done in a way that should be concordant with a future tribble writing framework, as per discussion with @droazen.
Changes to XsvLocatableTableCodec to support usage of arbitrary config files. This cannot be done when using tribble features in the CLI. Already reviewed with @jonn-smith . Support for SAM File headers and comments is included.
Note: The reading of AnnotatedIntervals cannot be done automatically on the command line, unless the config file is a sibling. The tools below do not even attempt this, since the use cases involved will never have a sibling config file.
Created a default config file in the jar file resources to read tsvs with locatable fields from the CNV collection files. This is much less strict than the framework used by the CNV tools. The reader will accept any columns (or subset of the columns).

CLIs (both experimental quality):

TagGermlineEvents is a simple tool that attempts to identify events in a tumor seg file that correspond to a germline events.
- This is done purely with concordance on the breakpoints of the events (within some padding).
- Input germline segments must have calls.
- If a germline call is broken into multiple segments, this tool will handle that appropriately (ditto if there are multiple tumor segments overlapping the germline call).
MergeAnnotatedRegions will merge all overlapping regions and resolve annotation value conflicts.

Closes #3995

codecov-io · 2018-01-29T19:20:18Z

Codecov Report

Merging #4276 into master will increase coverage by 0.039%.
The diff coverage is 82.671%.

@@               Coverage Diff               @@
##              master     #4276       +/-   ##
===============================================
+ Coverage     79.807%   79.846%   +0.039%     
- Complexity     17163     17321      +158     
===============================================
  Files           1067      1075        +8     
  Lines          62439     62891      +452     
  Branches       10138     10176       +38     
===============================================
+ Hits           49831     50216      +385     
- Misses          8661      8699       +38     
- Partials        3947      3976       +29

Impacted Files	Coverage Δ	Complexity Δ
...broadinstitute/hellbender/utils/IntervalUtils.java	`91.532% <ø> (ø)`	`181 <0> (ø)`	⬇️
...tils/codecs/xsvLocatableTable/XsvTableFeature.java	`55.385% <ø> (ø)`	`16 <0> (ø)`	⬇️
.../broadinstitute/hellbender/utils/tsv/DataLine.java	`86.325% <100%> (ø)`	`59 <0> (ø)`	⬇️
...tools/funcotator/dataSources/TableFuncotation.java	`60% <100%> (ø)`	`20 <0> (ø)`	⬇️
.../tools/copynumber/utils/MergeAnnotatedRegions.java	`100% <100%> (ø)`	`3 <3> (?)`
...ils/annotatedinterval/AnnotatedIntervalHeader.java	`100% <100%> (ø)`	`6 <6> (?)`
...tmutpileup/ValidateBasicSomaticShortMutations.java	`85.965% <100%> (ø)`	`7 <0> (ø)`	⬇️
...nder/tools/copynumber/utils/TagGermlineEvents.java	`100% <100%> (ø)`	`3 <3> (?)`
...ataSources/xsv/LocatableXsvFuncotationFactory.java	`85.185% <61.538%> (+0.491%)`	`24 <0> (-4)`	⬇️
...g/broadinstitute/hellbender/utils/io/Resource.java	`55.556% <66.667%> (+3.175%)`	`6 <1> (+1)`	⬆️
... and 21 more

samuelklee

@LeeTL1220 Can you address some of the design issues with SimpleAnnotatedGenomicRegion and the associated classes, as well as change the test files to more accurately reflect the expected inputs (the legacy CNV test files are currently a mishmash of conventions from the new CNV collections, altered locatable column headers, etc.)? I think it might be easier to proceed with the review once some of this refactoring is done.

As we discussed, I was also unable to run the tools from the command line, even though tests are passing. Not sure if you figured out what was going on there, but I also caught another issue that would not have caused tests to fail (i.e., you only allow writing to pre-existing files). There may be other issues lurking, which is why I'd like to be able to run from the command line before reviewing further.

Thanks for being accommodating!

samuelklee · 2018-01-29T20:14:57Z

...number/utils/combine-segment-breakpoints-different-annotation-headers-with-legacy-header.tsv

@@ -1,4 +1,6 @@
-#SAMPLE_NAME=SAMPLE2
+@HD	VN:1.5


I don't understand the format of many of these test files. I assume that the only way we are allowed to modify incoming files is to change the relevant column headers to CONTIG, START, and END (since this is specified by the tool documentation), correct? (Although see comment above about just using the column numbers to specify the locatable columns.)

Then why does this one have a CNV-collections header along with with legacy (GATK CNV 1.0) column headers? Under what circumstances would we ever see an input like this?

I don't think any of these test files should contain a CNV-collections header unless they are files that would have been produced by the pipeline without further modification. (Although if you want to be really thorough, you can use a test file that might have been created by CombineSegmentBreakpoints to check that you can further combine it with other segment files. Such a file could conceivably be a mishmash like this one. However, you should name it clearly and document the test accordingly.)

I have removed the -with-legacy-headers from the test file names.

Currently, since this is being driven by a config file (that contains CONTIG, START, and END), the input files are meant to represent files either generated by the CNV pipeline.

These files could already conceivably have been generated by CombineSegmentBreakpoints.

Are you asking me to test input files with different config file formats?

My point was that your test files should primarily reflect actual inputs expected by the tool.

Now that you've changed over to config files that identify the column numbers of the locatable fields, I would not expect to see this file as a typical input---it's a mishmash of CNV 1.0/ReCapSeg-style columns, modified CONTIG\tSTART\tEND column headers, and a CNV-collections-style SAM header.

I was just using Segment_Mean and Segment_Call (CNV 1.0 style columns) as two example annotations. While not likely an exact input (unless I was comparing CNV 1.0 to CNV ModeledSegments), the header can really be CONTIG, START, END + anything. The only requirement for the columns for this tool is to have CONTIG, START, and END.

I think that it is somewhat confusing to use this as test input if we'd never really see it in the real world. As a developer who might have to maintain the tests that use this input, it's nice to be able to understand what the origin of the input might be (in case I have to modify it or come up with new inputs). The name of the test file (*-different-annotation-headers-with-legacy-header.tsv) doesn't really provide enough clues in that regard, either.

Why not just use an unmodified ReCapSeg or CNV 1.0 seg file as test input?

Using fake data, I have included test files from JaBbA, UCSC, ReCapSeg (which is like CNV 1.0), PCAWG consensus files, and an Oncotator annotated CNV output.

Done.

samuelklee · 2018-01-29T20:17:43Z

...stitute/hellbender/tools/copynumber/utils/combine-segment-breakpoints-with-legacy-header.tsv

-#SAMPLE_NAME=SAMPLE1
+@HD	VN:1.5
+@SQ	SN:1	LN:2000000
+@RG	ID:GATKCopyNumber	SM:sample1


Why does this filename include "-with-legacy-header"? Isn't this a file produced by CallCopyRatioSegments?

I got rid of all -with-legacy-header ...
Done

samuelklee · 2018-01-29T20:18:25Z

...umber/utils/combine-segment-breakpoints-with-legacy-header-learning-combined-copy-number.tsv

@@ -1,3 +1,6 @@
+@HD	VN:1.5
+@SQ	SN:1	LN:2000000
+@RG	ID:GATKCopyNumber	SM:test-sample


This test file is only used in SimpleAnnotatedGenomicRegionUnitTest, so it should be renamed.

samuelklee · 2018-01-29T20:24:54Z

...ain/java/org/broadinstitute/hellbender/tools/copynumber/utils/CombineSegmentBreakpoints.java

 import java.util.function.Function;
 import java.util.stream.Collectors;

 @CommandLineProgramProperties(
        oneLineSummary = "Combine the breakpoints of two segment files and annotate the resulting intervals with chosen columns from each file.",
        summary = "Combine the breakpoints of two segment files while preserving annotations.\n" +
                "This tool will load all segments into RAM.\n"+
-        "Expected interval columns are: " + SimpleAnnotatedGenomicRegion.CONTIG_HEADER + ", " +
-        SimpleAnnotatedGenomicRegion.START_HEADER + ", " + SimpleAnnotatedGenomicRegion.END_HEADER,
+                "Column headers for locatable information are taken from the first segment file.\n" +


This is probably more useful as part of the javadoc for the --segments argument.

Actually, why do we even need the column headers to be CONTIG, START, and END if you can use the XSV config files to specify these by column number? I would instead require that the first three columns give the locatable columns with arbitrary column headers.

You can specify the column headers or column index now, via config file. By default these are CONTIG, START, and END.

Deleted comment.

Done.

samuelklee · 2018-01-29T20:29:37Z

...ain/java/org/broadinstitute/hellbender/tools/copynumber/utils/CombineSegmentBreakpoints.java

+                Arrays.asList(input1ToOutputHeaderMap, input2ToOutputHeaderMap), getBestAvailableSequenceDictionary(),
+                l -> progressMeter.update(l));
+
+        final SamFileHeaderMerger samFileHeaderMerger = new SamFileHeaderMerger(SAMFileHeader.SortOrder.coordinate,


Is it OK to merge sequence dictionaries if they are not identical? Perhaps you should at least emit a warning?

There is not much documentation about how sequence dictionaries or comment lines/headers are handled, in general. If this is functionality that you think is worth having, perhaps it's also worth documenting?

Added docs and this tool will throw an exception if the dictionaries cannot be merged properly. For example, identical contig names with different lengths will cause an exception.

samuelklee · 2018-01-29T22:37:55Z

...ellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedGenomicRegionCollection.java

+    }
+
+    /**
+     * Same as {@link #create(Path, Path, Set)} , but uses the default annotation


If there is no scenario in which we would not want to use the default config file, then let's just eliminate the corresponding constructor.

See above... made private
Done

samuelklee · 2018-01-29T22:38:56Z

...ellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedGenomicRegionCollection.java

+    public static SimpleAnnotatedGenomicRegionCollection create(final List<SimpleAnnotatedGenomicRegion> regions,
+                                                                                    final SAMFileHeader samFileHeader,
+                                                                                    final List<String> annotations,
+                                                                                    final String contigColumnName,


The telescoping constructors makes it difficult to see how these column names as used. As noted elsewhere, they are passed to a write method at one point but are never actually used there.

Reduced to two public create methods.

Done

samuelklee · 2018-01-29T22:40:30Z

...ellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedGenomicRegionCollection.java

+                        codec.getComments(), regions, codec.getFinalContigColumn(), codec.getFinalStartColumn(), codec.getFinalEndColumn());
+
+            }
+            catch ( final FileNotFoundException ex ) {


I don't think we need to throw an exception for broken tests---in that case, we should just fix the tests.

It was a typo. It meant that the input file was not found.

Done.

samuelklee · 2018-01-29T22:49:35Z

...ellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedGenomicRegionCollection.java

+    /** Does not include the locatable fields. */
+    private List<String> annotations;
+    private List<String> comments;
+    private List<SimpleAnnotatedGenomicRegion> records;


As noted in the last round of comments, because you allow SimpleAnnotatedGenomicRegion to contain an arbitrary (and mutable!) map, this collection class is quite vulnerable to shenanigans. For example, I can create a collection that contains records with different annotations.

I think some sort of parametrization of records and collections that restricts them to a certain set of specified annotations would be best. However, barring this, you should at least check that all records have consistent annotations upon construction.

No longer mutable.

Effectively, there is enforcement at construction now. And there is an attribute on the collection that will store the annotations that are in each AnnotatedInterval.

samuelklee · 2018-01-29T23:00:05Z

...ellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedGenomicRegionCollection.java

+     *
+     * @param input readable path to use for the xsv file.  Must be readable.  Never {@code null}.
+     * @param inputConfigFile config file for specifying the format of the xsv file.  Must be readable.  Never {@code null}.
+     * @param headersOfInterest Only preserve these headers.  These must be present in the input file.  This parameter should not include the locatable columns


Why headersOfInterest here and columnsOfInterest elsewhere?

pshapiro4broad · 2018-03-01T15:47:59Z

.../org/broadinstitute/hellbender/tools/copynumber/utils/annotatedregion/AnnotatedInterval.java

+ */
+public final class AnnotatedInterval implements Locatable  {
+
+    private SimpleInterval interval;


It looks like this can be final

@pshapiro4broad is here! Now it's a party!

Done.

pshapiro4broad · 2018-03-01T15:55:57Z

...stitute/hellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedIntervalWriter.java

+            // By initializing writer to be based on fileWriter, writer.close will close the fileWriter as well.
+            writer = new SimpleTableWriter(fileWriter, new TableColumnCollection(finalColumnList));
+        } catch (final IOException ioe) {
+            throw new UserException.CouldNotCreateOutputFile(outputFile, "Could not create: " + outputFile.getAbsolutePath());


Isn't this new UserException.CouldNotCreateOutputFile(outputFile, ioe);? It seems like the user would really want the exception in the output, as it could be a few different things.

Made it a GATKException. Now user will get runtime error details.

Done.

pshapiro4broad · 2018-03-01T15:58:09Z

...stitute/hellbender/tools/copynumber/utils/annotatedregion/SimpleAnnotatedIntervalWriter.java

+    // TODO: Test for optional SAM File Header
+    // TODO: Test for other column names
+    /**
+     * {@inheritDoc}


For an overridden method with no doc changes, it is sufficient to omit the javadoc completely, as javadoc automatically inherits docs for overridden methods.

That answers that. Deleted.

Done.

pshapiro4broad · 2018-03-01T16:02:25Z

...titute/hellbender/tools/copynumber/utils/annotatedregion/AnnotatedIntervalUtilsUnitTest.java

+
+        return new Object[][] {
+            {
+                Arrays.asList(


You could avoid some code duplication here by moving the Arrays.asList() call into your test method.

Apologies if I am missing something, but it seems like I would have to do the following, which does not reduce code duplication:

@DataProvider(name = "mergeTests") public Object [][] createMergeTests() { return new Object[][] { { new AnnotatedInterval[]{ new AnnotatedInterval(new SimpleInterval("1", 100, 200), ImmutableSortedMap.of("Foo", "bar", "Foo1", "bar1")), new AnnotatedInterval(new SimpleInterval("1", 100, 200), ImmutableSortedMap.of("Foo", "bar", "Foo1", "bar1")) }, // ..........

No action

@pshapiro4broad Comment above...

You can avoid that by calling new Object[][][] { ... } E.g.

@Test(dataProvider = "simpleTests") public void testSimpleTagging(AnnotatedInterval[] tumorSegments, AnnotatedInterval[] normalSegments, AnnotatedInterval[] gt) { final List<AnnotatedInterval> testResult = SimpleGermlineTagger.tagTumorSegmentsWithGermlineActivity(Arrays.asList(tumorSegments), Arrays.asList(normalSegments), "call", ReferenceUtils.loadFastaDictionary(new File(ReferenceUtils.getFastaDictionaryFileName(REF))), TEST_GERMLINE_TAGGING_ANNOTATION, 10); Assert.assertEquals(testResult, Arrays.asList(gt)); } @DataProvider(name = "simpleTests") public Object[][] createSimpleTests() { return new Object[][][]{{ { // Tumor segments are assumed to be mutable. new AnnotatedInterval(new SimpleInterval("1", 100, 200), Maps.newTreeMap(ImmutableSortedMap.of("call", "+"))), new AnnotatedInterval(new SimpleInterval("1", 201, 300), Maps.newTreeMap(ImmutableSortedMap.of("call", "0"))) }, { new AnnotatedInterval(new SimpleInterval("1", 100, 200), ImmutableSortedMap.of("call", "+")), new AnnotatedInterval(new SimpleInterval("1", 201, 500), ImmutableSortedMap.of("call", "0")) }, { new AnnotatedInterval(new SimpleInterval("1", 100, 200), ImmutableSortedMap.of("call", "+", TEST_GERMLINE_TAGGING_ANNOTATION, "+")), new AnnotatedInterval(new SimpleInterval("1", 201, 300), ImmutableSortedMap.of("call", "0", TEST_GERMLINE_TAGGING_ANNOTATION, "0")) } }, ... }; }

Although now that I've written it out it doesn't really format that much shorter. I'll leave it up to you.

(no action)

pshapiro4broad · 2018-03-01T16:07:33Z

...nstitute/hellbender/tools/copynumber/utils/germlinetagging/SimpleGermlineTaggerUnitTest.java

+        return new Object[][] {
+                {
+                        // Trivial case
+                        Lists.newArrayList(


Why not Arrays.asList()? Do these need to be modifiable? Also, you could shorten the data provider by putting the array -> list conversion into the test method.

Converted to Arrays.asList(). Done.

No action on the array -> list conversion in the test.

droazen · 2018-03-01T19:49:44Z

@samuelklee: @LeeTL1220 and I just had a discussion about the writer aspect of this branch, and we agreed on the following:

Lee will introduce a new header type to encapsulate the information that's currently passed in individually to the writeHeader() method in AnnotatedIntervalWriter. This makes the interface cleaner and more future-proof, since the signature will just become writeHeader(AnnotatedIntervalHeader)
Lee will start writing out 3 additional structured header lines (as comment lines) to every header, declaring the names of the chrom, start, and stop columns. These will not be respected on input yet (he will still be relying on a config file to get the names of these 3 columns), but it's the first step in the direction of storing all necessary schema information in the header of each file, rather than separately from each file.
Lee will file a github issue to eventually use these 3 header lines on input, when they are present, to get the names of the chrom/start/stop columns (possibly still with a fallback to a separate config file if they aren't, but that is a point we can debate in a future PR).

samuelklee · 2018-03-01T21:19:27Z

Sorry @droazen @LeeTL1220, can you give me a bit more context? @LeeTL1220 is no longer using any of the CNV-specific collections classes that I had hoped might be Tribble-ized in the future, so I'm OK with any decisions you guys make that are specific to his classes (does @jonn-smith have an opinion?) I think that moving towards storing the config in the header is a good thing, in general.

If we need to make corresponding changes to the CNV-specific collections classes, then we should talk more. Not all of those collections describe locatables, so I'm not sure how we could fit them in the Tribble framework.

LeeTL1220 · 2018-03-01T21:21:54Z

@samuelklee I'm not making any changes to the CNV collection classes. I think none of this PR affects those classes.

…

On Thu, Mar 1, 2018 at 4:19 PM, samuelklee ***@***.***> wrote: Sorry @droazen <https://github.com/droazen> @LeeTL1220 <https://github.com/leetl1220>, can you give me a bit more context? @LeeTL1220 <https://github.com/leetl1220> is no longer using any of the CNV-specific collections classes that I had hoped might be Tribble-ized in the future, so I'm OK with any decisions you guys make that is specific to his classes (does @jonn-smith <https://github.com/jonn-smith> have an opinion?) I think that moving towards storing the config in the header is a good thing, in general. If we need to make corresponding changes to the CNV-specific collections classes, then we should talk more. Not all of those collections describe locatables, so I'm not sure how we could fit them in the Tribble framework. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4276 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACDXkzoWV1fcDEucTdcNZ_DggL0UW4M9ks5taGXhgaJpZM4Ru2it> .

-- Lee Lichtenstein Broad Institute 75 Ames Street, Room 8011A Cambridge, MA 02142 617 714 8632

LeeTL1220 · 2018-03-02T01:39:26Z

@samuelklee Let's discuss tribble-ifying the CNV classes, but for this PR, leaving those alone.

LeeTL1220 · 2018-03-05T15:42:12Z

@droazen number 3 in the list above: #4489

LeeTL1220 · 2018-03-27T01:18:25Z

@samuelklee There have been a lot of changes since I first opened this PR. I thought you might want to check that I addressed the comments you had that are still applicable. However, since this is now totally decoupled from the CNV Collection code, there should be less to look at.

@jonn-smith Can you look at the XsvLocatableCodec changes and the changes to the FuncotationFactory code? And that I have honored what I said I would do in conversations.

samuelklee · 2018-04-02T14:02:12Z

...adinstitute/hellbender/tools/copynumber/utils/combine-segment-breakpoints-comparison.tsv.seg

+@HD	VN:1.5
+@SQ	SN:1	LN:2000000
+@RG	ID:GATKCopyNumber	SM:sample_gt
+CONTIG	START	END	Num_Probes	Segment_Mean	Segment_Call


This test file still seems unnatural to me (see original comment directly below). How would this be generated in the real world? It has a CNV-collections-style header and a mix of "CONTIG/START/END" and CNV 1.0 column headers, so it seems like it would have to be formatted manually.

Is the tool be able to consume truth files in their original format? If so, then the test files should reflect that.

Also, note the filename extension of many of the test files is .tsv.seg---probably this can be just .seg.

Same goes for most of the test files below.

What is the spec for these tools? I think they should be able to consume strictly unmodified CNV-collections files (namely, the output of CallCopyRatioSegments) and truth files (CNV 1.0-style seg files, etc.) along with the appropriate config files. Requiring that the user go in and add additional headers or modify column names is something we should avoid.

Perhaps this is what the tools already do, but if so, it's not clear to me from looking at these test files.

samuelklee · 2018-04-02T14:04:08Z

...institute/hellbender/tools/copynumber/utils/combine-segment-breakpoints-no-samheader.tsv.seg

@@ -1,4 +1,5 @@
-#SAMPLE_NAME=SAMPLE1
+# This is another comment
+# This is yet another comment


This test file is also not up to date. The CNV-collections column headers are correct, but the SAM-style header is missing. (I think you probably generated this file when the header consisted only of #SAMPLE_NAME=... and didn't update it when the CNV-collections format was updated?)

As per discussions with @jonn-smith and @droazen , the AnnotatedInterval codec will support files that either contain comments (#) or a SAM File Header ('@'). Not both. Then the locatable column headers is whatever is specified by the user config file or the default config file. The default config file will match what the CNV collection classes use.

This test is of a file that is comments only and uses the default configuration file. Therefore, the test should not have a SAM File Header and the locatable column headers should be CONTIG, START, and END.

No action.

When would this file appear naturally in the wild? My point is that it wouldn't (it would require that a user take a CNV-collections file, strip the SAM header, and add comments), so it's extremely confusing to base a test on it. Test files should reflect the most typical use cases; if a file is atypical or unusual (e.g., an improperly or unexpectedly formatted file meant to stress test the code), then it should be clearly named as such.

Can you list here exhaustively the typical formats expected for each of the tools? Ideally, I would be able to easily discern this just from glancing at the test files, but my main objection is that I cannot---precisely because most of them are a mishmash of conventions.

Also, I think it might be more convenient to make the default config file specify the locatable column headers by indices 0-2. For seg files that have Sample as the first column, I think it's a bit cleaner to just use cut externally than it is to change column headers with e.g. sed.

samuelklee · 2018-04-02T14:18:27Z

.../org/broadinstitute/hellbender/tools/copynumber/utils/annotatedregion/AnnotatedInterval.java

+/**
+ * Simple class that just has an interval and sorted name-value pairs.
+ */
+public final class AnnotatedInterval implements Locatable, Feature {


The package name should be changed to reflect the name change to AnnotatedInterval. You should probably also edit the commit message for this branch.

Also note #3884. I don't mind the redundancy for now.

Renamed the package.

Done.

samuelklee · 2018-04-02T14:18:47Z

@LeeTL1220 Apologies, I don't think I have the bandwidth for a detailed re-review, but I'm OK with the XSV code if @jonn-smith approves.

However, I still don't understand the test files and left some comments there.

jonn-smith

Some minor comments.

jonn-smith · 2018-04-06T19:37:38Z

...oadinstitute/hellbender/tools/funcotator/dataSources/xsv/LocatableXsvFuncotationFactory.java

+                            throw new UserException.MalformedFile("Could not decode from data file: " + dataPath.toUri().toString());
+                        }
+
+                        supportedFieldNames.addAll(codec.getHeaderWithoutLocationColumns());


Might as well just use header here, instead of grabbing the info from the codec again.

jonn-smith · 2018-04-06T19:41:24Z

...oadinstitute/hellbender/tools/funcotator/dataSources/xsv/LocatableXsvFuncotationFactory.java

                    }

                    // Initialize our field name lists:
                    initializeFieldNameLists();

                    // Adjust the manual annotations to make sure we don't try to annotate any fields we aren't
                    // responsible for:
+                    //TODO: Isn't this the default map not the override map as the name would imply?


No - this sets the values in the overrides map initially. These values then get set in the createFuncotations method via a call to setOverrideValuesInFuncotations.

This has been updated in another branch to be much more elegant, but the underlying data structure name is the same.

Deleting the TODO then...
Done

jonn-smith · 2018-04-06T20:05:58Z