Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit column kind default explicitly for Windows SDK SARIF emit. #1160

Merged
merged 5 commits into from
Dec 7, 2018

Conversation

michaelcfanning
Copy link
Member

@rtaket @ScottLouvau @lgolding

This change is unusual. run.columnKind is a property that specifies the underlying construct that is used to derive column information in a region. For Windows, these are UTF16 code units. For *nix, these tend to be unicode code points. After discussion, we believe that unicode code points are the most reasonable default value for this property, and that change was approved in the spec.

This SDK change addresses two problems: 1) ensure that SARIF persisted by this SDK forces the appropriate Windows-specific column kind into new log files, 2) when processing existing SARIF, the SDK will assume those files were also previously produced by the Windows platform and will populate column kind explicitly with UTF16 code units if that property was previously absent from SARIF.

In the (unexpected) circumstance that this SDK processes pre-release SARIF produced on a platform where unicode code points were used to compute regions, the producer will need to provide that enum value explicitly, rather than depending on the property's default value.

@@ -656,6 +656,14 @@
"unicodeCodePoints"
]
}
},
{
"kind": "AttributeHint",
Copy link
Member Author

@michaelcfanning michaelcfanning Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeHint [](start = 15, length = 13)

Oops. test gap closed here by adding columnKind explicitly to our comprehensive test files. I manually reviewed the hints file to make sure we don't have this same mistake elsewhere.

@@ -138,6 +139,8 @@ public SarifNodeKind SarifNodeKind
/// Specifies the unit in which the tool measures columns.
/// </summary>
[DataMember(Name = "columnKind", IsRequired = false, EmitDefaultValue = false)]
[JsonConverter(typeof(EnumConverter))]
[System.ComponentModel.DefaultValue(ColumnKind.UnicodeCodePoints)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jschema doesn't emit default values currently. We'll have to manually maintain them for now. microsoft/jschema#57

@@ -151,6 +154,7 @@ public SarifNodeKind SarifNodeKind
/// </summary>
public Run()
{
this.ColumnKind = ColumnKind.UnicodeCodePoints;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.ColumnKind = ColumnKind.UnicodeCodePoints [](start = 12, length = 46)

ok, this is an important apparent JSON.NET gap. in JSON.NET, you can provide an argument in a JSON convert settings object that specifies that default values should populate if missing on deserialization. the problem with that is that you need to explicitly create and pass the appropriate settings object (and we've attempted to make everything work with a single API call with no parameterization of this kind).

to work around the problem, we provide a programmatic equivalent of populate by setting the default value in the ctor. when JSON.NET itself is deserialization, it will overwrite this property in cases where the JSON has an explicit value. a fix to jschema #57 should provide this fix as well (and not just the default value attribute that elides property values from persisted JSON if they map to the default).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are talking about JsonSerializerSettings.DefaultValueHandling, whose Include value means that "The Json.NET deserializer will continue setting a field/property if the JSON value is the same as the default value."

But it goes on to say "DefaultValueHandling can also be customized on individual properties with JsonPropertyAttribute." So I think we can fix this by emitting

[JsonProperty DefaultValueHandling = DefaultValueHandling.Include)]

on every property.


In reply to: 239879240 [](ancestors = 239879240)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that to https://github.com/Microsoft/jchema/issues/57.


In reply to: 239910615 [](ancestors = 239910615,239879240)

@@ -21,7 +21,6 @@ public override object ReadJson(JsonReader reader, Type objectType, object exist
{
throw new ArgumentNullException(nameof(reader));
}

string value = (string)reader.Value;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s [](start = 12, length = 1)

Odd change. Doesn't our style require a blank line after a closing brace? See lines 34, 39, 53...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not have a strict guideline around vertical whitespace in this scenario. Generally, code should be crafted to be readable and we prefer that developers have some latitude interpreting that guideline. Most of our coding requirements are designed to ensure completeness of information and minimize introduction of correctness problems.

Still, I honestly don't know how this line deletion crept in and can revert.


In reply to: 239900759 [](ancestors = 239900759)

public void Run_ColumnKindSerializesProperly()
{
// In our Windows-specific SDK, if no one has explicitly set ColumnKind, we
// will set it to the windows-specific value of Utf16CodeUnits. Otherwise,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w [](start = 34, length = 1)

'w' => 'W'

if (this.ColumnKind == ColumnKind.None)
{
this.ColumnKind = ColumnKind.Utf16CodeUnits;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} [](start = 12, length = 1)

I agree with this logic, but not necessarily with its home in what should be essentially a "getter" method. It doesn't feel right for this method to have a side effect. Please spend a few moments pondering whether there's a better place for this, but don't block if you can't think of one.

@@ -13,6 +13,24 @@ public partial class Run
private static Invocation EmptyInvocation = new Invocation();
private static LogicalLocation EmptyLogicalLocation = new LogicalLocation();

public bool ShouldSerializeColumnKind()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShouldSerializeColumnKind [](start = 20, length = 25)

Wow, I hadn't looked closely at this until now. This is the first time I wondered who was calling these, and I see that JSON.NET has a built-in mechanism for conditional serialization.

@ghost
Copy link

ghost commented Dec 7, 2018

Ship it. I am still studying your comment about the "JSON.NET gap", but I don't see any reason to block this merge.

@ghost
Copy link

ghost commented Dec 7, 2018

If today

@ghost
Copy link

ghost commented Dec 7, 2018

You're feeling fine

@ghost
Copy link

ghost commented Dec 7, 2018

Please review

@ghost
Copy link

ghost commented Dec 7, 2018

My #1159.

@ghost
Copy link

ghost commented Dec 7, 2018

Burma Shave.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelcfanning michaelcfanning merged commit 8043741 into develop Dec 7, 2018
@michaelcfanning michaelcfanning deleted the column-kind-default branch January 28, 2019 18:08
michaelcfanning pushed a commit that referenced this pull request Feb 6, 2019
…#1264)

* Fix tests that are broken in appveyor (#1134)

* Properly persist run level property bags (#1136)

* Fix #1138: Add validation rule: contextRegion requires region (#1142)

Also:

- Enhance the "new-style" verification so that we no longer require the file "Invalid_Expected.sarif". Each file can now contain a property that specifies the expected locations of all the validation results.

* Prep for 2018-11-28 schema update. Remove run.architecture. (#1145)

* Add run.newlineSequences to schema (#1146)

* Mark result.message as required in the schema (#1147)

* Mark result.message as required in the schema

* Update release history with result.message breaking change.

* Fix typo in testoutput.

* Rename tool.fileVersion to tool.dottedQuadFileVersion (#1148)

* Upgrade more validator functional tests (#1149)

We apply the new functional test pattern to four more rules:
- `EndColumnMustNotBeLessThanStartColumn`
- `EndLineMustNotBeLessThanStartLine`
- `EndTimeMustBeAfterStartTime` (which is misnamed, and in a future PR we will rename it to `EndTimeMustNotBeBeforeStartTime`)
- `MessageShouldEndWithPeriod`

In addition, we remove the test data for a rule that no longer exists, `HashAlgorithmsMustBeUnique` (which no longer applies because `file.hashes` is now an object keyed by the hash algorithm).

Because there are so many properties of type `message`, exhaustively testing the rule `MessageShouldEndWithPeriod` revealed many holes in the visitor class `SarifValidationSkimmerBase`, which I plugged. As we have discussed, we should generate this class from the schema.

After this, there are only two more rules to convert:
- `UriBaseIdRequiresRelativeUri`
- `UseAbsolutePathsForNestedFileUriFragments`

... but this PR is already large enough.

* Remove 'open' from list of valid rule configuration default values. (#1158)

* Emit column kind default explicitly for Windows SDK SARIF emit. (#1160)

* Emit column kind default explicitly for Windows SDK SARIF emit.

* Update release notes

* More column kind test fixes

* Change behavior to always serialize column kind.

* Always serialize column kind

* Finish validator functional test upgrade (#1159)

* Rename rule EndTimeMustBeAfterStartTime => ...MustNotBeBefore...

* Upgrade UriBaseIdRequiresRelativeUri tests.

* Remove obsolete rule UseAbsolutePathsForNestedFileUriFragments.

* Remove support for old test design.

* Remove 'package' as a documented logical location kind in the schema. Documentation only change. (#1162)

* Fortify FPR converter improvements + unit tests (#1161)

* Improvements and corrections

Populate originalUriBaseIds from <SourceBasePath>
Populate tFL.kind from <Action type="...">
Add default node to result.locations

* Add location annotation for Action elements with no type attribute

* Support node annotations + uribasepath + test updates

* Update FortifyTest.fpr.sarif

* Add converter tests & assets + opportunistic code cleanup

* PR feedback

* Logical locations dictionaries to arrays (#1170)

The core change here is the transformation of `run.logicalLocations` from a dictionary (which is keyed, generally, by the fully qualified name of a logical location) to an array of logical locations. Result locations now reference logical locations by a logical location index. This changes removes the necessity of resolving key name collisions for logical locations that differ only by type (a namespace that collides with the fully qualified name of a type being the classic example).

In addition to making the core change, we have also authored a transformation that converts existing pre-release SARIF v2 files to the new design. We accomplish this by creating dictionaries, with value type comparison for keys, that are keyed by logical locations. This processing requires that any parent keys already exist in the array (so that a logical location can populate its parent logical location index, if any).

In addition to the core functionality and any transformation of individual log files, result matching presents special complications. In a result matching scenario, the logical location index of a result is not relevant to its identify: only the contents of the logical location this index points to are relevant. Furthermore, when merging a baseline file (which can contain results that are exclusive to a single log file within the matching domain), logical location indices are subject to change and must be updated.
For this scenario and at least one other, we use a visitor pattern to update indices. The reason is that locations are pervasive in the format and the visitor provides a convenient mechanism to put common location processing logical. This visitor uses puts additional pressure on the transformation logic, as it entails additional deserialization of the JSON. With more time/effort, we could have exhaustively updated all locations using the JParser/JObject/etc. API domain. Oh well.

Finally, we must update the logical that transforms v1 to v2 and vice versa.

Whew. If that was not already sufficiently intrusive, this work revealed some minor flaws in various converters (the ones that handle logical locations): AndroidStudio, FxCop and PREfast.
This change is complex but valuable. Logical locations are now expressed as coherent objects in their table. In the main, I have preferred to leave `result.fullyQualifiedName` populated (in addition to `result.logicalLocationIndex`, to support easily looking up matching logical locations).

* Add result.rank and ruleConfiguration.defaultRank (#1167)

As we discussed offline with @fishoak, the design is good as it stands. The only change is that the default should be -1. I filed oasis-tcs/sarif-spec#303 for that, and put it on the agenda for TC #30.

* Logical locations notes (#1184)

* Respond to a small # of PR comments related to recent logical locations change.

* Fix visibility on helper

* Logical locations notes (#1185)

* Respond to a small # of PR comments related to recent logical locations change.

* Fix visibility on helper

* Fix up v1 transformation with keys that collide

* Preserve decorated name data

* Rebaseline test for decorated name propagation

* Respond to PR feedback. Update tests to close test holes.

* Rebaseline updated test

* Test key collision in annotated code locations.

* Update baseline

* Incorporate "provenance" schema changes and fix package licenses (#1193)

* Add autogenerated RuleConfiguration.cs missed from earlier commit.

* Upgrade to NuGet.exe 4.9.2 to handle new license tag.

* Remove unused 'Owners' element from build.props.

* Add package Title.

* Use packageLicenseExpression to specify package license.

* Suppress NU5105 (about SemVer 2.0.0 strings) for "dotnet pack" packages.

NuGet.exe still warns for ".nuspec" packages.

* Incorporate latest "provenance" schema changes.

* Address PR feedback.

* External property files (#1194)

* Update spec for externalPropertiesFile object.

* Add external property files to schema.

* Finish merge of provenance changes.

* Update release notes.

* Remove vertical whitespace.

* PR feedback. Fix 'properties' to refer to an external file not an actual properties bag.

* Remove code gen hint that makes external property files a property bag holder.

* Introduce missing brace. Fix up code emit for 'properties' property that isn't a property bag.

* Incorporate schema changes for versionControlDetails.mappedTo and rule.deprecatedIds (#1198)

* Incorporate "versionControlDetails.mappedTo" schema change.

* Incorporate "rule.deprecatedIds" schema change.

* Revert updates to comprehensive.sarif (to allow transformer to continue to use this as test content).

* Array scrub part 1: everything but anyOf-or-null properties. (#1201)

NOTE: For explicitness, I added schema attributes even when they matched the JSON schema defaults, namely: `"minItems": 0` and `"uniqueItems": false`.

* Fix v1->v2 hash transformation (#1203)

CreateHash must be called to handle algorithm names that aren't in our translation table. Also updated a unit test to cover this case.

* Integrate jschema 0.61.0 into SDK (#1204)

* Merging arrays transformations back into 'develop' branch (#1236)

* Fix up tests

* Conversion to files array. WIP. Core SARIF component build complete except for SarifLogger tail.

* Add fileIndex property to file object (#1186)

* Fix up tests

* PR feedback to improve schema comment

* Logical locations notes (#1185) (#1187)

* Respond to a small # of PR comments related to recent logical locations change.

* Fix visibility on helper

* Fix up v1 transformation with keys that collide

* Preserve decorated name data

* Rebaseline test for decorated name propagation

* Respond to PR feedback. Update tests to close test holes.

* Rebaseline updated test

* Test key collision in annotated code locations.

* Update baseline

* Reduced files array build (#1191)

* Sarif and Sarif.Converters now building

* Files array (#1188)

* Add fileIndex property to file object (#1186)

* Fix up tests

* PR feedback to improve schema comment

* Logical locations notes (#1185) (#1187)

* Respond to a small # of PR comments related to recent logical locations change.

* Fix visibility on helper

* Fix up v1 transformation with keys that collide

* Preserve decorated name data

* Rebaseline test for decorated name propagation

* Respond to PR feedback. Update tests to close test holes.

* Rebaseline updated test

* Test key collision in annotated code locations.

* Update baseline

* DRY out converters to isolate shared code.

* Restore essential change in schema that converts files dictionary to an array.

* Simplify ShouldSerialize logic

* Remove unused namespaces

* Respond to PR feedback.

* Respond to PR feedback

* End-to-end build works. Now we can author core transformation and fix tests. (#1192)

* Fix up merge from 'develop' branch.

* Update supporting test code for processing checked in files. (#1202)

* Update supporting test code for processing checked in files.

* Update nested files test to contain single file.

* Files array basic transform (#1205)

* Update supporting test code for processing checked in files.

* Update nested files test to contain single file.

* WIP. Furhter progress

* Fix up samples build

* Fix up merge from basic transform branch

* Mime type validation (#1206)

* Fix up merge from basic transform branch

* Fix up mime test

* Start work on v1 <-> v2 transformation (#1208)

* Restore TransformCommand and first (unaffected) unit test

* Restore "minimal prerelease v2 to current v2" test.

* estore "minimal v1 to current v2" test.

* Restore remaining TransformCommand unit tests.

* Uncomment v2 => v1 tests to observe failures.

* Uncomment 'transform' command.

* Restore MakeUrisAbsoluteVisitor tests (#1210)

This change updates the visitor that expands URIs in the presence of `originalUriBaseIds`. Turns out there was technical debt here, because our tests provided `originalUriBaseIds` equivalents in the property bag (because we had no official SARIF slot for them). I did not notice this gap when we made the schema change to add `originalUriBaseIds`.

* Get v2 -> v1 transform working with files array (#1211)

Test failure count is down to 32; will be 28 when you merge your fix.

There is not -- and never was -- a test case for fileLocations that use uriBaseId (never was one). I know for a fact that there is no code to support that case. You’ll see a comment to that effect in the code. I will take care of that next. Then I will move on to v1 -> v2 transform.

As part of this change, the `SarifCurrentToVersionOneVisitorTests` are now based on the `RunTest` helper method from the base class `FileDiffingTests`.

* Convert debug assert to exception to make test execution more deterministic (#1214)

* Update insert optional data tests and update indices visitor. (#1212)

* Update insert optional data tests and update indices visitor.

* Delete speculatively needed file

* Remove erroneous override of base visit method.

* Rework summary comment on DeletesOutputsDirectoryOnClassInitializationFixture.

* Update clang analyzer name. Flow converter log verification through JToken comparison. (#1213)

* The multiool, core sarif, and validation test binaries now all pass (#1215)

* The multiool, core sarif, and validation test binaries now all pass completely.

* Remove unwanted assert that may fire during unit testing.

* Merge from files-array

* PR feedback.

* PR feedback tweaks

* Accept PR feedback from previous change. (#1216)

Use LINQ IEnuemrable.Except in the unit test, which improves readability without compromising efficiency (because Except uses a Set to do its work in O(N) time).

* Fix Sarif.Driver and Sarif.Functional tests. (#1217)

* Fix Sarif.Driver and Sarif.Functional tests.

* Restore test file

* Fix Semmle tests and Fortify converter: all tests now pass. (#1218)

* Sarif converters fixups (#1219)

* Fix semmle tests and fority.

* Final test fixups

* Invoke appveyor for files-array branch.: (#1220)

* Update SarifVersionOneToCurrentVisitor for run.files array (#1221)

* Uncomment v1 -> v2 tests; 3/14 fail.

* Move test data to locations expected by FileDiffingTests.

* Fix up some IDE C#7 code cleanups.

* Use FileDiffingTests helper.

* Fix bug in FileDiffingTests that caused a test failure.

* Remove default-valued argument from a call to RunTest.

* Create basic files array

Does not yet have fileIndex, parentIndex, or response file handling.

* Revert incorrect change in FileDiffingTests.

* Fix one unit test with spot fix to "expected" file.

* Fix up some C#7 IDE warnings

* Force update in FileDiffing tests to avoid deserialization errors from out of date "expected" files.

* Fix missing "modified" flag sets in PreRelCompatTransformer.

* Populate fileIndex in run.files array.

* Fix unit test by fixing fileLocation creation.

* Restore response file handling.

* Populate fileIndex on fileLocations as appropriate.

* Fix last test failure by reworking response file handling.

* Feedback: Introduce transformer helper PopulatePropertyIfAbsent.

* Feedback: Tighten platform-independent string compare.

Also:
- Reformat unit test lines.

* Feedbakc: Revert FileDiffingTest change; downgrade affected test files to provoke transform

* Basic rules transformation (except for v1 <-> v2 conversion) (#1223)

* Basic rules transformation (except for v1 <-> v2 conversion)

* Respond to very excellent PR feedback.

* PR feedback

* Add files array tests for nested files and uriBaseIds (#1226)

* Add failing v1 -> v2 nested files test

* Fix existing mis-handling of analysisTarget and resultFile.

* Get nested files test case working.

* Add failing v1 => v2 uriBaseId test.

* Populate v2 uriBaseId.

* Fix up expected v2 fileLocation.properties: test passes.

* Enhance uriBaseIds test case.

* Implement v2->v1 conversion for rules dictionary (#1228)

* Notification rule index (#1229)

* Add notification.ruleIndex and increase flatten messages testing

* Notification message tests + add notification.ruleIndex to schema

* Notification rule index (#1230)

* Add notification.ruleIndex and increase flatten messages testing

* Notification message tests + add notification.ruleIndex to schema

* Missed feedback from previous PR (#1231)

* Implement v1->v2 conversion for rules dictionary (#1232)

* Partial implementation

* Get last test working.

* Somebody missed checking in a generated file.

* Schema changes from TC #30 (#1233)

* Add source language, fix rank default.

* Adjust rank minimum to accommoodate default.

* Fix broken test.

* Remove unnecessary None items from project file.

* PR feedback

* Files array results matching (#1234)

* WIP preliminary results matching

* Restore results matching for files array

* Add back autogenerated (unused) namespace directive

* Updated release notes for TC30 changes. (#1240)

* Mention rules array change in release history. (#1243)

* Baseline states (#1245)

* Add 'updated' state to baselineState and rename 'existing' to 'unchanged'

* Update prerelease transformer

* Enable appveyor build + test. Correct version constant.

* Update test. Respond to PR feedback.

* Fix #1251 #1252 #1253 (#1254)

* Fixes + test coverage + cleanup

* Update SDK version

* Update version in test assets

* Fix multitool nuspec (#1256)

* Revert unintentional change to BaselineState (#1262)

The `develop` branch should match TC <span>#</span>30, but we inadvertently introduced a change from  TC <span>#</span>31: replacing `BaselineState.Existing` with `.Unchanged` and `Updated`.

I did not revert the entire change. Some things (like having AppVeyor build the `tc-31` branch instead of the defunct `files-array` branch, and some C# 7 updates to the `PrereleaseCompatibilityTransformer`) were good, and I kept them.

Also:
- Update the version to `2019-01-09` in preparation for merge to `master`.

* Transfer Bogdan's point fix (analysisTarget handling) from master to develop (#1263)

In preparation for merging `develop` to `master` for the publication of version 2019-01-09 (TC <span>#</span>30), we apply the recent changes in `master` to the `develop` branch. These changes fixed two bugs in the handling of `analysisTarget` in the v1-to-v2 converter (`SarifVersionOneToCurrentVisitor`).

Now `develop` is completely up to date, and when we merge `develop` to `master`, we _should_ be able to simply take the "incoming" changes on all conflicting files.

* Cherry-pick: v1 transformer analysis target region persistence fix. (#1238)
* Mention NuGet publishing changes in release history.
* Cherry pick: Don't emit v2 analysisTarget if there is no v1 resultFile. (#1247)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant