Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WX-1629] Add metadata value type tests #7433

Merged
merged 39 commits into from
May 29, 2024
Merged

[WX-1629] Add metadata value type tests #7433

merged 39 commits into from
May 29, 2024

Conversation

rsaperst
Copy link
Contributor

@rsaperst rsaperst commented May 14, 2024

Follow-up to WX-1410 to add additional testing of metadata value types to ensure that values are inserted into the database with correct types.

@rsaperst rsaperst marked this pull request as ready for review May 22, 2024 20:18
@rsaperst rsaperst requested a review from a team as a code owner May 22, 2024 20:18
@@ -13,5 +13,5 @@ metadata {
status: Succeeded
"outputs.draft3_infer_version.j": 660
"actualWorkflowLanguage": WDL
"actualWorkflowLanguageVersion": 1.0
"actualWorkflowLanguageVersion": "1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one 😂

case JsNull => (expected != JsNull).option(s"expected: $cacheSubstitutions but got: $actual")
case _ => Option(s"expected: $cacheSubstitutions but got: $actual")
}

matchError.map(s"Metadata mismatch for $key - " + _)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've done an admirable job creating your own string-to-array parser here, but I wonder if there's anything available in spray that can do this for us. Nice to outsource JSON serialization and deserialization wherever we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, spray has a parseJson method which was able to do this for us!

case Result.Failure(_) => invalidNel(s"Metadata block can not be converted to a Map: $config")
def fromConfig(config: Config): ErrorOr[WorkflowFlatMetadata] = {
val values: scala.collection.mutable.Map[String, JsValue] = scala.collection.mutable.Map[String, JsValue]()
config.entrySet().stream().forEach { value =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - any chance spray can help us with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think spray can help us with this one, by this point we aren't dealing with JSON anymore. I looked around for a function that could do this for us and it doesn't seem like any such function exists.

@@ -29,16 +29,7 @@ class WriteMetadataActor(override val batchSize: Int,
private val statsRecorder = MetadataStatisticsRecorder(metadataStatisticsRecorderSettings)

override def process(e: NonEmptyVector[MetadataWriteAction]) = instrumentedProcess {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just rearrangement or are there logic changes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just rearrangement to make the process method more easily unit testable.

@@ -14,9 +14,9 @@ metadata {
"outputs.biscayne_read_functions_windows_line_endings.map.y": 600
"outputs.biscayne_read_functions_windows_line_endings.map.z": 700
"outputs.biscayne_read_functions_windows_line_endings.map.x": 500
"outputs.biscayne_read_functions_windows_line_endings.tsv.0": "[\"line1\",\"line one\"]"
"outputs.biscayne_read_functions_windows_line_endings.tsv.1": "[\"line2\",\"line two\"]"
"outputs.biscayne_read_functions_windows_line_endings.tsv.2": "[\"line3\",\"line three\"]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super nice, I've definitely been annoyed in the past about having to craft just the right escape sequence for a test.

} else {
IO.unit
}
}

private def checkIfActuallySame(firstSet: Set[(String, JsValue)], secondSet: Set[(String, JsValue)]): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming: this function alludes to a "kinda same"/"actually same" duality... not sure that's what's happening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially the actual outputs are in a "cleaned" state where extraneous quotation marks have been removed, but the expected outputs might have extra quotation marks if the output is a String (e.g., the actual output will be "hello" but the expected output will read ""hello""). This function just cleans the outputs to make sure that we're not reporting a false difference caused by quotes that should have been removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to that affect?

HOCON). However values coming from the metadata endpoint are JsValue. At the moment we're only using
JsString, JsNumber and JsBoolean for comparison so this is a hacky way of handling that situation. It's
entirely likely that it won't survive long term.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geoffjentry we determined that 8 years is the going definition of "long term".

Comment on lines +30 to +31
# Due to Centaur changes in WX-1629, lots_of_inputs_papiv2 times out due to the large number of inputs. Instead,
# we run lots_of_inputs, which tests 400 inputs instead of the 10,000 in lots_of_inputs_papiv2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this test – did the slower Centaur evaluation inflate the overall suite time? I can't immediately intuit whether it would matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing the length of time of the tests on this PR to the nightly integration tests, the timings are about the same (some suites are a little faster, some are a little slower, all are pretty close though). I think that one test was problematic because it creates an array of 10,000 Files, which causes a recursion stack of 10,000 calls when we process the array. No other tests create such large arrays, and I think that call stack might have caused memory issues and slowness (I didn't test this hypothesis, but it would make sense since that seems to be the main difference between this test and others).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 sounds fine to me, since I only care about performance to the extent that I have to wait for tests.

Copy link
Contributor

@jgainerdewar jgainerdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job wrangling this!

} else {
IO.unit
}
}

private def checkIfActuallySame(firstSet: Set[(String, JsValue)], secondSet: Set[(String, JsValue)]): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to that affect?


services {
MetadataService {
metadata-keys-to-sanitize-utf8mb4 = ["submittedFiles:workflow", "commandLine", "outputs:metadata_type_validation.validate_boolean.boolean_output"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was special about outputs:metadata_type_validation.validate_boolean.boolean_output that caused it to be added here? Might want a comment explaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing special about that key, I added it to prove through integration testing that sanitizing metadata keys won't convert non-Strings to Strings (essentially I just picked that one randomly).

# timeout-minutes: 30
# with:
# limit-access-to-actor: true
# detached: true
#This script bascially just looks up another script to run, assuming that the other script's filename is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is left in here commented-out intentionally and I support it 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct!

@rsaperst rsaperst merged commit 408951f into develop May 29, 2024
37 checks passed
@rsaperst rsaperst deleted the wx1629 branch May 29, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants