-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DR-3038] Fix Azure Synapse Querying #1478
Conversation
Otherwise, if the flightId begins with an underscore, synapse will return 0 results when performing a wildcard query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I noticed that integration tests failed, but the Gradle scan made me think that the failures were due to transient environmental issues: https://scans.gradle.com/s/7iwlgyo77bt5a
So I took the liberty of rerunning, hope that's okay.
@@ -407,11 +407,11 @@ public void performIngest( | |||
// 3 - Retrieve info about database schema so that we can populate the parquet create query | |||
String tableName = destinationTable.getName(); | |||
String destinationParquetFile = | |||
FolderType.METADATA.getPath("parquet/" + tableName + "/" + ingestFlightId + ".parquet"); | |||
FolderType.METADATA.getPath(IngestUtils.getParquetFilePath(tableName, ingestFlightId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for refactoring out to call the existing utility method!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK as far as I can tell
@@ -386,7 +387,7 @@ public static List<Column> getDatasetFileRefColumns( | |||
|
|||
// Note: this is the unqualified path (e.g. it gets used in metadata and scratch directories) | |||
public static String getParquetFilePath(String targetTableName, String flightId) { | |||
return "parquet/" + targetTableName + "/" + flightId + ".parquet"; | |||
return "parquet/" + targetTableName + "/" + FLIGHT_ID_PREFIX + flightId + ".parquet"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to create a constant for parquet/
too? It's used four times in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking at this file I noticed synapseTestCleanup()
calls logger
like this:
logger.warn("[Cleanup exception] Unable to drop tables.", ex.getMessage());
Which is incorrect; only Actually it does accept an exception, but this code is passing in a logger.error()
accepts an exception as a final parameter.String
. For this to work as expected, it needs to format the exception in the string:
logger.warn("[Cleanup exception] Unable to drop tables. {}", ex.getMessage());
or it could pass the exception in:
logger.warn("[Cleanup exception] Unable to drop tables.", ex);
(Although I doubt the value of these log statements as I don't think anything is looking at the log outputs to monitor for these messages, so they won't be acted upon.)
"parquet/scratch_" + destinationTable.getName() + "/" + randomFlightId + ".parquet"); | ||
"parquet/scratch_" | ||
+ destinationTable.getName() | ||
+ "/flight" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be /_flight
? And if so, could this use IngestUtils.getParquetFilePath()
instead?
src/test/java/bio/terra/service/filedata/azure/AzureSynapsePdaoSnapshotConnectedTest.java
Outdated
Show resolved
Hide resolved
src/test/java/bio/terra/service/filedata/azure/AzureSynapsePdaoSnapshotConnectedTest.java
Outdated
Show resolved
Hide resolved
…oSnapshotConnectedTest.java Co-authored-by: Phil Shapiro <pshapiro@broadinstitute.org>
01bc313
to
9fd8727
Compare
9fd8727
to
cf8473b
Compare
https://broadworkbench.atlassian.net/browse/DR-3038
Some parquet file directories use the flight id, and if it begins with an underscore synapse will ignore the path when performing a wildcard query (which occurs when using the preview data endpoint). This PR adds a prefix to the flight id parquet directories.