Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DR-3038] Fix Azure Synapse Querying #1478

Merged
merged 6 commits into from
Jul 11, 2023

Conversation

samanehsan
Copy link
Contributor

https://broadworkbench.atlassian.net/browse/DR-3038

Some parquet file directories use the flight id, and if it begins with an underscore synapse will ignore the path when performing a wildcard query (which occurs when using the preview data endpoint). This PR adds a prefix to the flight id parquet directories.

Otherwise, if the flightId begins with an
underscore, synapse will return 0 results when
performing a wildcard query
Copy link
Contributor

@okotsopoulos okotsopoulos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I noticed that integration tests failed, but the Gradle scan made me think that the failures were due to transient environmental issues: https://scans.gradle.com/s/7iwlgyo77bt5a

So I took the liberty of rerunning, hope that's okay.

@@ -407,11 +407,11 @@ public void performIngest(
// 3 - Retrieve info about database schema so that we can populate the parquet create query
String tableName = destinationTable.getName();
String destinationParquetFile =
FolderType.METADATA.getPath("parquet/" + tableName + "/" + ingestFlightId + ".parquet");
FolderType.METADATA.getPath(IngestUtils.getParquetFilePath(tableName, ingestFlightId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for refactoring out to call the existing utility method!

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK as far as I can tell

@@ -386,7 +387,7 @@ public static List<Column> getDatasetFileRefColumns(

// Note: this is the unqualified path (e.g. it gets used in metadata and scratch directories)
public static String getParquetFilePath(String targetTableName, String flightId) {
return "parquet/" + targetTableName + "/" + flightId + ".parquet";
return "parquet/" + targetTableName + "/" + FLIGHT_ID_PREFIX + flightId + ".parquet";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to create a constant for parquet/ too? It's used four times in this file.

Copy link
Member

@pshapiro4broad pshapiro4broad Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at this file I noticed synapseTestCleanup() calls logger like this:

      logger.warn("[Cleanup exception] Unable to drop tables.", ex.getMessage());

Which is incorrect; only logger.error() accepts an exception as a final parameter. Actually it does accept an exception, but this code is passing in a String. For this to work as expected, it needs to format the exception in the string:

      logger.warn("[Cleanup exception] Unable to drop tables. {}", ex.getMessage());

or it could pass the exception in:

      logger.warn("[Cleanup exception] Unable to drop tables.", ex);

(Although I doubt the value of these log statements as I don't think anything is looking at the log outputs to monitor for these messages, so they won't be acted upon.)

"parquet/scratch_" + destinationTable.getName() + "/" + randomFlightId + ".parquet");
"parquet/scratch_"
+ destinationTable.getName()
+ "/flight"
Copy link
Member

@pshapiro4broad pshapiro4broad Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be /_flight? And if so, could this use IngestUtils.getParquetFilePath() instead?

samanehsan and others added 2 commits July 10, 2023 23:10
…oSnapshotConnectedTest.java

Co-authored-by: Phil Shapiro <pshapiro@broadinstitute.org>
@samanehsan samanehsan force-pushed the se/DR-3038-fix-azure-synapse-querying branch from 01bc313 to 9fd8727 Compare July 11, 2023 03:36
@samanehsan samanehsan force-pushed the se/DR-3038-fix-azure-synapse-querying branch from 9fd8727 to cf8473b Compare July 11, 2023 03:41
@samanehsan samanehsan merged commit e2cb1fa into develop Jul 11, 2023
7 checks passed
@samanehsan samanehsan deleted the se/DR-3038-fix-azure-synapse-querying branch July 11, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants