-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Destination BigQuery: Added more logs and updated the JobId made filled with locations and projectID #5049
🐛 Destination BigQuery: Added more logs and updated the JobId made filled with locations and projectID #5049
Conversation
951fcc5
to
1e05a67
Compare
/test connector=connectors/destination-bigquery
|
d72dfc0
to
1ccfaba
Compare
writer.getWriter().write(ByteBuffer.wrap((Jsons.serialize(formatRecord(writer.getSchema(), recordMessage)) + "\n").getBytes(Charsets.UTF_8))); | ||
} catch (IOException e) { | ||
} catch (IOException | RetryHelperException e) { | ||
LOGGER.error("Got an error while writing message:" + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's log as much as we possibly can here if the status is 404:
- region
- number of input streams
- all job information
- any other info that would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sherifnada.
Many thanks for your comment.
I've updated the PR with adding more logs. But the point here is that at the moment when we get an exception - we get it all wrapped as a String. So the only way to check if 404 is "string.contains()" method that looks a little bit ugly for this case as for me. But here is the thing, if we fail at some moment for other reason we anyway would like to know all the details and the whole "sync attempt" would be marked as failed. So I added full logs without additional verification for reason. Please let me know if you don't agree.
By the way, this PR is in "draft" status for now and waiting for GL internal code review.
P.S. Here is an example what we would get from Job object in logs:
Job{job=JobId{project=dataline-integration-testing, job=72c84b79-4549-442c-aa74-3ad323761f21, location=EU}, status=JobStatus{state=RUNNING, error=null, executionErrors=null}, statistics=LoadStatistics{creationTime=1627492909725, endTime=null, startTime=1627492910001, numChildJobs=null, parentJobId=null, scriptStatistics=null, inputBytes=null, inputFiles=null, outputBytes=null, outputRows=null, badRecords=null}, userEmail=test-bigquery-user@dataline-integration-testing.iam.gserviceaccount.com, etag=IhA80VDWAmmB2tZ5LSocCQ==, generatedId=dataline-integration-testing:EU.72c84b79-4549-442c-aa74-3ad323761f21, selfLink=https://www.googleapis.com/bigquery/v2/projects/dataline-integration-testing/jobs/72c84b79-4549-442c-aa74-3ad323761f21?location=EU, configuration=LoadJobConfiguration{type=LOAD, destinationTable=GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=airbyte_tests_kxudixzk, projectId=dataline-integration-testing, tableId=_airbyte_tmp_mqh_users}}, destinationEncryptionConfiguration=null, createDisposition=CREATE_IF_NEEDED, writeDisposition=null, formatOptions=FormatOptions{format=NEWLINE_DELIMITED_JSON}, nullMarker=null, maxBadRecords=null, schema=Schema{fields=[Field{name=_airbyte_ab_id, type=STRING, mode=null, description=null, policyTags=null}, Field{name=_airbyte_data, type=STRING, mode=null, description=null, policyTags=null}, Field{name=_airbyte_emitted_at, type=TIMESTAMP, mode=null, description=null, policyTags=null}]}, ignoreUnknownValue=null, sourceUris=[gs://bigquery-prod-upload-eu/prod-scotty-bbe62546-332e-4956-94c1-8df38458b14e], schemaUpdateOptions=null, autodetect=null, timePartitioning=null, clustering=null, useAvroLogicalTypes=null, labels=null, jobTimeoutMs=null, rangePartitioning=null, hivePartitioningOptions=null}}
…uery job is created
1ccfaba
to
666fb83
Compare
/test connector=connectors/destination-bigquery
|
@@ -114,6 +114,7 @@ Therefore, Airbyte BigQuery destination will convert any invalid characters into | |||
| :--- | :--- | :--- | :--- | | |||
| 0.3.6 | 2021-06-18 | [#3947](https://github.com/airbytehq/airbyte/issues/3947) | Service account credentials are now optional. | | |||
| 0.3.4 | 2021-06-07 | [#3277](https://github.com/airbytehq/airbyte/issues/3277) | Add dataset location option | | |||
| 0.3.9 | 2021-07-28 | [#3549](https://github.com/airbytehq/airbyte/issues/3549) | Add extended logs and made JobId filled with region and projectId | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in reverse chrono order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... how didn't I notice it. It was a hard day. Updated, sorry for that
…ore-logs-and-proper-job-creation # Conflicts: # docs/integrations/destinations/bigquery.md
…ore-logs-and-proper-job-creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shipit
…ore-logs-and-proper-job-creation
/test connector=connectors/destination-bigquery
|
/publish connector=connectors/destination-bigquery
|
What
It's not clear the exact fail reason for #3549 and jobs IDs were filled without location and project's ID. So presumably at some moment, we may try to update it by the wrong location and get 404.
How
Added more logs and made the JobId filled with location and projectID.
Performed a smoke test: migrated some data from Postres to BigQuery sandbox
Pre-merge Checklist
Expand the checklist which is relevant for this PR.
Connector checklist
airbyte_secret
in the connector's spec./gradlew :airbyte-integrations:connectors:<name>:integrationTest
./test connector=connectors/<name>
command as documented here is passing.README.md
docs/SUMMARY.md
if it's a new connectordocs/integrations/<source or destination>/<name>
.docs/integrations/...
. See changelog exampledocs/integrations/README.md
contains a reference to the new connector/publish
command described hereConnector Generator checklist
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes