Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent BigQueryException #279

Open
jjaimon opened this issue Feb 27, 2023 · 2 comments
Open

Frequent BigQueryException #279

jjaimon opened this issue Feb 27, 2023 · 2 comments

Comments

@jjaimon
Copy link

jjaimon commented Feb 27, 2023

We are running four self-hosted instances of this connector to publish data to a single BigQuery table. Four instances are completely isolated with its own kafkastream and kafka clusters and there is no data overlap.

Connector version: 2.4.4
Kafka version: 3.3.1

Connector stops frequently with the error

 {
   "code" : 400,
   "errors" : [ {
     "domain" : "global",
     "location" : "q",
     "locationType" : "parameter",
     "message" : "Could not serialize access to table xxxx.yyyyy due to concurrent update",
     "reason" : "invalidQuery"
   } ],
   "message" : "Could not serialize access to table xxxxx.yyyyy due to concurrent update",
   "status" : "INVALID_ARGUMENT"
 }

We also configured intermediateTableSuffix for each instance, thinking that the temporary tables were getting overwritten by each instance.

The BigQuery documentation on this was not very useful since we only want to change configuration settings in the connector, and not make any code-level changes for maintenance reasons.

I appreciate your suggestions on how this can be taken care of. It could be a problem with configuring the connector rather than an issue with the connector itself.

@Matthieu68857
Copy link

We do have the same errors on our side:

  • We only have one task per connector and table
  • No other DML by anything outside the connector
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "location" : "q",
    "locationType" : "parameter",
    "message" : "Could not serialize access to table xxx due to concurrent update",
    "reason" : "invalidQuery"
  } ],
  "message" : "Could not serialize access to table xxx due to concurrent update",
  "status" : "INVALID_ARGUMENT"
}

@jjaimon
Copy link
Author

jjaimon commented Apr 12, 2023

I made following changes to reduce the errors.

tasks.max: "1",
intermediateTableSuffix: mysuffix,
bufferSize: 100000
maxWriteSize: 10000
tableWriteSize: 1000

The above changes are in the connection configuration. I think the first three changes helped us reduce the errors to nearly 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants