Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: BigQuery cross-language ReadFromQuery is outputting Beam rows with a different UUID than expected output. #21784

Open
youngoli opened this issue Jun 9, 2022 · 2 comments

Comments

@youngoli
Copy link
Contributor

youngoli commented Jun 9, 2022

What happened?

This was found in the Go SDK but the root cause seems to be in the expansion service. When performing an xlang BigQueryIO read from a query, the Beam rows it outputs end up being structurally identical to the registered output type in Go, but not actually equivalent, so it can't be converted to the named struct output despite being structurally identical.

Workaround in Go SDK

To workaround this issue in the short term, turn the named struct type that's being used as the output to a type alias of the unnamed type. This can easily be done by inserting an = sign.

Before: type OutputRow struct {...}
After: type OutputRow = struct {...}

Note that this doesn't play well with Beam Go's type registration, you'll need to avoid registering the type alias.

Log Snippets

Here's a snippet of the error on the Go side to see how it manifests:

panic: interface conversion: interface {} is struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }, not bigquery.TestRowPtrs
Full error:
while executing Process for Plan[s02-67]:
2: DataSink[S[ptransform-65@localhost:12371]] Coder:W;coder-80<LP;coder-81<R[bigquery.TestRow]>>!GWC
3: PCollection[pcollection-72] Out:[2]
4: ParDo[bigquery.castFn] Out:[2]
1: DataSource[S[ptransform-64@localhost:12371], 0] Coder:W;coder-76<LP;coder-77<R[struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }]>>!GWC Out:4
	caused by:
panic: interface conversion: interface {} is struct { Counter *int64 "beam:\"counter\""; Rand_data *struct { Flip *bool "beam:\"flip\""; Num *int64 "beam:\"num\""; Word *string "beam:\"word\"" } "beam:\"rand_data\"" }, not bigquery.TestRowPtrs goroutine 58 [running]:
runtime/debug.Stack()
	/usr/lib/google-golang/src/runtime/debug/stack.go:24 +0x65
github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/exec.callNoPanic.func1()
	{...}/repos/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:58 +0xa5
panic({0xe0d380, 0xc0003a04e0})
	/usr/lib/google-golang/src/runtime/panic.go:866 +0x212
github.com/apache/beam/sdks/v2/go/pkg/beam/register.(*caller1x1[...]).Call1x1(...)
	{...}/repos/beam/sdks/go/pkg/beam/register/register.go:3205

And a snippet of the raw schema protos that are received by the Go SDK, compared with the schema it's expecting.

Received schema:

Schema proto: fields: {
  name: "counter"
  type: {
    nullable: true
    atomic_type: INT64
  }
}
fields: {
  name: "rand_data"
  type: {
    nullable: true
    row_type: {
      schema: {
        fields: {
          name: "flip"
          type: {
            nullable: true
            atomic_type: BOOLEAN
          }
        }
        fields: {
          name: "num"
          type: {
            nullable: true
            atomic_type: INT64
          }
          id: 1
          encoding_position: 1
        }
        fields: {
          name: "word"
          type: {
            nullable: true
            atomic_type: STRING
          }
          id: 2
          encoding_position: 2
        }
        id: "141b0073-d725-456c-bcdc-46c9c84e7a6d"
      }
    }
  }
  id: 1
  encoding_position: 1
}
id: "d520c5bd-86f8-4a7b-8cbd-af6816f09f61"

Expected schema:

Schema proto: fields: {
  name: "counter"
  type: {
    nullable: true
    atomic_type: INT64
  }
}
fields: {
  name: "rand_data"
  type: {
    nullable: true
    row_type: {
      schema: {
        fields: {
          name: "flip"
          type: {
            nullable: true
            atomic_type: BOOLEAN
          }
        }
        fields: {
          name: "num"
          type: {
            nullable: true
            atomic_type: INT64
          }
        }
        fields: {
          name: "word"
          type: {
            nullable: true
            atomic_type: STRING
          }
        }
        id: "c39b4c69-1e23-4267-9fb2-776e1a61a34f"
      }
    }
  }
}
id: "952f2fc2-afb0-4646-aaec-88b9a0f307be"

To see the data above, simply add the following lines after graphx/coder.go:371:

sp := prototext.Format(&s)
log.Warnf(context.Background(), "Schema proto: %v", sp)
log.Warnf(context.Background(), "Schema type: %v", t)
return coder.NewR(typex.New(t)), nil

Issue Priority

Priority: 2

Issue Component

Component: cross-language

@capthiron
Copy link
Contributor

Hey @youngoli :)

Do you happen to have a working example with the workaround?
I am not quite sure how to make it work with the aliasing.

Best regards!

@lostluck
Copy link
Contributor

lostluck commented Nov 30, 2022

@capthiron Well, the example here exists, but it doesn't have aliasing.

The aliasing work around is demonstrated here, in the integration tests:

@damccorm damccorm removed the stale label Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants