Skip to content

[PLUGIN-1948] Handle unsupported BigQuery table types gracefully in PartitionedBigQueryInputFormat#1610

Merged
adrikagupta merged 1 commit intodevelopfrom
table-definition-fix
Apr 27, 2026
Merged

[PLUGIN-1948] Handle unsupported BigQuery table types gracefully in PartitionedBigQueryInputFormat#1610
adrikagupta merged 1 commit intodevelopfrom
table-definition-fix

Conversation

@adrikagupta
Copy link
Copy Markdown
Contributor

@adrikagupta adrikagupta commented Apr 24, 2026

Problem:
PR #1607 fixed the initial ClassCastException that occurred when incorrectly assuming all BigQuery tables are standard tables. However, the code still blindly casts the TableDefinition to a StandardTableDefinition inside processQuery before passing it to generateQuery.

Solution:
This follow-up PR refactors PartitionedBigQueryInputFormat.java to restore backward compatibility and improve type safety:

  1. Removes the unsafe cast in processQuery and passes the generic TableDefinition to generateQuery.
  2. Moves the StandardTableDefinition cast inside generateQuery, after checking if any query modifiers (filters, limits, order by, partitions) are present.
  3. If no modifiers are present, it returns null, allowing other types likeSNAPSHOT and MODEL tables to be read directly without crashing.
  4. If a user does attempt to apply filters/limits/ordering to an unsupported table type, it safely catches the type using instanceof and throws a descriptive error instead of a ClassCastException.

Testing

  • Created a pipeline with BQ source, added a Snapshot in table field. Did not add any filtering fields. Ran preview successfully.
  • Created a pipeline with BQ source, added a Snapshot in table field. Added "1=1" under filter field. Preview threw expected error:
Error occurred in the phase: 'Splitting'. java.lang.IllegalArgumentException: Unsupported BigQuery table type for filtering/partitioning: SNAPSHOT. Cannot apply filters, limits, or ordering.
  • Created a pipeline with BQ source and added ingestion partitioning table with required partitioning filter as true. Successfully ran the preview with & without filters successfully.
  • Created a pipeline with BQ source and added materializing view as the table. Successfully ran the preview with & without filters successfully.
  • Created a pipeline with BQ source and added range-partitioning table with & without required partitioning filter as true. Successfully ran the preview with & without filters successfully.

@adrikagupta adrikagupta added the build Trigger unit test build label Apr 24, 2026
Copy link
Copy Markdown
Contributor

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change : #1583

Did the BQ source work for other types like SNAPSHOT?

@adrikagupta
Copy link
Copy Markdown
Contributor Author

adrikagupta commented Apr 24, 2026

Before this change : #1583

Did the BQ source work for other types like SNAPSHOT?

No, BQ source did not work for types like SNAPSHOT because inside generateQuery method, it would explicitly cast tableDefinition to StandardTableDefinition like below ( while for snapshot tableDefinition should be casted to SnapshotTableDefinition)

String generateQuery(String partitionFromDate, String partitionToDate, String filter, String project,
                       String datasetProject, String dataset, String table, @Nullable String serviceAccount,
                       @Nullable Boolean isServiceAccountFilePath) {
   if (partitionFromDate == null && partitionToDate == null && filter == null) {
      return null;
    }
    String queryTemplate = "select * from `%s` where %s";
    com.google.cloud.bigquery.Table sourceTable =
      BigQueryUtil.getBigQueryTable(datasetProject, dataset, table, serviceAccount, isServiceAccountFilePath, null,
        null);
    StandardTableDefinition tableDefinition = Objects.requireNonNull(sourceTable).getDefinition();
....
}

Code ref - https://github.com/data-integrations/google-cloud/pull/1583/changes#diff-dd07d59189d71f7a7e30a643be1e267f2b6165f32beb9a0afd105ecf229898ccL179

@itsankit-google
Copy link
Copy Markdown
Contributor

Before this change : #1583
Did the BQ source work for other types like SNAPSHOT?

No, BQ source did not work for types like SNAPSHOT because inside generateQuery method, it would explicitly cast tableDefinition to StandardTableDefinition like below ( while for snapshot tableDefinition should be casted to SnapshotTableDefinition)

String generateQuery(String partitionFromDate, String partitionToDate, String filter, String project,
                       String datasetProject, String dataset, String table, @Nullable String serviceAccount,
                       @Nullable Boolean isServiceAccountFilePath) {
   if (partitionFromDate == null && partitionToDate == null && filter == null) {
      return null;
    }
    String queryTemplate = "select * from `%s` where %s";
    com.google.cloud.bigquery.Table sourceTable =
      BigQueryUtil.getBigQueryTable(datasetProject, dataset, table, serviceAccount, isServiceAccountFilePath, null,
        null);
    StandardTableDefinition tableDefinition = Objects.requireNonNull(sourceTable).getDefinition();
....
}

Code ref - https://github.com/data-integrations/google-cloud/pull/1583/changes#diff-dd07d59189d71f7a7e30a643be1e267f2b6165f32beb9a0afd105ecf229898ccL179

Can you please run a pipeline and attach evidence of failure with an older version of plugin before that change?

@adrikagupta adrikagupta force-pushed the table-definition-fix branch 2 times, most recently from e7690e2 to e5cce38 Compare April 27, 2026 09:35
@adrikagupta adrikagupta changed the title [PLUGIN-1948] Add explicit type checking for StandardTableDefinition in BigQuery source [PLUGIN-1948] Handle unsupported BigQuery table types gracefully in PartitionedBigQueryInputFormat Apr 27, 2026
@adrikagupta adrikagupta force-pushed the table-definition-fix branch from e5cce38 to 6073d65 Compare April 27, 2026 09:45
@adrikagupta
Copy link
Copy Markdown
Contributor Author

adrikagupta commented Apr 27, 2026

Before this change : #1583
Did the BQ source work for other types like SNAPSHOT?

No, BQ source did not work for types like SNAPSHOT because inside generateQuery method, it would explicitly cast tableDefinition to StandardTableDefinition like below ( while for snapshot tableDefinition should be casted to SnapshotTableDefinition)

String generateQuery(String partitionFromDate, String partitionToDate, String filter, String project,
                       String datasetProject, String dataset, String table, @Nullable String serviceAccount,
                       @Nullable Boolean isServiceAccountFilePath) {
   if (partitionFromDate == null && partitionToDate == null && filter == null) {
      return null;
    }
    String queryTemplate = "select * from `%s` where %s";
    com.google.cloud.bigquery.Table sourceTable =
      BigQueryUtil.getBigQueryTable(datasetProject, dataset, table, serviceAccount, isServiceAccountFilePath, null,
        null);
    StandardTableDefinition tableDefinition = Objects.requireNonNull(sourceTable).getDefinition();
....
}

Code ref - https://github.com/data-integrations/google-cloud/pull/1583/changes#diff-dd07d59189d71f7a7e30a643be1e267f2b6165f32beb9a0afd105ecf229898ccL179

Can you please run a pipeline and attach evidence of failure with an older version of plugin before that change?

Thanks for pointing this out!

Before #1583, the BQ source did work for types like SNAPSHOT as long as no filters or partitions were applied. This was because the early return (if (partitionFromDate == null && partitionToDate == null && filter == null) { return null; }) happened before the StandardTableDefinition cast. When the logic was refactored, that cast was moved, breaking the ability to read raw snapshots and causing the ClassCastException.

I have moved the instanceof StandardTableDefinition check inside generateQuery after we check for filter, limit, orderBy, etc.

With this updated approach:

  1. Backward compatibility is restored: If no filters/limits are provided, it returns null early, allowing SNAPSHOT tables to be read successfully just like before.
  2. Type safety is added: If a user does try to apply a filter/limit to a SNAPSHOT, it now catches it safely and throws a clear UnsupportedOperationException instead of crashing with a ClassCastException.

Process:

  1. Created a pipeline with 0.24.1 BigQuery source plugin (code from [PLUGIN-1931] Add defaults for Time and Range Partitioning in BQ source plugin #1583 onwards are present in 0.24.2+ version). Added snapshot in the table field. Did not add any filters. Successfully ran the preview.
Screenshot 2026-04-27 at 3 30 16 PM Screenshot 2026-04-27 at 3 30 35 PM
  1. Created a pipeline with 0.24.1 BigQuery source plugin. Added snapshot in the table field. Added filter "1 = 1". Preview failed with this error:
Stage 'BigQuery' encountered : java.lang.ClassCastException: class com.google.cloud.bigquery.AutoValue_SnapshotTableDefinition cannot be cast to class com.google.cloud.bigquery.StandardTableDefinition (com.google.cloud.bigquery.AutoValue_SnapshotTableDefinition and com.google.cloud.bigquery.StandardTableDefinition are in unnamed module of loader io.cdap.cdap.internal.app.runtime.plugin.PluginClassLoader @38dfedff)```


@itsankit-google
Copy link
Copy Markdown
Contributor

Please do remember to cherry-pick this change and release 0.24.x in hub so that backward compatibility is restored.

@adrikagupta adrikagupta merged commit 06d4dd3 into develop Apr 27, 2026
17 checks passed
@adrikagupta adrikagupta deleted the table-definition-fix branch April 27, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Trigger unit test build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants