Skip to content

404 Session not found, when querying Google Cloud Spanner with Python Dataflow. #21009

@damccorm

Description

@damccorm

My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial run is successful, but any subsequent run fails with this error. "h1.google.api_core.exceptions.NotFound: 404 Session not found"
and also "504 Deadline Exceeded"

Here is part of the code:



SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'

spanner_domains = (
      p

     | 'ReadFromSpanner' >> ReadFromSpanner(
          project_id, database, database, sql=SPANNER_QUERY)

     | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))

def _KeyDomainSpanner(entity):
  row
= {}
  for i, column in enumerate(['row_id', 'update_key']):
    row[column] = entity[i]
  return
row['row_id'], row


The Dataflow job is able to read around 10M rows with 2.29.0 but only a few thousand with 2.33.0

Imported from Jira BEAM-12773. Original Jira may contain additional context.
Reported by: regeter.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions