Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test]: Many BigQuery direct read Python PostCommit failing possibly due to fastavro regression #28811

Closed
2 of 16 tasks
Abacn opened this issue Oct 4, 2023 · 11 comments · Fixed by #28896
Closed
2 of 16 tasks

Comments

@Abacn
Copy link
Contributor

Abacn commented Oct 4, 2023

What happened?

Possibly due to 1.8.4 released today (Oct 3)

apache_beam/runners/common.py:1609: in handle_process_outputs
    for result in results:
apache_beam/io/gcp/bigquery.py:1312: in __next__
    return fastavro.schemaless_reader(self.bytes_reader, self.avro_schema)
fastavro/_read.pyx:1126: in fastavro._read.schemaless_reader
    ???
fastavro/_read.pyx:1153: in fastavro._read.schemaless_reader
    ???
fastavro/_read.pyx:743: in fastavro._read._read_data
    ???
fastavro/_read.pyx:616: in fastavro._read.read_record
    ???
fastavro/_read.pyx:735: in fastavro._read._read_data
    ???
fastavro/_read.pyx:526: in fastavro._read.read_union
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   EOFError

fastavro/_read.pyx:176: EOFError

Issue Failure

Failure: Test is continually failing

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@yupbank
Copy link

yupbank commented Oct 4, 2023

This one is hitting us in production, any work arounds ? e.g. any health version ? currently using apache_beam[gcp]==2.50.0

@Abacn
Copy link
Contributor Author

Abacn commented Oct 4, 2023

If you are using our released docker image it should not be hit because fastavro dependency is included. If you are building custom environment, explicitly install fastavro<1.8.4 (pip install "fastavro<1.8.4")

@tvalentyn
Copy link
Contributor

tvalentyn commented Oct 6, 2023

@yupbank see recommendations about building reproducible environments on https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ to prevent an issue like this affecting your production.

@riteshghorse
Copy link
Contributor

Should we limit pin fastavro to <1.8.4?

'fastavro>=0.23.6,<2',

@tvalentyn
Copy link
Contributor

we need a bug on fastavro side; to unblock tests, we could temporarily add !=1.8.4 to the condition.

@tvalentyn
Copy link
Contributor

they might be working on patch release to remediate.

@riteshghorse
Copy link
Contributor

Created an issue for fastavro fastavro/fastavro#720

@kennknowles
Copy link
Member

But the root danger is still here: if fastavro releases a bad version in our allowed range, all users break. Is our dependency pinning strategy wrong?

@AnandInguva
Copy link
Contributor

AnandInguva commented Oct 9, 2023

We tend to follow SemVer approach for pinning dependencies for most of the packages. https://semver.org/#why-use-semantic-versioning. Usually, we set the versions of a dependency to be less than the next major upgrade. In this case, its 2.0. It is up to the fastavro library not to make backward compatible changes until there is an major upgrade.
https://devhints.io/semver

Also, we don't want to pin a tighter upper bounds on the dependencies since users will miss out on using latest minor releases until the next beam release comes out

@kennknowles
Copy link
Member

We have to choose. Either:

  1. We pin dependencies and we know that transitive dependencies cannot break users.
  2. We do semver loose dependencies and know that transitive dependencies can break users.

In scenario 2 we can't really treat transitive dependency changes as a panic situation or release blocker, because the behavior is actually by design.

To be very clear: I like scenario 2 better and I think dependency pinning should wait until the final deployment of an application.

For test coverage: Even though we claim we are compatible with a semver range, we have not tested that whole range. In reality it just means the same thing as being pinned to the top of the range. We have no idea if Beam 2.51.0 actually works with lower versions of fastavro. It was never tested.

@Abacn
Copy link
Contributor Author

Abacn commented Oct 10, 2023

Currently we release the docker image per release which contains fixed version of dependency, which partly mitigate the potential issue of choice 2. In this case, the same test does not break on Dataflow, which builds docker image via :sdks:python:container:py3x:docker task. Other portable runners in DOCKER mode would not be impacted either (most production use cases). It fails test because the direct runner and flink runner (LOOPBACK mode) determines beam's dependency from setup.py during the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants