Skip to content

[Bug]: Failure when reading non UTF-8 requirements.txt file #25498

@bvolpato

Description

@bvolpato

What happened?

This error caused some customer friction and took quite a bit to figure out what was going wrong:

INFO 2023-02-06T20:16:45.090490Z File "/usr/local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 597, in __exit__
INFO 2023-02-06T20:16:45.090811Z self.result = self.run()
INFO 2023-02-06T20:16:45.090882Z File "/usr/local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 547, in run
INFO 2023-02-06T20:16:45.091172Z return Pipeline.from_runner_api(
INFO 2023-02-06T20:16:45.091238Z File "/usr/local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 574, in run
INFO 2023-02-06T20:16:45.091570Z return self.runner.run_pipeline(self, self._options)
INFO 2023-02-06T20:16:45.091641Z File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 482, in run_pipeline
INFO 2023-02-06T20:16:45.091903Z artifacts = environments.python_sdk_dependencies(options)
INFO 2023-02-06T20:16:45.091971Z File "/usr/local/lib/python3.9/site-packages/apache_beam/transforms/environments.py", line 799, in python_sdk_dependencies
INFO 2023-02-06T20:16:45.092300Z return stager.Stager.create_job_resources(
INFO 2023-02-06T20:16:45.092381Z File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/portability/stager.py", line 240, in create_job_resources
INFO 2023-02-06T20:16:45.092601Z (
INFO 2023-02-06T20:16:45.092662Z File "/usr/local/lib/python3.9/site-packages/apache_beam/utils/retry.py", line 275, in wrapper
INFO 2023-02-06T20:16:45.092879Z return fun(*args, **kwargs)
INFO 2023-02-06T20:16:45.092948Z File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/portability/stager.py", line 721, in _populate_requirements_cache
INFO 2023-02-06T20:16:45.093278Z tmp_requirements_filepath = Stager._remove_dependency_from_requirements(
INFO 2023-02-06T20:16:45.093336Z File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/portability/stager.py", line 674, in _remove_dependency_from_requirements
INFO 2023-02-06T20:16:45.093680Z lines = f.readlines()
INFO 2023-02-06T20:16:45.093742Z File "/usr/local/lib/python3.9/codecs.py", line 322, in decode
INFO 2023-02-06T20:16:45.095038Z (result, consumed) = self._buffer_decode(data, self.errors, final)
INFO 2023-02-06T20:16:45.095143Z UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
INFO 2023-02-06T20:16:45.643411Z python failed with exit status 1

Sample file encoding is utf-16le:

image

Please consider making the requirements.txt parsing/reading more lenient. Maybe using ignore as documented at https://docs.python.org/3/howto/unicode.html?

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions