[Bug]: 16177 has type numpy.int64, but expected one of: bytes #27469

EdwardCuiPeacock · 2023-07-12T14:25:43Z

What happened?

When running TFX's Evaluator component (which uses tensorflow_model_analysis to create beam jobs), the following error occurs in one of the workers, while all the other workers was able to complete successfully:

Traceback:

Error message from worker: Traceback (most recent call last):

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 287, in _execute

    response = task()

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 360, in 

    lambda: self.create_worker().do_instruction(request), request)

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 596, in do_instruction

    return getattr(self, request_type)(

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle

    monitoring_infos = bundle_processor.monitoring_infos()

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1139, in monitoring_infos

    op.monitoring_infos(transform_id, dict(tag_to_pcollection_id)))

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/operations.py", line 543, in monitoring_infos

    all_monitoring_infos.update(self.user_monitoring_infos(transform_id))

  File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/operations.py", line 584, in user_monitoring_infos

    return self.metrics_container.to_runner_api_monitoring_infos(transform_id)

  File "/usr/local/lib/python3.9/site-packages/apache_beam/metrics/execution.py", line 309, in to_runner_api_monitoring_infos

    all_metrics = [

  File "/usr/local/lib/python3.9/site-packages/apache_beam/metrics/execution.py", line 310, in 

    cell.to_runner_api_monitoring_info(key.metric_name, transform_id)

  File "/usr/local/lib/python3.9/site-packages/apache_beam/metrics/cells.py", line 76, in to_runner_api_monitoring_info

    mi = self.to_runner_api_monitoring_info_impl(name, transform_id)

  File "/usr/local/lib/python3.9/site-packages/apache_beam/metrics/cells.py", line 150, in to_runner_api_monitoring_info_impl

    return monitoring_infos.int64_user_counter(

  File "/usr/local/lib/python3.9/site-packages/apache_beam/metrics/monitoring_infos.py", line 185, in int64_user_counter

    return create_monitoring_info(

  File "/usr/local/lib/python3.9/site-packages/apache_beam/metrics/monitoring_infos.py", line 302, in create_monitoring_info

    return metrics_pb2.MonitoringInfo(

TypeError: 7006 has type numpy.int64, but expected one of: bytes

This error also happens only occasionally, but frequent enough to break production pipeline with a recurring schedule. I would like to understand the root cause of this error to prevent issues in production.

Related issue in TFMA: tensorflow/model-analysis#171

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

The text was updated successfully, but these errors were encountered:

jrmccluskey · 2023-07-17T14:38:31Z

The root of the problem here is that numpy.int64 is not an instance of an int, so the numpy.int64 isn't getting encoded before being dropped into the MonitoringInfo proto. That logic check is here:

beam/sdks/python/apache_beam/metrics/monitoring_infos.py

Line 202 in 41e6628

if isinstance(metric, int):

It looks like end-to-end we're expecting only int values to be aggregated within the Beam metric, but we don't necessarily encounter a hard check for that until we are trying to encode metric into a proto. It looks like the inc() and dec() methods get a type hint through their default params but update() doesn't enforce a type hint yet. Would you know if that is the method being used to work on the metric here?

tvalentyn · 2023-07-17T19:44:11Z

Thanks for taking a look, @jrmccluskey .

It is possible that update() calls are happening withing Beam internals, but it sounds like somewhere in the application an incorrect type is passed as a counter value. It may be a bug in TFMA codebase. Perhaps adding logs to print offending counter name could help.

tvalentyn · 2023-07-18T14:17:41Z

let's continue on tensorflow/model-analysis#171

EdwardCuiPeacock added awaiting triage bug labels Jul 12, 2023

github-actions bot added python dataflow P2 labels Jul 12, 2023

EdwardCuiPeacock mentioned this issue Jul 12, 2023

only integer values should be passed to num_instances metric. tensorflow/model-analysis#171

Open

jrmccluskey mentioned this issue Jul 17, 2023

Add type hint to CounterCell #27522

Merged

3 tasks

tvalentyn removed the awaiting triage label Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: 16177 has type numpy.int64, but expected one of: bytes #27469

[Bug]: 16177 has type numpy.int64, but expected one of: bytes #27469

EdwardCuiPeacock commented Jul 12, 2023 •

edited

Loading

jrmccluskey commented Jul 17, 2023

tvalentyn commented Jul 17, 2023

tvalentyn commented Jul 18, 2023

[Bug]: 16177 has type numpy.int64, but expected one of: bytes #27469

[Bug]: 16177 has type numpy.int64, but expected one of: bytes #27469

Comments

EdwardCuiPeacock commented Jul 12, 2023 • edited Loading

What happened?

Issue Priority

Issue Components

jrmccluskey commented Jul 17, 2023

tvalentyn commented Jul 17, 2023

tvalentyn commented Jul 18, 2023

EdwardCuiPeacock commented Jul 12, 2023 •

edited

Loading