Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-5501] Interactive Beam -- display issue: number of PTransform executed wrongly displayed #6418

Merged
merged 3 commits into from
Sep 25, 2018

Conversation

qinyeli
Copy link
Contributor

@qinyeli qinyeli commented Sep 17, 2018

The main focus of this PR is to fix the display issue -- the number of PTransform executed is wrongly displayed.
screen shot 2018-09-17 at 2 13 24 pm

"Executing 8 of 3 transforms" now becomes "Executing 3 of 3 transforms".


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

Added read_cache_ids() and write_cache_ids() in pipeline ananlyzer.
This change unblocks the display to show PTransform actually executed.
This commit shortens some of the variable/function names in
PipelineAnalyzer.

* pcollection_id => pcoll_id
* transform_id => trans_id
* top_level => tl

The reason for this is because those long variable and function names
have been causing quite a pain in fixing linter errors. e.g.

    super_ultra_long_variable_name = some_class_instance.a_very_long_function_name()

becuase Python does not allow wrapping the line before or after '='
@qinyeli
Copy link
Contributor Author

qinyeli commented Sep 18, 2018

R: @pabloem

Hi, Pablo. Can you review this for me?

@pabloem
Copy link
Member

pabloem commented Sep 19, 2018

Added two small comments. LGTM.

pipeline_proto: (Pipeline proto)
caches_used: (set of str) A set of PCollection IDs of those whose cached
results are used in the execution.
pipeline_analyzer: (PipelineAnalyzer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what this is in a little bit more detail.


# _pcollection_stats maps pcoll_id to
# { 'cache_label': cache_label, version': version, 'sample': pcoll_in_list }
self._pcollection_stats = {}
for pcoll_id in pipeline_info.all_pcollections():
for pcoll_id in self._analyzer.tl_referenced_pcoll_ids():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we go from checking all PCs to checking only top-level-referenced ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PCollections are filtered around line 140 by referenced_pcollection, which contains only top_level ones.
The non-top level ones were added to _pcollection_stats but not used anywhere.

This commit is targetting the display issue:
the number of (PTransform actually executed) is wrongly displayed.
See: display_manager.py: Executin %s of %s transforms.

Other changes:
* Interface of display_manager. Instead of taking a bunch of parameters,
it now queries the variables from pipeline_analyzer.
* A couple of comment changes in pipeline_graph and
interactive_pipeline_graph
@pabloem
Copy link
Member

pabloem commented Sep 20, 2018

Thank you Sindy. It seems that the Beam community expects merged PRs to have a JIRA associated to them. Sorry about the trouble, but could you add one?

@qinyeli qinyeli changed the title Interactive Beam -- display issue: number of PTransform executed wrongly displayed [BEAM-5501] Interactive Beam -- display issue: number of PTransform executed wrongly displayed Sep 25, 2018
@qinyeli
Copy link
Contributor Author

qinyeli commented Sep 25, 2018

Created the corresponding JIRA issue: https://issues.apache.org/jira/browse/BEAM-5501

@pabloem pabloem merged commit 0816198 into apache:master Sep 25, 2018
@udim
Copy link
Member

udim commented Sep 25, 2018

This PR broke Python precommits (task :beam-sdks-python:lintPy27_3).
#6489 was merged just before this PR.

@tvalentyn
Copy link
Contributor

16:01:15 ************* Module apache_beam.runners.interactive.pipeline_analyzer
16:01:15 W:171,11: dict.keys referenced when not iterating (dict-keys-not-iterating)

"""
return self._top_level_required_transforms
return self._top_level_required_transforms.keys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return list(self._top_level_required_transforms) to avoid the lint complaint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also same change maybe needed in line 124 for Python 3 compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to do the change now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@qinyeli qinyeli deleted the display_bug branch September 27, 2018 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants