Skip to content

Conversation

@h4rr21
Copy link

@h4rr21 h4rr21 commented Aug 16, 2018

Check the result should be a list Type


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

@pabloem pabloem self-requested a review August 16, 2018 23:43
@h4rr21 h4rr21 changed the title Raise a Type Value error when result is not a list Raise a Type Value error when result is not iterable Aug 17, 2018
@boyuanzz
Copy link
Contributor

Hey @h4rr21 , could you please take care of this python precommit failures?

@robertwb
Copy link
Contributor

Thanks for looking at this.

One thing we have to be very careful of here is that this code runs for every element in at every step for every pipeline, so it's quite performance critical. Running the benchmarks at https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/map_fn_microbenchmark.py (you might need #6293) I see

Before

Fixed cost   0.6794422043757005
Per-element  7.113301970741965e-07
R^2          0.9817527457194359

After

Fixed cost   0.6880206489042804
Per-element  7.551892020485618e-07
R^2          0.9807901397710658

which is about a 5% regression. This may be worse if iter(results) is expensive.

Perhaps change the string test to "type(results) is str" and omit the iter(...) test as it already gives a more informative TypeError: 'T' object is not iterable in that case.

@h4rr21
Copy link
Author

h4rr21 commented Aug 29, 2018

@boyuanzz I was trying to open the details, but this url :

https://builds.apache.org/job/beam_PreCommit_Python_Commit/888/

doesn't work for me

@h4rr21
Copy link
Author

h4rr21 commented Aug 29, 2018

Hello @robertwb thanks for pointing this microbenchmark.

I'm trying to validate "results" as any itererable python object, I also now there is still under discussion if "string" should be a valid output...

so we just be looking at __iter__ attribute instead of "iter(results)

# validate results is not iterable or string
    if not hasattr(results,"__iter__")
        raise TypeError("This is not an iterable")
   

this shouldn't impact the performance

@robertwb
Copy link
Contributor

At this level, even __hasattr__ can have a performance impact. But given a type error is raised for non-iterables on the next line, there's no reason to add an extra test here.

@stale
Copy link

stale bot commented Nov 24, 2018

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Nov 24, 2018
@stale
Copy link

stale bot commented Dec 1, 2018

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Dec 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants