-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Make temp_location an attribute of the StandardOptions class #8801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make temp_location an attribute of the StandardOptions class #8801
Conversation
3d407ac to
63cf8ae
Compare
|
@aaltay @udim I updated the PR to include both I would think that most people would continue to use |
|
Another issues with this change is that, it is backward incompatible to move an option from its current class. This will break user pipelines that are setting these options in their code. Unless there is a workaround we may want to delay this until beam 3.0. @ostrokach are you blocked on this change? |
|
Is it possible to have this option settable/gettable from both places? In
the end, they're just wrappers around a common dictionary.
…On Fri, Aug 2, 2019 at 2:36 AM Ahmet Altay ***@***.***> wrote:
Another issues with this change is that, it is backward incompatible to
move an option from its current class. This will break user pipelines that
are setting these options in their code.
Unless there is a workaround we may want to delay this until beam 3.0.
@ostrokach <https://github.com/ostrokach> are you blocked on this change?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8801?email_source=notifications&email_token=AADWVAKGFIYZQOY7T3DVCFLQCN6QPA5CNFSM4HV6QYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MH47A#issuecomment-517504636>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADWVAMBL5VU7BA52QV6DGTQCN6QPANCNFSM4HV6QYRA>
.
|
|
@aaltay I am not blocked on this change; I just figured that it is strange to use @robertwb The |
|
We could also just add a property to GoogleCloudOptions that gets/sets the
value in the underlying dict (possibly with a deprecation warning) for
backwards compatibility. It would only be parsed by StandardOptions.
…On Fri, Aug 2, 2019 at 6:23 PM Alexey Strokach ***@***.***> wrote:
@aaltay <https://github.com/aaltay> I am not blocked on this change; I
just figured that it is strange to use one GoogleCloudOptions for things
that might not be GCloud-related, like cache location. But there are things
around it, like having a separate --cache_location argument for example.
@robertwb <https://github.com/robertwb> The PipelineOptions class collects
the arguments defined by every subclass
<https://github.com/apache/beam/blob/7fe54a0e178acf6957a797ac17edc4ec74e4bd42/sdks/python/apache_beam/options/pipeline_options.py#L179>
into a single _BeamArgumentParser, which is basically just an
argparse.ArgumentParser. We could over-write the add_argument
<https://github.com/python/cpython/blob/3.7/Lib/argparse.py#L1322> method
of _BeamArgumentParser in order to skip duplicate arguments, but there
does not seem to be a clean API for doing this without relying on "private"
attributes, and then there are edge cases where maybe the type or the
destination of the new argument is different from that of the old, and
those cases may end up being hard to debug.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8801?email_source=notifications&email_token=AADWVALBZQYCBGUKJ44GNUDQCRNRRA5CNFSM4HV6QYRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3OG5XI#issuecomment-517762781>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADWVAPC4HKZLEUOQQQV4EDQCRNRRANCNFSM4HV6QYRA>
.
|
|
@robertwb suggestion sounds good. We can change it so that it will be StandardOptions.temp_location Also, how does this options, their defaults, and which ones are being required works in Java SDK? |
f5a4a9d to
a318de2
Compare
a318de2 to
7eea599
Compare
|
@aaltay @robertwb I updated the PR so that:
|
robertwb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not getting back to this. The change looks good, just two suggestions.
| since='2.16.0', | ||
| custom_message=( | ||
| 'GoogleCloudOptions.temp_location is deprecated since %since%. ' | ||
| 'Use GoogleCloudOptions.gcp_temp_location instead.')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly also suggest StandardOptions.temp_location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the message to read:
'GoogleCloudOptions.temp_location is deprecated since %since%. '
'Please use StandardOptions.temp_location or '
'GoogleCloudOptions.gcp_temp_location, as appropriate.'))
Does that sound better?
| choices=['COST_OPTIMIZED', 'SPEED_OPTIMIZED'], | ||
| help='Set the Flexible Resource Scheduling mode') | ||
|
|
||
| def __getattr__(self, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: consider using @property here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first thought as well, but I couldn't figure out how to get it to work.
The problem is that attribute access of the GoogleCloudOptions class ends up being performed using PipelineOptions.__getattr__ and PipelineOptions.__setattr__, and those only check if the argument is set in the (child) class's _add_argparse_args(cls, parser) method, not if the (child) class actually has that attribute.
Since we can't have both StandardOptions._add_argparse_args and GoogleCloudOptions._add_argparse_args define the temp_location argument, I ended up patching the GoogleCloudOptions.__getattr__ and GoogleCloudOptions.__setattr__ methods to add an additional check for the temp_location attribute (and, if not, default to the parent's __getattr__ / __setattr__).
92e71ee to
548d546
Compare
|
@ostrokach Can you rebase this on the latest master? |
`temp_location` can potentially be a relevant argument for all runners. For example, if the DirectRunner were to perform out-of-core shuffles or combines, it would need a place to store temporary files. Spark also distinguishes between `SPARK_LOCAL_DIR` and `SPARK_WORKER_DIR` environment variables.
548d546 to
e9ee05b
Compare
|
retest this please |
1 similar comment
|
retest this please |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Currently, the
--temp_locationargument is parsed by theGoogleCloudOptionsclass. However,--temp_locationor a similar argument may be useful for other runners as well.For example, in the case of
DirectRunner,temp_locationmay be the place where the runner stores its temporary files, if it ever supports out of process shuffles, combines, etc. In the case ofInteractiveRunner,temp_locationmay be a good default location for the runner to store its cache files. In the case of theWriteToFilestransform, settingtemp_locationtop.options.view_as(GoogleCloudOptions).temp_locationmakes it appear as iftemp_locationhas to be on GCS, which from my understanding, is not the case unless we are using theDataflowRunner?CC: @ivant @ananvay
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.