-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[MongoDBio] custom writeFn for mongodb io python sdk #22400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Assigning reviewers. If you would like to opt out of this review, comment R: @AnandInguva for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
retest this please |
|
Please correct the lint, formatter and other checks as well |
Codecov Report
@@ Coverage Diff @@
## master #22400 +/- ##
==========================================
+ Coverage 65.20% 65.54% +0.33%
==========================================
Files 735 717 -18
Lines 98146 95041 -3105
==========================================
- Hits 64000 62297 -1703
+ Misses 32782 31476 -1306
+ Partials 1364 1268 -96
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 101 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Thanks for the help with this @AnandInguva |
Hey @AnandInguva I can't find the requirement for the docs, format, and lint(configurations for formatter/linter). The build console log doesn't have enough output for debugging. Could you point me to the requirement for that? Or any suggestion on the changes I should make to pass the docs/formatter/lint build? |
|
Hey @NagisaVon, you can just run these two commands in your sdks/python directory: (you would need to install tox if you don't have it: |
Hi @ahmedabu98, I've followed the guide to setup local develop env and was trying to run lint, but encountered the following error: python git:(mongodb-custom-write) ✗ sudo ../../gradlew lint
Password:
Configuration on demand is an incubating feature.
> Task :sdks:python:test-suites:tox:py37:lintPy37
GLOB sdist-make: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/setup.py
py37-lint recreate: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/py37-lint
py37-lint installdeps: -rbuild-requirements.txt, astroid<2.9,>=2.8.0, pycodestyle==2.8.0, pylint==2.11.1, isort==4.2.15, flake8==4.0.1
py37-lint inst: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/.tmp/package/1/apache-beam-2.41.0.dev0.zip
ERROR: invocation failed (exit code 1), logfile: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/py37-lint/log/py37-lint-2.log
================================== log start ===================================
WARNING: The directory '/Users/von/Library/Caches/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Processing ./target/.tox-py37-lint/.tmp/package/1/apache-beam-2.41.0.dev0.zip
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
/private/tmp/pip-req-build-7oirj20h/setup.py:95: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if StrictVersion(_PIP_VERSION) < StrictVersion(REQUIRED_PIP_VERSION):
Traceback (most recent call last):
File "<string>", line 36, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/private/tmp/pip-req-build-7oirj20h/setup.py", line 164, in <module>
generate_protos_first()
File "/private/tmp/pip-req-build-7oirj20h/setup.py", line 134, in generate_protos_first
gen_protos.generate_proto_files()
File "/private/tmp/pip-req-build-7oirj20h/gen_protos.py", line 476, in generate_proto_files
raise RuntimeError(error_msg)
RuntimeError: Not in apache git tree, unable to find proto definitions.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
=================================== log end ====================================
___________________________________ summary ____________________________________
ERROR: py37-lint: InvocationError for command /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/py37-lint/bin/python target/.tox-py37-lint/py37-lint/bin/pip install --retries 10 --exists-action w '/Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/.tmp/package/1/apache-beam-2.41.0.dev0.zip[test,dataframe]' (exited with code 1)
> Task :sdks:python:test-suites:tox:py37:lintPy37 FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py37:lintPy37'.
> Process 'command 'sh'' finished with non-zero exit value 1Can you help me with this? Thanks! |
|
@AnandInguva have you seen that error before? |
|
Reminder, please take a look at this pr: @AnandInguva @ahmedabu98 |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
|
|
retest this please |
triggered retesting to make sure this is not transient |
|
All test passed and docstring updated, ready for review @AnandInguva @tvalentyn |
Abacn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. The change is fine to me. I am wondering if we could also provide a WriteFunc for update that is ready-to-use as this use case is implied in the documentation change.
Generally we're trying to make the work consistent among SDKs. In Java SDK MongoDBIO insert/replace and update are distinguished by different configurations, as documented here: https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/mongodb/MongoDbIO.html
While Python SDK now only supports insert/replace (and does not support update). Providing update mode could also help to converge this gap.
| self.writeFn = self._defaultWriteFn | ||
|
|
||
| @staticmethod | ||
| def _defaultWriteFn(client, db, coll, documents, logger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can name this as _replaceOneWriteFunc, and also provide a _updateOneWriteFunc
| coll=None, | ||
| batch_size=100, | ||
| extra_client_params=None, | ||
| writeFn=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally in Beam xxxFn implies a DoFn. This is just a callable used in write. Consider renaming it to 'write_func' or 'write_callback' to avoid ambiguity and in_line_with_python_nomenclature
| coll=None, | ||
| batch_size=100, | ||
| extra_params=None, | ||
| writeFn=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define the typehint of write_func argument as Optional[Union[str, Callable]]. If it is None or "ReplaceOne" then set write_func= _ReplaceOneWriteFunc; if it is "UpdateOne" then set write_func= _UpdateOneWriteFunc, and document this behavior.
Thanks, those are great suggestions. I will take a look at the Java SDK and work on providing a default replace write function. |
|
waiting on author |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
@NagisaVon Any update on this? |
|
can someone point me to the test for this file so that I could add some new test cases? |
|
getting this error when building docs which point to this line of docstring here: what would be a correct type hint in the docstring ? |
|
@NagisaVon theres mongodbio_test.py for unit tests and mongodbio_it_test.py for integration tests, both are in the same module as this file |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Added a optional argument to init for the class WriteToMongoDB,
user could write a custom function to modify the behavior of WriteToMongoDB, such as using UpdateOne instead of ReplaceOne, a default function is provided so that the WriteToMongoDB class works exactly the same as before when the optional function is not provided.
here's the type signature for the custom writeFn
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.