Skip to content

Conversation

@NagisaVon
Copy link

@NagisaVon NagisaVon commented Jul 21, 2022

Added a optional argument to init for the class WriteToMongoDB,
user could write a custom function to modify the behavior of WriteToMongoDB, such as using UpdateOne instead of ReplaceOne, a default function is provided so that the WriteToMongoDB class works exactly the same as before when the optional function is not provided.

here's the type signature for the custom writeFn

writeFn(optional_func): writeFn(client:MongoClient obj, db:str, coll:str, documents:[dict], logger:logging.Logger obj)
        A custom function that user could implement to 
        gain more control over the write process. For example, 
        using UpdateOne or InsertOne instead of ReplaceOne bulk_write

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@NagisaVon NagisaVon changed the title [MONGODBIO] custom writeFn for mongodb io [MongoDBio] custom writeFn for mongodb io Jul 21, 2022
@NagisaVon NagisaVon changed the title [MongoDBio] custom writeFn for mongodb io [MongoDBio] custom writeFn for mongodb io python sdk Jul 21, 2022
@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @AnandInguva for label python.
R: @ahmedabu98 for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@AnandInguva
Copy link
Contributor

retest this please

@AnandInguva
Copy link
Contributor

Please correct the lint, formatter and other checks as well

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #22400 (e12f3a9) into master (6de4d5f) will increase coverage by 0.33%.
The diff coverage is 23.07%.

@@            Coverage Diff             @@
##           master   #22400      +/-   ##
==========================================
+ Coverage   65.20%   65.54%   +0.33%     
==========================================
  Files         735      717      -18     
  Lines       98146    95041    -3105     
==========================================
- Hits        64000    62297    -1703     
+ Misses      32782    31476    -1306     
+ Partials     1364     1268      -96     
Flag Coverage Δ
python 71.19% <23.07%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/mongodbio.py 25.87% <23.07%> (-0.20%) ⬇️

... and 101 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Ouwen
Copy link

Ouwen commented Jul 27, 2022

Thanks for the help with this @AnandInguva

@NagisaVon
Copy link
Author

Please correct the lint, formatter and other checks as well

Hey @AnandInguva I can't find the requirement for the docs, format, and lint(configurations for formatter/linter). The build console log doesn't have enough output for debugging. Could you point me to the requirement for that? Or any suggestion on the changes I should make to pass the docs/formatter/lint build?

@ahmedabu98
Copy link
Contributor

Hey @NagisaVon, you can just run these two commands in your sdks/python directory:

../../gradlew lint       # Runs several linter checks
tox -e py3-yapf    # Runs code formatting

(you would need to install tox if you don't have it: pip install tox)
More information here: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks

@NagisaVon
Copy link
Author

Hey @NagisaVon, you can just run these two commands in your sdks/python directory:

../../gradlew lint       # Runs several linter checks
tox -e py3-yapf    # Runs code formatting

(you would need to install tox if you don't have it: pip install tox) More information here: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks

Hi @ahmedabu98, I've followed the guide to setup local develop env and was trying to run lint, but encountered the following error:

python git:(mongodb-custom-write) ✗ sudo ../../gradlew lint                      
Password:
Configuration on demand is an incubating feature.

> Task :sdks:python:test-suites:tox:py37:lintPy37
GLOB sdist-make: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/setup.py
py37-lint recreate: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/py37-lint
py37-lint installdeps: -rbuild-requirements.txt, astroid<2.9,>=2.8.0, pycodestyle==2.8.0, pylint==2.11.1, isort==4.2.15, flake8==4.0.1
py37-lint inst: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/.tmp/package/1/apache-beam-2.41.0.dev0.zip
ERROR: invocation failed (exit code 1), logfile: /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/py37-lint/log/py37-lint-2.log
================================== log start ===================================
WARNING: The directory '/Users/von/Library/Caches/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Processing ./target/.tox-py37-lint/.tmp/package/1/apache-beam-2.41.0.dev0.zip
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      /private/tmp/pip-req-build-7oirj20h/setup.py:95: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
        if StrictVersion(_PIP_VERSION) < StrictVersion(REQUIRED_PIP_VERSION):
      Traceback (most recent call last):
        File "<string>", line 36, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/tmp/pip-req-build-7oirj20h/setup.py", line 164, in <module>
          generate_protos_first()
        File "/private/tmp/pip-req-build-7oirj20h/setup.py", line 134, in generate_protos_first
          gen_protos.generate_proto_files()
        File "/private/tmp/pip-req-build-7oirj20h/gen_protos.py", line 476, in generate_proto_files
          raise RuntimeError(error_msg)
      RuntimeError: Not in apache git tree, unable to find proto definitions.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

=================================== log end ====================================
___________________________________ summary ____________________________________
ERROR:   py37-lint: InvocationError for command /Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/py37-lint/bin/python target/.tox-py37-lint/py37-lint/bin/pip install --retries 10 --exists-action w '/Users/von/Documents/GradientHealth/Code/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-lint/.tmp/package/1/apache-beam-2.41.0.dev0.zip[test,dataframe]' (exited with code 1)

> Task :sdks:python:test-suites:tox:py37:lintPy37 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py37:lintPy37'.
> Process 'command 'sh'' finished with non-zero exit value 1

Can you help me with this? Thanks!

@ahmedabu98
Copy link
Contributor

@AnandInguva have you seen that error before?

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @AnandInguva @ahmedabu98

@github-actions
Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @tvalentyn for label python.
R: @pabloem for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@tvalentyn
Copy link
Contributor

retest this please

@tvalentyn
Copy link
Contributor

@AnandInguva have you seen that error before?

triggered retesting to make sure this is not transient

@NagisaVon NagisaVon requested review from AnandInguva and removed request for johnjcasey September 29, 2022 19:47
@NagisaVon
Copy link
Author

All test passed and docstring updated, ready for review @AnandInguva @tvalentyn

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. The change is fine to me. I am wondering if we could also provide a WriteFunc for update that is ready-to-use as this use case is implied in the documentation change.

Generally we're trying to make the work consistent among SDKs. In Java SDK MongoDBIO insert/replace and update are distinguished by different configurations, as documented here: https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/mongodb/MongoDbIO.html
While Python SDK now only supports insert/replace (and does not support update). Providing update mode could also help to converge this gap.

self.writeFn = self._defaultWriteFn

@staticmethod
def _defaultWriteFn(client, db, coll, documents, logger):
Copy link
Contributor

@Abacn Abacn Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can name this as _replaceOneWriteFunc, and also provide a _updateOneWriteFunc

coll=None,
batch_size=100,
extra_client_params=None,
writeFn=None,
Copy link
Contributor

@Abacn Abacn Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally in Beam xxxFn implies a DoFn. This is just a callable used in write. Consider renaming it to 'write_func' or 'write_callback' to avoid ambiguity and in_line_with_python_nomenclature

coll=None,
batch_size=100,
extra_params=None,
writeFn=None):
Copy link
Contributor

@Abacn Abacn Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define the typehint of write_func argument as Optional[Union[str, Callable]]. If it is None or "ReplaceOne" then set write_func= _ReplaceOneWriteFunc; if it is "UpdateOne" then set write_func= _UpdateOneWriteFunc, and document this behavior.

@NagisaVon
Copy link
Author

Thanks for the contribution. The change is fine to me. I am wondering if we could also provide a WriteFunc for update that is ready-to-use as this use case is implied in the documentation change.

Generally we're trying to make the work consistent among SDKs. In Java SDK MongoDBIO insert/replace and update are distinguished by different configurations, as documented here: https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/mongodb/MongoDbIO.html While Python SDK now only supports insert/replace (and does not support update). Providing update mode could also help to converge this gap.

Thanks, those are great suggestions. I will take a look at the Java SDK and work on providing a default replace write function.

@TheNeuralBit
Copy link
Member

waiting on author

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2023

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 3, 2023
@AnandInguva
Copy link
Contributor

@NagisaVon Any update on this?

@github-actions github-actions bot removed the stale label Jan 4, 2023
@NagisaVon
Copy link
Author

can someone point me to the test for this file so that I could add some new test cases?
@AnandInguva @tvalentyn @TheNeuralBit

@NagisaVon
Copy link
Author

NagisaVon commented Jan 11, 2023

getting this error when building docs

...  apache_beam.io.mongodbio.WriteToMongoDB:: WARNING: py:class reference target not found: Optional
...  apache_beam.io.mongodbio.WriteToMongoDB:: WARNING: py:class reference target not found: Union
...  apache_beam.io.mongodbio.WriteToMongoDB:: WARNING: py:class reference target not found: Callable

which point to this line of docstring here:
write_func(Optional[Union[str, Callable]]):

what would be a correct type hint in the docstring ?

@ahmedabu98
Copy link
Contributor

@NagisaVon theres mongodbio_test.py for unit tests and mongodbio_it_test.py for integration tests, both are in the same module as this file

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added stale and removed stale labels Mar 14, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2023

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 8, 2023
@github-actions
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants