Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added intera and inter op parallelism parameters to the hugggingface … #1081

Merged
merged 43 commits into from
Apr 20, 2023

Conversation

saeid93
Copy link
Contributor

@saeid93 saeid93 commented Apr 8, 2023

Fixes #961
As there is no universal solution for setting these two variables I think exposing them to the user is the best solution for now. For containerized environments, as I showed in the experiments the number of cores is the best heuristic but that's not the general case as MLServer could be also deployed in none containerized platforms.

@saeid93
Copy link
Contributor Author

saeid93 commented Apr 8, 2023

@adriangonz could you plz have a look at the failed test and let me know what do u think might be the problem? I think it is some problem with having tensorflow and torch in the requirements but I couldn't figure it out.

@adriangonz
Copy link
Contributor

Hey @saeid93 ,

It seems there's something going with the packages. It's unclear whether it's been triggered by the changes in requirements/dev.txt though. You can try replicating it locally with tox -e huggingface.

https://github.com/SeldonIO/MLServer/actions/runs/4647262874/jobs/8224160766?pr=1081#step:6:338

@@ -36,6 +36,7 @@ def _load_description() -> str:
install_requires=[
"mlserver",
"optimum[onnxruntime]>=1.4.0, <1.8.0",
"tensorflow==2.12.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't pin it here, as that will cause incompatibilities with other runtimes which are not yet using TF 2.12.

Do you know why do we need to specify it explicitly? I thought optimum already depended in tensorflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not part of the optimum[onnxruntime] here is the link. It is part of some other installations of link of Optimum but not the optimum[onnxruntime] that is in MLServer. I spent some time yesterday figuring out the problem and tried testing it locally. The problem is that for some yet unknown reason, having Tensorflow here or in requirements/dev.txt will result in pip and tox to get stuck in backtracking for versioning resolution. I've tried a number of changes in the version of the dependencies and still no luck. I think we can do either of the following:

  1. Remove support of the Tensorflow for this feature and add a warning message for using requesting it for users trying this feature with Tensorflow (as most of the HF models are PyTroch anyway these days)
  2. Continue resolving the TF scenario if you think they both should be supported and partial support for one framework is not acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangonz I added the 1st option, please let me know if there is any problem. I'll look into TensorFlow support in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @saeid93 ,

Just as an update, we merged a PR today which should make your life easier on this one:

#1092

Essentially, what was going was:

  • At the top-level MLServer package, we had to pin protobuf to >=3.20.2 to avoid a CVE with previous versions.
  • However, TF <2.12 is very stubborn on not using that version of protobuf. In fact, pip keeps backtracking until it gets to TF 2.9.0 (which mysteriously didn't have that protobuf constrain, so is happy).
  • This leads to MLServer indirectly forcing an older version of TF which was incompatible with Alibi-Detect and a few others.
  • The fix has been to move that protobuf constrain to the official Docker images - letting the user control what goes into their local environment (which could include the faulty version of protobuf).

Based on the above, what I would try is to rebase from master and then remove the extra tensorflow dep in the runtimes/huggingface/setup.py. That should hopefully let huggingface > transformers pick up the correct version of TF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adriangonz,
Thank you for the details. Yes, that's exactly what I was facing!
The only remaining part is that I need to include Tensorflow installation for the tests since it is not included in the dependencies (It isn't part of the onnx runtime either as I mentioned earlier #1081 (comment)),


I first add it dev/requirements.txt, and also tried runtime/huggingface.setup.py but it based on your feedback I think both are not intended for this. Please let me know where I can add this so I won't face the following error in the tests?

====================================================================================== ERRORS ======================================================================================
__________________________________________________________________________ ERROR collecting test session ___________________________________________________________________________
runtimes/huggingface/tests/test_tasks/conftest.py:2: in <module>
    from ..utils import file_path, file_bytescontent
runtimes/huggingface/tests/utils.py:6: in <module>
    from mlserver_huggingface.codecs.image import _pil_base64encode, _pil_base64decode
runtimes/huggingface/mlserver_huggingface/__init__.py:1: in <module>
    from .runtime import HuggingFaceRuntime
runtimes/huggingface/mlserver_huggingface/runtime.py:12: in <module>
    from .common import load_pipeline_from_settings
runtimes/huggingface/mlserver_huggingface/common.py:10: in <module>
    import tensorflow
E   ModuleNotFoundError: No module named 'tensorflow'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adriangonz,

Thank you for the description. I tested several times and highly doubt that it will be installed automatically or there is any problem with my environment. Are yo sure that tensorflow is not installed from before on your environment? Because I checked with several fresh envs and also MLServer Huggingface tox, tensorflow was not installed automatically in both of them.

In the transformers setup.py the _deps list is only used for making as a reference for finding the correct version of each package see here. The resulting deps dictionary after performing the regular expression maps each package name to its version constraint. Not everything in that list will be necessarily installed. What is installed in the transformers is the install_requires list that is composed of deps that are made above.

Another sign for this is that if you see the same optimum code that you sent me they have also pinned PyTorch there although it is present in the same _deps list that you mentioned in the setup.py repository. If the PyTorch was automatically installed then there was no need to add it there in the Optimum library.

If you think pinning tensorflow could cause problems, maybe we can revert back to not supporting tensorflow or any other options you think might be useful. I'm happy to investigate further if you have any suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! You're totally right. Thanks for that thorough explanation @saeid93. I saw tensorflow there and I've been wrongly assuming all this time that the transformers package would also install it.

In that case, I would suggest adding tensorflow there as a dep but without pinning it to any specific version? Or did you run into issues when trying that out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangonz Sure, glad to be of any help. I loosened the tf version, however, it will search until gets to the version that I had previously fixed. Please have a look now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuup, it seems that protobuf is still interacting with the version of TF even though it's not pinned by the main mlserver package anymore. I think it's OK if it still defaults to TF 2.9 though - not much we can do there (pinning it could cause plenty of incompatibilities with the other runtime deps).

So as it is right now (i.e. without any pin), it's probably as good as it gets! Thanks a lot for your help troubleshooting this one (and for pointing me on the right direction RE the transformers > tensorflow dep).

Copy link
Contributor Author

@saeid93 saeid93 Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem. I applied your comments, I think it should be good to go now.

saeid93 and others added 12 commits April 17, 2023 13:47
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: adriangonz <adriangonz@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple minor comments around the inter_op and intera_op flag names and docstring, but everything else looks great!

runtimes/huggingface/mlserver_huggingface/common.py Outdated Show resolved Hide resolved
runtimes/huggingface/mlserver_huggingface/settings.py Outdated Show resolved Hide resolved
Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great! This should be good to go now. Thanks a lot for addressing those comments and for your insights on the point about the tensorflow dependency.

@adriangonz adriangonz merged commit dc291b9 into SeldonIO:master Apr 20, 2023
26 checks passed
adriangonz added a commit that referenced this pull request Apr 20, 2023
#1081)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: adriangonz <adriangonz@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adrian Gonzalez-Martin <agm@seldon.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Effect of setting intra and inter-op parallelism paramters to deep learning models
2 participants