-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sdist installation #3080
Fix sdist installation #3080
Conversation
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ddelange, this is going backwards in time to what we did before we published to pypi.org and the opposite of what we want. It would be good to know why it fails to install in you CI/CD environment, but this change now breaks everyone who has |
If the issue is that this pip-inception flow is breaking due to the internal pip invocation then maybe we can do something like try to detect if you already set the correct environment variables to the right values and don't do this hackiness? |
Ah, if that's the issue: I think you can simply request they increase the file size cap for your specific package: pypa/packaging-problems#109 torch also has 600+ MB wheels on pypi: https://pypi.org/project/torch/2.0.1/#files |
@zeroepoch please see the latest commit: I switched to PEP440 direct references based on the wheels I can see at https://pypi.nvidia.com/tensorrt-bindings/ and https://pypi.nvidia.com/tensorrt-libs/. Like this, there is no more need for the index url, as it basically mimics pip's mechanics of finding the right wheel. How is that? |
@ddelange this was one of the approaches we first considered, but then another team member mentioned PyPI would reject it. See this paragraph.
It's annoying because this would be perfect for this problem. We did request an increase for our file size and it was approved. Then we ran into the project limit which was 10gb. We requested and increase for that and it was approved. Then we ran into the project size limit again. That wasn't really a sustainable solution and the libraries keep getting bigger each release unfortunately. So we took the next step which was to break out the libraries to avoid duplication and reduce storage size, but then we have these hacky sdist files to piece it all together depending on Python version. I appreciate you looking into alternatives since we're not happy with this solution either. |
wow, that's a journey for the books... have you considered hosting a zip attached to each GitHub Release? and then for instance employ a mechanic similar to |
If we had some way to prevent the internal pip call so you could manage the index URL or installation of dependencies yourself, would that resolve the issue for you? Normal users who just want to install TensorRT using the system Python would see a simple experience. Advanced users who have issues with the internal pip call breaking their installation or CI/CD can apply some workaround to skip this internal step and then you can just pip install the 3 packages yourself. |
I had a deeper look at the logs linked in the issue. The environment variables look good, pip should (and does) pick up the pip 'symlink' from the venv. Maybe the env vars get dropped in your subprocess, and it would help if the invocation explicitly propagates the current env to |
Normally a subprocess should inherit its parent's environment but maybe it's filtering certain things out or it's the way pip is invoked. Maybe we could explicitly copy the environment and set it for the sub-pip call. |
yes, it should and it does. I even rely on it in pipgrip and invoke the same way as TensorRT: https://github.com/ddelange/pipgrip/blob/0.10.4/src/pipgrip/pipper.py#L106 the logs from autogluon CI simply don't make sense to me. food for thought... |
and what a catch 22 by the way pypa/pip#6301 |
Not surprised to see others wanting this feature as well and I also don't understand the security argument. I guess if you only trust pypi.org then maybe you could say that, but what TensorRT is doing as an alternative shows it's not really that secure to begin with. I feel like this could properly be supported with a pedantic or strict flag if you didn't want any additional sources. If the URLs are directly in the requirements it's easier to audit and restrict. FYI pypi/support#2609 (comment). We could have taken an alternative approach and downloaded the libraries only but doing it with pip allows more flexibility. You can actually install just the bindings from the NVIDIA index if you have the libraries installed already another way. Before that required the tar package to get those wheels. |
https://peps.python.org/pep-0440/#direct-references Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
This reverts commit 5334f0d. ref NVIDIA#3080 (comment) Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
I've reverted the direct references commit per your comment. Maybe you want to keep this PR open as reference PR / proper implementation for the issue, or for the next major version release (where some action of the user may be expected). I've notified the aws/autogluon team about the current impasse, still no idea why 'python says no'... |
@pranavm-nvidia any ideas on better ways to solve this problem for autogluon? I was thinking maybe an environment variable to disable the internal pip call and then users can use |
An alternative is to install with |
does the current hack respect |
Ah you're right, my mistake. An environment variable would work then, though it would be a bit nicer if we could somehow intercept the |
if the user has exported |
if "pypi.nvidia.com" in os.environ.get("PIP_EXTRA_INDEX_URL", ""):
install_requires = [...]
cmdclass = {}
else:
install_requires = []
cmdclass = {"install": ...} |
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@zeroepoch anything from your side? |
pid = os.getppid() | ||
# try retrieval using psutil | ||
try: | ||
import psutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the install base looks like for this Python module, but it does look like the best solution if available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has zero dependencies, but not all platform specific wheels are available and building it from source will fail e.g. on graviton processors ref giampaolo/psutil#2103
# fall back to shell | ||
try: | ||
return ( | ||
subprocess.check_output(["ps", "-p", str(pid), "-o", "command"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding --no-headers
avoids the need for the split()
. Checking both CentOS 7 and Ubuntu 18.04 containers this option has been supported since at least then and TRT supports those OSes and newer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
# use pip-inside-pip hack only if the nvidia index is not set in the environment | ||
if "pypi.nvidia.com" in pip_config_list() or "pypi.nvidia.com" in parent_command_line(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A recent internal (and unrelated) discussion revealed we may want one additional condition here. In order to test different versions or from staging servers (other than pypi.nvidia.com
) a full override may be desired. Basically a "manage the source and installation of dependencies yourself" option. I think one way to handle this would be to check an environment variable when set assumes you have set --extra-index-url
to something you know works, which could be something other than pypi.nvidia.com
. This may be useful externally as well if someone is running their own internal PyPI server and mirroring our wheel/sdist files as-is. @pranavm-nvidia what do you think about this approach and calling this environment variable something like NVIDIA_TENSORRT_DISABLE_INTERNAL_PIP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we make NVIDIA_PIP_INDEX_URL
configurable? defaulting to https://pypi.nvidia.com
, but it could be overwritten with custom index urls or set it to an empty string to always pass the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NVIDIA_TENSORRT_DISABLE_INTERNAL_PIP
is still useful even if we make the NVIDIA pip index URL configurable. There could be a case where there are no extra indexes used (not sure how it would work, but I imagine it would be possible somehow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was out of the office last week so I was replying from my phone. I finally got a chance to fully review the changes and leave some suggested changes. |
Co-authored-by: Eric Work <work.eric@gmail.com> Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
no worries! diff since your review: https://github.com/ddelange/TensorRT/compare/fc53814..fix-sdist |
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of the environment variables LGTM. @rajeevsrao can you help to merge this PR?
@ttyio can you please cherry-pick this change internally and include for the next release? |
oops |
/blossom-ci |
super().run() | ||
|
||
|
||
def pip_config_list(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace the function with below code, to solve some issue we see in internal CI/CD, thanks!
def run_pip_command(args, call_func):
try:
return call_func([sys.executable, "-m", "pip"] + args)
except subprocess.CalledProcessError:
return call_func([os.path.join(sys.exec_prefix, "bin", "pip")] + args)
def pip_config_list():
"""Get the current pip config (env vars, config file, etc)."""
return run_pip_command(["config", "list"], subprocess.check_output).decode()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
…e-sdist * 'main' of https://github.com/NVIDIA/TensorRT: Fix sdist installation (NVIDIA#3080)
@ddelange, just to follow up on this PR, these changes + some more that were required are now live with version 8.6.1.post1. If you're curious what the final changes were you can pull out the setup.py file. |
Fixes #3078
This PR
removesamends the 'pip inside pip' anti-pattern and updates the PyPI description:I tried to find the source code of ERROR.txt inside
tensorrt_libs
sdist on official PyPI (so I could update the message with thePIP_EXTRA_INDEX_URL
env var), but couldn't find it. Can you update that error message accordingly? Similarly this one will also need an update: https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html