Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sdist installation #3080

Merged
merged 15 commits into from Jul 14, 2023
Merged

Fix sdist installation #3080

merged 15 commits into from Jul 14, 2023

Conversation

ddelange
Copy link
Contributor

@ddelange ddelange commented Jun 22, 2023

Fixes #3078

This PR removes amends the 'pip inside pip' anti-pattern and updates the PyPI description:

To install, please execute the following:

pip install tensorrt --extra-index-url https://pypi.nvidia.com

Or add the index URL to the (space-separated) PIP_EXTRA_INDEX_URL environment variable:

export PIP_EXTRA_INDEX_URL='https://pypi.nvidia.com'
pip install tensorrt

When the extra index url does not contain https://pypi.nvidia.com, a nested pip install will run with the proper extra index url hard-coded.

I tried to find the source code of ERROR.txt inside tensorrt_libs sdist on official PyPI (so I could update the message with the PIP_EXTRA_INDEX_URL env var), but couldn't find it. Can you update that error message accordingly? Similarly this one will also need an update: https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html

Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@zeroepoch
Copy link
Contributor

@ddelange, this is going backwards in time to what we did before we published to pypi.org and the opposite of what we want. It would be good to know why it fails to install in you CI/CD environment, but this change now breaks everyone who has tensorrt in their requirements.txt file without an extra index URL our anyone who is simply using pip install tensorrt. There were other "magic" ways to accomplish forcing certain dependencies to install from another server but the version specifier with URL included is not allowed on pypi.org. This is really an issue with the python package index and not allowing external hosting to support larger packages or a way to reference another server without adding new indexes for all packages. They really need something like docker has with their pull syntax. This is why PyTorch has to use their complex install procedure to work around the package size limits of 1gb. I 100% agree this is a total hack to work around the limits of pypi.org, but for the common case where an user just uses pip install tensorrt for their system environment or python3 venv it magically selects the right servers for them.

@zeroepoch
Copy link
Contributor

If the issue is that this pip-inception flow is breaking due to the internal pip invocation then maybe we can do something like try to detect if you already set the correct environment variables to the right values and don't do this hackiness?

@ddelange
Copy link
Contributor Author

Ah, if that's the issue: I think you can simply request they increase the file size cap for your specific package: pypa/packaging-problems#109

torch also has 600+ MB wheels on pypi: https://pypi.org/project/torch/2.0.1/#files

@ddelange
Copy link
Contributor Author

@zeroepoch please see the latest commit: I switched to PEP440 direct references based on the wheels I can see at https://pypi.nvidia.com/tensorrt-bindings/ and https://pypi.nvidia.com/tensorrt-libs/.

Like this, there is no more need for the index url, as it basically mimics pip's mechanics of finding the right wheel.

How is that?

@zeroepoch
Copy link
Contributor

@ddelange this was one of the approaches we first considered, but then another team member mentioned PyPI would reject it. See this paragraph.

Public index servers SHOULD NOT allow the use of direct references in uploaded distributions. Direct references are intended as a tool for software integrators rather than publishers.

It's annoying because this would be perfect for this problem.

We did request an increase for our file size and it was approved. Then we ran into the project limit which was 10gb. We requested and increase for that and it was approved. Then we ran into the project size limit again. That wasn't really a sustainable solution and the libraries keep getting bigger each release unfortunately. So we took the next step which was to break out the libraries to avoid duplication and reduce storage size, but then we have these hacky sdist files to piece it all together depending on Python version.

I appreciate you looking into alternatives since we're not happy with this solution either.

@ddelange
Copy link
Contributor Author

wow, that's a journey for the books... have you considered hosting a zip attached to each GitHub Release?

and then for instance employ a mechanic similar to nltk, it auto-downloads on import (by default kwarg), or you can run their entry_points to nltk download after install

@zeroepoch
Copy link
Contributor

zeroepoch commented Jun 24, 2023

If we had some way to prevent the internal pip call so you could manage the index URL or installation of dependencies yourself, would that resolve the issue for you? Normal users who just want to install TensorRT using the system Python would see a simple experience. Advanced users who have issues with the internal pip call breaking their installation or CI/CD can apply some workaround to skip this internal step and then you can just pip install the 3 packages yourself.

@ddelange
Copy link
Contributor Author

ddelange commented Jun 24, 2023

I had a deeper look at the logs linked in the issue. The environment variables look good, pip should (and does) pick up the pip 'symlink' from the venv. Maybe the env vars get dropped in your subprocess, and it would help if the invocation explicitly propagates the current env to sys.executable?

@ddelange ddelange closed this Jun 24, 2023
@ddelange ddelange reopened this Jun 24, 2023
@zeroepoch
Copy link
Contributor

Normally a subprocess should inherit its parent's environment but maybe it's filtering certain things out or it's the way pip is invoked. Maybe we could explicitly copy the environment and set it for the sub-pip call.

@ddelange
Copy link
Contributor Author

Normally a subprocess should inherit its parent's environment

yes, it should and it does. I even rely on it in pipgrip and invoke the same way as TensorRT: https://github.com/ddelange/pipgrip/blob/0.10.4/src/pipgrip/pipper.py#L106

the logs from autogluon CI simply don't make sense to me. food for thought...

@ddelange
Copy link
Contributor Author

and what a catch 22 by the way pypa/pip#6301

@zeroepoch
Copy link
Contributor

and what a catch 22 by the way pypa/pip#6301

Not surprised to see others wanting this feature as well and I also don't understand the security argument. I guess if you only trust pypi.org then maybe you could say that, but what TensorRT is doing as an alternative shows it's not really that secure to begin with. I feel like this could properly be supported with a pedantic or strict flag if you didn't want any additional sources. If the URLs are directly in the requirements it's easier to audit and restrict.

FYI pypi/support#2609 (comment). We could have taken an alternative approach and downloaded the libraries only but doing it with pip allows more flexibility. You can actually install just the bindings from the NVIDIA index if you have the libraries installed already another way. Before that required the tar package to get those wheels.

https://peps.python.org/pep-0440/#direct-references

Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
This reverts commit 5334f0d.

ref NVIDIA#3080 (comment)

Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ddelange
Copy link
Contributor Author

I've reverted the direct references commit per your comment.

Maybe you want to keep this PR open as reference PR / proper implementation for the issue, or for the next major version release (where some action of the user may be expected).

I've notified the aws/autogluon team about the current impasse, still no idea why 'python says no'...

https://http.cat/417

@zeroepoch
Copy link
Contributor

@pranavm-nvidia any ideas on better ways to solve this problem for autogluon? I was thinking maybe an environment variable to disable the internal pip call and then users can use --extra-index-url to install the 3 modules themselves if they run into problems.

@pranavm-nvidia
Copy link
Collaborator

An alternative is to install with --no-deps and then install tensorrt-libs and tensorrt-bindings separately with the --extra-index-url option. That wouldn't require any changes in the TRT wheels.

@ddelange
Copy link
Contributor Author

does the current hack respect --no-deps? I thought that flag only leads to ignoring install_requires, but I haven't tested that on tensorrt 8.6.1 tbh

@pranavm-nvidia
Copy link
Collaborator

Ah you're right, my mistake. An environment variable would work then, though it would be a bit nicer if we could somehow intercept the --no-deps option and respect that.

@ddelange
Copy link
Contributor Author

if the user has exported PIP_EXTRA_INDEX_URL=pypi.abc.de,pypi.nvidia.com, and setup.py detects that, it could have install_requires like in this PR. if not, the pip hack could kick in?

@ddelange
Copy link
Contributor Author

if "pypi.nvidia.com" in os.environ.get("PIP_EXTRA_INDEX_URL", ""):
    install_requires = [...]
    cmdclass = {}
else:
    install_requires = []
    cmdclass = {"install": ...}

Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ddelange
Copy link
Contributor Author

ddelange commented Jul 4, 2023

@zeroepoch anything from your side?

pid = os.getppid()
# try retrieval using psutil
try:
import psutil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the install base looks like for this Python module, but it does look like the best solution if available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has zero dependencies, but not all platform specific wheels are available and building it from source will fail e.g. on graviton processors ref giampaolo/psutil#2103

# fall back to shell
try:
return (
subprocess.check_output(["ps", "-p", str(pid), "-o", "command"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding --no-headers avoids the need for the split(). Checking both CentOS 7 and Ubuntu 18.04 containers this option has been supported since at least then and TRT supports those OSes and newer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python/packaging/frontend_sdist/setup.py Outdated Show resolved Hide resolved


# use pip-inside-pip hack only if the nvidia index is not set in the environment
if "pypi.nvidia.com" in pip_config_list() or "pypi.nvidia.com" in parent_command_line():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A recent internal (and unrelated) discussion revealed we may want one additional condition here. In order to test different versions or from staging servers (other than pypi.nvidia.com) a full override may be desired. Basically a "manage the source and installation of dependencies yourself" option. I think one way to handle this would be to check an environment variable when set assumes you have set --extra-index-url to something you know works, which could be something other than pypi.nvidia.com. This may be useful externally as well if someone is running their own internal PyPI server and mirroring our wheel/sdist files as-is. @pranavm-nvidia what do you think about this approach and calling this environment variable something like NVIDIA_TENSORRT_DISABLE_INTERNAL_PIP?

Copy link
Contributor Author

@ddelange ddelange Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we make NVIDIA_PIP_INDEX_URL configurable? defaulting to https://pypi.nvidia.com, but it could be overwritten with custom index urls or set it to an empty string to always pass the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NVIDIA_TENSORRT_DISABLE_INTERNAL_PIP is still useful even if we make the NVIDIA pip index URL configurable. There could be a case where there are no extra indexes used (not sure how it would work, but I imagine it would be possible somehow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroepoch
Copy link
Contributor

@zeroepoch anything from your side?

Sorry, I was out of the office last week so I was replying from my phone. I finally got a chance to fully review the changes and leave some suggested changes.

Co-authored-by: Eric Work <work.eric@gmail.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ddelange
Copy link
Contributor Author

ddelange commented Jul 6, 2023

no worries! diff since your review: https://github.com/ddelange/TensorRT/compare/fc53814..fix-sdist

Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
Copy link
Contributor

@zeroepoch zeroepoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of the environment variables LGTM. @rajeevsrao can you help to merge this PR?

@rajeevsrao
Copy link
Collaborator

@ttyio can you please cherry-pick this change internally and include for the next release?

@ddelange
Copy link
Contributor Author

ddelange commented Jul 7, 2023

oops write /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cuda_cpp.so: no space left on device

@ttyio
Copy link
Collaborator

ttyio commented Jul 10, 2023

/blossom-ci

super().run()


def pip_config_list():
Copy link
Collaborator

@ttyio ttyio Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace the function with below code, to solve some issue we see in internal CI/CD, thanks!

def run_pip_command(args, call_func):
    try:
        return call_func([sys.executable, "-m", "pip"] + args)
    except subprocess.CalledProcessError:
        return call_func([os.path.join(sys.exec_prefix, "bin", "pip")] + args)

def pip_config_list():
    """Get the current pip config (env vars, config file, etc)."""
    return run_pip_command(["config", "list"], subprocess.check_output).decode()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: ddelange <14880945+ddelange@users.noreply.github.com>
@ttyio ttyio merged commit ba459b4 into NVIDIA:main Jul 14, 2023
1 of 2 checks passed
ddelange added a commit to ddelange/TensorRT that referenced this pull request Jul 14, 2023
@zeroepoch
Copy link
Contributor

@ddelange, just to follow up on this PR, these changes + some more that were required are now live with version 8.6.1.post1. If you're curious what the final changes were you can pull out the setup.py file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants