Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to install package datasets #247

Closed
CarloLucibello opened this issue Nov 23, 2022 · 6 comments · May be fixed by conda-forge/conda-forge-repodata-patches-feedstock#359
Closed

unable to install package datasets #247

CarloLucibello opened this issue Nov 23, 2022 · 6 comments · May be fixed by conda-forge/conda-forge-repodata-patches-feedstock#359

Comments

@CarloLucibello
Copy link
Contributor

CarloLucibello commented Nov 23, 2022

This is possible related to GLIBC, but I don't know for sure.
I cannot reliably reproduce the error, but here are the problematic steps:

pkg> conda channel_add conda-forge
pkg> conda channel_add huggingface
pkg> conda add datasets

Sometimes this works, sometimes works but when I load my package importing datasets I get

ERROR: InitError: Python: ImportError: /home/lucibello/.julia/juliaup/julia-1.8.2+0.x64/bin/../lib/julia/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/lucibello/.julia/dev/HuggingFaceDatasets/.CondaPkg/env/lib/python3.10/site-packages/pyarrow/../../../libarrow.so.600)

and sometimes the installation itself fails with

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
Encountered problems while solving:
  - package datasets-2.7.0-py_0 requires pyarrow >=6.0.0, but none of the providers can be installed

The environment can't be solved, aborting the operation
error    libmamba Could not solve for environment specs
critical libmamba UnsatisfiableError
ERROR: LoadError: InitError: failed process: Process(`/home/runner/.julia/artifacts/8d5103d84a46e89c60d007a7d30b926037514616/bin/micromamba -r /home/runner/.julia/scratchspaces/0b3b1443-0f03-428d-bdfb-f27f9c1191ea/root create -y -p /tmp/jl_dCVni6/.CondaPkg/env --override-channels --no-channel-priority "datasets[version='>=2.7, <3']" "libstdcxx-ng[version='>=3.4,<11.4',channel='conda-forge']" "numpy[version='>=1.23, <2']" "pillow[version='>=9.2, <10']" "python[version='>=3.7,<4',channel='conda-forge',build='*cpython*']" -c conda-forge -c huggingface`, ProcessExited(1)) [1]

in this CI run.

Sorry but the behavior is very erratic and I cannot offer a stable way to trigger the issue.

@CarloLucibello CarloLucibello changed the title unable to install oackage datasets unable to install package datasets Nov 23, 2022
@cjdoris
Copy link
Collaborator

cjdoris commented Nov 23, 2022

First, note that the datasets package is also in conda-forge so you don't need the huggingface channel. If you specifically want the huggingface version of datasets, you can just do conda add huggingface::datasets without any channel_add commands.

That said, I can reproduce the GLIBCXX error. But in this case, it's not a bug in PythonCall but a bug in how pyarrow (which datasets depends on) is packaged by conda-forge because you can reproduce it directly in Python with these commands:

$ micromamba create -n test pyarrow 'libstdcxx-ng<11.4' -c conda-forge
$ micromamba activate test
$ python
>>> import pyarrow

So this needs to get reported somewhere.

The libstdcxx-ng dependency is automatically added by PythonCall to ensure that the version of libstdc++ that Python packages are expecting is compatible with the one actually loaded by Julia. It is intended to avoid this GLIBCXX problem, except that sometimes conda-forge packages are not packaged correctly - in this case pyarrow is still expecting a too-new version of libstdc++ to be available, despite the fact that the libstdcxx-ng package is being held back.

As for the other error, where mamba cannot install packages, this is due to the numpy constraint. It can be reproduced like this:

$ micromamba create -n test "numpy>=1.23,<2" "libstdcxx-ng>=3.4,<11.4"

This happens because newer versions of numpy require a newer version of libstdc++. You'll need to lower the bound on numpy - 1.22 is OK.

@CarloLucibello
Copy link
Contributor Author

CarloLucibello commented Dec 7, 2022

Thanks for looking into this. I now have the bounds

numpy = ">=1.22, <2"
pyarrow = "=6.0.0"

But still the CI run fails, https://github.com/CarloLucibello/HuggingFaceDatasets.jl/actions/runs/3637555353/jobs/6138697246#step:5:223

Btw, I cannot reproduce any failure locally on my mac now.

@CarloLucibello
Copy link
Contributor Author

I guess also that PythonCall is adding the bound libstdcxx-ng<11.4 and cannot allow newer versions

@cjdoris
Copy link
Collaborator

cjdoris commented Dec 7, 2022

That CI run doesn't actually specify pyarrow as a dependency even though it is in your CondaPkg.toml. You should specify ==6.0.0 not =6.0.0, but I don't know why CondaPkg ignores it entirely, maybe a bug...

@cjdoris
Copy link
Collaborator

cjdoris commented Dec 7, 2022

The libstdcxx-ng requirement will be lifted if you're using a version of Julia with a newer libstdc++.

@CarloLucibello
Copy link
Contributor Author

I've not been running into this lately so marking as closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants