-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github actions: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. #392
Comments
Hi Maarten! Any news on that issue? |
@therealgherkhin Not yet. The thing is, I am only experiencing this issue with the Github actions pipeline. I do not run into any issues either locally or on both Kaggle and Google Colab, so I also have issues replicating this issue... Just to be sure, how did you try to install BERTopic, and in what kind of environment? |
Oh okay.. I run into this issue when importing bertopic in a docker environment. I installed it via Edit: I also tried it local on my machine and there everything seems fine. The docker installation still fails. Environment:
|
Hi Maarten, You can replicate the issue by building the following Docker image:
Or just by running The weird thing is, a Docker image that I created last week (with BERTopic version 0.9.3) is still running fine, but when I try to build it again with the exact same requirements and then run it, it is now failing. It seems to use something in the cached layers that it is now unable to rebuild, but I'm not sure exactly what it is. The same goes for my virtual environment; importing it still works fine in the old virtual environment, installing it in a new virtual environment with the same requirements also works, but importing it fails with the same error. The error happens when importing the HDBSCAN library (see scikit-learn-contrib/hdbscan#457 (comment)), which could be solved with bumping Numpy to version |
I got this problem too.I tried to fix it by upgrade numpy,but I have got another problem. |
Okay so I downgraded Python to 3.7 and now it works. I'm still not sure why it doesnt work with 3.8 |
The last few days I have been bug-fixing this as much as I could. However, it seems that the issue stems from ABI issues between HDBSCAN and Numpy. Whenever a major version is released from Numpy, there is a chance that it will break HDBSCAN if used together with UMAP. Python 3.7BERTopic works in python 3.7 seemingly without any problems, simply Python 3.8+For now, if you are on Python 3.8 or higher, it seems that the following will work: pip install --upgrade pip setuptools wheel
pip install bertopic --no-cache-dir
pip uninstall hdbscan -y
pip install hdbscan --no-cache-dir --no-binary :all: --no-build-isolation Future FixAt this point, I am not entirely sure how I want to proceed. It seems that Having said that, any and all help is greatly appreciated! |
Hi MaartenGr!Thanks for your awesome work on bert topic.I tried your advice: |
@SkyeCC I edited the message directly above yours to provide more up-to-date instructions on how to overcome your issue as |
Small update, it seems that the HDBSCAN issue can easily be fixed with a new pypi release of HDBSCAN or update the requirements to include the master branch of the package. Turned out that Before I can introduce a fix, I will have to wait until HDBSCAN updates their pypi package to include |
Hello, Thanks for getting to the bottom of this issue!! I tried the above and it resolved my numpy error, but I am now getting an error saying "ModuleNotFoundError: No module named 'torch._C'" when trying to import BERTopic. I have python version 3.9.6. Any thoughts? |
@salfaro1 Could you share the entire error? Also, do you by chance have a |
Hello Maarten, Information:
|
@TAsUjxnMIL Could you share the entire code for running BERTopic including the full error message? It may be that UMAP is improperly installed. Python 3.7 is the most stable for BERTopic, so it might be worthwhile to use a completely fresh environment with that version. Note that conda env might have packages pre-installed, so make sure to create a fully fresh environment when doing so. |
Update: |
Hi, so I checked and there is no torch folder or file in the folder I am working in. To be extra sure this wasn't it, I changed directory to a different location, to no avail. Could you explain what you mean by a fresh environment? I uninstalled Python and started fresh by reinstalling it, but the same exact error appeared. Here is the entire error message with the traceback: ModuleNotFoundError Traceback (most recent call last) ~\AppData\Local\Programs\Python\Python39\lib\site-packages\bertopic_init_.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\bertopic_bertopic.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\bertopic\backend_init_.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\bertopic\backend_word_doc.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\bertopic\backend_utils.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\bertopic\backend_sentencetransformers.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\sentence_transformers_init_.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\sentence_transformers\datasets_init_.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\sentence_transformers\datasets\DenoisingAutoEncoderDataset.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils_init_.py in ~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\utils\throughput_benchmark.py in ModuleNotFoundError: No module named 'torch._C' |
@salfaro1 Apologies for the late response. With a fresh environment, I mean a clean conda, pyenv, or poetry environment. Whenever things are not working, you typically start with a fresh install of python through these environments. These versions of python do not have anything else installed aside from the base packages. What is likely happening in your environment is that there are dependencies that might clash, starting from a clean slate might help. If that does not work, using python 3.7 might fix your issue. |
If you receive a metadata error where you are unable to find NumPy's METADATA folder for 1.19.x after following the bug fix for python 3.8+ make sure to run the following
|
Hi Maarten, |
@Ariannaperla Most likely, you updated to an unsupported numpy or numba version. I would advise starting from a fresh environment and trying the above again. If that does not work, using python 3.7 might solve your issue. If all fails, you can also install BERTopic from conda, as instructed here. |
CondaTo those interested, some of the issues users are having with the installation of BERTopic might be resolved by using conda to install BERTopic. Installing
Once the
|
Using conda to install bertopic worked for me. |
I'm running in a python3.9 container. Upgrading the following did the trick for me
|
Good news! HDBSCAN was updated to 0.8.28 which means that the There will be a fix in the future to make sure only 0.8.28 is selected but for now, this should be working. |
Since this issue seems to be resolved, I will close this. To those still experiencing this issue, let me know and we'll see if we can figure something out. |
Hi guys. I would like to add another solution. I know the above solutions work for most people, but they didn't work for me. What really seemed to be the problem in my case was a PYTORCH 1.8 and numpy combination I upgraded pytorch to 1.10 and numpy to 1.23.3 and the problem disappeared. I hope this helps someone out there. |
We are getting this error for numpy 2.0 as well. |
The github actions workflow is suddenly giving me the following error:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
It seems that it has most likely to do with numpy-based binary compatibility issues (some more info here). However, I cannot seem to fix it thus far with the suggested method (setting
oldest-supported-numpy
inpyproject.toml
).If you have any idea, please follow along with the full discussions here. Any help is greatly appreciated!
The text was updated successfully, but these errors were encountered: