Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipk: issue related to boost-cpp version which differs in bioconda-build CI and local install. #49000

Merged
merged 54 commits into from
Jul 10, 2024

Conversation

blinard-BIOINFO
Copy link
Member

@blinard-BIOINFO blinard-BIOINFO commented Jul 8, 2024

Package does not work because of different behavior in conda-build CI and install on some local computer.

@martin-g @Juke34
I'm getting a weird behavior related to boost following the package development.

In the conda-build CI, package boost-cpp=1.85 got selected for the conda env.
My rule initially required boost-cpp>=1.67.

As one can see in the CI logs of PR #47946 :

# see pipeline https://github.com/bioconda/bioconda-recipes/runs/27086383834

ONDA INFO (OUT) The following NEW packages will be INSTALLED:
15:55:23 BIOCONDA INFO (OUT) 
15:55:23 BIOCONDA INFO (OUT)     _libgcc_mutex:    0.1-conda_forge   conda-forge
15:55:23 BIOCONDA INFO (OUT)     _openmp_mutex:    4.5-2_gnu         conda-forge
15:55:23 BIOCONDA INFO (OUT)     boost-cpp:        1.85.0-h44aadfe_2 conda-forge        <<===
15:55:23 BIOCONDA INFO (OUT)     bzip2:            1.0.8-hd590300_5  conda-forge
15:55:23 BIOCONDA INFO (OUT)     icu:              73.2-h59595ed_0   conda-forge
15:55:23 BIOCONDA INFO (OUT)     libboost:         1.85.0-hba137d9_2 conda-forge
15:55:23 BIOCONDA INFO (OUT)     libboost-devel:   1.85.0-h00ab1b0_2 conda-forge
15:55:23 BIOCONDA INFO (OUT)     libboost-headers: 1.85.0-ha770c72_2 conda-forge

[...]

CONDA INFO (OUT) -- Found Boost: $PREFIX/lib/cmake/Boost-1.85.0/BoostConfig.cmake (found version "1.85.0") found components: serialization iostreams system filesystem

So the binary is built with boost v1.85 on bioconda side.
However, when installing on a fresh env in my local computer, conda or mamba refuses to use boost-cpp 1.85 and stick to boost-cpp 1.84. This results to a conflit of shared boost librairies and the program fails to run.

# running the command 

ipk.py build --refalign reference.fasta --reftree tree.rooted.newick --states nucl --workdir . --model GTR

/home/belinard/SOFTWARE_LOCAL/miniconda3/envs/epik/bin/ipk-dna: error while loading shared libraries: libboost_program_options.so.1.85.0: cannot open shared object file: No such file or directory

# the binary from the package look for v1.85 like in conda-build

ldd miniconda3/envs/epik/bin/ipk-dna
	linux-vdso.so.1 (0x00007ffeebbd7000)
	libboost_program_options.so.1.85.0 => not found
	libboost_filesystem.so.1.85.0 => not found
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000788eb5097000)
	libboost_serialization.so.1.85.0 => not found
	libboost_iostreams.so.1.85.0 => not found
	libstdc++.so.6 => /home/belinard/SOFTWARE_LOCAL/miniconda3/envs/epik/bin/../lib/libstdc++.so.6 (0x0000788eb4eb2000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000788eb4dcb000)
	libgcc_s.so.1 => /home/belinard/SOFTWARE_LOCAL/miniconda3/envs/epik/bin/../lib/libgcc_s.so.1 (0x0000788eb4dac000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000788eb4a00000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000788eb4da5000)
	/lib64/ld-linux-x86-64.so.2 (0x0000788eb515b000)

# but the command `conda install ipk epik` installs v1.84

# conda list gives:

boost-cpp                 1.84.0               h44aadfe_3    conda-forge
libboost                  1.84.0               hba137d9_3    conda-forge
libboost-devel            1.84.0               h00ab1b0_3    conda-forge
libboost-headers          1.84.0               ha770c72_3    conda-forge

# i can confirm by looking at the files
miniconda3/envs/epik/lib/libboost_serialization.so

I looked at the repo libboost-feedstock which maintains boost-cpp and it seems this can happen if channel_priority is not set to strict for conda-forge. I check the documentation, but so far did not found a way to force "strict" priority for a bioconda recipe. Is this even possible ?

In this PR, I set the boost-cpp version to a strict version in the recipe, hoping this will resolve the issue.

But I wanted to let you know that this weird behavior happens (and may break potential other libboost-based packages, which may be super common for C++ packages).

The worse part is that all looks clean in your CI side, then you merge to make it available online, and then things get messed-up when someone is actually installing the package on his/her computer.

Juke34 and others added 30 commits April 11, 2023 12:08
@blinard-BIOINFO blinard-BIOINFO changed the title Ipk Ipk: issue related to boost-cpp version which differs in bioconda-build CI and local install. Jul 8, 2024
@Juke34
Copy link
Contributor

Juke34 commented Jul 8, 2024

It works on my side on macOS, boost-cpp 1.85 is used.
How are your channels setup?
conda config --show channels

channels:
  - conda-forge
  - bioconda
  - defaults

and your channel priority?
conda config --show channel_priority

channel_priority: strict

@martin-g
Copy link
Contributor

martin-g commented Jul 9, 2024

It uses 1.85 for me too!

@blinard-BIOINFO
Copy link
Member Author

blinard-BIOINFO commented Jul 9, 2024

The issue comes from the fact that my default conda installation uses flexible priority.

$  conda config --show channels
channels:
  - conda-forge
  - bioconda
  - defaults
  - r

$  conda config --show channel_priority
channel_priority: flexible

As mentionned here: https://github.com/conda-forge/boost-feedstock
The boost-cpp package can be an issue if priority is not set to strict.
There is quite a few issues here and there on the web showing the same issue for conda packages using boost-cpp.

I was wondering if there is any way to set a directive in the meta.yaml so that channels priority is set as strict for a particular package (e.g boost-cpp in our case) ?

@blinard-BIOINFO
Copy link
Member Author

blinard-BIOINFO commented Jul 9, 2024

Also, in a fresh environment, where I install only package ipk, trying to remove it launches either 1) an error with mamba remove or 2) has infinite wheel with conda remove (it's running since yesterday).
I'm obliged to delete the full environment and do a conda clean --all to solve the issue.

If you do no observe this behavior, I suppose this could be related to boost-cpp version issue.

Removing specs: ['ipk']
Transaction

  Prefix: /home/belinard/SOFTWARE_LOCAL/miniconda3/envs/epik

  Removing specs:

   - ipk


  Package                  Version  Build               Channel         Size
──────────────────────────────────────────────────────────────────────────────
  Remove:
──────────────────────────────────────────────────────────────────────────────

  - __unix                       0  0                   installed           
  - _libgcc_mutex              0.1  conda_forge         conda-forge         
  - _openmp_mutex              4.5  2_gnu               conda-forge         
  - boost-cpp               1.84.0  h44aadfe_3          conda-forge         
  - bzip2                    1.0.8  hd590300_5          conda-forge         
  - ca-certificates       2024.7.4  hbcca054_0          conda-forge         
  - click                    8.1.7  unix_pyh707e725_0   conda-forge         
  - gmp                      6.3.0  hac33072_2          conda-forge         
  - icu                       73.2  h59595ed_0          conda-forge         
  - ipk                      0.5.1  hdcf5f25_0          bioconda            
  - ld_impl_linux-64          2.40  hf3520f5_7          conda-forge         
  - libboost                1.84.0  hba137d9_3          conda-forge         
  - libboost-devel          1.84.0  h00ab1b0_3          conda-forge         
  - libboost-headers        1.84.0  ha770c72_3          conda-forge         
  - libexpat                 2.6.2  h59595ed_0          conda-forge         
  - libffi                   3.4.2  h7f98852_5          conda-forge         
  - libgcc-ng               14.1.0  h77fa898_0          conda-forge         
  - libgfortran-ng          14.1.0  h69a702a_0          conda-forge         
  - libgfortran5            14.1.0  hc5f4f2c_0          conda-forge         
  - libgomp                 14.1.0  h77fa898_0          conda-forge         
  - libnsl                   2.0.1  hd590300_0          conda-forge         
  - libsqlite               3.46.0  hde9e2c9_0          conda-forge         
  - libstdcxx-ng            14.1.0  hc0a3c3a_0          conda-forge         
  - libuuid                 2.38.1  h0b41bf4_0          conda-forge         
  - libxcrypt               4.4.36  hd590300_1          conda-forge         
  - libzlib                 1.2.13  h4ab18f5_6          conda-forge         
  - mpi                        1.0  openmpi             conda-forge         
  - ncurses                    6.5  h59595ed_0          conda-forge         
  - openmpi                  4.1.6  hc5af2df_101        conda-forge         
  - openssl                  3.3.1  h4ab18f5_1          conda-forge         
  - phyml             3.3.20220408  h37cc20f_2          bioconda            
  - pip                       24.0  pyhd8ed1ab_0        conda-forge         
  - python                  3.12.3  hab00c5b_0_cpython  conda-forge         
  - raxml-ng                 1.2.2  h6d1f11b_0          bioconda            
  - readline                   8.2  h8228510_1          conda-forge         
  - setuptools              70.1.1  pyhd8ed1ab_0        conda-forge         
  - tk                      8.6.13  noxft_h4845f30_101  conda-forge         
  - tzdata                   2024a  h0c530f3_0          conda-forge         
  - wheel                   0.43.0  pyhd8ed1ab_1        conda-forge         
  - xz                       5.2.6  h166bdaf_0          conda-forge         
  - zlib                    1.2.13  h4ab18f5_6          conda-forge         
  - zstd                     1.5.6  ha6fb4c9_0          conda-forge         

  Summary:

  Remove: 42 packages

  Total download: 0 B

──────────────────────────────────────────────────────────────────────────────


Confirm changes: [Y/n] Y
PackageRecord(_hash=-1232370165032959845, name='__unix', version='0', build='0', build_number=0, channel=Channel("@"), subdir='linux-64', fn='__unix', md5='12345678901234567890123456789012', package_type='virtual_system')

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/belinard/SOFTWARE_LOCAL/miniconda3/lib/python3.10/site-packages/conda/_vendor/boltons/setutils.py", line 247, in remove
        didx = self.item_index_map.pop(item)
    KeyError: PackageRecord(_hash=-1232370165032959845, name='__unix', version='0', build='0', build_number=0, channel=Channel("@"), subdir='linux-64', fn='__unix', md5='12345678901234567890123456789012', package_type='virtual_system')

@blinard-BIOINFO
Copy link
Member Author

Last element.

  • both recipes of ipk and epikrequesting the same boost version constraint in the meta.yaml
  • These two packages are using boost C++ library, which needs to be the same version. ipk produces a binary database, and epik loads it to perform taxonomic classification

Temporary solution is to manually fix both to boost-cpp=1.85, as a more strict versioning constraint.

@blinard-BIOINFO blinard-BIOINFO marked this pull request as ready for review July 9, 2024 11:34
@Juke34
Copy link
Contributor

Juke34 commented Jul 9, 2024

Usually we prefer to keep compatibility as large as possible, so preferring boost-cpp >=1.67instead of boost-cpp =1.85 if no perculiar reason push against that.
We also promote the usage of strict channel priority, so I guess most of user would not meet any issue.
Then the synchronous version needed between EPIK and IPK for me is another problem.
First I'm not sure to understand the problem. Is the binary database produced impacted by the boost-cpp version?
That is strange to me, but in that case it should be well stated on EPIK that bugs can occur if boost-cpp used in EPIK and IPK differ. Another solution is to save in the IPK build database the version of boost-cpp used, and check this information when running EPIK throwing a warning explaining the problem when versions differ. Or both tools should be shipped together...

I do not see any problem at short term to set boost-cpp to 1.85 (both for EPIK and IPK recipes) but if boost-cpp must be the same for EPIK and IPK then problems will occur as soon as the version will be modified in one of the recipe. Synchronizing dependencies versions between recipes seem impossible to me.

What is your thought @martin-g ?

@blinard-BIOINFO
Copy link
Member Author

First I'm not sure to understand the problem. Is the binary database produced impacted by the boost-cpp version?

Indeed, because it is serialized via boost library. Boost guarantees correct de-serialization only for the same version of the library. Problem is, there is no good mechanism at de-serialization time to verify which library version was used at serialization time... We scratch our heads about this for some time now and the best solution we found was to ensure same boost headers version in the compile environment (I basically do the same in conda).

Initially, the choice of package separation is because IPK creates databases that can be used by different tools. For instance, the package sherpas that I created years ago, can also load IPK-built databases, but for different analysis purposes (virus recombination, while epik is for taxonomic classification).

I agree with you, ultimately the solution could be to merge all tools into one conda recipe. But then, to make the release mechanism work, we will have to create a "merged" github repo, with its own releases, for a single conda package.

In the meantime, I would be happy to stick to boost-cpp=1.85. On our side, we need to discuss all this for future releases.

@martin-g
Copy link
Contributor

According to https://www.boost.org/doc/libs/1_85_0/libs/serialization/doc/todo.html#backversioning

Back Versioning
It has been suggested that a useful feature of the library would be the ability to create "older versions" of archives. Currently, the library permits one to make programs that are guaranteed the ability to load archives with classes of a previous version. But there is no way to save classes in accordance with a previous version. 

it should be possible to read data serialized with an older version.

I also don't like pins to exact version but in this case it sounds like the best idea.

@Juke34 Juke34 merged commit 53d10f7 into bioconda:master Jul 10, 2024
6 checks passed
martin-g added a commit to martin-g/bioconda-recipes that referenced this pull request Jul 10, 2024
Pin boost to 1.85 as discussed at bioconda#49000

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/bioconda-recipes that referenced this pull request Jul 11, 2024
Pin boost to 1.85 as discussed at bioconda#49000

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
mencian pushed a commit that referenced this pull request Jul 11, 2024
* epik: add linux-aarch64 build

Pin boost to 1.85 as discussed at #49000

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Use `long double` as a replacement for float128 on aarch64/arm64

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Add osx-arm64

---------

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants