Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda does not use hard links to root packages #4881

Closed
legg33 opened this issue Mar 15, 2017 · 10 comments
Closed

Conda does not use hard links to root packages #4881

legg33 opened this issue Mar 15, 2017 · 10 comments
Labels
locked [bot] locked due to inactivity source::community catch-all for issues filed by community members type::support neither a bug nor feature, is really just a user having questions or difficulty somewhere

Comments

@legg33
Copy link

legg33 commented Mar 15, 2017

After installing the anaconda python 3.6 version I run

conda create --name flowers numpy

and it starts to download packages from the Internet. I abort, because I am on a metered connection. Shouldn't conda create a hard link because numpy is already available in the root environment?

When i run the following command

conda create --name flowers python=3.5

it downloads all the required packages, which is expected. Then I run

conda create --name flowers2 python=3.5

and it does not download everything again. Why doesn't this work with the packages present in the root environment?

This happens on my Ubuntu 16.10 and my Windows 7 machine.

@legg33 legg33 changed the title Conda does not use hard links Conda does not use hard links to root packages Mar 15, 2017
@nehaljwani
Copy link
Contributor

In Ubuntu, is the root environment created by the same user who is trying to create the flowers environment?

@legg33
Copy link
Author

legg33 commented Mar 18, 2017

Yes both are the same users (me) and anaconda is installed in my home directory. Here is the full output

conda create -n flowers numpy
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/MYUSER/anaconda3/envs/flowers:

The following NEW packages will be INSTALLED:

	mkl:        2017.0.1-0   
	numpy:      1.12.0-py36_0
	openssl:    1.0.2k-1     
	pip:        9.0.1-py36_1 
	python:     3.6.0-0      
	readline:   6.2-2        
	setuptools: 27.2.0-py36_0
	sqlite:     3.13.0-0     
	tk:         8.5.18-0     
	wheel:      0.29.0-py36_0
	xz:         5.2.2-1      
	zlib:       1.2.8-3      

Proceed ([y]/n)? y

mkl-2017.0.1-0 100% |################################| Time: 0:00:42   3.13 MB/s
openssl-1.0.2k 100% |################################| Time: 0:00:00   4.33 MB/s
readline-6.2-2 100% |################################| Time: 0:00:00   3.78 MB/s
sqlite-3.13.0- 100% |################################| Time: 0:00:01   3.68 MB/s
tk-8.5.18-0.ta 100% |################################| Time: 0:00:00   4.05 MB/s
xz-5.2.2-1.tar 100% |################################| Time: 0:00:00   3.66 MB/s
zlib-1.2.8-3.t 100% |################################| Time: 0:00:00   4.49 MB/s
python-3.6.0-0 100% |################################| Time: 0:00:05   3.15 MB/s
numpy-1.12.0-p 100% |################################| Time: 0:00:02   3.19 MB/s
setuptools-27. 100% |################################| Time: 0:00:00   3.50 MB/s
wheel-0.29.0-p 100% |################################| Time: 0:00:00   4.71 MB/s
pip-9.0.1-py36 100% |################################| Time: 0:00:00   3.80 MB/s
#
# To activate this environment, use:
# > source activate flowers
#
# To deactivate this environment, use:
# > source deactivate flowers
#

I've decided to let it run once. Now I can

conda remove --all -n flowers

and then again

conda create -n flowers numpy

The following output gets created

conda create -n flowers numpy
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/MYUSER/anaconda3/envs/flowers:

The following NEW packages will be INSTALLED:

	mkl:        2017.0.1-0   
	numpy:      1.12.0-py36_0
	openssl:    1.0.2k-1     
	pip:        9.0.1-py36_1 
	python:     3.6.0-0      
	readline:   6.2-2        
	setuptools: 27.2.0-py36_0
	sqlite:     3.13.0-0     
	tk:         8.5.18-0     
	wheel:      0.29.0-py36_0
	xz:         5.2.2-1      
	zlib:       1.2.8-3      

Proceed ([y]/n)? y
#
# To activate this environment, use:
# > source activate flowers
#
# To deactivate this environment, use:
# > source deactivate flowers
#

So the previously downloaded packages must get cached somewhere. The question is why does it seem like the root packages are not present in this cache?

The same thing happened when running the first time

conda create -n flowers anaconda

I thought maybe it just fetched a newer numpy package, but I have updated anaconda in my root environment and when I created the first new environment with anaconda in it it also downloaded every package.

@nehaljwani
Copy link
Contributor

Could you please share the output for conda info -a

@legg33
Copy link
Author

legg33 commented Mar 18, 2017

Sure.

 Current conda install:

		   platform : linux-64
	      conda version : 4.3.14
	   conda is private : False
	  conda-env version : 4.3.14
	conda-build version : not installed
	     python version : 3.6.0.final.0
	   requests version : 2.12.4
	   root environment : /home/marvingee/anaconda3  (writable)
	default environment : /home/marvingee/anaconda3
	   envs directories : /home/marvingee/anaconda3/envs
	                      /home/marvingee/.conda/envs
              package cache : /home/marvingee/anaconda3/pkgs
			      /home/marvingee/.conda/pkgs
	       channel URLs : https://repo.continuum.io/pkgs/free/linux-64
			      https://repo.continuum.io/pkgs/free/noarch
			      https://repo.continuum.io/pkgs/r/linux-64
			      https://repo.continuum.io/pkgs/r/noarch
		              https://repo.continuum.io/pkgs/pro/linux-64
			      https://repo.continuum.io/pkgs/pro/noarch
		config file : None
	       offline mode : False
		 user-agent : conda/4.3.14 requests/2.12.4 CPython/3.6.0 Linux/4.8.0-39-generic debian/stretch/sid glibc/2.24
		    UID:GID : 1000:1000

# conda environments:
#
openCvEnv                /home/marvingee/anaconda3/envs/openCvEnv
root                  *  /home/marvingee/anaconda3

sys.version: 3.6.0 |Anaconda 4.3.1 (64-bit)| (default...
sys.prefix: /home/marvingee/anaconda3
sys.executable: /home/marvingee/anaconda3/bin/python
conda location: /home/marvingee/anaconda3/lib/python3.6/site-packages/conda
conda-build: None
conda-env: /home/marvingee/anaconda3/bin/conda-env
conda-server: /home/marvingee/anaconda3/bin/conda-server
user site dirs: 

CIO_TEST: <not set>
CONDA_DEFAULT_ENV: <not set>
CONDA_ENVS_PATH: <not set>
LD_LIBRARY_PATH: <not set>
PATH: /home/marvingee/anaconda3/bin:/opt/texbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:~/.local/bin:
PYTHONHOME: <not set>
PYTHONPATH: <not set>

License directories:
	/home/marvingee/.continuum
	/home/marvingee/anaconda3/licenses
License files (license*.txt):
Package/feature end dates:

@legg33
Copy link
Author

legg33 commented Apr 17, 2017

Here some output that shows, that the inode numbers differ for the same library in the root and another environment -> no hard links :(

ls -li ~/anaconda3/lib/libcurl.so*
 807828 lrwxrwxrwx 1 marvingee marvingee     16 Mär 15 22:25 anaconda3/lib/libcurl.so -> libcurl.so.4.4.0
 807829 lrwxrwxrwx 1 marvingee marvingee     16 Mär 15 22:25 anaconda3/lib/libcurl.so.4 -> libcurl.so.4.4.0
 807830 -rwxrwxr-x 1 marvingee marvingee 468319 Mär 15 22:25 anaconda3/lib/libcurl.so.4.4.0

ls -li ~/anaconda3/envs/openCvEnv/lib/libcurl.so*
 812698 lrwxrwxrwx 1 marvingee marvingee     16 Mär 18 09:08 anaconda3/envs/openCvEnv/lib/libcurl.so -> libcurl.so.4.4.0
 812699 lrwxrwxrwx 1 marvingee marvingee     16 Mär 18 09:08 anaconda3/envs/openCvEnv/lib/libcurl.so.4 -> libcurl.so.4.4.0
 812700 -rwxrwxr-x 1 marvingee marvingee 468319 Mär 18 09:08 anaconda3/envs/openCvEnv/lib/libcurl.so.4.4.0

@nehaljwani
Copy link
Contributor

nehaljwani commented Apr 17, 2017

@MarvinGee That is because libcurl.so.4.4.0 is patched on installation, so it can no longer be the same file as the one you get after extracting the package. See the following:

(root) [thanos@titan ~]# cat $CONDA_PREFIX/pkgs/curl-7.52.1-0/info/has_prefix 
/home/ilan/minonda/conda-bld/curl_1485366318540/_b_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_ binary lib/libcurl.so.4.4.0
/opt/anaconda1anaconda2anaconda3 text bin/curl-config
/opt/anaconda1anaconda2anaconda3 text lib/libcurl.la
/opt/anaconda1anaconda2anaconda3 text lib/pkgconfig/libcurl.pc

@legg33
Copy link
Author

legg33 commented Apr 18, 2017

Okay. Could you explain to me what is supposed to get hard linked between the environments (with the same version of a library of course) or where I can inform myself, please?

Do they only use a common package cache and on installation copy/extract the package from the cache in the new environment as a completely new file and therefore inode, or is conda supposed to look if the package is already installed in another existing environment and create a hard link to the appropiate inode?

@nehaljwani
Copy link
Contributor

nehaljwani commented Apr 18, 2017

IIUC, conda first downloads the package tarball from whichever channel you ask it to download from and it is kept in the directory /path/to/root/conda/environment/pkgs or in $HOME/.conda/pkgs. However, this can be overridden by specifying a different directory using the config parameter pkgs_dirs. So you can do:

conda config --prepend pkgs_dirs /path/to/custom/dir/pkgs

Now, this directory will have all the tarballs and their extracted contents. Each package tarball has the file info/files which contains the list of all files which are supposed to installed in the activated conda environment. Now, conda attempts to hard link all of these files into the current environment. However, if there is a file info/has_prefix in the package tarball, then those files will be patched based on the string provided in that file. These files will not have the same inode as the one in the package cache directory as they need to have environment-path specific strings in them.

Now, looking at an example (on installation of the conda package: curl):

The files to be installed into the environment are:

(root) [thanos@titan ~]# cat $CONDA_PREFIX/pkgs/curl-7.52.1-0/info/files
bin/curl
bin/curl-config
include/curl/curl.h
include/curl/curlbuild.h
include/curl/curlrules.h
include/curl/curlver.h
include/curl/easy.h
include/curl/mprintf.h
include/curl/multi.h
include/curl/stdcheaders.h
include/curl/typecheck-gcc.h
lib/libcurl.a
lib/libcurl.la
lib/libcurl.so
lib/libcurl.so.4
lib/libcurl.so.4.4.0
lib/pkgconfig/libcurl.pc

The files which need to be patched are:

(root) [thanos@titan ~]# awk '{print $3}' $CONDA_PREFIX/pkgs/curl-7.52.1-0/info/has_prefix
lib/libcurl.so.4.4.0
bin/curl-config
lib/libcurl.la
lib/pkgconfig/libcurl.pc

If conda succeeds in creating hard links, then set(all files ∈ info/files) - set(all files ∈ info/has_prefix) should have the same inode as the extracted file in the package cache directory.

(root) [thanos@titan ~]# stat -c %i $CONDA_PREFIX/bin/curl
1193108
(root) [thanos@titan ~]# stat -c %i $CONDA_PREFIX/pkgs/curl-7.52.1-0/bin/curl
1193108

The files which are present in info/has_prefix will have different inode numbers.

(root) [thanos@titan ~]# stat -c %i $CONDA_PREFIX/lib/libcurl.so.4.4.0
288620
(root) [thanos@titan ~]# stat -c %i $CONDA_PREFIX/pkgs/curl-7.52.1-0/lib/libcurl.so.4.4.0
1201180

Proof of patch:

(root) [thanos@titan ~]# diff <(xxd $CONDA_PREFIX/pkgs/curl-7.52.1-0/lib/libcurl.so.4.4.0) <(xxd $CONDA_PREFIX/lib/libcurl.so.4.4.0)
21177,21193c21177,21193
< 0052b80: 2f68 6f6d 652f 696c 616e 2f6d 696e 6f6e  /home/ilan/minon
< 0052b90: 6461 2f63 6f6e 6461 2d62 6c64 2f63 7572  da/conda-bld/cur
< 0052ba0: 6c5f 3134 3835 3336 3633 3138 3534 302f  l_1485366318540/
< 0052bb0: 5f62 5f65 6e76 5f70 6c61 6365 686f 6c64  _b_env_placehold
< 0052bc0: 5f70 6c61 6365 686f 6c64 5f70 6c61 6365  _placehold_place
< 0052bd0: 686f 6c64 5f70 6c61 6365 686f 6c64 5f70  hold_placehold_p
< 0052be0: 6c61 6365 686f 6c64 5f70 6c61 6365 686f  lacehold_placeho
< 0052bf0: 6c64 5f70 6c61 6365 686f 6c64 5f70 6c61  ld_placehold_pla
< 0052c00: 6365 686f 6c64 5f70 6c61 6365 686f 6c64  cehold_placehold
< 0052c10: 5f70 6c61 6365 686f 6c64 5f70 6c61 6365  _placehold_place
< 0052c20: 686f 6c64 5f70 6c61 6365 686f 6c64 5f70  hold_placehold_p
< 0052c30: 6c61 6365 686f 6c64 5f70 6c61 6365 686f  lacehold_placeho
< 0052c40: 6c64 5f70 6c61 6365 686f 6c64 5f70 6c61  ld_placehold_pla
< 0052c50: 6365 686f 6c64 5f70 6c61 6365 686f 6c64  cehold_placehold
< 0052c60: 5f70 6c61 6365 686f 6c64 5f70 6c61 6365  _placehold_place
< 0052c70: 686f 6c64 5f70 6c61 6365 686f 6c64 5f2f  hold_placehold_/
< 0052c80: 7373 6c2f 6361 6365 7274 2e70 656d 0000  ssl/cacert.pem..
---
> 0052b80: 2f63 6f6e 6461 2f73 736c 2f63 6163 6572  /conda/ssl/cacer
> 0052b90: 742e 7065 6d00 0000 0000 0000 0000 0000  t.pem...........
> 0052ba0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052bb0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052bc0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052bd0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052be0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052bf0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c00: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c10: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c20: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c30: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c40: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c50: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c60: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c70: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 0052c80: 0000 0000 0000 0000 0000 0000 0000 0000  ................

@kalefranz
Copy link
Contributor

#4253

@github-actions
Copy link

Hi there, thank you for your contribution to Conda!

This issue has been automatically locked since it has not had recent activity after it was closed.

Please open a new issue if needed.

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Oct 20, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2021
@kenodegard kenodegard added type::support neither a bug nor feature, is really just a user having questions or difficulty somewhere and removed type::question labels Jan 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity source::community catch-all for issues filed by community members type::support neither a bug nor feature, is really just a user having questions or difficulty somewhere
Projects
None yet
Development

No branches or pull requests

4 participants