Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conda 4.7.x new compression algorithm corrupts some files in some packages #8973

Closed
StrikerRUS opened this issue Jul 21, 2019 · 6 comments
Closed
Labels
locked [bot] locked due to inactivity

Comments

@StrikerRUS
Copy link

Current Behavior

In conda 4.7.x new compression algorithm corrupts some files in some packages which results in _csv.Error: line contains NULL byte error. This behavior presents only on Linux and Python 3.6. Other OSes and Python versions seems to be bug-free. conda 4.6 is also bug-free.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/travis/miniconda/envs/test-env/lib/python3.6/site-packages/sklearn/datasets/base.py", line 739, in load_boston
    temp = next(data_file)
_csv.Error: line contains NULL byte

Refer to nilearn/nilearn#2079. Logs from reproducible example are available here: https://travis-ci.org/microsoft/LightGBM/jobs/561754537. And here is a little bit more detailed log from pytest with more complex test:

________________________________ TestBasic.test ________________________________
self = <test_basic.TestBasic testMethod=test>
    def test(self):
>       X_train, X_test, y_train, y_test = train_test_split(*load_breast_cancer(True),
                                                            test_size=0.1, random_state=2)

../tests/python_package_test/test_basic.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../../miniconda/envs/test-env/lib/python3.6/site-packages/sklearn/datasets/base.py:456: in load_breast_cancer
    data, target, target_names = load_data(module_path, 'breast_cancer.csv')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
module_path = '/home/travis/miniconda/envs/test-env/lib/python3.6/site-packages/sklearn/datasets'
data_file_name = 'breast_cancer.csv'

    def load_data(module_path, data_file_name):
        """Loads data from module_path/data/data_file_name.
    
        Parameters
        ----------
        module_path : string
            The module path.
    
        data_file_name : string
            Name of csv file to be loaded from
            module_path/data/data_file_name. For example 'wine_data.csv'.
    
        Returns
        -------
        data : Numpy array
            A 2D array with each row representing one sample and each column
            representing the features of a given sample.
    
        target : Numpy array
            A 1D array holding target variables for all the samples in `data.
            For example target[0] is the target varible for data[0].
   
        target_names : Numpy array
            A 1D array containing the names of the classifications. For example
            target_names[0] is the name of the target[0] class.
        """
        with open(join(module_path, 'data', data_file_name)) as csv_file:
            data_file = csv.reader(csv_file)
>           temp = next(data_file)
E           _csv.Error: line contains NULL byte

Steps to Reproduce

conda config --set always_yes yes --set changeps1 no
conda update -q -y conda

conda create -q -y -n $CONDA_ENV python=$PYTHON_VERSION
source activate $CONDA_ENV
cd $BUILD_DIRECTORY

conda install -q -y -n $CONDA_ENV scikit-learn
python -c "from sklearn.datasets import load_boston; X, y = load_boston(True)"

Expected Behavior

No _csv.Error: line contains NULL byte during reading files.

Environment Information

`conda info`

     active environment : test-env
    active env location : /home/travis/miniconda/envs/test-env
            shell level : 1
       user config file : /home/travis/.condarc
 populated config files : /home/travis/.condarc
          conda version : 4.7.9
    conda-build version : not installed
         python version : 3.7.3.final.0
       virtual packages : 
       base environment : /home/travis/miniconda  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/travis/miniconda/pkgs
                          /home/travis/.conda/pkgs
       envs directories : /home/travis/miniconda/envs
                          /home/travis/.conda/envs
               platform : linux-64
             user-agent : conda/4.7.9 requests/2.21.0 CPython/3.7.3 Linux/4.4.0-101-generic ubuntu/14.04.5 glibc/2.19
                UID:GID : 2000:2000
             netrc file : None
           offline mode : False

`conda config --show-sources`

==> /home/travis/.condarc <==
changeps1: False
always_yes: True

`conda list --show-channel-urls`

# packages in environment at /home/travis/miniconda/envs/test-env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
blas                      1.0                         mkl    defaults
ca-certificates           2019.5.15                     0    defaults
certifi                   2019.6.16                py36_0    defaults
intel-openmp              2019.4                      243    defaults
joblib                    0.13.2                   py36_0    defaults
libedit                   3.1.20181209         hc058e9b_0    defaults
libffi                    3.2.1                hd88cf55_4    defaults
libgcc-ng                 9.1.0                hdf63c60_0    defaults
libgfortran-ng            7.3.0                hdf63c60_0    defaults
libstdcxx-ng              9.1.0                hdf63c60_0    defaults
mkl                       2019.4                      243    defaults
mkl-service               2.0.2            py36h7b6447c_0    defaults
mkl_fft                   1.0.12           py36ha843d7b_0    defaults
mkl_random                1.0.2            py36hd81dba3_0    defaults
ncurses                   6.1                  he6710b0_1    defaults
numpy                     1.16.4           py36h7e9f1db_0    defaults
numpy-base                1.16.4           py36hde5b4d6_0    defaults
openssl                   1.1.1c               h7b6447c_1    defaults
pip                       19.1.1                   py36_0    defaults
python                    3.6.8                h0371630_0    defaults
readline                  7.0                  h7b6447c_5    defaults
scikit-learn              0.21.2           py36hd81dba3_0    defaults
scipy                     1.3.0            py36h7c811a0_0    defaults
setuptools                41.0.1                   py36_0    defaults
six                       1.12.0                   py36_0    defaults
sqlite                    3.29.0               h7b6447c_0    defaults
tk                        8.6.8                hbc83047_0    defaults
wheel                     0.33.4                   py36_0    defaults
xz                        5.2.4                h14c3975_4    defaults
zlib                      1.2.11               h7b6447c_3    defaults

@StrikerRUS
Copy link
Author

StrikerRUS commented Jul 21, 2019

Take a look at the following screen (conda 4.6 vs conda 4.7):

image

@msarahan
Copy link
Contributor

we will re-convert this package and test it again. We have seen some corruption issues that may be associated with the behavior of the conversion process when we abort it. Thanks for reporting it.

@StrikerRUS
Copy link
Author

@msarahan Thanks a lot for your info! I'll be happy to test re-converted package ASAP.

@msarahan
Copy link
Contributor

The fixed package is up. I've tested it and your reproducer now works. Thanks again for letting us know.

@StrikerRUS
Copy link
Author

@msarahan I can confirm that it's working now. Thank you very much for fast fix!

@github-actions
Copy link

Hi there, thank you for your contribution to Conda!

This issue has been automatically locked since it has not had recent activity after it was closed.

Please open a new issue if needed.

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Aug 29, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

No branches or pull requests

2 participants