Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker #300

willbowditch · 2018-07-24T16:55:40Z

I'm getting the following error using a basic cluster initialised using the dataproc-initialization-actions/jupyter/ scripts.

Exception: Python in worker has different version 3.6 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

I've confirmed this is the case: on the master version is 3.7.0 and on workers its 3.6.5.

I've tracked down the issue to this segment of code:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/dced28ebd3780307418789ac51bfe8303030cd9a/jupyter/jupyter.sh#L44-L52

conda defaults to upgrading python to 3.7.0 for installations of jupyter causing the different versions.

I think wither the init script should prevent conda from installing the new python version, or perhaps upgrading the workers (I did this manually for now). I'm not familiar with conda (use pip instead), so unsure if there is a straightforward fix for this.

The text was updated successfully, but these errors were encountered:

karth295 · 2018-07-26T20:26:23Z

Confirmed your theory:

Create conda cluster
Run conda install jupyter

The following packages will be UPDATED:

    asn1crypto:         0.24.0-py36_0          --> 0.24.0-py37_0         
    certifi:            2018.4.16-py36_0       --> 2018.4.16-py37_0      
    cffi:               1.11.5-py36h9745a5d_0  --> 1.11.5-py37h9745a5d_0 
    chardet:            3.0.4-py36h0f667ec_1   --> 3.0.4-py37_1          
    conda:              4.5.4-py36_0           --> 4.5.8-py37_0          
    cryptography:       2.2.2-py36h14c3975_0   --> 2.2.2-py37h14c3975_0  
    idna:               2.6-py36h82fb2a8_1     --> 2.7-py37_0            
    pip:                10.0.1-py36_0          --> 10.0.1-py37_0         
    pycosat:            0.6.3-py36h0a5515d_0   --> 0.6.3-py37h14c3975_0  
    pycparser:          2.18-py36hf9f622e_1    --> 2.18-py37_1           
    pyopenssl:          18.0.0-py36_0          --> 18.0.0-py37_0         
    pysocks:            1.6.8-py36_0           --> 1.6.8-py37_0          
    python:             3.6.5-hc3d631a_2       --> 3.7.0-hc3d631a_0      
    requests:           2.18.4-py36he2e5f8d_1  --> 2.19.1-py37_0         
    ruamel_yaml:        0.15.37-py36h14c3975_2 --> 0.15.42-py37h14c3975_0
    setuptools:         39.2.0-py36_0          --> 39.2.0-py37_0         
    six:                1.11.0-py36h372c433_1  --> 1.11.0-py37_1         
    sqlite:             3.23.1-he433501_0      --> 3.24.0-h84994c4_0     
    urllib3:            1.22-py36hbe7ace6_0    --> 1.23-py37_0           
    wheel:              0.31.1-py36_0          --> 0.31.1-py37_0

Indeed Python is one of the packages that gets updated. I like the idea of preventing conda from upgrading other packages (--no-update-dependencies)

karth295 · 2018-07-26T20:28:52Z

A different solution is to pin to a particular version of conda: https://stackoverflow.com/questions/51427175/error-while-running-pyspark-dataproc-job-due-to-python-version

karth295 · 2018-07-30T21:40:19Z

For posterity, --no-update-dependencies does not work in this case.

A third possible solution is to install jupyter on all nodes so that the python environment starts out consistent between master and workers.

Fixes GoogleCloudDataproc#300

Pin python version to version already installed Fixes GoogleCloudDataproc#300

Related to GoogleCloudDataproc#300

Related to #300

Also, pin python version to version already installed Fixes #300

willbowditch changed the title ~~Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker~~ Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker Jul 24, 2018

willbowditch changed the title ~~Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker~~ Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker Jul 24, 2018

karth295 mentioned this issue Jul 26, 2018

worker has different version 2.7 than that in driver 3.5 #291

Closed

karth295 added a commit to karth295/dataproc-initialization-actions that referenced this issue Jul 30, 2018

Install jupyter on all nodes to ensure consistent python version

f2ac73f

Fixes GoogleCloudDataproc#300

This was referenced Jul 30, 2018

Install jupyter on all nodes to ensure consistent python version #306

Merged

driver and worker Python versions are incompatible after fresh install #310

Closed

karth295 added a commit to karth295/dataproc-initialization-actions that referenced this issue Aug 6, 2018

Install jupyter on all nodes to ensure consistent python version

fd8c4a1

Pin python version to version already installed Fixes GoogleCloudDataproc#300

karth295 added a commit to karth295/dataproc-initialization-actions that referenced this issue Aug 6, 2018

Pin miniconda version to avoid breaking changes

b8282e7

Related to GoogleCloudDataproc#300

karth295 mentioned this issue Aug 6, 2018

Pin miniconda version to avoid breaking changes #311

Merged

karth295 added a commit that referenced this issue Aug 6, 2018

Pin miniconda version to avoid breaking changes (#311)

636a163

Related to #300

karth295 closed this as completed in #306 Aug 6, 2018

karth295 added a commit that referenced this issue Aug 6, 2018

Install jupyter on all nodes to ensure consistent python version (#306)

96a6a9a

Also, pin python version to version already installed Fixes #300

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker #300

Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker #300

willbowditch commented Jul 24, 2018

karth295 commented Jul 26, 2018

karth295 commented Jul 26, 2018

karth295 commented Jul 30, 2018

Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker #300

Bug in dataproc-initialization-actions/jupyter/ causes different version on master and worker #300

Comments

willbowditch commented Jul 24, 2018

karth295 commented Jul 26, 2018

karth295 commented Jul 26, 2018

karth295 commented Jul 30, 2018