Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsutil breaks after updating to SDK 298 on OS X #1052

Closed
codefrau opened this issue Jun 23, 2020 · 33 comments
Closed

gsutil breaks after updating to SDK 298 on OS X #1052

codefrau opened this issue Jun 23, 2020 · 33 comments

Comments

@codefrau
Copy link

SDK 297 was fine

+ gcloud version
Google Cloud SDK 298.0.0
beta 2020.06.19
bq 2.0.58
core 2020.06.19
gsutil 4.51
kubectl 2020.05.01

+ gsutil -m rsync -r -c -x '^\.|.*\.js\.map$' . gs://croquet.io/

WARNING: You have requested checksumming but your crcmod installation isn't
using the module's C extension, so checksumming will run very slowly. For help
installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
module 'sys' has no attribute 'maxint'
CommandException: 1 files/objects could not be copied/removed.
+ echo 'Fixing metadata...'
Fixing metadata...
+ gsutil -m -q setmeta -h Content-Type:text/html -h 'Cache-Control:public, max-age=60' 'gs://croquet.io/**.html'
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 749, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2348, in run
    cls = copy.copy(class_map[caller_id])
  File "<string>", line 2, in __getitem__
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 753, in _callmethod
    self._connect()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 740, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 487, in Client
    c = SocketClient(address)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused

This is on macOS Catalina 10.15.5:

$ gsutil version -l
gsutil version: 4.51
checksum: a4c57d9b2479f11efe1b0ffb6470c0c5 (OK)
boto version: 2.49.0
python version: 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
OS: Darwin 19.5.0
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): /Users/vanessa/.boto
gsutil path: /usr/local/google-cloud-sdk/bin/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False

The same command works fine again after reverting to 297 that I had installed previously.

@dilipped
Copy link
Collaborator

I tried the rsync command you have mentioned above and it's working for me. Are you able to reproduce the issue?

@adambar
Copy link

adambar commented Jun 24, 2020

I also have an issue with gcloud 298's gsutil on OS X. My error occurs when I run a cp operation and works fine again after downgrading to 297.

I've anonymized my output but it looks like:

/Users/secretuser/google-cloud-sdk/bin/gsutil -q cp -n testfile gs://bucket/hidden/testfile

File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gsutil.py", line 123, in RunMain
    sys.exit(gslib.__main__.main())
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 429, in main
    return _RunNamedCommandAndHandleExceptions(
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 767, in _RunNamedCommandAndHandleExceptions
    _HandleUnknownFailure(e)
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 625, in _RunNamedCommandAndHandleExceptions
    return command_runner.RunNamedCommand(command_name,
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1205, in RunCommand
    self.Apply(_CopyFuncWrapper,
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1485, in Apply
    caller_id = self._SetUpPerCallerState()
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1360, in _SetUpPerCallerState
    class_map[caller_id] = cls
  File "<string>", line 2, in __setitem__
  File "/Users/secretuser/.pyenv/versions/3.8.3/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 850, in _callmethod
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/secretuser/.pyenv/versions/3.8.3/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 243, in serve_client
    request = recv()
  File "/Users/secretuser/.pyenv/versions/3.8.3/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 30, in <module>
    from gslib.command import Command
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/command.py", line 50, in <module>
    from gslib.cloud_api_delegator import CloudApiDelegator
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 26, in <module>
    from gslib.cs_api_map import ApiMapConstants
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/cs_api_map.py", line 23, in <module>
    from gslib.gcs_json_api import GcsJsonApi
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 72, in <module>
    from gslib.third_party.storage_apitools import storage_v1_client as apitools_client
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 26, in <module>
    class StorageV1(base_api.BaseApiClient):
  File "/Users/secretuser/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 38, in StorageV1
    _USER_AGENT += gslib.USER_AGENT
AttributeError: module 'gslib' has no attribute 'USER_AGENT'

@codefrau
Copy link
Author

@dilipped yes I can reproduce. As soon as I update, it breaks:


$ gsutil -m rsync -r -c -x '^\.|.*\.js\.map$' . gs://croquet.io/
Building synchronization state...
Starting synchronization...

$ gcloud version
Google Cloud SDK 297.0.0
beta 2019.05.17
bq 2.0.58
core 2020.06.12
gsutil 4.51
kubectl 2020.05.01
Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

$ sudo gcloud components update 


Your current Cloud SDK version is: 297.0.0
You will be upgraded to version: 298.0.0

┌─────────────────────────────────────────────────────────────────────────────┐
│                      These components will be updated.                      │
├─────────────────────────────────────────────────────┬────────────┬──────────┤
│                         Name                        │  Version   │   Size   │
├─────────────────────────────────────────────────────┼────────────┼──────────┤
│ BigQuery Command Line Tool (Platform Specific)      │     2.0.58 │  < 1 MiB │
│ Cloud SDK Core Libraries                            │ 2020.06.19 │ 15.0 MiB │
│ Cloud SDK Core Libraries (Platform Specific)        │ 2020.06.19 │  < 1 MiB │
│ Cloud Storage Command Line Tool (Platform Specific) │       4.51 │  < 1 MiB │
│ gcloud cli dependencies                             │ 2020.06.19 │  3.4 MiB │
└─────────────────────────────────────────────────────┴────────────┴──────────┘

...

Update done!

To revert your SDK to the previously installed version, you may run:
  $ gcloud components update --version 297.0.0

$ gsutil -m rsync -r -c -x '^\.|.*\.js\.map$' . gs://croquet.io/

WARNING: You have requested checksumming but your crcmod installation isn't
using the module's C extension, so checksumming will run very slowly. For help
installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
module 'sys' has no attribute 'maxint'
CommandException: 1 files/objects could not be copied/removed.
$ 

@chollinger93
Copy link

This AttributeError: module 'gslib' has no attribute 'USER_AGENT' only happens when using the -m flag. It's caused by a missing attribute (USER_AGENT in gslib/__init__.py).

I was able to fix it by manually merging this commit: f8f00d0

Which simply adds the USER_AGENT variable back. Not sure why that is not in the official binary/archive.

Another issue is macOS and Python 3.8 specific, which of course, is a wonderful combination (I'm not bitter, you are!): https://bugs.python.org/issue33725 and #961 give some hints. This can be resolved by upgrading Python or by just glueing it together and hoping for the best: python/cpython@bc36696

Since I still got TypeError: cannot pickle '_io.TextIOWrapper', this one is really funny.

It hits the multiprocessing library reduction.py->dump() method, where it passes both a gsutil.cp process and a dict that starts with {'log_to_stderr': False, 'authkey'.... Apparentlygsutil tries to start a dict as a process somehow.

I hence "fixed" this by adding:

def dump(obj, file, protocol=None):
    '''Replacement for pickle.dump() using ForkingPickler.'''
    if type(obj) == dict:
        return
    ForkingPickler(file, protocol).dump(obj)

in multiprocessing.reduction.dump(), which is more a joke than a fix. But it does tell me that somehow, this funky dict is generated somewhere. I'll just downgrade, but maybe one of the Google folks can look at that. Looks like a dict of what I assume are environment variables somehow make their way into the process pool.

@dilipped
Copy link
Collaborator

@otter-in-a-suit Thanks for the information!

Regarding AttributeError: module 'gslib' has no attribute 'USER_AGENT' this is a known bug that we fixed after gsutil v4.51 was released in f8f00d0 . The fix has been merged and it will be made available in the gcloud sdk binary in the next gsutil release.

@codefrau For the module 'sys' has no attribute 'maxint' error, it is getting raised from the crcmod-osx module. This looks like a bug in https://github.com/gsutil-mirrors/crcmod-osx where it is calling sys.maxint and maxint doesn't exist in python3. As a quick fix, you can try installing the crcmod by following the steps here https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod#macos

@dweekly
Copy link

dweekly commented Jul 13, 2020

Still broken in Cloud SDK 300.0.0 on macOS 10.15.5 (Python 3.8.3).

@sheurich
Copy link

And 301.0.0 macOS 10.16 beta (Python 3.8.3).

@dilipped
Copy link
Collaborator

Unfortunately, the fix for AttributeError: module 'gslib' has no attribute 'USER_AGENT' was not rolled out in the 301.0.0 release. It will be part of 302.0.0. Sorry for the delay.

@hartbeatnt
Copy link

any estimate on when 302.0.0 will be released?

@dilipped
Copy link
Collaborator

21st July, if nothing blocks the release

@hartbeatnt
Copy link

hartbeatnt commented Jul 14, 2020

ok, thanks. For anyone else running into the AttributeError: module 'gslib' has no attribute 'USER_AGENT' issue, rolling back to a previous version will fix the problem until 302.0.0 is released:
gcloud components update --version 297.0.1

You might not need to back that far but I can verify that 291.0.1 works (at least for me)

@ttwd80
Copy link

ttwd80 commented Jul 29, 2020

303.0.0 works for me. We can close this.

@codefrau
Copy link
Author

codefrau commented Jul 29, 2020

Nope. In 303.0.0 I still get module 'sys' has no attribute 'maxint'. And the AttributeError: 'ForkAwareLocal' object has no attribute 'connection' is still there, too. Both were in my very first report.

@ttwd80
Copy link

ttwd80 commented Jul 29, 2020

@codefrau what's the python version? Mine is Python 3.8.3.

@dilipped
Copy link
Collaborator

I just wanted to point out that 303.0.0 only fixes theAttributeError: module 'gslib' has no attribute 'USER_AGENT' issue. The other two issues have not been fixed yet. For the maxint issue, the work around would be to install crcmod library directly instead of relying on the one shipped with gsutil for macOS. Instructions can be found here https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod.

@LokeshNanda
Copy link

In 303.0.0 now getting TypeError: cannot pickle '_io.TextIOWrapper' objec

@giovannibonetti
Copy link

In 303.0.0 now getting TypeError: cannot pickle '_io.TextIOWrapper' objec

Please see #961 (comment). It solved the problem for me.

@Amzd
Copy link

Amzd commented Aug 18, 2020

I have the same maxint error but cannot install crcmod with the instructions at https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod.

Could not find a version that satisfies the requirement crcmod (from versions: )

Update:

Fixed the above error by installing pip2 instead of pip3.

But even with crcmod installed the way that the instructions say it still gives the same maxint error.

The MacOS description also states:

If for some reason the pre-compiled version is not being detected, please let the Google Cloud Storage team know

So hereby.

@dilipped
Copy link
Collaborator

@Amzd which python version are you using? You can check that by doing gsutil ver -l. Make sure you are installing crcmod for the correct python version. If you have multiple Python binaries available on your system, it is possible that gcloud is running on one python version but the crcmod is getting installed for a different python version.

You can check your python path by running gcloud info. Then you can run <your python path> -m pip install crcmod to install crcmod for that particular python version.

@Amzd
Copy link

Amzd commented Aug 21, 2020

Ah okay, gsutil is using python3 but when I try to install crcmod with python3 I get the error:

Could not find a version that satisfies the requirement crcmod (from versions: )
No matching distribution found for crcmod

@codefrau
Copy link
Author

Tried 314.0 today, still broken, I guess #1107 is not deployed yet?

@tmc
Copy link

tmc commented Oct 22, 2020

This is still an issue

@dilipped
Copy link
Collaborator

#1107 Is not deployed yet. We are working on the release and it should be out by next week or the week after.
The PR does not address the crcmod issue. For crcmod related error, installing the library directly should resolve the issue - https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod

@maccman
Copy link

maccman commented Oct 28, 2020

Can you update us when this is fixed? It's still an issue for me (after updating gcloud components).

@alvis
Copy link

alvis commented Nov 8, 2020

I can confirm that manually installing crcmod is valid workaround.

The only tricky thing is that you have to identify which python gsutil is using, and hence the corresponding pip.
If you have configured the CLOUDSDK_PYTHON environment variable, the path is easy to be identified. If not, check the python version via gsutil version -l. 😉

@clintron
Copy link

clintron commented Nov 9, 2020

Building on @alvis 's comment, you'll need to use Python 3.7 for now. It sounds like the issue has been fixed, but I don't know if the patch has made it into the current release yet:
https://bugs.python.org/issue33725

@max-sixty
Copy link

I think this is solved now, could the maintainers confirm?

@codefrau
Copy link
Author

The version in SDK 323.0.0 appears to work fine. I don't remember if I had to build crcmod or not, but it works for me.

The only annoyance is this warning which is printed on each invocation:

If you experience problems with multiprocessing on MacOS, they might be related to https://bugs.python.org/issue33725. You can disable multiprocessing by editing your .boto config or by adding the following flag to your command: -o "GSUtil:parallel_process_count=1". Note that multithreading is still available even if you disable multiprocessing.

I edited my .boto to silence it. I assume it will also go away with a newer Python version (I'm using the system default on Catalina, 3.6.5).

So I as the original reporter of this issue consider it fixed (yay!) but I'll leave it to the maintainers to decide if it's okay to close.

@martindufort
Copy link

Getting this error with this Cloud SDK version:

Google Cloud SDK 325.0.0
beta 2021.01.22
bq 2.0.64
cloud-datastore-emulator 2.1.0
core 2021.01.22
gcloud 
gsutil 4.58

when trying to synchronize.

Building synchronization state...
Starting synchronization...
module 'sys' has no attribute 'maxint'

: python --version                                                                                                                    
Python 3.7.1

@dilipped
Copy link
Collaborator

I will close this based on #1123.

@martindufort The maxint seems to be an issue because of crcmod which is a separate issue and is not related to the multiprocessing issue discussed here. Please install crcmod directly to fix it. You can refer to #1123 to learn more about the crcmod issue. The compiled crcmod library shipped with gsutil is broken for Python3 and hence we recommend installing crcmod directly. Feel free to file a separate issue if that does not work for you.

Thanks!

@confiq
Copy link

confiq commented Apr 17, 2021

@martindufort , because I had the same problem... you'll need to update crcmod as stated before. In my machine™️ upgrading globally helped pip3 install -U crcmod. Might help for the next soul that arrives here from google...

@Ali-Parandeh
Copy link

I still have issues with this. Does anyone know how to fix it? Gsutil keeps hanging for me when I use the -m flag.

@berk94
Copy link

berk94 commented Jan 10, 2023

I fixed the module 'sys' has no attribute 'maxint' error with the following steps:

  1. Run gcloud info
  2. Note down the Python Location as <python_location>
  3. Run <python_location> -m pip install crcmod

I'm using Homebrew Python, which is currently at v 3.10.7, but when I ran gcloud info, I saw that the Python version was 3.9.14 (different than brew's current Python version). Directly running pip3 install -U crcmod did not work as it was installing crcmod for Python3.10, which isn't the Python used by gsutil. Hope this helps others who experience the same problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests