Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

barman-cloud-wal-restore Sometimes Unable to Restore Compressed Files #325

Closed
bonesmoses opened this issue Mar 3, 2021 · 4 comments
Closed
Assignees
Labels
Milestone

Comments

@bonesmoses
Copy link

We have encountered an issue where restoring from an Amazon S3 location can result in this Boto3 error:

2021-03-02 14:53:35,174 [36298] ERROR: Barman cloud WAL restore exception: 'StreamingBody' object has no attribute 'tell'

And this does to be the case. In the Boto3 code, the StreamingBody object does not provide a tell method. Even though it is a "file-like" object, passing this to a method that expects certain callable routines will result in an error. We apparently do this in barman/cloud.py:

        # Write the dest file in binary mode
        with open(dest_path, 'wb') as dest_file:
            # If the file is not compressed, just copy its content
            if not decompress:
                shutil.copyfileobj(remote_file, dest_file)
                return

            if decompress == 'gzip':
                source_file = gzip.GzipFile(fileobj=remote_file, mode='rb')
            elif decompress == 'bzip2':
                source_file = bz2.BZ2File(remote_file, 'rb')
            else:
                raise ValueError("Unknown compression type: %s" % decompress)

            with source_file:
                shutil.copyfileobj(source_file, dest_file)

As the StreamingBody object only offers a read method, the "correct" use would be to avoid shutil.copyfileobj and write the file directly. I.e.:

dest_file.write(source_file.read())

# Or this

for chunk in source_file.read(1048576): # or some other configurable buffer size
    dest_file.write(chunk)

This appears to only happen when the base class or various superclasses (gzip.GzipFile) when streamed from S3 lack the tell method, but this can happen with other methods as well, such as seek. This may also be boto3 version dependent, as there's currently a bug discussion in issue 879 regarding adding IOBase to add these missing methods. The issue is still open, but some pull requests have been accepted through its lifetime.

Either we should require a certain Boto3 version, wrap the compressed IO classes to include IOBase ourselves or add the methods shutil.copyfileobj expects, or write files directly.

I'll also note that in the same barman/cloud.py we create a StreamingBodyIO class that extends RawIOBase, but this is not utilized for compressed streams.

@bonesmoses
Copy link
Author

Upon further investigation, it is not sufficient to read() and write() directly, as this still causes the error. For older versions of boto3, this seems to be coming directly from the s3transfer library which was later folded into botocore. Further investigation is necessary from someone more familiar with Barman internals and S3 interactivity via Boto.

@Kamal-Villupuram
Copy link

Any updates?

@pguser4ever
Copy link

We are currently looking at barman-cloud-backup, barman-cloud-restore/barman-cloud-wal-restore for backup solution and we interested in your progress

@mikewallace1979
Copy link
Contributor

I spent some time researching this issue but unfortunately didn't manage to resolve it in time for 2.13. What I did learn is summarized below.

Firstly, I am only able to reproduce this issue with barman-cloud on Python 2.7 when restoring WALs which have been archived with gzip compression.

To reproduce against a local minio instance:

​$ barman-cloud-wal-archive s3://mt-backups mt-primary /path/to/wals/0000000200000000/000000020000000000000074 --endpoint-url=http://localhost:9000 --gzip
$ barman-cloud-wal-restore s3://mt-backups mt-primary 000000020000000000000074 /tmp/000000020000000000000074 --endpoint-url=http://localhost:9000 -vv
...
2021-07-27 14:11:23,597 [3221] ERROR: Barman cloud WAL restore exception: 'StreamingBody' object has no attribute 'tell'
2021-07-27 14:11:23,597 [3221] DEBUG: Exception details:
Traceback (most recent call last):
  File "/Users/michael.wallace/src/EnterpriseDB/barman/barman/clients/cloud_walrestore.py", line 68, in main
    downloader.download_wal(config.wal_name, config.wal_dest)
  File "/Users/michael.wallace/src/EnterpriseDB/barman/barman/clients/cloud_walrestore.py", line 242, in download_wal
    self.cloud_interface.download_file(remote_name, wal_dest, compression)
  File "/Users/michael.wallace/src/EnterpriseDB/barman/barman/cloud_providers/aws_s3.py", line 253, in download_file
    shutil.copyfileobj(source_file, dest_file)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 63, in copyfileobj
    buf = fsrc.read(length)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 267, in read
    self._read(readsize)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 294, in _read
    pos = self.fileobj.tell()   # Save current position
AttributeError: 'StreamingBody' object has no attribute 'tell'

The problem is actually in the python2 implementation of gzip which is why using an alternative to shutil.copyfileobj does not help.

If we use bzip2 we fail for a different reason (though fundamentally the issue is still that we provide something that a python2 compression implementation doesn't expect):

$ barman-cloud-wal-archive s3://mt-backups mt-primary /path/to/wals/0000000200000000/000000020000000000000074 --endpoint-url=http://localhost:9000 --bzip2
$ barman-cloud-wal-restore s3://mt-backups mt-primary 000000020000000000000074 /tmp/000000020000000000000074 --endpoint-url=http://localhost:9000 -vv
...
2021-07-27 14:17:09,874 [3553] ERROR: Barman cloud WAL restore exception: coercing to Unicode: need string or buffer, StreamingBody found
2021-07-27 14:17:09,874 [3553] DEBUG: Exception details:
Traceback (most recent call last):
  File "/Users/michael.wallace/src/EnterpriseDB/barman/barman/clients/cloud_walrestore.py", line 68, in main
    downloader.download_wal(config.wal_name, config.wal_dest)
  File "/Users/michael.wallace/src/EnterpriseDB/barman/barman/clients/cloud_walrestore.py", line 242, in download_wal
    self.cloud_interface.download_file(remote_name, wal_dest, compression)
  File "/Users/michael.wallace/src/EnterpriseDB/barman/barman/cloud_providers/aws_s3.py", line 248, in download_file
    source_file = bz2.BZ2File(remote_file, "rb")
TypeError: coercing to Unicode: need string or buffer, StreamingBody found

If the WAL is archived with no compression then the wal-restore is successful. If the wal-restore happens with barman-cloud on python3 then it is also successful.

Restoring the backup content itself is unaffected as it uses tarfile to decompress the stream rather than the gzip or bz2 modules.

Regarding gzip, the problematic code in python gzip is attempting to determine if we are at the end of a file by seeking to the end and comparing the position - I'm not sure this would be particularly desirable behaviour on any file-like object which represents a stream over the network. This issue was fixed in Python 3.2 which adds support for non-seekable file-like objects.

Regarding bzip2, the issue is simply that the implementation in Python 2.7 does not support reading from file-like objects - this was added in Python 3.3.

Assuming the only causes are indeed the Python 2 gzip and bzip2 implementations then the workaround would be either to upgrade to a supported Python 3 or to avoid using compression with barman-cloud-wal-archive on Python 2.

That being the case I'm going to close this issue however: if anybody experiences this issue with Python >= 3.6 then please reopen and share the details of your environment.

Note: The issue also affects barman-cloud-wal-restore when using the azure-blob-storage cloud provider so you will need to use the same workaround of either using Python 3 or archiving WALs without compression.

mikewallace1979 added a commit that referenced this issue Jul 27, 2021
Updates the barman-cloud-wal-archive man page and the argument
help strings to warn people that compression options should not be
used with older python versions (<3.2 for gzip, 3.3 for bzip2).

See issue #325 for the full context.
mikewallace1979 added a commit that referenced this issue Aug 4, 2021
Detects when barman-cloud-wal-restore is attempting to restore a
compressed WAL on python 2.x and returns an error message telling
the user to upgrade to a supported python 3.x.

See issue #325
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants