Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check files consistency between cloud providers storages #750

Open
nenkie76 opened this issue Dec 27, 2022 · 0 comments
Open

Check files consistency between cloud providers storages #750

nenkie76 opened this issue Dec 27, 2022 · 0 comments

Comments

@nenkie76
Copy link

nenkie76 commented Dec 27, 2022

Hi,

I've been experimenting with smart_open and can't figure out which way can I ensure that files are consistent when coping data between GCS and S3 (and vice versa).

with open(uri=f"...",mode='rb',transport_params=dict(client=gcs_client)) as fout:
    with open(uri=f"...", mode='wb',transport_params=s3_tp) as fin:
        for line in fout:
            fin.write(line)

ETags are not matching (which is expected I guess), but files are different in size when copied from GCS to S3.
gsutil shows size 1340495 bytes and after copying to s3 it's 1291979 bytes (though the file itself seems ok).
I've tried turn off s3 multipart_upload, but that doesn't change the behaviour.

If I use below ordinary way to read/write files, my file size taken from gcs and written to s3 matches, and I can create validation process.

for blob in blobs:
    buffer = io.BytesIO()
    blob.download_to_file(buffer)
    buffer.seek(0)
    s3_client.put_object(Body=buffer, Bucket='...' Key=blob.name)

Which mechanism can be used to validate files consistency after copy?

PyDev console: 
macOS-13.1-arm64-arm-64bit
Python 3.10.5 (v3.10.5:f377153967, Jun  6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]
smart_open 6.3.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant