Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data delivery to public cloud storage bucket #150

Open
charlesbrandt opened this issue Feb 7, 2024 · 1 comment
Open

Data delivery to public cloud storage bucket #150

charlesbrandt opened this issue Feb 7, 2024 · 1 comment

Comments

@charlesbrandt
Copy link
Contributor

Some users need to work in a public cloud context with data managed in a Bioloop instance. For this feature, users can specify cloud resources (e.g. bucket storage) and necessary keys for access. After the public cloud resource has been properly configured to grant the Bioloop instance's system user access, users can initialize data transfer from Bioloop to the bucket with a "Push to Cloud" button.

Screenshot from 2024-02-07 17-50-52-annotated

Each cloud service will need a separate worker that can handle the transfer. The necessary keys provided by the cloud service can be tracked on a per project basis in the Bioloop database. The worker will need to use those credentials to configure the cloud client and make the transfer. As an example, this is roughly what the process looks like for Amazon S3:

Install AWS cli

python -m pip install --user awscli

Configure it

cd /path/to/workers

aws configure
AWS Access Key ID [None]: [from file]
AWS Secret Access Key [None]: [from file]
Default region name [None]: 
Default output format [None]: 

Verify access with a test command

aws s3 ls s3://bucket-name

Do the transfer

aws s3 sync [source-data-file] s3://bucket-name/destination

Note: Some buckets may be configured with write permissions but not allow subsequent modification of file names / location. This can prevent moving / renaming files after the intiial transfer.

@charlesbrandt
Copy link
Contributor Author

Ideally, when delivering the data to a public cloud bucket, the corresponding md5 checksum values should also be sent so users can verify the data was transferred successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant