Skip to content

aws-samples/data-offload

SnowballDataOffload

Table of Contents

Disclaimer

Sample code, software libraries, command line tools, proofs of concept, templates, or other related technology are provided as AWS Content or Third-Party Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content or Third-Party Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content or Third-Party Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content or Third-Party Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Security is a top priority at AWS. Carefully consider security when deploying this solution. Please see also Snowball Edge Developer Guide Security for AWS Snowball Edge and [Best Practices for Security] (https://docs.aws.amazon.com/snowball/latest/developer-guide/best-practice-security.html)

Solution Overview

This Snowball Data Offload agent is intended to allow users to durably and efficiently offload data from on-prem network attached storage (NAS). AWS Snowball Edge data import jobs allow users to transfer large amounts of data to the AWS Cloud. However, AWS recommends customers keep a copy of their data on-premises in case the AWS Snowball Edge is lost or damaged in transit. This data offload agent was intended to solve for the use case that customer needed a "landing zone" to durably store petabytes of data while the data import jobs were in transit to AWS. This Snowball Data Offload agent automates the process of pulling the data from the source NAS system and ensuring that all data is replicated to the Amazon S3 compatible storage cluster and the data import snowball. The agent will automatically identify files that need to be copied and use s5cmd to faciliate parallel transfers

Architecture Diagram

Architecture Diagram Logical Connectivity

Installation

Prerequisites

Setup Amazon Linux 2 EC Instance

This EC2 instance will host the snowball data offload agent.

Setup Snowball Data Offload Agent

git clone https://github.com/aws-samples/data-offload
pip3 install -r requirements.txt

Edit config.json for your environment

This configuration file is passed to main.py and describes how and where to copy the source data. Valid parameters are:

  • log_level: controls log verbosity. Valid values are debug, info, notice, warning, err, crit, alert, emerg
  • num_workers: sets the size of the global worker for each destination
  • reporting_frequency: how often the script produces a progress report in seconds
  • source: directory containing the files that need to be offloaded
  • destinations: JSON object that describes target copy groups
    • {groupx}: JSON object with arbitrary name that describes one or more Snowball Edge devices. Each group receives a copy of the data in parrallel
      • type: Describes the S3 interface for the target Snowball Edge devices. Valid values are s3adapter or s3compatible. See Amazon S3 Compatible Storage on AWS Snowball Edge Compute Optimized Devices Now Generally Available for more information on the differences.
      • snowballs: Array of snowball edge devices that comprise the group
        • bucket: The target S3 bucket name. This value must be consistent for all devices in a s3compatible groups.
        • endpoint: Url for S3 service. Note that S3compatible has a dedicated S3 API that runs on 443 while s3adapter runs on 8443.
        • profile: Name of the AWS profile containing credentials for the S3 service. See Configuration and credential file settings for more information.
        • name: Human readable name

See included config.json file as an example.

Run

export PATH="$HOME/go/bin:$PATH"
export AWS_REGION='snow'
python3 main.py --config_file config.json

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published