<p style="text-align:center;">
    <img src="https://raw.githubusercontent.com/skyplane-project/skyplane/main/docs/_static/logo-light-mode.png" width=500>
</p>

# Welcome to Skyplane!

## Skyplane enables fast data transfers between any cloud

Skyplane is a tool for blazingly fast bulk data transfers between object stores in the cloud. It provisions a fleet of VMs in the cloud to transfer data in parallel while using compression and bandwidth tiering to reduce cost.

Skyplane is:
1. 🔥 Blazing fast ([110x faster than AWS DataSync](https://skyplane.org/en/latest/benchmark.html))
2. 🤑 Cheap (4x cheaper than rsync)
3. 🌐 Universal (AWS, Azure and GCP)

You can use Skyplane to transfer data: 
* between object stores within a cloud provider (e.g. AWS us-east-1 to AWS us-west-2)
* between object stores across multiple cloud providers (e.g. AWS us-east-1 to GCP us-central1)
* between local storage and cloud object stores (experimental)

# Exercises

This notebook consists of 4 exercises:

1. Exercise 1: Copying data between AWS regions
2. Exercise 2: Let's try the same transfer with Skyplane
3. Exercise 3: Let's scale up transfer bandwidth with Skyplane
4. Exercise 4: We will now try another Skyplane feature - syncing data between AWS regions

# Learning outcomes 🎯

After completing this notebook, you would have:

1. An understand of unix-inspired Skyplane interface
2. Transfered data for a ML workload from AWS S3 object stores in US-EAST-1 (N. Virginia) to EU-WEST-1 (Ireland)
3. Compare and contrast `aws s3 cp` with `skyplane cp`
4. Terminate the transfer and clean up state



# How to use this Tutorial

These notebooks serve as a guide to Skyplane. At any point if you happen to get stuck, feel free to ping us on `#skyplane` channel on the [Skycamp slack.](https://join.slack.com/t/skycamp2022/shared_invite/zt-1gsrgky1z-iSFVEEOMSUD7Dd7B5syCsA)

We will describe what we are doing in this notebook. The commands and the example response are included. We highly recommend you open an terminal and run commands yourself. 

### 💻 - Run commands in an interactive terminal window

You can use this icon as a hint to know when to switch away from the current notebook and edit a file or open a terminal. We also have example outputs that you can use to ensure consistency. 


# How to open a Terminal

If you're using jupyter lab, you can create a terminal in your browser by going to `File -> New -> Terminal`

# Preflight  - Initializing cloud credentials

Before we start this tutorial, we have few pre-flight checks:

### Let's ensure we have the latest notebook

In [None]:
# Please run this cell
!git pull --quiet

### Update to the latest Skyplane pip package

In [None]:
# Please run this cell
!pip install -U "git+https://github.com/skyplane-project/skyplane.git@skycamp-tutorial#egg=skyplane[aws]"

### Configure Skyplane with AWS credentials. 

#### <span style="color:red">Choose `Y` only for AWS, and `n` for GCP and Azure.</span>

💻 `skyplane init`

```
 _____ _   ____   _______ _       ___   _   _  _____ 
/  ___| | / /\ \ / / ___ \ |     / _ \ | \ | ||  ___|
\ `--.| |/ /  \ V /| |_/ / |    / /_\ \|  \| || |__  
 `--. \    \   \ / |  __/| |    |  _  || . ` ||  __| 
/\__/ / |\  \  | | | |   | |____| | | || |\  || |___ 
\____/\_| \_/  \_/ \_|   \_____/\_| |_/\_| \_/\____/

03:37:54 [DEBUG] Found existing configuration file at /root/.skyplane/config, 
loading

(1) Configuring AWS:
    Do you want to configure AWS support in Skyplane? [Y/n]:
    Loaded AWS credentials from the AWS CLI [IAM access key ID: ...ZEXYJW]
    AWS region config file saved to /root/.skyplane/aws_config

(2) Configuring Azure:
    Do you want to configure Azure support in Skyplane? [Y/n]: n
    Disabling Azure support

(3) Configuring GCP:
    Do you want to configure GCP support in Skyplane? [Y/n]: n
    Disabling Google Cloud support

Config file saved to /root/.skyplane/config
To disable performance logging info: 
https://skyplane.org/en/latest/performance_stats_collection.html
```


> **💡 Hint** - If you run into any issues, please contact one of the Skyplane team members immediately. This step is critical to follow through the tutorial.

# Transferring Data with Skyplane

<p style="text-align:center;">
    <img src="./assets/skycamp-art.png" width=700>
</p>

The core of Skyplane is based around the `cp` command. Suppose you want to train the Stable Diffusion neural network. You’ve just gotten a fresh batch of [LAION](https://laion.ai/blog/laion-400-open-dataset/) dataset, but it is in us-east-1, while your network is in eu-west-1. Skyplane can help you efficiently transfer this data over to wherever your neural network may be so that you can train your model. Let’s prepare for a transfer by first initializing buckets in a few different cloud regions in AWS.

# Creating Buckets

### Setting up AWS in the Destination region

First, let’s create a bucket in your source AWS region to store the data for the neural network.

> **💡 Hint** - Reminder to replace [name] with a unique string. e.g., "edcvr"

💻 `aws s3 mb s3://skycamp-diffusion-data-[name] --region eu-west-1`
```
make_bucket: skycamp-diffusion-data-[name]
```

# Exercise 1: Copying data between AWS regions

Transferring data between AWS regions with `aws s3 cp`

Each cloud provider has dedicated tools to move data between cloud regions. Let’s try transferring over the data using AWS’s built in cp command:

💻 `aws s3 cp --recursive s3://laion-400m s3://skycamp-diffusion-data-[name]`

```
copy: s3://stable-diffusion-data/part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet to s3://skycamp-anton-diffusion-data/part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
copy: s3://stable-diffusion-data/part-00002-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet to s3://skycamp-anton-diffusion-data/part-00002-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
copy: s3://stable-diffusion-data/part-00005-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet to s3://skycamp-anton-diffusion-data/part-00005-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
copy: s3://stable-diffusion-data/part-00008-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet to s3://skycamp-anton-diffusion-data/part-00008-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
copy: s3://stable-diffusion-data/part-00001-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet to s3://skycamp-anton-diffusion-data/part-00001-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
Completed 4.2 GiB/~15.7 GiB (36.9 MiB/s) with ~12 file(s) remaining (calculating...)

```

### This will take a long time to complete. Feel free to interrupt the command. Notice that it copies data at under 25 MiB/s.


# Exercise 2: Let's try the same transfer with Skyplane

💻 `skyplane cp --recursive s3://laion-400m s3://skycamp-diffusion-data-[name]`

```
 _____ _   ____   _______ _       ___   _   _  _____
/  ___| | / /\ \ / / ___ \ |     / _ \ | \ | ||  ___|
\ `--.| |/ /  \ V /| |_/ / |    / /_\ \|  \| || |__
 `--. \    \   \ / |  __/| |    |  _  || . ` ||  __|
/\__/ / |\  \  | | | |   | |____| | | || |\  || |___
\____/\_| \_/  \_/ \_|   \_____/\_| |_/\_| \_/\____/


Will transfer 32 objects totaling 35.90GB from aws:us-east-1 to aws:us-east-1
    VMs to provision: 1x aws:us-east-1
    part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet => part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
    part-00001-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet => part-00001-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
    ...
a2ce-0d6f73df1a36-c000.snappy.parquet
    part-00030-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet => part-00030-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet

Continue? [Y/n]: Y
Transfer starting (Tip: Enable auto-confirmation with `skyplane config set autoconfirm true`)

Storing debug information for transfer in /tmp/skyplane/transfer_logs/20221018_205648/client.log
✓ Initializing cloud keys (3/3) in 2.84s
✓ Provisioning gateway instances (1/1) in 34.48s
✓ Installing gateway package (1/1) in 12.41s
🚀 35.90GB transfer job launched
  Transfer progress (completing multi-part uploads) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35.9/35.9 GiB 602.0 MB/s 0:00:00
✓ Deprovisioning instances (1/1) in 2.42s
⠙ Verifying all files were copied

✅ Transfer completed successfully
Transfer runtime: 61.94s, Throughput: 4.64Gbps

```


##  💡  Observe skyplane significantly reduces the time to move data

# Exercise 3: Let's scale up transfer bandwidth with Skyplane

💻 `skyplane cp -n 4 --recursive s3://laion-400m s3://skycamp-diffusion-data-[name]`

```
 _____ _   ____   _______ _       ___   _   _  _____ 
/  ___| | / /\ \ / / ___ \ |     / _ \ | \ | ||  ___|
\ `--.| |/ /  \ V /| |_/ / |    / /_\ \|  \| || |__  
 `--. \    \   \ / |  __/| |    |  _  || . ` ||  __| 
/\__/ / |\  \  | | | |   | |____| | | || |\  || |___ 
\____/\_| \_/  \_/ \_|   \_____/\_| |_/\_| \_/\____/


Will transfer 32 objects totaling 53.75GB from aws:us-east-1 to aws:eu-west-1
    VMs to provision: 4x aws:eu-west-1, 4x aws:us-east-1
    Estimated egress cost: $1.08 at $0.02/GB
    part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet => part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
    part-00001-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet => part-00001-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
    ...
    part-00031-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet => part-00031-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet
Continue? [Y/n]: Y
Transfer starting (Tip: Enable auto-confirmation with `skyplane config set autoconfirm true`)

Storing debug information for transfer in /tmp/skyplane/transfer_logs/20221019_020117/client.log
✓ Initializing cloud keys (3/3) in 3.79s
✓ Provisioning gateway instances (8/8) in 43.82s
✓ Installing gateway package (8/8) in 28.26s
🚀 53.75GB transfer job launched
  Transfer progress (completing multi-part uploads) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.8/53.8 GiB 2.0 GB/s 0:00:00
Compression saved 0.20% of egress fees
✓ Deprovisioning instances (8/8) in 6.62s
⠸ Verifying all files were copied

✅ Transfer completed successfully
Transfer runtime: 33.09s, Throughput: 12.99Gbps
```

##  💡  Observe skyplane enables you to scale up transfer throughput by paralellizing data between multiple VMs

---

# Exercise 4: We will now try another Skyplane feature - syncing data between AWS regions
Now, let’s suppose that you have a bucket storing a backup of your dataset, and it already has some of the data there.


Let’s use Skyplane’s sync command to update the weights, and again compare the time that it takes. 


💻 `skyplane sync s3://laion-400m s3://skycamp-diffusion-data-[name]`

```
 _____ _   ____   _______ _       ___   _   _  _____ 
/  ___| | / /\ \ / / ___ \ |     / _ \ | \ | ||  ___|
\ `--.| |/ /  \ V /| |_/ / |    / /_\ \|  \| || |__  
 `--. \    \   \ / |  __/| |    |  _  || . ` ||  __| 
/\__/ / |\  \  | | | |   | |____| | | || |\  || |___ 
\____/\_| \_/  \_/ \_|   \_____/\_| |_/\_| \_/\____/

⠹ Querying objects in skycamp-diffusion-data-anton
No objects need updating. Exiting...
```

---

## 💻 Terminate your cluster!

Finally, just to make sure that we don't have any instances running that might be burning up money, let's quickly deprovision everything.

💻 `skyplane deprovision`

```
No instances to deprovision
✓ Removing IPs from VPCs (4/4) in 2.05s

```

## 🎉 Congratulations! Your plane has now landed. Skyplane is an open sourced project. Feel free to use Skyplane for all your data mobility needs!


#### Eager to learn more? 

#### Feel free to play-around with the [Skyplane optimizer](https://optimizer.skyplane.org/), read our NSDI 2023 [paper](https://arxiv.org/abs/2210.07259), or browse through our GitHub [repository](https://github.com/skyplane-project/skyplane).

Acknowledgement: Thanks to [Skypilot](https://github.com/romilbhardwaj/skypilot-tutorial/) for the notebook template.