<p style="text-align:center;">
    <img src="https://raw.githubusercontent.com/skyplane-project/skyplane/main/docs/_static/logo-light-mode.png" width=500>
</p>

# Welcome to Skyplane!

## Skyplane enables fast data transfers between any cloud

Skyplane is a tool for blazingly fast bulk data transfers between object stores in the cloud. It provisions a fleet of VMs in the cloud to transfer data in parallel while using compression and bandwidth tiering to reduce cost.

Skyplane is:
1. 🔥 Blazing fast ([110x faster than AWS DataSync](https://skyplane.org/en/latest/benchmark.html))
2. 🤑 Cheap (4x cheaper than rsync)
3. 🌐 Universal (AWS, Azure, GCP, IBMCloud, and Cloudflare R2)

You can use Skyplane to transfer data: 
* between object stores within a cloud provider (e.g. AWS us-east-1 to AWS us-west-2)
* between object stores across multiple cloud providers (e.g. AWS us-east-1 to GCP us-central1)
* between local storage and cloud object stores (experimental)

# Exercises

This notebook consists of 4 exercises:

1. Exercise 1: Setting up authentication and cloud buckets
2. Exercise 2: Copying data with Skyplane 
3. Exercise 3: Copying data to multiple destinations (multicast) 
5. Exercise 4: Cleanup 

# Learning outcomes 🎯

After completing this notebook, you would have:

1. An understand the Skyplane API
2. Transfered data for a ML model from AWS S3 object stores in US-EAST-1 (N. Virginia) to EU-WEST-1 (Ireland)
3. Compare and contrast `aws s3 cp` with Skyplane
4. Terminate the transfer and clean up state



# How to use this Tutorial

These notebooks serve as a guide to Skyplane. At any point if you happen to get stuck, feel free to ping us on `#skyplane` channel on the [Skycamp slack.](https://join.slack.com/t/skycamp2023/shared_invite/zt-25axzytwn-y5AR~Bx2nqm4Iec6jlq3JA)


### Update to the latest Skyplane pip package

In [None]:
!pip uninstall -y skyplane

In [None]:
# Please run this cell
!pip install -U "git+https://github.com/skyplane-project/skyplane.git@skycamp-tutorial-2023#egg=skyplane[aws]"

In [None]:
!pip install ipywidgets

# Transferring Data with Skyplane

<p style="text-align:center;">
    <img src="./assets/unicast.jpg" width=700>
</p>

The core of Skyplane is based around the `cp` command. Suppose you want to transfer a fine-tuned [Gorilla](https://github.com/ShishirPatil/gorilla) model from one region to another to be accessible to a cross-regional serving cluster. Skyplane can help you efficiently transfer this data so you model weights are accessible accross multiple regions.  Let’s prepare for a transfer by first initializing buckets in a few different cloud regions in AWS.

# Exercise 1: Setting up Authentication & Buckets 

## Authentication

In [None]:
AWS_ACCESS_KEY = "" #TODO 
AWS_SECRET_KEY = "" #TODO 

In [None]:
import skyplane 

client = skyplane.SkyplaneClient(
    aws_config=skyplane.AWSConfig(aws_access_key=AWS_ACCESS_KEY, aws_secret_key=AWS_SECRET_KEY)
)

# Creating Buckets

### Setting up AWS in the Destination region

First, let’s create a bucket in the destination region `aws:ap-south-1` to store the model weights. 

> **💡 Hint** - Reminder to replace [name] with a unique string. e.g., "edcvr"

In [None]:
bucket_name = "gorilla-weights-[name]"

We can create the bucket through Skyplane's API interface. 

In [None]:
bucket_path = client.object_store().create_bucket(region="aws:ap-south-1", bucket_name=bucket_name)
bucket_path

# Exercise 2: Transferring data with Skyplane

We've setup the following source bucket that contains model weights: 

In [None]:
src_bucket_path = "s3://skycamp-demo-bucket/gorilla/"

We can copy this to the bucket we previously created with the Skyplane client:

> **⚠️ Warning** - Be careful not to interrupt the running cell, since it may lead to leaked instances. 

In [None]:
client.copy(src_bucket_path, bucket_path, recursive=True, max_instances=1)

# Exercise 3: Transferring to multiple destinations
<p style="text-align:center;">
    <img src="./assets/multicast.jpg" width=700>
</p>

In some cases, data needs to be replicated to multiple destinations. For example, say you have some freshly trained model weights: you'll want to have them accessible across multiple regions as quickly as possible. In this example, we'll show how you can run a multicast (i.e. multi-destination) transfer using Skyplane. 

## Create a secondary region bucket

Lets create a second bucket in the additional destination region `aws:eu-north-1`. 

> **💡 Hint** - Reminder to replace [name] with a unique string. e.g., "edcvr"

In [None]:
another_bucket_name = "gorilla-[name]"

In [None]:
another_bucket_path = client.object_store().create_bucket(region="aws:eu-central-1", bucket_name=another_bucket_name)
another_bucket_path

## Running a multicast transfer 
To run a multicast transfer, we can simply enter a list of destinations instead of a single destination. 

In [None]:
client.copy(src_bucket_path, [bucket_path, another_bucket_path], recursive=True, max_instances=1)

# Exercise 4: Cleanup 
Finally, lets use the Skyplane API to delete the buckets we created. 

In [None]:
client.object_store().delete_bucket(bucket_name, provider="aws")

In [None]:
client.object_store().delete_bucket(another_bucket_name, provider="aws")

Run to double check that instances are all deprovisioned: 

In [None]:
!skyplane deprovision

## 🎉 Congratulations! Your plane has now landed. Skyplane is an open sourced project. Feel free to use Skyplane for all your data mobility needs!


#### Eager to learn more? 

#### Feel free to play-around with the [Skyplane optimizer](https://optimizer.skyplane.org/), read our NSDI 2023 [paper](https://arxiv.org/abs/2210.07259), or browse through our GitHub [repository](https://github.com/skyplane-project/skyplane).

Acknowledgement: Thanks to [Skypilot](https://github.com/romilbhardwaj/skypilot-tutorial/) for the notebook template.