<p style="text-align:center;">
    <img src="https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/skypilot-wide-light-1k.png" width=500>
</p>

# Welcome to SkyPilot!

SkyPilot is a framework for running machine learning workloads on any cloud.

SkyPilot makes it easy to use multiple clouds and reduce your cloud costs.

_Ease of use & productivity_
* **Run existing projects on the cloud** with zero code changes
* **Easily manage jobs** across multiple clusters
* **Automatic fail-over** to find scarce resources (GPUs) across regions and clouds
* **Store datasets on the cloud** and access them like you would on a local file system 
* **No cloud lock-in** – seamlessly run your code across different cloud providers (AWS, Azure or GCP)

_Cost saving_
* Run jobs on **spot instances** with **automatic recovery** from preemptions
* Hands-free cluster management: **automatically stopping idle clusters**
* One-click use of **TPUs**, for high-performance, cost-effective training
* Automatically benchmark and find the cheapest hardware for your job

# Learning outcomes 🎯

After completing this notebook, you will be able to:

1. Understand the basic SkyPilot YAML interface (`setup`, `run`).
2. Run a hello world task on a cloud of your choice.
3. SSH into your cluster for debugging and development.
4. Terminate the cluster and understand the cluster lifecycle.

# How to use this Tutorial

These notebooks serve as an **interactive** introduction to SkyPilot.

There are points in these notebooks where you may need to edit files outside the notebook and open a terminal to run some commands. These points will be highlighted with **two icons**:

### 📝 - Edit an external file
### 💻 - Run commands in an interactive terminal window

You can use these icons as a hint to know when to switch away from the current notebook and edit a file or open a terminal.

> **💡 Hint** - If you're using jupyter lab, you can create a terminal in your browser by going to `File -> New -> Terminal`

# Preflight checks - verifying cloud credential setup

Before we start this tutorial, let's run `sky check` to make sure your credentials are correctly setup.

After running the below cell, you should have one or more clouds marked as `enabled`.

> **💡 Hint** - If you don't see any clouds enabled, please refer to `00_installation` notebook or the [SkyPilot docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#cloud-account-setup) on how to setup your cloud accounts.

In [1]:
# Run this cell to check if your cloud accounts are setup to work with SkyPilot
! sky check

Checking credentials to enable clouds for SkyPilot.
  [32m[1mAWS: enabled[0m          
  [32m[1mAzure: enabled[0m          
  [32m[1mGCP: enabled[0m          

SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run [1msky check[0m.
[0m

# Writing your first SkyPilot Task

A **task** in SkyPilot specifies the command that must be run on the cloud, along with the resources required (e.g. GPUs, TPUs, number of nodes) and any dependencies (e.g., files, packages and libraries).

Tasks in SkyPilot are defined as YAML files. Here is an example:

-----------------------------------
```yaml
# example.yaml
name: example

setup: |
  echo "Run any setup commands here"
  sudo apt install cowsay

run: |
  echo "Hello Stranger!"
  cowsay "Moo!"
```
----------------------------------- 

This defines a task with the following components:

* **setup**: commands that must be run before the task is executed. Here we install any dependencies for the task.

* **run**: commands that run the actual task.

## 📝 Edit `example.yaml` to echo "Hello SkyPilot" 
Go ahead and open example.yaml and edit the run field to echo "Hello SkyPilot".

# Launching your first SkyPilot Task with `sky launch`

Once your task YAML is ready, you can run it on the cloud with `sky launch`.

## 💻 Launch your Sky Task!

In a terminal window, run:

-------------------------
```console
sky launch example.yaml
```
-------------------------

You'll notice that SkyPilot will perform multiple actions for you:
#### **1. Find the lowest priced VM instance type across different clouds**

SkyPilot will run its optimizer and present you with the cheapest VM type that fits your resource demand.

> ```console
$ sky launch example.yaml
(base) romilb@romilbx1yoga:skypilot-tutorial/01_hello_sky$ sky launch example.yaml 
Task from YAML spec: example.yaml
I 09-07 16:24:59 optimizer.py:605] == Optimizer ==
I 09-07 16:24:59 optimizer.py:617] Target: minimizing cost
I 09-07 16:24:59 optimizer.py:628] Estimated cost: $0.4 / hour
I 09-07 16:24:59 optimizer.py:628] 
I 09-07 16:24:59 optimizer.py:685] Considered resources (1 node):
I 09-07 16:24:59 optimizer.py:713] ---------------------------------------------------------------------
I 09-07 16:24:59 optimizer.py:713]  CLOUD   INSTANCE         vCPUs   ACCELERATORS   COST ($)   CHOSEN   
I 09-07 16:24:59 optimizer.py:713] ---------------------------------------------------------------------
I 09-07 16:24:59 optimizer.py:713]  AWS     m6i.2xlarge      8       -              0.38          ✔     
I 09-07 16:24:59 optimizer.py:713]  Azure   Standard_D8_v4   8       -              0.38                
I 09-07 16:24:59 optimizer.py:713]  GCP     n1-highmem-8     8       -              0.47                
I 09-07 16:24:59 optimizer.py:713] ---------------------------------------------------------------------
I 09-07 16:24:59 optimizer.py:713] 
Launching a new cluster 'sky-82ce-romilb'. Proceed? [Y/n]: Y
```

#### **2. Provision the cluster**

SkyPilot will setup a cluster with the requested resources and setup a SSH profile for it.


#### **3. Run the task's `setup` commands to prepare the cluster for running the task**

SkyPilot will run any commands specified in the `setup` field in the YMAL on the VMs in the cluster. In this case, it will install the `cowsay` package.


#### **4. Run the task's `run` commands**

Finally, SkyPilot will run the commands specified in the `run` field. These commands can use any dependencies installed in the `setup` phase.

> ```console
(example pid=23346) Hello SkyPilot!
(example pid=23346)  ______
(example pid=23346) < Moo! >
(example pid=23346)  ------
(example pid=23346)         \   ^__^
(example pid=23346)          \  (oo)\_______
(example pid=23346)             (__)\       )\/\
(example pid=23346)                 ||----w |
(example pid=23346)                 ||     ||
INFO: Job finished (status: SUCCEEDED).
```

<details>
  <summary> <b>💡 Hint</b> - the full output will look like this (click here)</summary>
  

```console
$ sky launch example.yaml
(base) romilb@romilbx1yoga:/mnt/d/Romil/Berkeley/Research/skypilot-tutorial/01_hello_sky$ sky launch example.yaml 
Task from YAML spec: example.yaml
I 09-07 16:24:59 optimizer.py:605] == Optimizer ==
I 09-07 16:24:59 optimizer.py:617] Target: minimizing cost
I 09-07 16:24:59 optimizer.py:628] Estimated cost: $0.4 / hour
I 09-07 16:24:59 optimizer.py:628] 
I 09-07 16:24:59 optimizer.py:685] Considered resources (1 node):
I 09-07 16:24:59 optimizer.py:713] ---------------------------------------------------------------------
I 09-07 16:24:59 optimizer.py:713]  CLOUD   INSTANCE         vCPUs   ACCELERATORS   COST ($)   CHOSEN   
I 09-07 16:24:59 optimizer.py:713] ---------------------------------------------------------------------
I 09-07 16:24:59 optimizer.py:713]  AWS     m6i.2xlarge      8       -              0.38          ✔     
I 09-07 16:24:59 optimizer.py:713]  Azure   Standard_D8_v4   8       -              0.38                
I 09-07 16:24:59 optimizer.py:713]  GCP     n1-highmem-8     8       -              0.47                
I 09-07 16:24:59 optimizer.py:713] ---------------------------------------------------------------------
I 09-07 16:24:59 optimizer.py:713] 
Launching a new cluster 'sky-82ce-romilb'. Proceed? [Y/n]: Y
I 09-07 16:25:01 cloud_vm_ray_backend.py:2666] Creating a new cluster: "sky-82ce-romilb" [1x AWS(m6i.2xlarge)].
I 09-07 16:25:01 cloud_vm_ray_backend.py:2666] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 09-07 16:25:01 cloud_vm_ray_backend.py:902] To view detailed progress: tail -n100 -f /home/romilb/sky_logs/sky-2022-09-07-16-24-58-776744/provision.log
I 09-07 16:25:02 cloud_vm_ray_backend.py:1152] Launching on AWS us-east-1 (us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f)
I 09-07 16:26:26 log_utils.py:45] Head node is up.
I 09-07 16:27:27 cloud_vm_ray_backend.py:995] Successfully provisioned or found existing VM.
I 09-07 16:27:29 cloud_vm_ray_backend.py:1972] Running setup on 1 node.
Run any setup commands here

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  python-pip-whl python3-wheel
Use 'sudo apt autoremove' to remove them.
Suggested packages:
  filters cowsay-off
The following NEW packages will be installed:
  cowsay
0 upgraded, 1 newly installed, 0 to remove and 3 not upgraded.
Need to get 18.5 kB of archives.
After this operation, 93.2 kB of additional disk space will be used.
Get:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 cowsay all 3.03+dfsg2-7 [18.5 kB]
debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Fetched 18.5 kB in 0s (294 kB/s)
Selecting previously unselected package cowsay.
(Reading database ... 146551 files and directories currently installed.)
Preparing to unpack .../cowsay_3.03+dfsg2-7_all.deb ...
Unpacking cowsay (3.03+dfsg2-7) ...
Setting up cowsay (3.03+dfsg2-7) ...
Processing triggers for man-db (2.9.1-1) ...
I 09-07 16:27:47 cloud_vm_ray_backend.py:1975] Setup completed.
I 09-07 16:27:50 cloud_vm_ray_backend.py:2040] Job submitted with Job ID: 1
I 09-07 23:27:51 log_lib.py:373] Start streaming logs for job 1.
INFO: Tip: use Ctrl-C to exit log streaming (task will not be killed).
INFO: Waiting for task resources on 1 node. This will block if the cluster is full.
INFO: All task resources reserved.
INFO: Reserved IPs: ['172.31.33.58']
(example pid=23346) Hello SkyPilot!
(example pid=23346)  ______
(example pid=23346) < Moo! >
(example pid=23346)  ------
(example pid=23346)         \   ^__^
(example pid=23346)          \  (oo)\_______
(example pid=23346)             (__)\       )\/\
(example pid=23346)                 ||----w |
(example pid=23346)                 ||     ||
INFO: Job finished (status: SUCCEEDED).
Shared connection to 18.234.228.139 closed.
I 09-07 16:27:53 cloud_vm_ray_backend.py:2053] Job ID: 1
I 09-07 16:27:53 cloud_vm_ray_backend.py:2053] To cancel the job:       sky cancel sky-82ce-romilb 1
I 09-07 16:27:53 cloud_vm_ray_backend.py:2053] To stream the logs:      sky logs sky-82ce-romilb 1
I 09-07 16:27:53 cloud_vm_ray_backend.py:2053] To view the job queue:   sky queue sky-82ce-romilb
I 09-07 16:27:53 cloud_vm_ray_backend.py:2162] 
I 09-07 16:27:53 cloud_vm_ray_backend.py:2162] Cluster name: sky-82ce-romilb
I 09-07 16:27:53 cloud_vm_ray_backend.py:2162] To log into the head VM: ssh sky-82ce-romilb
I 09-07 16:27:53 cloud_vm_ray_backend.py:2162] To submit a job:         sky exec sky-82ce-romilb yaml_file
I 09-07 16:27:53 cloud_vm_ray_backend.py:2162] To stop the cluster:     sky stop sky-82ce-romilb
I 09-07 16:27:53 cloud_vm_ray_backend.py:2162] To teardown the cluster: sky down sky-82ce-romilb
NAME             LAUNCHED     RESOURCES            STATUS  AUTOSTOP  COMMAND                  
sky-82ce-romilb  28 secs ago  1x AWS(m6i.2xlarge)  UP      -         sky launch example.yaml  
```
</details>

# Tasks and Clusters in SkyPilot

**Tasks** in SkyPilot are executed on **clusters**. A **cluster** is a collection of nodes on a cloud.

When you run a task with `sky launch`, SkyPilot creates a new cluster with a random name if an existing cluster is not specified.

> **💡 Hint** - When running `sky launch`, you can give the cluster a name with the `-c` flag. E.g. `sky launch -c mycluster example.yaml` would launch a cluster with the name `mycluster`. If the cluster name already exists, then SkyPilot will try to reuse the cluster by re-running the `setup` commands on the cluster.

You can see a table of your clusters with the command `sky status`.

## 💻 Checking your cluster status with `sky status`
In a terminal window, run:


-------------------------
```console
sky status
```
-------------------------

### Expected output
-------------------------
```console
(base) romilb@romilbx1yoga:skypilot-tutorial/01_hello_sky$ sky status

NAME             LAUNCHED     RESOURCES            STATUS  AUTOSTOP  COMMAND                  
sky-82ce-romilb  19 mins ago  1x AWS(m6i.2xlarge)  UP      -         sky launch example.yaml  
```
-------------------------

We can see that the `sky launch` in the previous cells created a cluster with the name `sky-82ce-romilb`.

## 💻 SSH into the cluster!

For debugging and development, you can easily SSH into a SkyPilot cluster with the `ssh` utility. In a terminal window, run:

-------------------------
```console
ssh <cluster-name>
```
-------------------------

### Expected output

This will drop you into an interactive terminal inside your cluster:

-------------------------
```console
(base) romilb@romilbx1yoga:skypilot-tutorial/01_hello_sky$ ssh sky-82ce-romilb 
Warning: Permanently added '18.234.228.139' (ECDSA) to the list of known hosts.
=============================================================================
       __|  __|_  )
       _|  (     /   Deep Learning AMI GPU PyTorch 1.10.0 (Ubuntu 20.04)
      ___|\___|___|
=============================================================================

Welcome to Ubuntu 20.04.4 LTS (GNU/Linux 5.13.0-1014-aws x86_64v)

Last login: Wed Sep  7 23:27:50 2022 from 24.23.130.196
ubuntu@ip-172-31-33-58:~$ echo $HOSTNAME
ip-172-31-33-58
```
-------------------------

> **💡 Hint** - To enable the SSH functionality, SkyPilot adds the remote cluster to your `~/.ssh/config`. This means you can use the cluster name alias with other ssh tools, such as `scp`, `rsync`, VSCode and more!

# Cluster lifecycle management

SkyPilot clusters can exist in three states, each of which has different billing and storage implications:

* **`RUNNING`** - Cluster is up and running, you will be billed for the instance and the attached storages.
* **`STOPPED`** - Cluster nodes are shut down and their disks are suspended. Your data and node state is safe and the cluster can be restored to running state when required. You will be billed only for the storage.
* **`TERMINATED`** - Cluster is terminated and all nodes and their attached disks are deleted. These clusters cannot be restarted and will not be shown in `sky status`.

To manage these states, SkyPilot offers three useful commands:

1. **`sky stop`** - stops a `RUNNING` cluster.
2. **`sky start`** - starts a `STOPPED` cluster.
2. **`sky down`** - terminates a `RUNNING` or `STOPPED` cluster.

> **💡 Hint** - `sky stop` and `sky start` are useful when you want to suspend your experiments for a while but want to quickly resume later. `sky down` is useful to delete a cluster and restart a job from scratch.

## 💻 Terminate your cluster!
We're at the end of this notebook and we don't want to let your cluster keep running and rack up a big bill! Let's terminate the cluster with `sky down`.

First, let's get the cluster name with `sky status`.

-------------------------
```console
sky status
```
-------------------------

and then run `sky down` to terminate the cluster

-------------------------
```console
sky down <cluster-name>
```
-------------------------

### Expected output

-------------------------
```console
(base) romilb@romilbx1yoga:skypilot-tutorial/01_hello_sky$ sky down sky-82ce-romilb
Terminating 1 cluster: sky-82ce-romilb. Proceed? [Y/n]: Y
Terminating cluster sky-82ce-romilb...done.
Terminating 1 cluster ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
```
-------------------------

#### 🎉 Congratulations! You have completed this notebook. Please proceed to the next notebook.
