# Getting started with Coiled

Welcome to the getting started guide for Coiled! This notebook covers installing and setting up Coiled as well as running your first computation using Coiled.

## Launch a cluster

The first step is to spin up a Dask Cluster. In Coiled, this is done by creating a `coiled.Cluster` instance, there are [several keyword arguments](https://docs.coiled.io/user_guide/api.html#coiled.Cluster) you can use to specify the details of your cluster further. Please read the [cluster creation documentation](https://docs.coiled.io/user_guide/cluster_creation.html) to know more.

Note that we will give a name to this cluster, if you don't specify this keyword argument, clusters will be given a unique randomly generated name.

In [None]:
import coiled

cluster = coiled.Cluster(name="quickstart-example", n_workers=10)

Once a cluster has been created (you can see the status on your [Coiled dashboard](https://cloud.coiled.io/)), you can connect Dask to the cluster by creating a `distributed.Client` instance.

In [None]:
from dask.distributed import Client

client = Client(cluster)
client

## Analyze data in the cloud

Now that we have our cluster running and Dask connected to it, let's run a computation. This example will run the computation on about 84 million rows.

In [None]:
import dask.dataframe as dd

df = dd.read_csv(
    "s3://nyc-tlc/trip data/yellow_tripdata_2019-*.csv",
    dtype={
        "payment_type": "UInt8",
        "VendorID": "UInt8",
        "passenger_count": "UInt8",
        "RatecodeID": "UInt8",
    },
    storage_options={"anon": True},
    blocksize="16 MiB",
).persist()

df.groupby("passenger_count").tip_amount.mean().compute()

## Stop a cluster

By default, clusters will shutdown after 20 minutes of inactivity. You can stop a cluster by pressing the stop button on the [Coiled dashboard](https://cloud.coiled.io/). Alternatively, we can get a list of all running clusters and use the cluster name to stop it.

In [None]:
coiled.list_clusters()

The command `list_clusters` returns a dictionary with the cluster name used as the key. We can grab that and then call the command `coiled.delete_cluster()` to stop the running cluster, and `client.close()` to close the client.

In [None]:
coiled.delete_cluster(name="quickstart-example")
client.close()

You can now go back to the [Coiled dashboard](https://cloud.coiled.io/) and you will see that the cluster is now stopping/stopped

# Software Environments

Software Environments are Docker images that contain all your dependencies and files that you might need to run your computations. If you don't specify a software environment to the `coiled.Cluster` constructor, we will use Coiled's default software environment. You can learn more about software environments in our [documentation](https://docs.coiled.io/user_guide/software_environment.html).

## Create a software environment

When creating software environments, there are [several keyword arguments](https://docs.coiled.io/user_guide/api.html#coiled.create_software_environment) that you can use to create a custom environment for your work.

In [None]:
coiled.create_software_environment(
    name="quickstart", 
    conda={
        "channels": ["conda-forge"], 
        "dependencies": ["coiled=0.0.51", "dask=2021.9.0"]
    }
)

We can now follow our previous workflow of creating a cluster - this time, we will use our newly created software environment - connect the cluster to Dask and then running the same example.

In [None]:
cluster = coiled.Cluster(n_workers=10, software="quickstart")
client = Client(cluster)
client

If you go to the [Coiled dashboard](https://cloud.coiled.io/), under the **Software Environment** column, you can see that we are using the quickstart software environment we have just created. Note also that this time, the cluster will have a randomly generated name.

Let's now run the same computation as before, but using the cluster that is running with the software environment that we have recently created.

In [None]:
df = dd.read_csv(
    "s3://nyc-tlc/trip data/yellow_tripdata_2019-*.csv",
    dtype={
        "payment_type": "UInt8",
        "VendorID": "UInt8",
        "passenger_count": "UInt8",
        "RatecodeID": "UInt8",
    },
    storage_options={"anon": True},
    blocksize="16 MiB",
).persist()

df.groupby("passenger_count").tip_amount.mean().compute()