# A basic introduction to Anyscale and Airflow

This introduction will cover the following topics:
1. Setting up a local Airflow environment with the Anyscale provider installed.
2. Running an example DAG to submit an Anyscale job.
3. Running an example DAG to deploy an Anyscale service.

## 1. Getting airflow up and running

Let's start by setting up a local Airflow environment with the Anyscale provider installed.

### Step 1: Fetch the default airflow `docker-compose.yaml` file

Run the following command to fetch the default `docker-compose.yaml` file from the Apache Airflow repository:

```bash
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.2/docker-compose.yaml'
```



### Step 2: Creating a custom Airflow dockerfile

Create a `Dockerfile` in the root directory of this repository with the following contents:

```Dockerfile
FROM apache/airflow:2.1.2

RUN pip install astro-provider-anyscale==1.0.0
```
What this does is it creates a custom image based on the official Apache Airflow image and installs the `astro-provider-anyscale` package. 

This ensures that the Anyscale provider is available when Airflow executes the DAGs.

### Step 3: Modify the `docker-compose.yaml` file to build the custom image

Modify the `docker-compose.yaml` file to build the custom image.

See this section of the `docker-compose.yaml` file:

```yaml
x-airflow-common:
  &airflow-common
  # In order to add custom dependencies or upgrade provider packages you can use your extended image.
  # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
  # and uncomment the "build" line below, Then run `docker-compose build` to build the images.
  # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.9.2}
  build: .
```

We comment out the `image` line and uncomment the `build` line. This tells docker-compose to build the image using the `Dockerfile` in the current directory.

### Step 4: Modify the `docker-compose.yaml` file to avoid loading example DAGs
    
To avoid loading the example DAGs, set the `AIRFLOW__CORE__LOAD_EXAMPLES` environment variable to `false`:

```
x-airflow-common:
  ...
  environment:
    ...
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
```


### Step 5: Run `docker-compose up`
Run `docker-compose up` in the root directory of this repository. This will start the airflow webserver and scheduler.

You should see output that looks like this:
```
❯ docker-compose up
[+] Running 7/0
 ✔ Container basic-demo-postgres-1           Running                                                                                                           0.0s 
 ✔ Container basic-demo-redis-1              Running                                                                                                           0.0s 
 ✔ Container basic-demo-airflow-init-1       Created                                                                                                           0.0s 
 ✔ Container basic-demo-airflow-worker-1     Running                                                                                                           0.0s 
 ✔ Container basic-demo-airflow-scheduler-1  Running                                                                                                           0.0s 
 ✔ Container basic-demo-airflow-webserver-1  Running                                                                                                           0.0s 
 ✔ Container basic-demo-airflow-triggerer-1  Running                                                                                                           0.0s 
Attaching to airflow-init-1, airflow-scheduler-1, airflow-triggerer-1, airflow-webserver-1, airflow-worker-1, postgres-1, redis-1
...
```

### Step 6: Access the webserver
You can access the webserver at `localhost:8080`.

You should land on the login page if this is your first time accessing the webserver.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_sign_in.png" width="700px">


### Step 7: Log in
The default username and password are `airflow` and `airflow` respectively as defined in the `docker-compose.yaml` file.

```
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
```

So you can log in with the username `airflow` and password `airflow`.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_sign_in_full.png" width="700px">


### Step 8: View the available active DAGs

You should see two DAGs available when you log in. These are the `sample_anyscale_job_workflow` and `sample_anyscale_service_workflow` DAGs.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_home.png" width="700px">

## Running an example DAG to submit an Anyscale job.

Now that we have the Airflow environment set up, let's run the `sample_anyscale_job_workflow` DAG.

### Step 0: Going over the syntax of the DAG

The DAG is defined in the `sample_anyscale_job_workflow.py` file in the `dags` directory.

The DAG is defined as follows:

```python
from airflow import DAG
from airflow.providers.anyscale.operators.anyscale import AnyscaleSubmitJobOperator

...

# consult the SDK documentation
# https://docs.anyscale.com/reference/job-api#job-models
anyscale_job_config = dict(
    working_dir=str(FOLDER_PATH),
    entrypoint="python ray_job.py",
    max_retries=1,
)

...

submit_anyscale_job = SubmitAnyscaleJob(
    # base airflow operator parameters
    task_id="submit_anyscale_job",
    dag=dag,
    conn_id=ANYSCALE_CONN_ID,
    name="Simple Anyscale Job",
    # custom operator parameters
    wait_for_completion=True,
    job_timeout_seconds=3000,
    poll_interval=10,
    # Anyscale Job Config
    **anyscale_job_config,
)
```

### Step 1: Trigger the DAG

Click on the trigger DAG button on the `sample_anyscale_job_workflow` DAG.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_trigger_dag.png" width="700px">


### Step 2: Click on the DAG run 

Click on the DAG run to view a list of DAG runs.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_home_run.png" width="700px">

### Step 3: View the DAG runs

Click on the specific DAG run in the list.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_dag_runs.png" width="700px">


### Step 4: View the DAG run graph

You should see a DAG run graph that looks like this:

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_run_graph.png" width="700px">


### Step 5: View the task logs

You can view the task logs by clicking on the task and then selecting the logs tab.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_dag_logs.png" width="700px">

### Step 6: View the Anyscale job
You can view the Anyscale job by clicking on the `Anyscale` link in the logs.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_jobs.png" width="700px">



## Running an example DAG to deploy an Anyscale service.

Now that we have the Airflow environment set up, let's run the `sample_anyscale_service_workflow` DAG.

### Step 0: Going over the syntax of the DAG

The DAG is defined in the `sample_anyscale_service_workflow.py` file in the `dags` directory.

The DAG is defined as follows:

```python
from airflow import DAG
from airflow.providers.anyscale.operators.anyscale import AnyscaleSubmitJobOperator

...

# consult the SDK documentation
# https://docs.anyscale.com/reference/service-api#service-models
anyscale_service_config = dict(
    working_dir="https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip",
    applications=[{"import_path": "sentiment_analysis.app:model"}],
    requirements=["transformers", "requests", "pandas", "numpy", "torch"],
    in_place=False,
    canary_percent=None,
)

...

deploy_anyscale_service = RolloutAnyscaleService(
    # base airflow operator parameters
    task_id="rollout_anyscale_service",
    conn_id=ANYSCALE_CONN_ID,
    name=SERVICE_NAME,
    dag=dag,
    # custom operator parameters
    service_rollout_timeout_seconds=600,
    poll_interval=30,
    # Anyscale Service Config
    **anyscale_service_config,
)

```


### Step 1: Trigger the DAG

Click on the trigger DAG button on the `sample_anyscale_service_workflow` DAG.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_service_trigger_dag.png" width="700px">


### Step 2: View the DAG run graph

Similar to the previous section, you can follow the above steps until you see a DAG run graph that looks like this:

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_service_run_graph.png" width="700px">


### Step 3: View the DAG runs

Click on the specific DAG run in the list.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_service_run_list.png" width="700px">


### Step 4: View the DAG run graph

You should see a DAG run graph that looks like this:

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_service_run_graph.png" width="700px">


### Step 5: View the task logs

You can view the task logs by clicking on the task and then selecting the logs tab.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/airflow_service_logs.png" width="700px">


### Step 6: View the Anyscale service
You can view the Anyscale service by clicking on the `Anyscale` link in the logs.

<img src="https://anyscale-public-materials.s3.us-west-2.amazonaws.com/airflow-demo/anyscale_service.png" width="700px">

