## Homework

The goal of this homework is to familiarize users with workflow orchestration. 

Start with the orchestrate.py file in the 03-orchestration/3.4 folder
of the course repo: https://github.com/DataTalksClub/mlops-zoomcamp/blob/main/03-orchestration/3.4/orchestrate.py

## Q1. Human-readable name

You’d like to give the first task, `read_data` a nicely formatted name.
How can you specify a task name?

> Hint: look in the docs at https://docs.prefect.io or 
> check out the doc string in a code editor.

- `@task(retries=3, retry_delay_seconds=2, name="Read taxi data")`
- `@task(retries=3, retry_delay_seconds=2, task_name="Read taxi data")`
- `@task(retries=3, retry_delay_seconds=2, task-name="Read taxi data")`
- `@task(retries=3, retry_delay_seconds=2, task_name_function=lambda x: f"Read taxi data")`

**Answer**: @task(retries=3, retry_delay_seconds=2, name="Read taxi data")

## Q2. Cron

Cron is a common scheduling specification for workflows. 

Using the flow in `orchestrate.py`, create a deployment.
Schedule your deployment to run on the third day of every month at 9am UTC.
What’s the cron schedule for that?

- `0 9 3 * *`
- `0 0 9 3 *`
- `9 * 3 0 *`
- `* * 9 3 0`

Run at the repository's root directory

`prefect project init`

Set up the `prefect.yaml` file to pull from the main branch of https://github.com/d4nielfr4nco/mlops-zoomcamp-d4nielfr4nco.git

```
pull:
- prefect.projects.steps.git_clone_project:
    repository: https://github.com/d4nielfr4nco/mlops-zoomcamp-d4nielfr4nco.git
    branch: main
    access_token: null
```

Start a server

`prefect server start`

In another terminal, start a worker that polls from the `mlops-work-pool` work pool

`prefect worker start -p mlops-work-pool -t process --work-queue default`

In another terminal, go to the directory that contains the flow code and deploy the flow

`prefect deploy homeworks/03-orchestration/3.4/orchestrate.py:main_flow -n taxi1 -p mlops-work-pool --cron "0 9 3 * *" --timezone UTC`

**Answer**: 0 9 3 * *

## Q3. RMSE 

Download the January 2023 Green Taxi data and use it for your training data.
Download the February 2023 Green Taxi data and use it for your validation data. 

Make sure you upload the data to GitHub so it is available for your deployment.

Create a custom flow run of your deployment from the UI. Choose Custom
Run for the flow and enter the file path as a string on the JSON tab under Parameters.

Make sure you have a worker running and polling the correct work pool.

View the results in the UI.

What’s the final RMSE to five decimal places?

- 6.67433
- 5.19931
- 8.89443
- 9.12250

In the UI, in `Deployments`, execute a `Custom Run`

Go into Flows > main-flow > taxi1 > 3 dots > Custom Run > Run

Change train_path and validation_path parameters to 

```
{
  "train_path": "./homeworks/data/green_tripdata_2023-01.parquet",
  "val_path": "./homeworks/data/green_tripdata_2023-02.parquet"
}
``` 

**Answer**: 5.19931

## Q4. RMSE (Markdown Artifact)

Download the February 2023 Green Taxi data and use it for your training data.
Download the March 2023 Green Taxi data and use it for your validation data. 

Create a Prefect Markdown artifact that displays the RMSE for the validation data.
Create a deployment and run it.

What’s the RMSE in the artifact to two decimal places ?

- 9.71
- 12.02
- 15.33
- 5.37

Add the code snippet below in `train_best_model()` function from `orchestrate.py`:

```
markdown__rmse_report = f"""# RMSE Report

## Summary

Duration Prediction 

## RMSE XGBoost Model

| Region    | RMSE |
|:----------|-------:|
| {date.today()} | {rmse:.2f} |
"""

create_markdown_artifact(
    key="duration-model-report", markdown=markdown__rmse_report
)
```

Create a new deployment:

`prefect deploy homeworks/03-orchestration/3.4/orchestrate.py:main_flow -n taxi2 -p mlops-work-pool`

In the UI, in `Deployments`, execute a `Custom Run`

Change train_path and validation_path parameters to 

```
{
  "train_path": "./homeworks/data/green_tripdata_2023-02.parquet",
  "val_path": "./homeworks/data/green_tripdata_2023-03.parquet"
}
``` 

**Answer**: 5.37

## Q5. Emails


It’s often helpful to be notified when something with your dataflow doesn’t work
as planned. Create an email notification for to use with your own Prefect server instance.
In your virtual environment, install the prefect-email integration with 

```bash
pip install prefect-email
```

Make sure you are connected to a running Prefect server instance through your
Prefect profile.
See the docs if needed: https://docs.prefect.io/latest/concepts/settings/#configuration-profiles

Register the new block with your server with 

```bash
prefect block register -m prefect_email
```

Remember that a block is a Prefect class with a nice UI form interface.
Block objects live on the server and can be created and accessed in your Python code. 

See the docs for how to authenticate by saving your email credentials to
a block and note that you will need an App Password to send emails with
Gmail and other services. Follow the instructions in the docs.

Create and save an `EmailServerCredentials` notification block.
Use the credentials block to send an email.

Test the notification functionality by running a deployment.

What is the name of the pre-built prefect-email task function?

- `send_email_message`
- `email_send_message`
- `send_email`
- `send_message`

After running:

`prefect block register -m prefect_email`

Go to the Blocks page in the Prefect UI to configure the newly registered blocks (EmailServerCredentials).

Type `gmail-test` in `Block Name`

Follow this [tutorial](https://support.google.com/accounts/answer/185833)
* In App Password, select Other and name it `Prefect`

For the username, use your Gmail username 
For the password, use the code generated by App Password

Create the file `create_email_block.py` and add the following code:

``` 
from prefect import flow
from prefect_email import EmailServerCredentials, email_send_message

@flow
def example_email_send_message_flow(email_addresses: list[str]):
    email_server_credentials = EmailServerCredentials.load("gmail-test")
    for email_address in email_addresses:
        subject = email_send_message.with_options(name=f"email {email_address}").submit(
            email_server_credentials=email_server_credentials,
            subject="Example Flow Notification using Gmail",
            msg="This proves email_send_message works!",
            email_to=email_address,
        )

example_email_send_message_flow(["EMAIL-ADDRESS-PLACEHOLDER"])
```

Create a new deployment:

`prefect deploy homeworks/03-orchestration/3.4/create_email_block.py:example_email_send_message_flow -n mail_test -p mlops-work-pool`

In the UI, in `Deployments`, execute a `Quick Run`

**Answer**: email_send_message

## Q6. Prefect Cloud

The hosted Prefect Cloud lets you avoid running your own Prefect server and
has automations that allow you to get notifications when certain events occur
or don’t occur. 

Create a free forever Prefect Cloud account at [app.prefect.cloud](https://app.prefect.cloud/) and connect
your workspace to it following the steps in the UI when you sign up. 

Set up an Automation from the UI that will send yourself an email when
a flow run completes. Run one of your existing deployments and check
your email to see the notification.

Make sure your active profile is pointing toward Prefect Cloud and
make sure you have a worker active.

What is the name of the second step in the Automation creation process?

- Details
- Trigger
- Actions
- The end

Create a profile to connect to Prefect Cloud 

`prefect profile create prefect_cloud` 

Then run

`prefect profile use prefect_cloud`

When I run `prefect cloud login`, I authenticated with a new API key I created.

After connecting to Prefect Cloud:
* Automations > Add Automation 
* Trigger
    * Flow run state > All flows
* Actions
    * Action Type: Send a notification
    * Add Block
        * Block Name: gmail-test
        * Emails: ["LIST OF EMAILS"]
* Details
    * Automation Name: Email Test
* Save


Run a previously run flow:

`prefect worker start -p mlops-work-pool -t process --work-queue default`

In another terminal, go to the directory that contains the flow code and deploy the flow

`prefect deploy homeworks/03-orchestration/3.4/orchestrate.py:main_flow -n taxi1 -p mlops-work-pool --cron "0 9 3 * *" --timezone UTC`

In the UI, in `Deployments`, execute a `Custom Run`

Change train_path and validation_path parameters to 

```
{
  "train_path": "./homeworks/data/green_tripdata_2023-02.parquet",
  "val_path": "./homeworks/data/green_tripdata_2023-03.parquet"
}
``` 

**Answer**: Actions

## Submit the results

* Submit your results here: https://forms.gle/nVSYH5fGGamdY1LaA
* You can submit your solution multiple times. In this case, only the last submission will be used
* If your answer doesn't match options exactly, select the closest one

## Answers
* **Q1**: @task(retries=3, retry_delay_seconds=2, name="Read taxi data")
* **Q2**: 0 9 3 * *
* **Q3**: 5.19931
* **Q4**: 5.37
* **Q5**: email_send_message
* **Q6**: Actions

## Deadline

The deadline for submitting is 12 June (Monday), 23:00 CEST (Berlin time). 

After that, the form will be closed.