# Task: Extend Kedro Pipeline

In this exercise, you will get more familier with Kedro by extending the workflow pipeline shown in the introduction.

**Note that the introduction notebook should be run prior to this exercise.**

Let's first change the working directory to the existing project.

In [None]:
import os
os.chdir("/workshop/kedro_intro/workflow-tutorial")

## Subtask I: Add additional node to pipeline

After training the model, it should be evaluated. Create a new Kedro `node` that takes as input the model, and the features `x_test` and target `y_test`.

The output should be `evaluation_metric`: a json including several metrics.

The following function can be used.

In [None]:
import numpy as np

from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

def evaluate_model(pipe: Pipeline, x_test: np.ndarray, y_test: np.ndarray):
    """Calculate the coefficient of determination and log the result.

        Args:
            pipe: Trained model.
            X_test: Testing data of independent features.
            y_test: Target.
        Returns:
            json with scores

    """
    y_pred = pipe.predict(x_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("mae", mae)
    mlflow.log_metric("r2", r2)

    logger = logging.getLogger(__name__)
    logger.info("Model has a coefficient R^2 of %.3f.", r2)

    return {"train": {"rmse": float(rmse),
                      "mae": float(mae),
                      "r2": float(r2)}}

### Extend existing pipeline

In [None]:
# modify src/workflow_tutorial/pipelines/pipeline.py


### Test and visualize pipeline

## Subtask II: Add second pipeline

## Set up the data
In the introduction, we have build a pipeline that predicts the quality of **red** wine.
Let's now build a second Pipeline that predicts the quality of **white** wine.

Download the [Wine Quality Data Set](http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv) for white wines and add the data to the corresponing directory!

### Register the datasets
Register the dataset in the catalog!

Let's have a look at the data..

## Create the pipeline
Create the new pipeline in `src/workflow_tutorial/pipelines/pipeline.py`.

### Register the pipeline
You need to register the pipeline in `src/workflow_tutorial/hooks.py`.

Note that `register_pipelines` returns `Dict[str, Pipeline]`, hence, you can return multiple pipelines for each type of wine.

The default pipeline usually comprises all possible pipelines: You can simply add `red_wine_pipeline + white_wine_pipeline`.

### Set Parameters

Setting the parameters Parameters..

In [None]:
%%writefile conf/base/parameters.yml
test_size: 0.25
random_state: 42

alpha: 0.5
l1_ratio: 0.5

## Run the pipeline
You can either run the full (default) project pipeline or a pipeline specified with the `--pipeline` option.

## Kedro Visualization 
Visualize the pipeline using the kedro-viz plugin.

## Optional Subtask: Add data version control
Add git and data version control (DVC - already installed) to the project!

Add dvc remote storage (local).

Commit changes.

Add the model pickle and the metrics file to the catalog in order to not only store them as a Kedro `MemoryDataSet` but locally.

Create dvc pipelines for red and white wine.

Commit your changes and update dvc remote storage.

Everything should now be up to date.                                                       

## Optional Subtask: Create Airflow DAG from Kedro pipeline

### Install more project dependencies

We want to use the *kedro-airflow* plugin. Please install this new project dependency using `kedro install`.
Note that to further update the project requirements, you should modify `src/requirements.in` (not `src/requirements.txt`).

Create and deploy airflow DAG.

In [None]:
#!kedro airflow create

In [None]:
#!kedro airflow deploy