## Write a parameter file

In this exercise, you will be given a DVC parameter file with some missing values. Your task is to fill in the blanks to complete the parameter file. This exercise will help you understand how DVC manages parameters and how they influence different stages of your project.

The parameter file provided has two main sections: preprocess and train_and_evaluate. Each section contains various parameters related to data preprocessing and model training, respectively.

### Ide Exercise Instruction
    - Add Date as a column to drop in the preprocess section.
    - Add RainTomorrow as target column in the preprocess section.
    - Add 2 as number of estimators for for the Random Forest Classifier Model in train_and_evaluate section.

In [None]:
# params.yaml

# preprocess:
#   # Add Date as a column to drop
#   drop_colnames:
#     - Date
#   # Add RainTomorrow as target column
#   target_column: RainTomorrow
#   categorical_features:
#     - Location
#     - WindGustDir
#     - WindDir9am
#     - WindDir3pm
#     - RainToday
# train_and_evaluate:
#   target_column: RainTomorrow
#   train_test_split:
#     test_size: 0.2
#     random_state: 1993
#   shuffle: true
#   shuffle_random_state: 1993
#   # Add 2 as number of estimators for rfc classifier
#   rfc_params:
#     n_estimators: 2
#     max_depth: 2
#     random_state: 42

## Designing a DVC pipeline

Designing a DVC pipeline, or DAG, is fundamental to leveraging DVC in your machine learning workflows. DAGs allow us to codify inputs, outputs, and execution of a certain step. The outputs of one step can serve as input to one or more steps, thereby naturally setting the right dependencies between steps.

In this exercise, you'll work on designing an ML workflow that contains four stages, namely,

    - Data preprocessing (preprocess_stage)
    - Data splitting (split_stage)
    - Model training (train_stage)
    - Model evaluation (evaluate_stage)

We will exclusively work with the dvc stage add commands. Scroll down to the end of the shell script file (dvc_dag_stages_add.sh) if needed.

### Ide Exercise Instruction
    - Add processed_data.csv as output from preprocess_stage.
    - Add parameters from the split section of the default parameter file to the split_stage.
    - Add model.pkl as one of the dependencies in the evaluate_stage.
    - Run the bash file by running bash dvc_dag_stages_add.sh command on the terminal. Notice how dvc.yaml gets populated.

In [None]:
## dvc_dag_stages_add.sh

# #!/bin/bash

# # Preprocess stage
# # Output is processed_data.csv
# dvc stage add --force \
# -n preprocess_stage \
# -o processed_data.csv \
# -d raw_data.csv \
# -d preprocess.py \
# -d processed_data.csv \
# python3 preprocess.py


# # Split stage
# # This stage uses parameters from `split` section of params.yaml
# dvc stage add --force \
# -n split_stage \
# -p split \
# -d processed_data.csv \
# -d split.py \
# -o train_data.csv \
# -o eval_data.csv \
# python3 split.py


# # Train stage
# # This stage generates model.pkl as output
# dvc stage add --force \
# -n train_stage \
# -p train \
# -d train_data.csv \
# -d train.py \
# -o model.pkl \
# python3 train.py


# # Evaluate stage
# # This stage uses model.pkl as one of the input
# dvc stage add --force \
# -n evaluate_stage \
# -p evaluate \
# -d eval_data.csv \
# -d model.pkl \
# -d evaluate.py \
# -o metrics.json \
# python3 evaluate.py

#$ bash dvc_dag_stages_add.sh

## Visualizing a DVC pipeline

In this exercise, you will learn to use the dvc dag command with different flags to gain various insights about your project's pipeline. Understanding these flags and their effects on the dvc dag command's output will help you better manage and understand your project's pipeline.

Remember, the goal of this exercise is not just to execute the commands but to understand the nuances of the dvc dag command and how different flags alter its output.

NOTE: You can use arrow keys to scroll the terminal outputs and use the q key to exit out.

### Ide Exercise Instruction
    - Run the dvc dag command without any flags and observe the output.
    - Run the dvc dag command with the --outs flag and compare the output with the previous step.
    - Run the dvc dag command with a train stage as a target and observe how the output changes.

In [None]:
#$ dvc dag
#$ dvc dag --outs
#$ dvc dag train