# DVC Pipeline Integration

This notebook provides a guide to integrating the Self-Critique pipeline with DVC (Data Version Control) for versioning prompts, datasets, and experiment metrics. This ensures full reproducibility of your results.

## Learning Objectives

- **Prompt Versioning**: Use DVC to track changes to prompt templates.
- **Dataset Versioning**: Version the benchmark dataset to ensure consistent evaluation.
- **Metrics Tracking**: Log experiment metrics with DVC for comparison.
- **Reproducibility**: Create a fully reproducible pipeline from data to results.

---


## Section 1: DVC Setup for Prompts and Data

First, we'll place our prompts and benchmark dataset under DVC control.


In [None]:
# Assumes DVC is initialized (`dvc init`)

# Create directories for prompts and data
mkdir -p ../../prompts ../../data/benchmark

# Create a prompt template
echo "Summarize the following paper: {{text}}" > ../../prompts/summary_v1.prompt

# Create a dummy benchmark dataset
echo '{"title": "Paper 1", "text": "..."}' > ../../data/benchmark/paper1.json
echo '{"title": "Paper 2", "text": "..."}' > ../../data/benchmark/paper2.json

# Add these directories to DVC
# This creates .dvc files that point to the data
dvc add ../../prompts ../../data/benchmark

# You would then commit these .dvc files to Git
# git add prompts.dvc data/benchmark.dvc
# git commit -m "Track prompts and benchmark data with DVC"

print("✓ Prompts and data directories are now tracked by DVC.")

## Section 2: DVC Pipeline for Evaluation

DVC can define and run a multi-stage pipeline. We'll create a simple evaluation pipeline that takes our prompts and data as inputs and produces metrics as an output.


In [None]:
%%writefile ../../evaluate.py
import json
import os
import random
import yaml
import sys

def evaluate(prompt_path, data_dir, metrics_file):
    """Simulates running an evaluation."""
    print(f"Using prompt: {prompt_path}")
    print(f"Using data from: {data_dir}")
    
    # Read parameters from params.yaml
    params = yaml.safe_load(open("params.yaml"))
    
    # Simulate evaluation
    avg_quality = 8.5 + params['temperature'] * 0.5 + random.uniform(-0.2, 0.2)
    
    metrics = {
        'avg_quality': avg_quality,
        'model': params['model'],
        'temperature': params['temperature']
    }
    
    os.makedirs(os.path.dirname(metrics_file), exist_ok=True)
    with open(metrics_file, 'w') as f:
        json.dump(metrics, f, indent=2)
        
    print(f"Metrics saved to {metrics_file}")

if __name__ == "__main__":
    prompt_path = sys.argv[1]
    data_dir = sys.argv[2]
    metrics_file = sys.argv[3]
    evaluate(prompt_path, data_dir, metrics_file)


### 2.1 Defining Parameters and Pipeline Stages

We'll create a `params.yaml` for parameters and a `dvc.yaml` to define the pipeline stages.


In [None]:
# Create a params.yaml to hold parameters
echo 'model: claude-sonnet-4-20250514' > ../../params.yaml
echo 'temperature: 0.3' >> ../../params.yaml

# Create the dvc.yaml to define the pipeline
cat <<EOL > ../../dvc.yaml
stages:
  evaluate:
    cmd: python evaluate.py prompts/summary_v1.prompt data/benchmark metrics.json
    deps:
    - evaluate.py
    - prompts/summary_v1.prompt
    - data/benchmark
    params:
    - model
    - temperature
    metrics:
    - metrics.json:
        cache: false
EOL

print("✓ dvc.yaml and params.yaml created.")

## Section 3: Running Experiments and Comparing Metrics

DVC makes it easy to run experiments with different parameters and compare the results.


In [None]:
# Run the pipeline for the first time
dvc exp run --name baseline

# Change a parameter and run another experiment
dvc exp run --set-param temperature=0.7 --name high-temp

# Show the results
dvc exp show


### Interpretation of `dvc exp show`

The output will show a table comparing the two experiments (`baseline` and `high-temp`). You'll see the different `temperature` parameter and the resulting `avg_quality` metric, allowing for easy comparison.

## Section 4: Rollback and Reproducibility

If an experiment shows a drop in quality, you can easily revert to a previous version of your code, data, and prompts.

```bash
# To go back to the baseline experiment
dvc exp apply baseline

# This will revert your code, params.yaml, and any data files
# to the state they were in for that experiment.
```

This ensures that anyone can reproduce your results by checking out the same Git commit and running `dvc repro`.
