# Calkit example: Notebook as a pipeline

This notebook uses the Calkit `stage` cell magic to run cells as DVC
pipeline stages, which caches their outputs for faster execution
and enhanced reproducibility.

In [1]:
%load_ext calkit.magics

## Creating a dataset from a Pandas DataFrame and saving as Parquet

In [2]:
%%stage --name get-data \
    --environment main \
    --out df:parquet:pandas \
    --out-type dataset \
    --out-title "The data" \
    --out-desc "This is just some test data."

import pandas as pd
import time

# Let's pretend this is an expensive call
time.sleep(10)

df = pd.DataFrame({"col1": range(1000)})
df.describe()

Stage 'get-data' didn't change, skipping
Data and pipelines are up to date.


Unnamed: 0,col1
count,1000.0
mean,499.5
std,288.819436
min,0.0
25%,249.75
50%,499.5
75%,749.25
max,999.0


## Creating a figure from the dataset

In [3]:
%%stage --name plot \
    --environment main \
    --dep get-data:df:parquet:pandas \
    --out fig --out-path figures/plot.png \
    --out-type figure \
    --out-title "A plot of the data" \
    --out-desc "This is a plot of the data."

import os

os.makedirs("figures", exist_ok=True)

fig = df.plot(backend="plotly")
fig.write_image("figures/plot.png")
fig

Stage 'get-data' didn't change, skipping
Stage 'plot' didn't change, skipping
Data and pipelines are up to date.
