# Polars Proc Compare Example

This notebook demonstrates how to use the Polars Proc Compare library to compare datasets.

In [1]:
import polars as pl
from IPython.display import HTML
from pathlib import Path
from polars_proc_compare import DataCompare
from polars_proc_compare.data_generator import create_delta_dataset

# Create results directory if it doesn't exist
results_dir = Path("results")
results_dir.mkdir(exist_ok=True)

## Create Sample Datasets

First, let's create a sample dataset and a modified version with known differences.

In [2]:
# Create base dataset
base_df = pl.DataFrame({
    "id": range(1, 1001),
    "name": [f"name_{i}" for i in range(1, 1001)],
    "value": [float(i) for i in range(1, 1001)],
    "category": ["A" if i % 2 == 0 else "B" for i in range(1, 1001)]
})

# Create comparison dataset with 5% differences
compare_df, modifications = create_delta_dataset(
    base_df,
    delta_percentage=5.0,
    seed=42,
    exclude_columns=["id"]
)

print("Modification Statistics:")
for key, value in modifications.items():
    print(f"{key}: {value}")

Modification Statistics:
total_rows: 1000
total_columns: 3
total_cells: 3000
modified_cells: 150
modified_columns: {'value': 43, 'category': 58, 'name': 49}
modified_rows: 140


## Compare Datasets

Now let's compare the datasets using DataCompare and display the results directly in the notebook.

In [3]:
# Create comparison object
dc = DataCompare(base_df, compare_df, key_columns=["id"])

# Run comparison
results = dc.compare()

# Generate HTML report
report_path = results_dir / "comparison_report.html"
results.to_html(report_path)

# Display the HTML report in the notebook
HTML(report_path.read_text())