# PySuricata — Polars Example

This notebook demonstrates how to use PySuricata with polars DataFrames and LazyFrames.

## 1. Installation

```bash
pip install pysuricata[polars]
```

## 2. Basic Report with Polars DataFrame

In [None]:
import polars as pl
from pysuricata import profile

# Load the Titanic dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pl.read_csv(url)
print(f"Loaded {df.shape[0]} rows × {df.shape[1]} columns")
df.head()

In [None]:
# Generate and save the report
report = profile(df)
report.save_html("titanic_polars_report.html")
print("Report saved to titanic_polars_report.html")

## 3. LazyFrame Support

PySuricata automatically collects LazyFrames before profiling:

In [None]:
# Create a LazyFrame with some transformations
lazy = (
    pl.scan_csv(url)
    .filter(pl.col("Age").is_not_null())
    .with_columns(pl.col("Fare").round(2))
)

print(f"LazyFrame schema: {lazy.schema}")

# PySuricata handles the collection automatically
report = profile(lazy)
report.save_html("titanic_lazy_report.html")
print("LazyFrame report saved")

## 4. Statistics Only

In [None]:
from pysuricata import summarize

stats = summarize(df)

print(f"Rows: {stats['dataset']['row_count']}")
print(f"Columns: {stats['dataset']['column_count']}")
print(f"Missing cells: {stats['dataset']['missing_cells_pct']:.1f}%")

In [None]:
# Column-level overview
for col_name, col_stats in stats["columns"].items():
    col_type = col_stats.get("type", "unknown")
    missing = col_stats.get("missing_pct", 0)
    print(f"  {col_name:20s}  type={col_type:12s}  missing={missing:.1f}%")

## 5. Custom Configuration

In [None]:
from pysuricata import ReportConfig

config = ReportConfig()
config.compute.chunk_size = 500
config.compute.random_seed = 42
config.render.title = "Polars Titanic Analysis"

report = profile(df, config=config)
report.save_html("titanic_polars_custom.html")
print("Custom polars report saved")