# PySuricata — Polars Example

This notebook demonstrates how to use PySuricata with polars DataFrames and LazyFrames.

## 1. Installation

```bash
pip install pysuricata[polars]
```

## 2. Basic Report with Polars DataFrame

In [1]:
import polars as pl
from pysuricata import profile

# Load the Titanic dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pl.read_csv(url)
print(f"Loaded {df.shape[0]} rows × {df.shape[1]} columns")
df.head()

Loaded 891 rows × 12 columns


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Owen Harris""","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. John Bradley (Fl…","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Miss. Laina""","""female""",26.0,0,0,"""STON/O2. 3101282""",7.925,,"""S"""
4,1,1,"""Futrelle, Mrs. Jacques Heath (…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S"""
5,0,3,"""Allen, Mr. William Henry""","""male""",35.0,0,0,"""373450""",8.05,,"""S"""


In [2]:
# Generate and save the report
report = profile(df)
report.save_html("titanic_polars_report.html")
print("Report saved to titanic_polars_report.html")

Report saved to titanic_polars_report.html


## 3. LazyFrame Support

PySuricata automatically collects LazyFrames before profiling:

In [3]:
# Create a LazyFrame with some transformations
lazy = (
    pl.scan_csv(url)
    .filter(pl.col("Age").is_not_null())
    .with_columns(pl.col("Fare").round(2))
)

print(f"LazyFrame schema: {lazy.schema}")

# PySuricata handles the collection automatically
report = profile(lazy)
report.save_html("titanic_lazy_report.html")
print("LazyFrame report saved")

  print(f"LazyFrame schema: {lazy.schema}")


LazyFrame schema: Schema({'PassengerId': Int64, 'Survived': Int64, 'Pclass': Int64, 'Name': String, 'Sex': String, 'Age': Float64, 'SibSp': Int64, 'Parch': Int64, 'Ticket': String, 'Fare': Float64, 'Cabin': String, 'Embarked': String})


LazyFrame report saved


## 4. Statistics Only

In [4]:
from pysuricata import summarize

stats = summarize(df)

print(f"Rows: {stats['dataset']['rows_est']}")
print(f"Columns: {stats['dataset']['cols']}")
print(f"Missing cells: {stats['dataset']['missing_cells_pct']:.1f}%")

Rows: 891
Columns: 12
Missing cells: 8.1%


In [5]:
# Column-level overview
for col_name, col_stats in stats["columns"].items():
    col_type = col_stats.get("type", "unknown")
    missing = col_stats.get("missing_pct", 0)
    print(f"  {col_name:20s}  type={col_type:12s}  missing={missing:.1f}%")

  PassengerId           type=numeric       missing=0.0%
  Survived              type=categorical   missing=0.0%
  Pclass                type=categorical   missing=0.0%
  Name                  type=categorical   missing=0.0%
  Sex                   type=categorical   missing=0.0%
  Age                   type=numeric       missing=0.0%
  SibSp                 type=categorical   missing=0.0%
  Parch                 type=categorical   missing=0.0%
  Ticket                type=categorical   missing=0.0%
  Fare                  type=numeric       missing=0.0%
  Cabin                 type=categorical   missing=0.0%
  Embarked              type=categorical   missing=0.0%


## 5. Custom Configuration

In [6]:
from pysuricata import ReportConfig

config = ReportConfig()
config.compute.chunk_size = 500
config.compute.random_seed = 42
config.render.title = "Polars Titanic Analysis"

report = profile(df, config=config)
report.save_html("titanic_polars_custom.html")
print("Custom polars report saved")

Custom polars report saved
