# Streaming Examples: CSV, TSV, Parquet, pandas, polars

This notebook demonstrates how to:
- Generate small CSV/TSV/Parquet files for testing.
- Stream each format in chunks using `pysuricata.io.iter_chunks(...)`.
- Stream in-memory `pandas` and `polars` DataFrames.
- Build a report with `pysuricata.api.profile(...)` from any of these sources.

Notes:
- Parquet streaming requires `pyarrow`.
- Polars examples require `polars`.
- All computations operate on pandas DataFrame chunks under the hood.

In [1]:
import os, sys, math, random
import numpy as np
import pandas as pd

try:
    import polars as pl
except Exception:
    pl = None

try:
    import pyarrow as pa  # noqa: F401
except Exception:
    pa = None

from pysuricata.api import profile, ReportConfig
from pysuricata.io import iter_chunks

BASE = 'examples'
os.makedirs(BASE, exist_ok=True)
CSV_PATH = os.path.join(BASE, 'demo_users.csv')
TSV_PATH = os.path.join(BASE, 'demo_users.tsv')
PARQUET_PATH = os.path.join(BASE, 'demo_users.parquet')

def tiny_print(title, msg):
    print(f'[{title}] {msg}')


## Generate synthetic dataset and write CSV/TSV/Parquet

In [2]:
rng = np.random.default_rng(42)
N = 10_000
countries = np.array(['US','ES','DE','FR','GB','BR','IN','CN','JP'])
df = pd.DataFrame({
    'id': np.arange(N),
    'amount': rng.normal(100.0, 15.0, size=N).round(2),
    'country': countries[rng.integers(0, len(countries), size=N)],
    'flag': rng.integers(0, 2, size=N).astype(bool),
    'ts': pd.to_datetime(1.6e9 + rng.integers(0, 86_400*30, size=N), unit='s', utc=True),
})

# Save CSV/TSV
df.to_csv(CSV_PATH, index=False)
df.to_csv(TSV_PATH, index=False, sep='	')

# Save Parquet when pyarrow is available
if pa is not None:
    df.to_parquet(PARQUET_PATH, index=False)
    tiny_print('PARQUET', f'Wrote {PARQUET_PATH}')
else:
    tiny_print('PARQUET', 'pyarrow not available, skipping write')

tiny_print('CSV', f'Wrote {CSV_PATH}')
tiny_print('TSV', f'Wrote {TSV_PATH}')
df.head()


[PARQUET] pyarrow not available, skipping write
[CSV] Wrote examples/demo_users.csv
[TSV] Wrote examples/demo_users.tsv


Unnamed: 0,id,amount,country,flag,ts
0,0,104.57,CN,True,2020-09-20 07:28:31+00:00
1,1,84.4,CN,False,2020-09-20 21:37:14+00:00
2,2,111.26,IN,False,2020-10-09 08:42:53+00:00
3,3,114.11,DE,True,2020-10-01 23:16:13+00:00
4,4,70.73,BR,True,2020-10-13 07:43:08+00:00


## Chunked CSV with `iter_chunks`

In [3]:
from pysuricata.io import iter_chunks
rows = 0
for i, ch in enumerate(iter_chunks(CSV_PATH, chunk_size=2_000, read_csv_kwargs={'sep': ','})):
    rows += len(ch)
    if i < 2:
        tiny_print('CSV-CHUNK', f'chunk {i+1} shape={ch.shape}')
tiny_print('CSV-CHUNK', f'total rows seen={rows}')

rep_csv = profile(CSV_PATH, config=ReportConfig())
rep_csv.save_html(os.path.join(BASE, 'report_csv.html'))
tiny_print('CSV-REPORT', 'saved examples/report_csv.html')
rep_csv  # displays in notebook


2025-09-05 13:47:14,044 INFO pysuricata.report_v2: Starting report generation: source=examples/demo_users.csv
2025-09-05 13:47:14,044 INFO pysuricata.report_v2: chunk_size=200000, uniques_k=2048, numeric_sample_k=20000, topk_k=50
2025-09-05 13:47:14,044 INFO pysuricata.report_v2: ▶ Build chunk iterator...
2025-09-05 13:47:14,045 INFO pysuricata.report_v2: ✓ Build chunk iterator done (0.00s)
2025-09-05 13:47:14,045 INFO pysuricata.report_v2: ▶ Read first chunk...
2025-09-05 13:47:14,052 INFO pysuricata.report_v2: ✓ Read first chunk done (0.01s)
2025-09-05 13:47:14,077 INFO pysuricata.report_v2: ▶ Infer kinds & build accumulators...
2025-09-05 13:47:14,092 INFO pysuricata.report_v2: ✓ Infer kinds & build accumulators done (0.02s)
2025-09-05 13:47:14,094 INFO pysuricata.report_v2: ▶ Consume first chunk...
2025-09-05 13:47:14,197 INFO pysuricata.report_v2: ✓ Consume first chunk done (0.10s)
2025-09-05 13:47:14,197 INFO pysuricata.report_v2: kinds: 2 numeric, 1 categorical, 1 datetime, 1 bo

[CSV-CHUNK] chunk 1 shape=(2000, 5)
[CSV-CHUNK] chunk 2 shape=(2000, 5)
[CSV-CHUNK] total rows seen=10000


2025-09-05 13:47:14,246 INFO pysuricata.report_v2: ✓ Render final HTML done (0.02s)
2025-09-05 13:47:14,246 INFO pysuricata.report_v2: ✓ Render Variables section done (0.03s)
2025-09-05 13:47:14,259 INFO pysuricata.report_v2: Report generation complete in 0.22s


[CSV-REPORT] saved examples/report_csv.html


Unnamed: 0,id,amount,country,flag,ts
1505,1505,107.48,ES,True,2020-10-06 00:01:32+00:00
2614,2614,72.15,BR,True,2020-09-14 06:33:45+00:00
1757,1757,134.01,US,True,2020-09-23 20:40:03+00:00
6669,6669,86.34,FR,True,2020-09-15 14:16:58+00:00
4731,4731,95.62,FR,False,2020-10-02 22:15:03+00:00
2504,2504,118.27,US,False,2020-09-30 00:51:35+00:00
1271,1271,121.74,JP,True,2020-10-13 09:46:35+00:00
8096,8096,96.79,IN,True,2020-09-28 11:15:03+00:00
5352,5352,114.87,DE,False,2020-09-15 07:51:44+00:00
1822,1822,118.33,FR,True,2020-09-21 18:30:01+00:00

0,1
Count,10000
Unique,"9,864 (≈)"
Missing,0 (0.0%)
Outliers,0 (0.0%)
Zeros,1 (0.0%)
Infinites,0 (0.0%)
Negatives,0 (0.0%)

0,1
Min,0
Median,5000
Mean,5000
Max,9999
Q1,2500
Q3,7499
Processed bytes,78.1 KB (≈)

0,1
Min,0.0
P1,99.99
P5,500.0
P10,999.9
Q1 (P25),2500.0
Median (P50),5000.0
Q3 (P75),7499.0
P90,8999.0
P95,9499.0
P99,9899.0

0,1
Mean,5000
Std,2887
Variance,8.334e+06
SE (mean),28.87
95% CI (mean),"4,943 – 5,056"
Coeff. Variation,0.5774
Geo-Mean,
MAD,2500
Skewness,0
Kurtosis (excess),-1.2

0,1
0,0
1,1
2,2
3,3
4,4

0,1
9999,9999
9998,9998
9997,9997
9996,9996
9995,9995

0,1
Count,10000
Unique,"9,686 (≈)"
Missing,0 (0.0%)
Outliers,95 (0.9%)
Zeros,0 (0.0%)
Infinites,0 (0.0%)
Negatives,0 (0.0%)

0,1
Min,34.16
Median,99.8
Mean,99.85
Max,160.4
Q1,89.84
Q3,109.8
Processed bytes,78.1 KB (≈)

0,1
Min,34.16
P1,64.39
P5,74.89
P10,80.3
Q1 (P25),89.84
Median (P50),99.8
Q3 (P75),109.8
P90,119.2
P95,124.6
P99,135.3

0,1
Mean,99.85
Std,15.1
Variance,227.9
SE (mean),0.151
95% CI (mean),99.55 – 100.1
Coeff. Variation,0.1512
Geo-Mean,98.67
MAD,9.955
Skewness,-0.003365
Kurtosis (excess),0.05268

0,1
5627,34.16
949,45.27
8617,47.06
5759,48.87
9208,51.05

0,1
8091,160.4
4141,151.8
9047,151.5
7138,149.4
4222,149.1

0,1
5627,34.16
8091,160.4
949,45.27
8617,47.06
4141,151.8

0,1
Count,10000
Unique,"6,522 (≈)"
Missing,0 (0.0%)
Mode,CN
Mode %,11.6%
Processed bytes,19.5 KB (≈)

0,1
Entropy,3.169
Rare levels,0 (0.0%)
Top 5 coverage,56.7%
Label length (avg),2
Length p90,2
Empty strings,0

0,1
Count,10000
Missing,0 (0.0%)
Unique,2
Processed bytes,9.9 KB (≈)

0,1
True,"5,022 (50.2%)"
False,"4,978 (49.8%)"

0,1
Count,10000
Missing,0 (0.0%)
Min,2020-09-13T12:26:47Z
Max,2020-10-13T12:25:54Z
Processed bytes,722.8 KB (≈)

0,1
Hour,▇▇▇▇▇▇▇▇▆▇█▆▇▇▇▇▇▇▇▇▇▇▇▇
Day of week,█▇▆▆▆▆▇
Month,▁▁▁▁▁▁▁▁█▆▁▁

Hour,Count
0,401
1,403
2,399
3,412
4,410
5,432
6,400
7,449
8,381
9,417

Day,Count
Mon,1708
Tue,1553
Wed,1286
Thu,1370
Fri,1331
Sat,1268
Sun,1484

Month,Count
Jan,0
Feb,0
Mar,0
Apr,0
May,0
Jun,0
Jul,0
Aug,0
Sep,5822
Oct,4178


## Chunked TSV with `iter_chunks`

In [4]:
rows = 0
for i, ch in enumerate(iter_chunks(TSV_PATH, chunk_size=2_000, read_csv_kwargs={'sep': '	'})):
    rows += len(ch)
    if i < 2:
        tiny_print('TSV-CHUNK', f'chunk {i+1} shape={ch.shape}')
tiny_print('TSV-CHUNK', f'total rows seen={rows}')

rep_tsv = profile(TSV_PATH, config=ReportConfig())
rep_tsv.save_html(os.path.join(BASE, 'report_tsv.html'))
tiny_print('TSV-REPORT', 'saved examples/report_tsv.html')
rep_tsv


2025-09-05 13:47:14,321 INFO pysuricata.report_v2: Starting report generation: source=examples/demo_users.tsv
2025-09-05 13:47:14,321 INFO pysuricata.report_v2: chunk_size=200000, uniques_k=2048, numeric_sample_k=20000, topk_k=50
2025-09-05 13:47:14,322 INFO pysuricata.report_v2: ▶ Build chunk iterator...
2025-09-05 13:47:14,322 INFO pysuricata.report_v2: ✓ Build chunk iterator done (0.00s)
2025-09-05 13:47:14,322 INFO pysuricata.report_v2: ▶ Read first chunk...
2025-09-05 13:47:14,327 INFO pysuricata.report_v2: ✓ Read first chunk done (0.00s)
2025-09-05 13:47:14,345 INFO pysuricata.report_v2: ▶ Infer kinds & build accumulators...
2025-09-05 13:47:14,357 INFO pysuricata.report_v2: ✓ Infer kinds & build accumulators done (0.01s)
2025-09-05 13:47:14,357 INFO pysuricata.report_v2: ▶ Consume first chunk...
2025-09-05 13:47:14,451 INFO pysuricata.report_v2: ✓ Consume first chunk done (0.09s)
2025-09-05 13:47:14,451 INFO pysuricata.report_v2: kinds: 2 numeric, 1 categorical, 1 datetime, 1 bo

[TSV-CHUNK] chunk 1 shape=(2000, 5)
[TSV-CHUNK] chunk 2 shape=(2000, 5)
[TSV-CHUNK] total rows seen=10000


2025-09-05 13:47:14,471 INFO pysuricata.report_v2: ▶ Render Variables section...
2025-09-05 13:47:14,471 INFO pysuricata.report_v2: rendered 5 variable cards
2025-09-05 13:47:14,471 INFO pysuricata.report_v2: ▶ Render final HTML...
2025-09-05 13:47:14,491 INFO pysuricata.report_v2: ✓ Render final HTML done (0.02s)
2025-09-05 13:47:14,491 INFO pysuricata.report_v2: ✓ Render Variables section done (0.02s)
2025-09-05 13:47:14,506 INFO pysuricata.report_v2: Report generation complete in 0.19s


[TSV-REPORT] saved examples/report_tsv.html


Unnamed: 0,id,amount,country,flag,ts
9128,9128,108.93,ES,False,2020-09-16 20:55:13+00:00
5237,5237,105.04,ES,False,2020-09-26 17:24:49+00:00
2885,2885,104.55,DE,True,2020-10-07 16:12:25+00:00
6652,6652,87.44,US,True,2020-10-12 07:58:55+00:00
3332,3332,71.72,GB,False,2020-09-24 10:04:08+00:00
7443,7443,83.3,BR,True,2020-09-22 21:14:45+00:00
266,266,102.33,FR,True,2020-09-15 10:45:15+00:00
7460,7460,105.26,BR,True,2020-09-30 05:33:30+00:00
375,375,134.91,ES,False,2020-09-28 18:29:50+00:00
3763,3763,108.85,DE,True,2020-09-20 10:28:50+00:00

0,1
Count,10000
Unique,"9,864 (≈)"
Missing,0 (0.0%)
Outliers,0 (0.0%)
Zeros,1 (0.0%)
Infinites,0 (0.0%)
Negatives,0 (0.0%)

0,1
Min,0
Median,5000
Mean,5000
Max,9999
Q1,2500
Q3,7499
Processed bytes,78.1 KB (≈)

0,1
Min,0.0
P1,99.99
P5,500.0
P10,999.9
Q1 (P25),2500.0
Median (P50),5000.0
Q3 (P75),7499.0
P90,8999.0
P95,9499.0
P99,9899.0

0,1
Mean,5000
Std,2887
Variance,8.334e+06
SE (mean),28.87
95% CI (mean),"4,943 – 5,056"
Coeff. Variation,0.5774
Geo-Mean,
MAD,2500
Skewness,0
Kurtosis (excess),-1.2

0,1
0,0
1,1
2,2
3,3
4,4

0,1
9999,9999
9998,9998
9997,9997
9996,9996
9995,9995

0,1
Count,10000
Unique,"9,686 (≈)"
Missing,0 (0.0%)
Outliers,95 (0.9%)
Zeros,0 (0.0%)
Infinites,0 (0.0%)
Negatives,0 (0.0%)

0,1
Min,34.16
Median,99.8
Mean,99.85
Max,160.4
Q1,89.84
Q3,109.8
Processed bytes,78.1 KB (≈)

0,1
Min,34.16
P1,64.39
P5,74.89
P10,80.3
Q1 (P25),89.84
Median (P50),99.8
Q3 (P75),109.8
P90,119.2
P95,124.6
P99,135.3

0,1
Mean,99.85
Std,15.1
Variance,227.9
SE (mean),0.151
95% CI (mean),99.55 – 100.1
Coeff. Variation,0.1512
Geo-Mean,98.67
MAD,9.955
Skewness,-0.003365
Kurtosis (excess),0.05268

0,1
5627,34.16
949,45.27
8617,47.06
5759,48.87
9208,51.05

0,1
8091,160.4
4141,151.8
9047,151.5
7138,149.4
4222,149.1

0,1
5627,34.16
8091,160.4
949,45.27
8617,47.06
4141,151.8

0,1
Count,10000
Unique,"6,522 (≈)"
Missing,0 (0.0%)
Mode,CN
Mode %,11.6%
Processed bytes,19.5 KB (≈)

0,1
Entropy,3.169
Rare levels,0 (0.0%)
Top 5 coverage,56.7%
Label length (avg),2
Length p90,2
Empty strings,0

0,1
Count,10000
Missing,0 (0.0%)
Unique,2
Processed bytes,9.9 KB (≈)

0,1
True,"5,022 (50.2%)"
False,"4,978 (49.8%)"

0,1
Count,10000
Missing,0 (0.0%)
Min,2020-09-13T12:26:47Z
Max,2020-10-13T12:25:54Z
Processed bytes,722.8 KB (≈)

0,1
Hour,▇▇▇▇▇▇▇▇▆▇█▆▇▇▇▇▇▇▇▇▇▇▇▇
Day of week,█▇▆▆▆▆▇
Month,▁▁▁▁▁▁▁▁█▆▁▁

Hour,Count
0,401
1,403
2,399
3,412
4,410
5,432
6,400
7,449
8,381
9,417

Day,Count
Mon,1708
Tue,1553
Wed,1286
Thu,1370
Fri,1331
Sat,1268
Sun,1484

Month,Count
Jan,0
Feb,0
Mar,0
Apr,0
May,0
Jun,0
Jul,0
Aug,0
Sep,5822
Oct,4178


## Chunked Parquet with `iter_chunks` (pyarrow)

In [5]:
if pa is not None and os.path.exists(PARQUET_PATH):
    rows = 0
    for i, ch in enumerate(iter_chunks(PARQUET_PATH, chunk_size=2_000)):
        rows += len(ch)
        if i < 2:
            tiny_print('PQ-CHUNK', f'chunk {i+1} shape={ch.shape}')
    tiny_print('PQ-CHUNK', f'total rows seen={rows}')
    
    rep_pq = profile(PARQUET_PATH, config=ReportConfig())
    rep_pq.save_html(os.path.join(BASE, 'report_parquet.html'))
    tiny_print('PQ-REPORT', 'saved examples/report_parquet.html')
    rep_pq
else:
    tiny_print('PQ', 'pyarrow not available or parquet not written; skipping')


[PQ] pyarrow not available or parquet not written; skipping


## In-memory pandas DataFrame (chunked)

In [6]:
rows = 0
for i, ch in enumerate(iter_chunks(df, chunk_size=1_500)):
    rows += len(ch)
    if i < 2:
        tiny_print('PD-CHUNK', f'chunk {i+1} shape={ch.shape}')
tiny_print('PD-CHUNK', f'total rows seen={rows}')

rep_pd = profile(iter_chunks(df, chunk_size=2_000))
rep_pd.save_html(os.path.join(BASE, 'report_pandas_inmemory.html'))
tiny_print('PD-REPORT', 'saved examples/report_pandas_inmemory.html')
rep_pd


2025-09-05 13:47:14,558 INFO pysuricata.report_v2: Starting report generation: source=DataFrame
2025-09-05 13:47:14,558 INFO pysuricata.report_v2: chunk_size=200000, uniques_k=2048, numeric_sample_k=20000, topk_k=50
2025-09-05 13:47:14,559 INFO pysuricata.report_v2: ▶ Build chunk iterator...
2025-09-05 13:47:14,559 INFO pysuricata.report_v2: ✓ Build chunk iterator done (0.00s)
2025-09-05 13:47:14,559 INFO pysuricata.report_v2: ▶ Read first chunk...
2025-09-05 13:47:14,559 INFO pysuricata.report_v2: ✓ Read first chunk done (0.00s)
2025-09-05 13:47:14,567 INFO pysuricata.report_v2: ▶ Infer kinds & build accumulators...
2025-09-05 13:47:14,569 INFO pysuricata.report_v2: ✓ Infer kinds & build accumulators done (0.00s)
2025-09-05 13:47:14,569 INFO pysuricata.report_v2: ▶ Consume first chunk...
2025-09-05 13:47:14,589 INFO pysuricata.report_v2: ✓ Consume first chunk done (0.02s)
2025-09-05 13:47:14,589 INFO pysuricata.report_v2: kinds: 2 numeric, 1 categorical, 1 datetime, 1 boolean
2025-09-

[PD-CHUNK] chunk 1 shape=(1500, 5)
[PD-CHUNK] chunk 2 shape=(1500, 5)
[PD-CHUNK] total rows seen=10000


2025-09-05 13:47:14,725 INFO pysuricata.report_v2: ✓ Render final HTML done (0.02s)
2025-09-05 13:47:14,725 INFO pysuricata.report_v2: ✓ Render Variables section done (0.02s)
2025-09-05 13:47:14,737 INFO pysuricata.report_v2: Report generation complete in 0.18s


[PD-REPORT] saved examples/report_pandas_inmemory.html


Unnamed: 0,id,amount,country,flag,ts
1495,1495,83.44,DE,False,2020-10-05 20:20:43+00:00
341,341,104.57,ES,True,2020-09-22 12:51:32+00:00
412,412,89.83,US,True,2020-09-17 15:16:45+00:00
1090,1090,91.06,ES,False,2020-10-04 10:15:55+00:00
837,837,121.41,CN,True,2020-10-11 18:07:28+00:00
95,95,78.29,CN,False,2020-09-29 14:24:01+00:00
491,491,105.5,JP,False,2020-10-03 18:56:05+00:00
1978,1978,123.63,IN,False,2020-09-21 14:34:11+00:00
1515,1515,103.15,IN,True,2020-10-09 23:22:58+00:00
94,94,74.69,DE,True,2020-09-15 14:26:52+00:00

0,1
Count,10000
Unique,"9,864 (≈)"
Missing,0 (0.0%)
Outliers,0 (0.0%)
Zeros,1 (0.0%)
Infinites,0 (0.0%)
Negatives,0 (0.0%)

0,1
Min,0
Median,5000
Mean,5000
Max,9999
Q1,2500
Q3,7499
Processed bytes,78.1 KB (≈)

0,1
Min,0.0
P1,99.99
P5,500.0
P10,999.9
Q1 (P25),2500.0
Median (P50),5000.0
Q3 (P75),7499.0
P90,8999.0
P95,9499.0
P99,9899.0

0,1
Mean,5000
Std,2887
Variance,8.334e+06
SE (mean),28.87
95% CI (mean),"4,943 – 5,056"
Coeff. Variation,0.5774
Geo-Mean,
MAD,2500
Skewness,0
Kurtosis (excess),-1.2

0,1
0,0
1,1
2,2
3,3
4,4

0,1
9999,9999
9998,9998
9997,9997
9996,9996
9995,9995

0,1
Count,10000
Unique,"9,686 (≈)"
Missing,0 (0.0%)
Outliers,95 (0.9%)
Zeros,0 (0.0%)
Infinites,0 (0.0%)
Negatives,0 (0.0%)

0,1
Min,34.16
Median,99.8
Mean,99.85
Max,160.4
Q1,89.84
Q3,109.8
Processed bytes,78.1 KB (≈)

0,1
Min,34.16
P1,64.39
P5,74.89
P10,80.3
Q1 (P25),89.84
Median (P50),99.8
Q3 (P75),109.8
P90,119.2
P95,124.6
P99,135.3

0,1
Mean,99.85
Std,15.1
Variance,227.9
SE (mean),0.151
95% CI (mean),99.55 – 100.1
Coeff. Variation,0.1512
Geo-Mean,98.67
MAD,9.955
Skewness,-0.003365
Kurtosis (excess),0.05268

0,1
5627,34.16
949,45.27
8617,47.06
5759,48.87
9208,51.05

0,1
8091,160.4
4141,151.8
9047,151.5
7138,149.4
4222,149.1

0,1
5627,34.16
8091,160.4
949,45.27
8617,47.06
4141,151.8

0,1
Count,10000
Unique,"6,522 (≈)"
Missing,0 (0.0%)
Mode,CN
Mode %,11.6%
Processed bytes,19.5 KB (≈)

0,1
Entropy,3.169
Rare levels,0 (0.0%)
Top 5 coverage,56.7%
Label length (avg),2
Length p90,2
Empty strings,0

0,1
Count,10000
Missing,0 (0.0%)
Unique,2
Processed bytes,10.4 KB (≈)

0,1
True,"5,022 (50.2%)"
False,"4,978 (49.8%)"

0,1
Count,10000
Missing,0 (0.0%)
Min,2020-09-13T12:26:47Z
Max,2020-10-13T12:25:54Z
Processed bytes,78.8 KB (≈)

0,1
Hour,▇▇▇▇▇▇▇▇▆▇█▆▇▇▇▇▇▇▇▇▇▇▇▇
Day of week,█▇▆▆▆▆▇
Month,▁▁▁▁▁▁▁▁█▆▁▁

Hour,Count
0,401
1,403
2,399
3,412
4,410
5,432
6,400
7,449
8,381
9,417

Day,Count
Mon,1708
Tue,1553
Wed,1286
Thu,1370
Fri,1331
Sat,1268
Sun,1484

Month,Count
Jan,0
Feb,0
Mar,0
Apr,0
May,0
Jun,0
Jul,0
Aug,0
Sep,5822
Oct,4178


## In-memory polars DataFrame (chunked)

In [7]:
if pl is not None:
    pl_df = pl.from_pandas(df)
    rows = 0
    for i, ch in enumerate(iter_chunks(pl_df, chunk_size=2_000)):
        rows += len(ch)
        if i < 2:
            tiny_print('PL-CHUNK', f'chunk {i+1} shape={ch.shape}')
    tiny_print('PL-CHUNK', f'total rows seen={rows}')
    
    rep_pl = profile(iter_chunks(pl_df, chunk_size=2_000))
    rep_pl.save_html(os.path.join(BASE, 'report_polars_inmemory.html'))
    tiny_print('PL-REPORT', 'saved examples/report_polars_inmemory.html')
    rep_pl
else:
    tiny_print('POLARS', 'polars not available; skipping')


[POLARS] polars not available; skipping


## Optional: polars LazyFrame (chunked)

In [8]:
if pl is not None:
    lf = pl.from_pandas(df).lazy()
    rep_pl_lazy = profile(iter_chunks(lf, chunk_size=2_000))
    rep_pl_lazy.save_html(os.path.join(BASE, 'report_polars_lazyframe.html'))
    tiny_print('PL-LAZY', 'saved examples/report_polars_lazyframe.html')
    rep_pl_lazy
else:
    tiny_print('POLARS', 'polars not available; skipping lazy example')


[POLARS] polars not available; skipping lazy example
