# PanelDataset + CausalTensor QuickstartThis walkthrough shows how to move panel-shaped data into `PanelDataset`, clean it with `align`, `balance`, `impute`, and `scale`, and then hand it off to CausalTensor models such as `SDID` and `robust_synthetic_control`. The snippets run in pure Python and can be copied into a notebook inside this repo.

## 1. Loading A Built-In Dataset`from_builtin` reads the raw files under `src/causaltensor/datasets/raw`, keeps the entity/time labels, and returns a ready-to-chain `PanelDataset`. This path requires the optional dependency `pyreadr` for `.rda` files.

In [None]:
from causaltensor.datasets.panel_dataset import (    PanelDataset,    PanelAlignOptions,    ImputeOptions,    ScaleOptions,)panel = PanelDataset.from_builtin("smoking")print(panel.O.shape, panel.Z.shape, panel.unit_index[:5], panel.time_index[:5])

## 2. Constructing From A Long DataFrame`from_long` pivots a tidy DataFrame into matrices, builds the treatment panel (if provided), aggregates covariates per unit, and preserves the categorical labels so you can align with other datasets later.

In [None]:
import pandas as pdlong_df = pd.DataFrame(    {        "store": ["A"] * 4 + ["B"] * 4,        "year": [2018, 2019, 2020, 2021] * 2,        "sales": [13.2, 13.4, 12.1, 11.2, 16.3, 16.5, 15.9, 15.1],        "promo": [0, 0, 1, 1, 0, 0, 0, 0],        "foot_traffic": [40, 41, 42, 44, 39, 38, 36, 35],    })panel_long = PanelDataset.from_long(    data=long_df,    unit_col="store",    time_col="year",    outcome_col="sales",    treat_col="promo",    covar_cols=["foot_traffic"],)

## 3. Constructing Directly From ArraysUse `from_arrays` when the data already lives in NumPy (e.g., simulated experiments or downstream of another pipeline).

In [None]:
import numpy as npO = np.random.randn(5, 10)Z = np.zeros_like(O)Z[0, 5:] = 1  # treat the first unit after period 5X = np.random.randn(5, 3)panel_arrays = PanelDataset.from_arrays(    O=O,    Z=Z,    X=X,    unit_index=[f"unit_{i}" for i in range(O.shape[0])],    time_index=list(range(2000, 2000 + O.shape[1])),)

## 4. Align, Balance, Impute, ScaleAll `PanelDataset` methods return new instances, so they can be chained fluently.

In [None]:
target_years = list(range(1970, 2001))panel_proc = (    panel.align(        units=panel.unit_index,        times=target_years,        options=PanelAlignOptions(drop_unknown_times=True),    )    .balance()    .impute(ImputeOptions(strategy="ffill", axis="time"))    .scale(ScaleOptions(strategy="standard", with_mean=True, clip_minmax=(-6, 6))))O_proc, Z_proc, X_proc = panel_proc.to_matrices(copy=True)

- `align` reorders and optionally filters the unit/time axes (use `strict=True` to raise when requested labels are missing).- `balance` removes fully missing slices to avoid numerical issues in models that expect dense blocks.- `impute` currently supports `ffill`, `bfill`, and `median` strategies; choose the axis that matches your missingness pattern.- `scale` offers `standard`, `minmax`, `robust`, and `log1p` options plus clipping to rein in outliers.

## 5. Feeding CausalTensor Models### SDID

In [None]:
from causaltensor.cauest.SDID import SDIDtau_sdid = SDID(O=O_proc, Z=Z_proc)print(f"SDID ATE: {tau_sdid:.3f}")

The helper infers treated units/times directly from `Z`. For manual control pass `treat_units=[...]` and `starting_time=...` (indices relative to `panel_proc.unit_index` / `panel_proc.time_index`).

### Robust Synthetic Control

In [None]:
from causaltensor.cauest.RobustSyntheticControl import robust_synthetic_controlMhat, tau_rsc = robust_synthetic_control(O_proc, Z_proc)print(f"Robust SC ATE: {tau_rsc:.3f}")

The preprocessing above (balanced panels, forward filling, scaling) typically reduces the rank of donor blocks and stabilizes the SVD that `robust_synthetic_control` relies on.

## 6. Inspecting The Result

In [None]:
O_df, Z_df, X_df = panel_proc.to_dataframes()print(O_df.head())

Because the matrices stay aligned through the pipeline, you can safely join these frames with any downstream visualization or validation routines.