Getting Started – Running the CLI (`main.py`)

This section focuses on running the main entrypoint in different ways and with different control parameters.


CLI usage examples

Run from the repo root (PowerShell):

```powershell
# Minimal
python .\main.py --input X_train.parquet --output-pred X_train_predictors.parquet --output-meta X_train_metadata.parquet

# More bootstrap reps, verbose
python .\main.py -i X_train.parquet -op X_train_predictors.parquet -om X_train_metadata.parquet -b 120 -v

# Enable energy tests with custom permutations and max-n
python .\main.py -i X_train.parquet -op X_train_predictors.parquet -om X_train_metadata.parquet --energy-enable -ep 80 -emn 500

# Limit parallelism and set seed
python .\main.py -i X_train.parquet -op X_train_predictors.parquet -om X_train_metadata.parquet -j 2 -s 42
```


In [None]:
# Run main.py programmatically (no shell)
import sys, subprocess

def run_main(args):
    cmd = [sys.executable, "main.py", *args]
    print("Running:", " ".join(cmd))
    result = subprocess.run(cmd, capture_output=False, check=True)

# Example: minimal
run_main(["--input", "X_train.parquet",
          "--output-pred", "X_train_predictors.parquet",
          "--output-meta", "X_train_metadata.parquet"])


In [None]:
# Parameter variations: loop through different configurations
from itertools import product

inputs = ["X_train.parquet"]
boots = [40, 80]
energy_flags = [False, True]
energy_perms = [40]
max_n = [400]
seeds = [123, 42]

for input_path, b, eflag, eperm, emn, seed in product(inputs, boots, energy_flags, energy_perms, max_n, seeds):
    args = [
        "--input", input_path,
        "--output-pred", f"predictors_b{b}_e{int(eflag)}_s{seed}.parquet",
        "--output-meta", f"metadata_b{b}_e{int(eflag)}_s{seed}.parquet",
        "--bootstrap", str(b),
        "--seed", str(seed)
    ]
    if eflag:
        args += ["--energy-enable", "--energy-perm", str(eperm), "--energy-max-n", str(emn)]
    run_main(args)


## Parameter reference (selected)

- `--input` / `-i`: Input Parquet path (schema: MultiIndex `[id, time]`, columns `value`, `period`).
- `--output-pred` / `-op`: Output Parquet path for predictors.
- `--output-meta` / `-om`: Output Parquet path for metadata.
- `--bootstrap` / `-b`: Bootstrap replicates for conditional tests.
- `--energy-enable`: Enable energy-distance test (slower).
- `--energy-perm` / `-ep`: Permutations for energy test when enabled.
- `--energy-max-n` / `-emn`: Max samples per period for energy test.
- `--n-jobs` / `-j`: Parallel workers (defaults to CPU-1).
- `--seed` / `-s`: Global seed.
- `--verbose` / `-v`: Verbose logging.
