# Synthetic Data Generator â€” Usage Demo

This notebook demonstrates how to use the Synthetic Dataset Generator in **pyezml**.

## ðŸ“¦ Step 1 â€” Install (if needed)
Uncomment and run the cell below if pyezml is not installed.

In [None]:
# !pip install pyezml

## ðŸš€ Step 2 â€” Quick One-Line Dataset
Generate a rich synthetic dataset in one line.

In [None]:
from ezml.datasets.synthetic import make_mathematical_synthetic_data

df = make_mathematical_synthetic_data(n_samples=1000)
df.head()

## ðŸ§ª Step 3 â€” Custom Distribution Schema
Create your own dataset using specific statistical distributions.

In [None]:
from ezml.datasets.synthetic import SyntheticDatasetGenerator

schema = {
    "age": {"distribution": "normal", "loc": 30, "scale": 5},
    "salary": {"distribution": "lognormal", "mean": 10, "sigma": 0.4},
    "purchases": {"distribution": "poisson", "lam": 4},
}

gen = SyntheticDatasetGenerator(n_samples=1000, random_state=42)
df_custom = gen.from_distributions(schema)
df_custom.head()

## ðŸ§® Step 4 â€” Add Mathematical Features
Automatically create polynomial, trigonometric, and interaction features.

In [None]:
df_math = gen.add_mathematical_features(
    df_custom,
    degree=3,
    include_interactions=True,
    include_trig=True,
)

df_math.shape

## ðŸŽ¯ Step 5 â€” Add Synthetic Target
Generate a regression or classification target.

In [None]:
df_target = gen.add_target(
    df_math,
    task="classification",
    target_name="target",
    noise=0.3,
)

df_target["target"].value_counts()

## ðŸ“Œ Step 6 â€” See Supported Distributions
List all available statistical distributions.

In [None]:
from ezml import list_supported_distributions

list_supported_distributions()

---
âœ… You are now ready to generate rich synthetic datasets with **pyezml**!

Use these datasets to test models, benchmark pipelines, or practice machine learning.