# Stable DI: SimulationSpec + Component Environment

This notebook demonstrates two stable ways to build and run plume navigation environments:

- Create a component-based environment via the DI factory (`create_component_environment`)
- Build an `(env, policy)` pair from a `SimulationSpec` using `prepare()` â€” including spec-declared observation wrappers

> Headless-safe: no interactive rendering required; prints basic info during short runs.


In [None]:
# Basic environment info (optional)
import plume_nav_sim as pns

info = pns.get_package_info(include_registration_status=False)
info["package_name"], info["package_version"]

## 1) Component environment via DI factory

Create a fully-configured environment using `create_component_environment(...)`.


In [None]:
from plume_nav_sim.envs.factory import create_component_environment

env = create_component_environment(
    grid_size=(32, 32),
    goal_location=(16, 16),
    action_type="discrete",  # or 'oriented' / 'run_tumble'
    observation_type="concentration",  # scalar odor at agent position
    reward_type="sparse",  # success when within goal_radius
    plume_sigma=10.0,
    render_mode=None,
)
obs, info = env.reset(seed=42)
print("reset info keys:", sorted(info.keys()))

steps = 0
terminated = truncated = False
while not (terminated or truncated) and steps < 10:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    steps += 1
print("component env steps:", steps, "terminated=", terminated, "truncated=", truncated)
env.close()

## 2) Spec-first composition with SimulationSpec

Define runtime behavior in one place with `SimulationSpec`, including optional observation wrappers.


In [None]:
from plume_nav_sim.compose import SimulationSpec, PolicySpec, WrapperSpec, prepare

# Declare a simple wrapper: core 1-back odor history -> observation becomes [c_prev, c_now]
nback = WrapperSpec(
    spec="plume_nav_sim.observations.history_wrappers:ConcentrationNBackWrapper",
    kwargs={"n": 2},
)

sim = SimulationSpec(
    grid_size=(32, 32),
    source_location=(16, 16),
    max_steps=30,
    goal_radius=5.0,
    plume_sigma=12.0,
    action_type="discrete",
    observation_type="concentration",
    reward_type="step_penalty",
    render=False,
    seed=123,
    policy=PolicySpec(builtin="deterministic_td"),
    observation_wrappers=[nback],
)

env, policy = prepare(sim)
obs, info = env.reset(seed=sim.seed)
print("wrapped observation shape:", getattr(env.observation_space, "shape", None))

total = 0.0
for t in range(10):
    # Policies support either select_action(obs, explore=False) or callable-style
    act = getattr(policy, "select_action", policy)(obs)
    obs, r, done, trunc, info = env.step(act)
    total += float(r)
    if done or trunc:
        break
print("spec episode steps:", t + 1, "total reward=", total)
env.close()

### Notes

- `create_component_environment(...)` assembles a DI-backed environment from curated defaults and options.
- `SimulationSpec` forwards environment parameters to `plume_nav_sim.make_env(...)` and applies wrappers via dotted-path.
- `prepare(sim)` performs a policy vs environment action-space subset check and calls `policy.reset(seed=...)` when available.
