# CVD Reactor Environment Analysis

This notebook analyzes the `CVDReactorEnv` and DQN training results for a basic Arrhenius based CVD environment and RL Training with DQN. We’ll evaluate performance, visualize trajectories, and experiment with hyperparameters to optimize the agent and build intuition for CVD dynamics.

## Objectives

- Compare random actions vs. DQN performance (thickness error).
- Visualize temperature (T) and flow rate (F) effects on deposition.
- Test DQN hyperparameters for better control.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from cvd_env.reactor_env import CVDReactorEnv
from agents.dqn_trainer import train_dqn, evaluate_dqn
from utils.plots import plot_trajectories
from utils.logger import setup_logging, get_logger


setup_logging()

logger = get_logger(__name__)

## 1. Run Random Actions (Baseline)

Let’s run the environment with random actions to establish a baseline, as done in `reactor_env.py`’s `__main__` block. This simulates a CVD process without optimization, mimicking manual control.


In [None]:
env = CVDReactorEnv(target_thickness=100.0, max_steps=50)

obs, _ = env.reset()
done = False
random_errors = []
while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(int(action))
    random_errors.append(info["thickness_error"])
    env.render()
    done = terminated or truncated

logger.info(
    "Random Policy - Final thickness: %.2f nm, Error: %.2f nm",
    obs[0],
    info["thickness_error"],
)

env.plot()

## 2. Train and Evaluate DQN

Train a DQN agent for 5000 timesteps (default) and evaluate its performance, as in `dqn_trainer.py`. This shows how RL optimizes the CVD process.


In [None]:
env = CVDReactorEnv(target_thickness=100.0, max_steps=50)
model = train_dqn(env, total_timesteps=5000, learning_rate=1e-3)

evaluate_dqn(model, env, n_episodes=3)

## 3. Compare Random vs. DQN

Let’s quantify DQN’s improvement by comparing thickness errors across multiple runs.


In [None]:
n_episodes = 10
random_errors = []
for _ in range(n_episodes):
    obs, _ = env.reset()
    done = False
    while not done:
        action = env.action_space.sample()
        obs, _, terminated, truncated, info = env.step(int(action))
        if terminated or truncated:
            random_errors.append(info["thickness_error"])
            break


dqn_errors = []
for _ in range(n_episodes):
    obs, _ = env.reset()
    done = False
    while not done:
        action, _ = model.predict(obs)
        obs, _, terminated, truncated, info = env.step(int(action))
        if terminated or truncated:
            dqn_errors.append(info["thickness_error"])
            break


logger.info(
    "Random Policy - Mean Error: %.2f nm, Std: %.2f nm",
    np.mean(random_errors),
    np.std(random_errors),
)
logger.info(
    "DQN Policy - Mean Error: %.2f nm, Std: %.2f nm",
    np.mean(dqn_errors),
    np.std(dqn_errors),
)


plt.figure(figsize=(8, 5))
plt.boxplot([random_errors, dqn_errors], labels=["Random", "DQN"])
plt.ylabel("Thickness Error (nm)")
plt.title("Random vs. DQN Policy Error Comparison")
plt.grid(True)
plt.show()

## 4. Explore Arrhenius Dynamics

Analyze how temperature (T) and flow rate (F) affect the deposition rate $$ r = k_0 \cdot F^{0.5} \cdot \exp(-E_a / (R \cdot T)) $$.


In [None]:
# Deposition rate function
def get_deposition_rate(T, F, k0=1e5, Ea=50000, R=8.314, alpha=0.5):
    return k0 * (F**alpha) * np.exp(-Ea / (R * T))


# Analyze rate vs. T and F
T_values = np.linspace(500, 1000, 100)
F_values = np.linspace(10, 100, 100)
T_grid, F_grid = np.meshgrid(T_values, F_values)
rates = get_deposition_rate(T_grid, F_grid)

plt.figure(figsize=(10, 8))
plt.contourf(T_grid, F_grid, rates, cmap="viridis")
plt.colorbar(label="Deposition Rate (nm/s)")
plt.xlabel("Temperature (K)")
plt.ylabel("Flow Rate (sccm)")
plt.title("Deposition Rate as a Function of T and F")
plt.show()

## 5. Experiment with Hyperparameters

Test different `total_timesteps` and `learning_rate` to optimize DQN performance.


In [None]:
timesteps_list = [5000, 10000, 20000]
errors_by_timesteps = []

for timesteps in timesteps_list:
    model = train_dqn(env, total_timesteps=timesteps, learning_rate=1e-3)
    errors = []
    for _ in range(5):
        obs, _ = env.reset()
        done = False
        while not done:
            action, _ = model.predict(obs)
            obs, _, terminated, truncated, info = env.step(int(action))
            if terminated or truncated:
                errors.append(info["thickness_error"])
                break
    errors_by_timesteps.append(errors)

plt.figure(figsize=(8, 5))
plt.boxplot(errors_by_timesteps, labels=[str(t) for t in timesteps_list])
plt.xlabel("Total Timesteps")
plt.ylabel("Thickness Error (nm)")
plt.title("DQN Performance vs. Training Timesteps")
plt.grid(True)
plt.show()

## 6. Conclusions

- **Random vs. DQN**: DQN should show lower mean error and variance compared to random actions, demonstrating RL’s ability to optimize CVD control.
- **Arrhenius Insights**: The heatmap shows T’s exponential impact on deposition rate, guiding control strategies.
- **Hyperparameter Tuning**: More timesteps may reduce errors but increase training time.
