# Robotic Arm Control via Deep Reinforcement Learning (SAC + PyBullet + panda-gym)
This notebook demonstrates how to train a Soft Actor‑Critic (SAC) agent to control a Panda robotic arm in a PyBullet simulation via the **panda‑gym** environments.

**Sections**
1. Install & import libraries
2. Create the vectorised simulation environment
3. Train an SAC agent (with TensorBoard logging)
4. Evaluate & (optionally) record a demo video
5. Inspect results in TensorBoard


## 1  Install & import libraries

In [None]:
%pip install -q --upgrade pip
%pip install -q gymnasium==0.29.1 panda-gym==3.0.1 pybullet==3.2.6 stable-baselines3==2.4.0 sb3-contrib==2.4.0 tensorboard
# After the first run you can comment‑out the cell above to avoid re‑installing.
import gymnasium as gym
from stable_baselines3 import SAC
from stable_baselines3.common.vec_env import make_vec_env, SubprocVecEnv
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.evaluation import evaluate_policy
import os, datetime, getpass, shutil
print('All libraries imported ✔️')

## 2  Create the simulation environment

In [None]:
# Choose a task (dense reward versions learn faster)
ENV_ID = 'PandaReachDense-v3'  # e.g. PandaSlideDense-v3, PandaPickAndPlaceDense-v3, …
N_ENVS = 4                     # Parallel environments (adjust to your CPU cores)
SEED   = 42

log_dir = './logs/'
os.makedirs(log_dir, exist_ok=True)

vec_env = make_vec_env(
    lambda: Monitor(gym.make(ENV_ID, render_mode=None)),
    n_envs=N_ENVS,
    seed=SEED,
    vec_env_cls=SubprocVecEnv,   # runs each env in its own process
)
vec_env.reset()
print(f'Vectorised environment `{ENV_ID}` with {N_ENVS} workers ready ✔️')

## 3  Train the SAC agent

In [None]:
MODEL_PATH = './sac_panda_reach'
TIMESTEPS  = 100_000          # adjust as needed (≥1 M for best performance)

model = SAC(
    policy='MlpPolicy',
    env=vec_env,
    verbose=1,
    tensorboard_log=log_dir,
    seed=SEED,
)

model.learn(total_timesteps=TIMESTEPS, progress_bar=True)
model.save(MODEL_PATH)
print(f'Model saved to {MODEL_PATH} ✔️')

## 4  Evaluate the trained agent

In [None]:
eval_env = gym.make(ENV_ID, render_mode='human')  # change to 'rgb_array' for headless servers
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f'Mean reward over 10 episodes: {mean_reward:.2f} ± {std_reward:.2f}')

# Optional: record a short video
# from stable_baselines3.common.vec_env import VecVideoRecorder
# video_folder = './videos/'
# os.makedirs(video_folder, exist_ok=True)
# video_env = VecVideoRecorder(eval_env, video_folder,
#                              record_video_trigger=lambda ep: ep == 0,
#                              video_length=500, name_prefix='sac-demo')
# video_env.reset()
# for _ in range(500):
#     action, _ = model.predict(video_env.reset()[0], deterministic=True)
#     video_env.step(action)
# video_env.close()

## 5  Inspect results in TensorBoard

In [None]:
# In Jupyter, run the two magic commands below **in a new cell**
# %load_ext tensorboard
# %tensorboard --logdir ./logs


## 6  Shut down & clean up

In [None]:
vec_env.close()
print('Training session finished — environment closed.')