
<br>
<font>
<div dir=ltr align=center>
<img src="https://cdn.freebiesupply.com/logos/large/2x/sharif-logo-png-transparent.png" width=150 height=150>
<div dir=ltr align=center>
<font color=0F5298 size=7>
    Artificial Intelligence <br>
<font color=2565AE size=5>
    Computer Engineering Department <br>
    Spring 2025<br>
<font color=3C99D size=5>
    Project-Phase2<br>
    Soft Actor Critic<br>
<font color=696880 size=4>
    Ali Najar-Mohmmad Shafizade-Armin Khosravi




In this notebook, we are going to get familiar with SAC algorithm. Soft Actor Critic (SAC) is an off-policy algorithm that maximizes a combination of expected return **and** entropy. Higher entropy results in higher exploration, which is an important concept in Reinforcement Learning.

## 📦 Setup and Dependencies

Install PyBullet for Physics based environments.

In [1]:
!pip install -q pybullet Box2D
!nvidia-smi

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.5/80.5 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.7/3.7 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pybullet (setup.py) ... [?25l[?25hdone
/bin/bash: line 1: nvidia-smi: command not found


Import necessary packages.

In [2]:
import os
import numpy as np
import torch as T
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
from torch.distributions import Normal

import warnings
warnings.filterwarnings('ignore')
from gym.wrappers import RecordVideo
import gym
import pybullet_envs
np.bool8 = np.bool_
from tqdm.notebook import trange
from IPython.display import Video

Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.
Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.
See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.
  return datetime.utcnow().replace(tzinfo=utc)


## 📈 Utility codes

We will use this utility function to visualize the training progress.

In [14]:
def plot_learning_curve(x, filename, save_plot=True):
    avg_x = [np.mean(x[np.max([0, i - 100]):i]) for i in range(len(x))]
    plt.figure(dpi=200)
    plt.title('Learning Curve')
    plt.plot(range(len(x)), x, label='score', alpha=0.3)
    plt.plot(range(len(avg_x)), avg_x, label='average score')
    plt.xlabel('Episode')
    plt.ylabel('Score')
    plt.legend()
    plt.grid()
    if save_plot:
        plt.savefig(filename + '.png')
    plt.show()

This class implements a **Replay Buffer** to store and sample transitions of the form $(s_t, a_t, r_t, s_{t+1}, d_t)$ to break correlation in updates for stability in mini-batch stochastic gradient descent.


In [9]:
class ReplayBuffer:
    def __init__(self, buffer_size, state_dims, action_dims):
        self.buffer_size = buffer_size
        self.state_dims = state_dims
        self.action_dims = action_dims
        self.ptr = 0
        self.is_full = False

        # Initialize buffer arrays
        self.states = np.zeros((self.buffer_size, self.state_dims), dtype=np.float32)
        self.states_ = np.zeros((self.buffer_size, self.state_dims), dtype=np.float32)
        self.actions = np.zeros((self.buffer_size, self.action_dims), dtype=np.float32)
        self.rewards = np.zeros(self.buffer_size, dtype=np.float32)
        self.dones = np.zeros(self.buffer_size, dtype=np.bool_)

    def store_transition(self, state, action, reward, state_, done):
        # Store the transition
        self.states[self.ptr] = state
        self.actions[self.ptr] = action
        self.rewards[self.ptr] = reward
        self.states_[self.ptr] = state_
        self.dones[self.ptr] = done

        self.ptr += 1
        if self.ptr >= self.buffer_size:
            self.is_full = True
            self.ptr = 0

    def load_batch(self, batch_size):
        # Sample a random batch
        max_mem = self.buffer_size if self.is_full else self.ptr
        batch = np.random.choice(max_mem, batch_size, replace=False)

        states = self.states[batch]
        actions = self.actions[batch]
        rewards = self.rewards[batch]
        states_ = self.states_[batch]
        dones = self.dones[batch]

        return states, actions, rewards, states_, dones

## 🧱 Neural Networks

This cell defines three core neural networks used in SAC:

- **Critic Network:**
Estimates the **Q-value function** $ Q(s, a) $. Two critics are used to mitigate overestimation bias.

- **Value Network:**
Estimates the **state value function** $ V(s) $, used to train the actor and as a baseline.

- **Actor Network:**
Outputs the **mean** and **standard deviation** for a Gaussian policy
$
\pi(a|s) = \mathcal{N}(\mu(s), \sigma(s))
$.

In [10]:
class Critic(nn.Module):
    def __init__(self, beta, state_dims, action_dims, fc1_dims, fc2_dims, name='Critic', ckpt_dir='tmp'):
        super(Critic, self).__init__()

        self.beta = beta
        self.state_dims = state_dims
        self.action_dims = action_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.name = name
        self.ckpt_dir = ckpt_dir
        self.ckpt_path = os.path.join(self.ckpt_dir, f'{self.name}_sac.pth')

        self.fc1 = nn.Linear(self.state_dims + self.action_dims, self.fc1_dims)
        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        self.q = nn.Linear(self.fc2_dims, 1)

        self.dropout = nn.Dropout(0.1)

        self.bn1 = nn.BatchNorm1d(self.fc1_dims)
        self.bn2 = nn.BatchNorm1d(self.fc2_dims)

        self.apply(self._init_weights)

        self.optimizer = optim.Adam(self.parameters(), lr=self.beta, weight_decay=1e-4)
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
        self.to(self.device)

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            T.nn.init.xavier_uniform_(module.weight)
            T.nn.init.constant_(module.bias, 0.0)

    def forward(self, state, action):
        state_action = T.cat([state, action], dim=1)
        x = F.relu(self.bn1(self.fc1(state_action)))
        x = self.dropout(x)
        x = F.relu(self.bn2(self.fc2(x)))
        x = self.dropout(x)
        q = self.q(x)
        return q

    def save_checkpoint(self):
        T.save(self.state_dict(), self.ckpt_path)

    def load_checkpoint(self, gpu_to_cpu=False):
        if gpu_to_cpu:
            self.load_state_dict(T.load(self.ckpt_path, map_location=lambda storage, loc: storage))
        else:
            self.load_state_dict(T.load(self.ckpt_path))

class Actor(nn.Module):
    def __init__(self, alpha, state_dims, action_dims, fc1_dims, fc2_dims, max_action, reparam_noise,
                 name='Actor', ckpt_dir='tmp'):
        super(Actor, self).__init__()

        self.alpha = alpha
        self.state_dims = state_dims
        self.action_dims = action_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.max_action = max_action
        self.reparam_noise = reparam_noise
        self.name = name
        self.ckpt_dir = ckpt_dir
        self.ckpt_path = os.path.join(self.ckpt_dir, f'{self.name}_sac.pth')

        self.fc1 = nn.Linear(self.state_dims, self.fc1_dims)
        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        self.mu = nn.Linear(self.fc2_dims, self.action_dims)
        self.sigma = nn.Linear(self.fc2_dims, self.action_dims)

        self.dropout = nn.Dropout(0.1)
        self.bn1 = nn.BatchNorm1d(self.fc1_dims)
        self.bn2 = nn.BatchNorm1d(self.fc2_dims)

        self.apply(self._init_weights)

        self.optimizer = optim.Adam(self.parameters(), lr=self.alpha, weight_decay=1e-4)
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
        self.to(self.device)

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            T.nn.init.xavier_uniform_(module.weight)
            T.nn.init.constant_(module.bias, 0.0)

    def forward(self, state):
        x = F.relu(self.bn1(self.fc1(state)))
        x = self.dropout(x)
        x = F.relu(self.bn2(self.fc2(x)))
        x = self.dropout(x)

        mu = self.mu(x)
        sigma = T.clamp(self.sigma(x), min=self.reparam_noise, max=1)

        return mu, sigma

    def sample_normal(self, state, reparameterize=True):
        mu, sigma = self.forward(state)
        probabilities = Normal(mu, sigma)

        if reparameterize:
            actions = probabilities.rsample()
        else:
            actions = probabilities.sample()

        action = T.tanh(actions) * self.max_action
        log_probs = probabilities.log_prob(actions)
        log_probs -= T.log(1 - action.pow(2) + self.reparam_noise)
        log_probs = log_probs.sum(1, keepdim=True)

        return action, log_probs

    def save_checkpoint(self):
        T.save(self.state_dict(), self.ckpt_path)

    def load_checkpoint(self, gpu_to_cpu=False):
        if gpu_to_cpu:
            self.load_state_dict(T.load(self.ckpt_path, map_location=lambda storage, loc: storage))
        else:
            self.load_state_dict(T.load(self.ckpt_path))


class Value(nn.Module):
    def __init__(self, beta, state_dims, fc1_dims, fc2_dims, name='Value', ckpt_dir='tmp'):
        super(Value, self).__init__()

        self.beta = beta
        self.state_dims = state_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.name = name
        self.ckpt_dir = ckpt_dir
        self.ckpt_path = os.path.join(self.ckpt_dir, f'{self.name}_sac.pth')

        self.fc1 = nn.Linear(self.state_dims, self.fc1_dims)
        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        self.v = nn.Linear(self.fc2_dims, 1)

        self.dropout = nn.Dropout(0.1)

        self.apply(self._init_weights)

        self.optimizer = optim.Adam(self.parameters(), lr=self.beta, weight_decay=1e-4)
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
        self.to(self.device)

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            T.nn.init.xavier_uniform_(module.weight)
            T.nn.init.constant_(module.bias, 0.0)

    def forward(self, state):
        x = F.relu(self.fc1(state))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        v = self.v(x)
        return v

    def save_checkpoint(self):
        T.save(self.state_dict(), self.ckpt_path)

    def load_checkpoint(self, gpu_to_cpu=False):
        if gpu_to_cpu:
            self.load_state_dict(T.load(self.ckpt_path, map_location=lambda storage, loc: storage))
        else:
            self.load_state_dict(T.load(self.ckpt_path))

## 🤖 Agent Implementation

This class encapsulates the full logic of the SAC agent. In general, the learning process uses entropy-regularized policy gradients $J_\pi = \mathbb{E}_{s_t \sim D, a_t \sim \pi} \left[ \alpha \log(\pi(a_t|s_t)) - Q(s_t, a_t) \right]$ with soft target updates $\theta_{\text{target}} \leftarrow \tau \theta + (1 - \tau)\theta_{\text{target}}$.

In [11]:
class Agent:
    def __init__(self, gamma, alpha, beta, state_dims, action_dims, max_action, fc1_dim, fc2_dim,
                 memory_size, batch_size, tau, update_period, reward_scale, warmup, reparam_noise_lim,
                 name, ckpt_dir='tmp', use_fixed_alpha=False, fixed_alpha_value=0.1):

        self.gamma = gamma
        self.alpha = alpha
        self.beta = beta
        self.state_dims = state_dims
        self.action_dims = action_dims
        self.max_action = max_action
        self.fc1_dim = fc1_dim
        self.fc2_dim = fc2_dim
        self.memory_size = memory_size
        self.batch_size = batch_size
        self.tau = tau
        self.update_period = update_period
        self.reward_scale = reward_scale
        self.warmup = warmup
        self.reparam_noise_lim = reparam_noise_lim
        self.name = name
        self.ckpt_dir = ckpt_dir
        self.use_fixed_alpha = use_fixed_alpha
        self.fixed_alpha_value = fixed_alpha_value

        model_name = f'{name}__' \
                     f'gamma_{gamma}__' \
                     f'alpha_{alpha}__' \
                     f'beta_{beta}__' \
                     f'fc1_{fc1_dim}__' \
                     f'fc2_{fc2_dim}__' \
                     f'bs_{batch_size}__' \
                     f'buffer_{memory_size}__' \
                     f'update_period_{update_period}__' \
                     f'tau_{tau}__'

        self.model_name = model_name
        self.learn_iter = 0
        self.step_count = 0
        self.full_path = os.path.join(self.ckpt_dir, self.model_name)

        # Initialize replay buffer
        self.memory = ReplayBuffer(memory_size, state_dims, action_dims)

        # Initialize networks
        self.actor = Actor(alpha, state_dims, action_dims, fc1_dim, fc2_dim,
                          max_action, reparam_noise_lim, name='Actor', ckpt_dir=ckpt_dir)
        self.critic_1 = Critic(beta, state_dims, action_dims, fc1_dim, fc2_dim,
                              name='Critic1', ckpt_dir=ckpt_dir)
        self.critic_2 = Critic(beta, state_dims, action_dims, fc1_dim, fc2_dim,
                              name='Critic2', ckpt_dir=ckpt_dir)
        self.value = Value(beta, state_dims, fc1_dim, fc2_dim,
                          name='Value', ckpt_dir=ckpt_dir)
        self.target_value = Value(beta, state_dims, fc1_dim, fc2_dim,
                                 name='TargetValue', ckpt_dir=ckpt_dir)

        # Learnable alpha (temperature parameter) یا fixed alpha
        if not self.use_fixed_alpha:
            self.target_entropy = -action_dims  # Heuristic target entropy
            self.log_alpha = T.zeros(1, requires_grad=True, device=self.actor.device)
            self.alpha_optimizer = optim.Adam([self.log_alpha], lr=alpha)
        else:
            self.fixed_alpha = T.tensor(fixed_alpha_value, device=self.actor.device)

        # Learning rate schedulers
        self.actor_scheduler = optim.lr_scheduler.StepLR(self.actor.optimizer, step_size=1000, gamma=0.9)
        self.critic1_scheduler = optim.lr_scheduler.StepLR(self.critic_1.optimizer, step_size=1000, gamma=0.9)
        self.critic2_scheduler = optim.lr_scheduler.StepLR(self.critic_2.optimizer, step_size=1000, gamma=0.9)
        self.value_scheduler = optim.lr_scheduler.StepLR(self.value.optimizer, step_size=1000, gamma=0.9)

        # Initialize target value network parameters
        self.update_parameters(tau=1.0)

        self.best_score = float('-inf')

    def choose_action(self, state, deterministic=False, reparameterize=True):
        state = T.tensor([state], dtype=T.float).to(self.actor.device)

        if deterministic:
            mu, _ = self.actor.forward(state)
            action = T.tanh(mu) * self.max_action
        else:
            action, _ = self.actor.sample_normal(state, reparameterize=reparameterize)

        return action.cpu().data.numpy().flatten()

    def store_transition(self, state, action, reward, state_, done):
        self.memory.store_transition(state, action, reward, state_, done)

    def load_batch(self):
        states, actions, rewards, states_, done = self.memory.load_batch(self.batch_size)

        states = T.tensor(states, dtype=T.float).to(self.actor.device)
        actions = T.tensor(actions, dtype=T.float).to(self.actor.device)
        rewards = T.tensor(rewards, dtype=T.float).to(self.actor.device)
        states_ = T.tensor(states_, dtype=T.float).to(self.actor.device)
        done = T.tensor(done).to(self.actor.device)

        return states, actions, rewards, states_, done

    def update_parameters(self, tau=None):
        if tau is None:
            tau = self.tau

        target_value_params = self.target_value.named_parameters()
        value_params = self.value.named_parameters()

        target_value_state_dict = dict(target_value_params)
        value_state_dict = dict(value_params)

        for name in value_state_dict:
            value_state_dict[name] = tau * value_state_dict[name].clone() + \
                                   (1 - tau) * target_value_state_dict[name].clone()

        self.target_value.load_state_dict(value_state_dict)

    def save_model(self):
        print('... saving checkpoint ...')
        self.actor.save_checkpoint()
        self.critic_1.save_checkpoint()
        self.critic_2.save_checkpoint()
        self.value.save_checkpoint()
        self.target_value.save_checkpoint()

    def load_model(self, gpu_to_cpu=False):
        print('... loading checkpoint ...')
        self.actor.load_checkpoint(gpu_to_cpu=gpu_to_cpu)
        self.critic_1.load_checkpoint(gpu_to_cpu=gpu_to_cpu)
        self.critic_2.load_checkpoint(gpu_to_cpu=gpu_to_cpu)
        self.value.load_checkpoint(gpu_to_cpu=gpu_to_cpu)
        self.target_value.load_checkpoint(gpu_to_cpu=gpu_to_cpu)

    def learn(self):
        # Skip learning during warm-up period or insufficient samples
        if self.memory.ptr < self.warmup or self.memory.ptr < self.batch_size:
            return

        # Load batch
        states, actions, rewards, states_, done = self.load_batch()

        # Current alpha value
        if self.use_fixed_alpha:
            alpha = self.fixed_alpha
        else:
            alpha = T.exp(self.log_alpha).clamp(min=0.01, max=1.0)

        # === VALUE LOSS ===
        value = self.value(states).view(-1)
        value_ = self.target_value(states_).view(-1)
        value_[done] = 0.0

        actions_new, log_probs = self.actor.sample_normal(states, reparameterize=False)
        log_probs = log_probs.view(-1)
        q1_new_policy = self.critic_1.forward(states, actions_new)
        q2_new_policy = self.critic_2.forward(states, actions_new)
        critic_value = T.min(q1_new_policy, q2_new_policy)
        critic_value = critic_value.view(-1)

        # Value target with entropy regularization
        target_value = critic_value - alpha * log_probs

        value_loss = 0.5 * F.mse_loss(value, target_value.detach())

        self.value.optimizer.zero_grad()
        value_loss.backward()
        T.nn.utils.clip_grad_norm_(self.value.parameters(), max_norm=1.0)
        self.value.optimizer.step()

        # === ACTOR LOSS ===
        actions_new, log_probs = self.actor.sample_normal(states, reparameterize=True)
        log_probs = log_probs.view(-1)

        q1_new_policy = self.critic_1.forward(states, actions_new)
        q2_new_policy = self.critic_2.forward(states, actions_new)
        critic_value = T.min(q1_new_policy, q2_new_policy)
        critic_value = critic_value.view(-1)

        # Actor loss with entropy regularization
        actor_loss = alpha * log_probs - critic_value
        actor_loss = T.mean(actor_loss)

        self.actor.optimizer.zero_grad()
        actor_loss.backward()
        T.nn.utils.clip_grad_norm_(self.actor.parameters(), max_norm=1.0)
        self.actor.optimizer.step()

        # === ALPHA LOSS (اگر learnable باشه) ===
        if not self.use_fixed_alpha and self.learn_iter % 5 == 0:
            alpha_loss = -(self.log_alpha * (log_probs + self.target_entropy).detach())
            alpha_loss = T.mean(alpha_loss)

            self.alpha_optimizer.zero_grad()
            alpha_loss.backward()
            T.nn.utils.clip_grad_norm_([self.log_alpha], max_norm=0.5)
            self.alpha_optimizer.step()

        # === CRITIC LOSS ===
        q_hat = self.reward_scale * rewards + self.gamma * value_
        q1_old_policy = self.critic_1.forward(states, actions).view(-1)
        q2_old_policy = self.critic_2.forward(states, actions).view(-1)

        critic_1_loss = 0.5 * F.mse_loss(q1_old_policy, q_hat.detach())
        critic_2_loss = 0.5 * F.mse_loss(q2_old_policy, q_hat.detach())

        self.critic_1.optimizer.zero_grad()
        critic_1_loss.backward()
        T.nn.utils.clip_grad_norm_(self.critic_1.parameters(), max_norm=1.0)
        self.critic_1.optimizer.step()

        self.critic_2.optimizer.zero_grad()
        critic_2_loss.backward()
        T.nn.utils.clip_grad_norm_(self.critic_2.parameters(), max_norm=1.0)
        self.critic_2.optimizer.step()

        # === TARGET NETWORK UPDATE ===
        if self.learn_iter % self.update_period == 0:
            self.update_parameters()

        # Learning rate scheduling
        if self.learn_iter % 1000 == 0:
            self.actor_scheduler.step()
            self.critic1_scheduler.step()
            self.critic2_scheduler.step()
            self.value_scheduler.step()

        self.learn_iter += 1

## ⚙️ Training Configuration

Set up your training parameters. `HalfCheetahBulletEnv-v0` is a continuous control task where the agent must learn to run using articulated legs.


In [16]:
env_name = 'HalfCheetahBulletEnv-v0'
dir = 'tmp'
n_games = 100

gamma = 0.99
alpha = 1e-4
beta = 3e-4
fc1_dim = 256
fc2_dim = 256
memory_size = 200000
batch_size = 256
tau = 0.01
update_period = 2
reward_scale = 2.0
warmup = 2000
reparam_noise_lim = 1e-6
record_video = True
learning_frequency = 2
use_fixed_alpha = False
fixed_alpha_value = 0.1

## 🚀 Training Loop

For each episode, interact with the environment to collect transitions, then update the SAC networks and save the best model.

After training, a learning curve is plotted to visualize convergence and performance stability.

In [None]:
# Environment setup
env = gym.make(env_name)
dir_path = os.path.join(dir, env_name)
os.makedirs(dir_path, exist_ok=True)

if record_video:
    env = RecordVideo(env, video_folder=os.path.join(dir_path, 'videos'),
                      episode_trigger=lambda ep: ep == n_games - 1)

# Get environment dimensions
state_dims = env.observation_space.shape[0]
action_dims = env.action_space.shape[0]
max_action = float(env.action_space.high[0])

# Initialize Agent
agent = Agent(gamma=gamma, alpha=alpha, beta=beta,
              state_dims=state_dims, action_dims=action_dims,
              max_action=max_action, fc1_dim=fc1_dim, fc2_dim=fc2_dim,
              memory_size=memory_size, batch_size=batch_size, tau=tau,
              update_period=update_period, reward_scale=reward_scale,
              warmup=warmup, reparam_noise_lim=reparam_noise_lim,
              name='SAC_HalfCheetah', ckpt_dir=dir_path)

# Initialize performance tracking variables
scores = []
best_score = -np.inf
avg_score = 0

for game in trange(n_games):
    # Reset environment and initialize variables
    state = env.reset()
    if isinstance(state, tuple):
        state = state[0]
    done = False
    score = 0

    # Interact with environment until done
    while not done:
        action = agent.choose_action(state)
        next_state, reward, done, info = env.step(action)

        if isinstance(next_state, tuple):
            next_state = next_state[0]
        if isinstance(done, tuple):
            done = done[0]

        agent.store_transition(state, action, reward, next_state, done)
        agent.learn()

        score += reward
        state = next_state

    # Track score and average score
    scores.append(score)
    avg_score = np.mean(scores[-100:])

    print(f'| Game: {game:6.0f} | Score: {score:10.2f} | Best score: {best_score:10.2f} | '
          f'Avg score {avg_score:10.2f} | Learning iter: {agent.learn_iter:10.0f} |')

    # Save model if better
    if avg_score > best_score:
        best_score = avg_score
        agent.save_model()
        print(f'New best average score: {best_score:.2f}! Model saved.')

env.close()
plot_learning_curve(scores, agent.full_path)

## 🎥 Visualize Agent Behavior

This is the last episode recorded video of the trained agent interacting with the environment in training process.

In [None]:
Video(f"/content/tmp/HalfCheetahBulletEnv-v0/videos/rl-video-episode-{n_games-1}.mp4", embed=True, width=600)