# Testing DQN Replay Buffer

This notebook demonstrates how to use the DQN Replay Buffer interactively.

We'll:
1. Create an environment
2. Create a replay buffer
3. Manually store experiences
4. Sample batches for training
5. Inspect buffer statistics

## Setup

In [13]:
# Add src to path
import sys
sys.path.insert(0, '../src')

import numpy as np
from environment import Config, ConnectFourEnvironment
from utils import DQNReplayBuffer

print("✅ Imports successful!")

✅ Imports successful!


## 1. Create Environment and Buffer

In [15]:
# Create environment
config = Config()
env = ConnectFourEnvironment(config)

# Create replay buffer with capacity of 100 experiences
buffer = DQNReplayBuffer(capacity=100)

print(f"Environment created: {env.rows}x{env.cols} board")
print(f"Replay buffer created: {buffer}")
print(f"Buffer stats: {buffer.get_stats()}")
print()

Environment created: 6x7 board
Replay buffer created: DQNReplayBuffer(size=0, capacity=100)
Buffer stats: {'size': 0, 'capacity': 100, 'utilization': 0.0, 'is_full': False}



## 2. Manually Add First Experience

Let's manually create and store an experience:
1. Get initial state
2. Make a random move
3. Get reward and done signal
4. Add to buffer

In [16]:
# Reset environment to get initial state
state = env.reset()

print("Initial state:")
env.render()
print(f"State shape: {state.shape}")
print(f"Legal moves: {env.get_legal_moves()}")

Initial state:

. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
0 1 2 3 4 5 6

State shape: (3, 6, 7)
Legal moves: [0, 1, 2, 3, 4, 5, 6]


In [17]:
# Make a random move (let's choose column 3)
action = 3
next_state, reward, done = env.play_move(action)

print(f"Action taken: {action}")
print(f"Reward: {reward}")
print(f"Done: {done}")
print("\nBoard after move:")
env.render()

Action taken: 3
Reward: None
Done: False

Board after move:

. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . X . . .
0 1 2 3 4 5 6



In [20]:
# Add experience to buffer
buffer.add(state, action, reward, next_state, done)

print(f"✅ Experience added to buffer!")
print(f"Buffer size: {len(buffer)}")
print(f"Buffer stats: {buffer.get_stats()}")

✅ Experience added to buffer!
Buffer size: 3
Buffer stats: {'size': 3, 'capacity': 100, 'utilization': 0.03, 'is_full': False}


## 3. Add More Experiences

Let's add several more experiences by playing random moves

In [21]:
import random

# Play 10 more random moves
for i in range(10):
    state = env.get_state()
    legal_moves = env.get_legal_moves()
    
    if not legal_moves:
        print("Game over!")
        break
    
    action = random.choice(legal_moves)
    next_state, reward, done = env.play_move(action)
    
    buffer.add(state, action, reward, next_state, done)
    
    print(f"Move {i+2}: Column {action}, Reward: {reward}, Done: {done}")
    
    if done:
        print("\nFinal board:")
        env.render()
        break

print(f"\n✅ Buffer now has {len(buffer)} experiences")
print(f"Buffer stats: {buffer.get_stats()}")

Move 2: Column 1, Reward: None, Done: False
Move 3: Column 1, Reward: None, Done: False
Move 4: Column 6, Reward: None, Done: False
Move 5: Column 4, Reward: None, Done: False
Move 6: Column 2, Reward: None, Done: False
Move 7: Column 4, Reward: None, Done: False
Move 8: Column 4, Reward: None, Done: False
Move 9: Column 3, Reward: None, Done: False
Move 10: Column 4, Reward: None, Done: False
Move 11: Column 6, Reward: None, Done: False

✅ Buffer now has 13 experiences
Buffer stats: {'size': 13, 'capacity': 100, 'utilization': 0.13, 'is_full': False}


## 4. Sample a Batch

Now let's sample a batch of experiences for training

In [22]:
# Check if buffer has enough experiences
batch_size = 4

if buffer.is_ready(batch_size):
    print(f"✅ Buffer ready! Sampling batch of {batch_size}...\n")
    
    states, actions, rewards, next_states, dones = buffer.sample(batch_size)
    
    print(f"Sampled batch:")
    print(f"  States shape: {states.shape}")
    print(f"  Actions shape: {actions.shape}")
    print(f"  Rewards shape: {rewards.shape}")
    print(f"  Next states shape: {next_states.shape}")
    print(f"  Dones shape: {dones.shape}")
    print(f"\nActions in batch: {actions}")
    print(f"Rewards in batch: {rewards}")
    print(f"Dones in batch: {dones}")
else:
    print(f"❌ Buffer not ready. Need {batch_size} experiences, have {len(buffer)}")

✅ Buffer ready! Sampling batch of 4...

Sampled batch:
  States shape: (4, 3, 6, 7)
  Actions shape: (4,)
  Rewards shape: (4,)
  Next states shape: (4, 3, 6, 7)
  Dones shape: (4,)

Actions in batch: [4 4 3 1]
Rewards in batch: [nan nan nan nan]
Dones in batch: [0. 0. 0. 0.]


## 5. Inspect Individual Experiences

Let's look at what's actually stored in the buffer

In [23]:
# Sample one experience to inspect
if len(buffer) > 0:
    states, actions, rewards, next_states, dones = buffer.sample(1)
    
    print("Single experience:")
    print(f"\nAction taken: {actions[0]}")
    print(f"Reward received: {rewards[0]}")
    print(f"Episode done: {bool(dones[0])}")
    
    print(f"\nState before action:")
    print(f"  Shape: {states[0].shape}")
    print(f"  Player 1 pieces: {states[0][0].sum()}")
    print(f"  Player 2 pieces: {states[0][1].sum()}")
    print(f"  Current player: {'Player 1' if states[0][2,0,0] == 1 else 'Player 2'}")
    
    print(f"\nState after action:")
    print(f"  Shape: {next_states[0].shape}")
    print(f"  Player 1 pieces: {next_states[0][0].sum()}")
    print(f"  Player 2 pieces: {next_states[0][1].sum()}")
    print(f"  Current player: {'Player 1' if next_states[0][2,0,0] == 1 else 'Player 2'}")

Single experience:

Action taken: 1
Reward received: nan
Episode done: False

State before action:
  Shape: (3, 6, 7)
  Player 1 pieces: 1.0
  Player 2 pieces: 1.0
  Current player: Player 1

State after action:
  Shape: (3, 6, 7)
  Player 1 pieces: 2.0
  Player 2 pieces: 1.0
  Current player: Player 2


## 6. Fill Buffer and Test Capacity

Let's fill the buffer to capacity and see what happens

In [24]:
# Play multiple games to fill buffer
games_played = 0
experiences_added = len(buffer)

while len(buffer) < buffer.capacity:
    # Reset for new game
    env.reset()
    games_played += 1
    
    # Play until game ends
    for _ in range(50):  # Max 50 moves per game
        state = env.get_state()
        legal_moves = env.get_legal_moves()
        
        if not legal_moves:
            break
        
        action = random.choice(legal_moves)
        next_state, reward, done = env.play_move(action)
        
        buffer.add(state, action, reward, next_state, done)
        experiences_added += 1
        
        if done:
            break
    
    if games_played % 5 == 0:
        stats = buffer.get_stats()
        print(f"Games: {games_played}, Buffer: {stats['size']}/{stats['capacity']} ({stats['utilization']:.1%})")

print(f"\n✅ Buffer filled!")
print(f"Games played: {games_played}")
print(f"Total experiences added: {experiences_added}")
print(f"Final buffer stats: {buffer.get_stats()}")

Games: 5, Buffer: 100/100 (100.0%)

✅ Buffer filled!
Games played: 5
Total experiences added: 106
Final buffer stats: {'size': 100, 'capacity': 100, 'utilization': 1.0, 'is_full': True}


## 7. Test Buffer Overflow

What happens when we add more experiences than capacity?

In [25]:
print(f"Buffer before overflow: {len(buffer)} experiences")

# Add 10 more experiences
env.reset()
for i in range(10):
    state = env.get_state()
    legal_moves = env.get_legal_moves()
    if not legal_moves:
        env.reset()
        state = env.get_state()
        legal_moves = env.get_legal_moves()
    
    action = random.choice(legal_moves)
    next_state, reward, done = env.play_move(action)
    buffer.add(state, action, reward, next_state, done)

print(f"Buffer after adding 10 more: {len(buffer)} experiences")
print(f"\n✅ Buffer automatically removed oldest experiences!")
print(f"Buffer stays at capacity: {buffer.capacity}")

Buffer before overflow: 100 experiences
Buffer after adding 10 more: 100 experiences

✅ Buffer automatically removed oldest experiences!
Buffer stays at capacity: 100


## 8. Sample Multiple Batches

In training, we'll sample many batches. Let's test that.

In [28]:
# Sample 5 different batches
batch_size = 8

print(f"Sampling {5} batches of size {batch_size}:\n")

for i in range(5):
    states, actions, rewards, next_states, dones = buffer.sample(batch_size)
    
    print(f"Batch {i+1}:")
    print(f"  Actions: {actions}")
    print(f"  Rewards: {rewards}")
    print(f"  Avg reward: {rewards.mean():.3f}")
    print()

Sampling 5 batches of size 8:

Batch 1:
  Actions: [0 6 2 3 6 5 0 1]
  Rewards: [nan nan nan nan nan nan nan nan]
  Avg reward: nan

Batch 2:
  Actions: [4 4 3 6 5 3 1 5]
  Rewards: [nan nan nan nan nan nan nan nan]
  Avg reward: nan

Batch 3:
  Actions: [5 6 5 1 4 4 6 0]
  Rewards: [nan nan nan nan nan nan nan nan]
  Avg reward: nan

Batch 4:
  Actions: [2 1 0 6 5 0 2 4]
  Rewards: [nan nan nan nan nan nan nan nan]
  Avg reward: nan

Batch 5:
  Actions: [1 6 4 6 6 3 2 3]
  Rewards: [nan nan nan nan nan nan nan nan]
  Avg reward: nan



## 9. Clear Buffer

Test the clear functionality

In [29]:
print(f"Buffer before clear: {len(buffer)} experiences")

buffer.clear()

print(f"Buffer after clear: {len(buffer)} experiences")
print(f"Buffer stats: {buffer.get_stats()}")
print(f"\n✅ Buffer successfully cleared!")

Buffer before clear: 100 experiences
Buffer after clear: 0 experiences
Buffer stats: {'size': 0, 'capacity': 100, 'utilization': 0.0, 'is_full': False}

✅ Buffer successfully cleared!


## Summary

You've successfully tested the DQN Replay Buffer! Key takeaways:

1. ✅ Buffer stores experiences as tuples: `(state, action, reward, next_state, done)`
2. ✅ `add()` stores new experiences
3. ✅ `sample()` returns random batches for training
4. ✅ Buffer automatically removes oldest when full
5. ✅ `is_ready()` checks if enough data for training
6. ✅ `get_stats()` provides buffer statistics
7. ✅ `clear()` empties the buffer

**Next Steps:**
- Implement DQN Value Network (neural network)
- Implement DQN Agent (uses buffer + network)
- Train DQN agent to play Connect 4!