# Reinforcement Learning Agent for Hangman
## Part 2: Q-Learning with HMM Integration

This notebook trains a Q-Learning agent that uses the HMM model for intelligent letter guessing in Hangman.

## Setup and Imports

In [1]:
import re
import numpy as np
import pandas as pd
from collections import Counter, defaultdict
import random
import pickle
from tqdm import tqdm
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')  # For saving plots without display

print("="*70)
print("Q-Learning Agent with HMM Integration")
print("="*70 + "\n")

Q-Learning Agent with HMM Integration



## Upload Files (for Colab)

**For Google Colab:** Upload the following files:
1. `hmm_model.pkl` (generated from hmm.ipynb)
2. `corpus.txt`
3. `test.txt`

In [2]:
from google.colab import files
print("Please upload hmm_model.pkl, corpus.txt, and test.txt")
uploaded = files.upload()

Please upload hmm_model.pkl, corpus.txt, and test.txt


Saving corpus.txt to corpus.txt
Saving test.txt to test.txt


## Load Data

In [3]:
print("Loading data...")

# Try to load from current directory first (Colab), then from Data/ folder (local)
try:
    with open('corpus.txt', 'r', encoding='utf-8', errors='ignore') as f:
        corpus = [line.strip().lower() for line in f if line.strip().lower() and re.match(r'^[a-z]+$', line.strip().lower())]
    with open('test.txt', 'r', encoding='utf-8', errors='ignore') as f:
        test_words = [line.strip().lower() for line in f if line.strip().lower() and re.match(r'^[a-z]+$', line.strip().lower())]
except FileNotFoundError:
    # Try Data/ folder
    with open('Data/corpus.txt', 'r', encoding='utf-8', errors='ignore') as f:
        corpus = [line.strip().lower() for line in f if line.strip().lower() and re.match(r'^[a-z]+$', line.strip().lower())]
    with open('Data/test.txt', 'r', encoding='utf-8', errors='ignore') as f:
        test_words = [line.strip().lower() for line in f if line.strip().lower() and re.match(r'^[a-z]+$', line.strip().lower())]

print(f"‚úì Corpus: {len(corpus)} words")
print(f"‚úì Test set: {len(test_words)} words")

Loading data...
‚úì Corpus: 49979 words
‚úì Test set: 2000 words


## Define HMM Class

We need to define the `TrueHMM` class before loading the pickled model.
This class definition must match the one used in `hmm.ipynb`.

In [4]:
class TrueHMM:
    """
    Hidden Markov Model for Hangman using Forward-Backward Algorithm

    This class definition is needed to unpickle the trained model.
    The actual model will be loaded from hmm_model.pkl.
    """

    def __init__(self, corpus):
        # This won't be called when unpickling
        pass

    def forward(self, masked):
        """Forward algorithm: Œ±(t, letter)"""
        length = len(masked)
        if length not in self.emissions:
            return None

        fwd = []

        # t=0
        alpha = {}
        for l in 'abcdefghijklmnopqrstuvwxyz':
            if masked[0] == '_':
                alpha[l] = self.initial[l] * self.emissions[length][0][l]
            elif masked[0] == l:
                alpha[l] = self.initial[l]
            else:
                alpha[l] = 0.0
        fwd.append(alpha)

        # t=1..length-1
        for t in range(1, length):
            alpha = {}
            for curr in 'abcdefghijklmnopqrstuvwxyz':
                s = sum(fwd[t-1][prev] * self.transitions.get(prev, self.default_trans).get(curr, 1e-10)
                       for prev in 'abcdefghijklmnopqrstuvwxyz')

                if masked[t] == '_':
                    alpha[curr] = s * self.emissions[length][t][curr]
                elif masked[t] == curr:
                    alpha[curr] = s
                else:
                    alpha[curr] = 0.0
            fwd.append(alpha)

        return fwd

    def backward(self, masked):
        """Backward algorithm: Œ≤(t, letter)"""
        length = len(masked)
        if length not in self.emissions:
            return None

        bwd = [None] * length
        bwd[-1] = {l: 1.0 for l in 'abcdefghijklmnopqrstuvwxyz'}

        for t in range(length-2, -1, -1):
            beta = {}
            for curr in 'abcdefghijklmnopqrstuvwxyz':
                s = 0.0
                for nxt in 'abcdefghijklmnopqrstuvwxyz':
                    trans = self.transitions.get(curr, self.default_trans).get(nxt, 1e-10)

                    if masked[t+1] == '_':
                        emit = self.emissions[length][t+1][nxt]
                        s += trans * emit * bwd[t+1][nxt]
                    elif masked[t+1] == nxt:
                        s += trans * bwd[t+1][nxt]

                beta[curr] = s
            bwd[t] = beta

        return bwd

    def get_letter_probs(self, masked, guessed):
        """Forward-Backward: Œ≥(t, l) = Œ±(t, l) √ó Œ≤(t, l)"""
        fwd = self.forward(masked)
        bwd = self.backward(masked)

        if fwd is None or bwd is None:
            # Fallback
            probs = np.array([self.global_freq.get(chr(ord('a')+i), 1) for i in range(26)], dtype=float)
            probs = probs / (probs.sum() + 1e-10)
        else:
            probs = np.zeros(26)
            for t in range(len(masked)):
                if masked[t] == '_':
                    for i, l in enumerate('abcdefghijklmnopqrstuvwxyz'):
                        probs[i] += fwd[t][l] * bwd[t][l]

        # Mask guessed
        for l in guessed:
            probs[ord(l) - ord('a')] = 0

        total = probs.sum()
        return probs / (total + 1e-10) if total > 0 else np.ones(26) / 26

    def get_candidates(self, masked, guessed):
        """Pattern matching for candidate words"""
        length = len(masked)
        if length not in self.by_len:
            return []

        candidates = []
        for word in self.by_len[length]:
            if all((m == '_' or word[i] == m) for i, m in enumerate(masked)):
                if not any(g in word for g in guessed if g not in masked):
                    candidates.append(word)
        return candidates

print("‚úì HMM class defined")

‚úì HMM class defined


## Load Pre-trained HMM Model

In [5]:
print("="*70)
print("LOADING PRE-TRAINED HMM MODEL")
print("="*70 + "\n")

with open('/content/hmm_model.pkl', 'rb') as f:
    hmm = pickle.load(f)

print("‚úì HMM model loaded successfully!")
print(f"  - Word lengths covered: {len(hmm.by_len)}")
print(f"  - Emission matrices: {len(hmm.emissions)}")
print(f"  - Transition model size: {len(hmm.transitions)}")
print()

LOADING PRE-TRAINED HMM MODEL

‚úì HMM model loaded successfully!
  - Word lengths covered: 24
  - Emission matrices: 24
  - Transition model size: 26



## Game Environment

Hangman game implementation:
- **State**: Masked word, guessed letters, lives remaining
- **Actions**: Guess a letter (a-z)
- **Dynamics**: Correct guess reveals letter, wrong guess loses a life
- **Terminal**: Lives = 0 (lose) or all letters revealed (win)

In [6]:
class Game:
    """
    Hangman Game Environment

    State space:
    - Current masked word (e.g., "_a__")
    - Set of guessed letters
    - Remaining lives (starts at 6)

    Actions:
    - Guess a letter from a-z

    Dynamics:
    - If letter is in word: reveal it
    - If letter not in word: lose 1 life
    - If letter already guessed: count as repeated (no life lost)

    Terminal states:
    - Lives = 0 (lose)
    - All letters revealed (win)
    """

    def __init__(self, word):
        self.word = word
        self.guessed = set()
        self.lives = 6
        self.wrong = 0
        self.repeated = 0

    def guess(self, letter):
        if letter in self.guessed:
            self.repeated += 1
            return None
        self.guessed.add(letter)
        if letter not in self.word:
            self.lives -= 1
            self.wrong += 1
        return letter in self.word

    def done(self):
        return self.lives <= 0 or all(c in self.guessed for c in self.word)

    def won(self):
        return all(c in self.guessed for c in self.word)

    def get_masked(self):
        return ''.join('_' if c not in self.guessed else c for c in self.word)

print("‚úì Game environment defined")

‚úì Game environment defined


## Q-Learning Agent

The agent uses a hybrid strategy:

1. **Candidate Word Frequency** (highest priority)
2. **HMM Forward-Backward Probabilities**
3. **Q-Table Values** (with HMM bonus)
4. **Epsilon-Greedy Exploration** (training only)

**Reward Structure:**
- Correct guess: +8 + 3√ó(letters revealed)
- Win: +100 + 15√ó(lives remaining)
- Wrong guess: -12
- Repeated guess: -3

In [7]:
class OptimizedAgent:
    """
    Q-Learning Agent for Hangman

    State representation:
    - Word length
    - Number of blanks remaining
    - Lives remaining
    - Number of letters guessed

    Action space:
    - 26 actions (letters a-z)

    Q-Learning update:
    Q(s,a) ‚Üê Q(s,a) + Œ±[r + Œ≥¬∑max_a' Q(s',a') - Q(s,a)]

    Strategy hierarchy:
    1. Candidate word frequency (highest priority during testing)
    2. HMM forward-backward probabilities
    3. Q-table values (with HMM bonus)
    4. Epsilon-greedy exploration (training only)

    Reward structure:
    - Correct guess: +8 + 3√ó(letters revealed)
    - Win: +100 + 15√ó(lives remaining)
    - Wrong guess: -12
    - Repeated guess: -3
    """

    def __init__(self, hmm):
        self.hmm = hmm
        self.q_table = defaultdict(lambda: defaultdict(float))
        self.alpha = 0.2  # Learning rate
        self.gamma = 0.95  # Discount factor
        self.eps = 0.5  # Exploration rate

        print("‚úì Q-Learning Agent initialized")
        print(f"  - Learning rate (Œ±): {self.alpha}")
        print(f"  - Discount factor (Œ≥): {self.gamma}")
        print(f"  - Exploration (Œµ): {self.eps} ‚Üí 0.05")
        print(f"  - State space: (length, blanks, lives, guessed)")
        print(f"  - Action space: 26 letters\n")

    def state_key(self, game):
        """Create a tuple representation of game state for Q-table indexing"""
        masked = game.get_masked()
        return (len(game.word), masked.count('_'), game.lives, len(game.guessed))

    def choose_letter(self, game, training=True):
        """
        Select next letter to guess using hybrid strategy

        Priority order:
        1. Candidate frequency (if candidates exist)
        2. HMM forward-backward probabilities
        3. Q-values with exploration
        """
        available = set('abcdefghijklmnopqrstuvwxyz') - game.guessed
        if not available:
            return None

        masked = game.get_masked()

        # STRATEGY 1: Candidate frequency (HIGHEST PRIORITY)
        candidates = self.hmm.get_candidates(masked, game.guessed)
        if candidates:
            letter_freq = Counter()
            for word in candidates[:100]:  # Limit for speed
                for l in set(word) - game.guessed:
                    letter_freq[l] += 1

            if letter_freq:
                # During testing, use best candidate letter
                if not training:
                    return letter_freq.most_common(1)[0][0]

                # During training, sometimes explore
                if random.random() > self.eps:
                    return letter_freq.most_common(1)[0][0]

        # STRATEGY 2: HMM Forward-Backward probabilities
        hmm_probs = self.hmm.get_letter_probs(masked, game.guessed)
        hmm_best = chr(ord('a') + np.argmax(hmm_probs))

        if not training:
            return hmm_best if hmm_best in available else random.choice(list(available))

        # STRATEGY 3: Q-Learning (during training only)
        if random.random() < self.eps:
            return random.choice(list(available))

        state = self.state_key(game)
        best_letter = hmm_best
        best_q = self.q_table[state].get(hmm_best, 0) + 10  # HMM bonus

        for l in available:
            q_val = self.q_table[state][l]
            if q_val > best_q:
                best_q = q_val
                best_letter = l

        return best_letter

    def update_q(self, state, letter, reward, next_state):
        """
        Q-Learning update rule:
        Q(s,a) ‚Üê Q(s,a) + Œ±[r + Œ≥¬∑max_a' Q(s',a') - Q(s,a)]
        """
        current = self.q_table[state][letter]
        max_next = max(self.q_table[next_state].values()) if self.q_table[next_state] else 0
        self.q_table[state][letter] = current + self.alpha * (reward + self.gamma * max_next - current)

print("‚úì Agent class defined")

‚úì Agent class defined


## Initialize Agent

In [8]:
print("="*70)
print("INITIALIZING Q-LEARNING AGENT")
print("="*70 + "\n")

agent = OptimizedAgent(hmm)

INITIALIZING Q-LEARNING AGENT

‚úì Q-Learning Agent initialized
  - Learning rate (Œ±): 0.2
  - Discount factor (Œ≥): 0.95
  - Exploration (Œµ): 0.5 ‚Üí 0.05
  - State space: (length, blanks, lives, guessed)
  - Action space: 26 letters



## Training Configuration

In [9]:
# Training hyperparameters
EPISODES = 15000  # Number of training episodes
STATS_EVERY = 100  # Report stats every N episodes

print(f"Training configuration:")
print(f"  - Episodes: {EPISODES}")
print(f"  - Stats reporting interval: {STATS_EVERY}")
print(f"  - Training corpus size: {len(corpus)}")
print()

Training configuration:
  - Episodes: 15000
  - Stats reporting interval: 100
  - Training corpus size: 49979



## Training Loop

This cell trains the Q-Learning agent over 15,000 episodes, tracking:
- Total reward per episode
- Win/loss outcomes
- Wrong and repeated guesses
- Epsilon decay (exploration rate)

In [10]:
print("="*70)
print(f"TRAINING ({EPISODES:,} episodes)")
print("="*70 + "\n")

# Tracking metrics
episode_rewards = []  # Total reward per episode
episode_wins = []  # 1 if won, 0 if lost
episode_wrong_guesses = []  # Number of wrong guesses per episode
episode_repeated_guesses = []  # Number of repeated guesses per episode
epsilon_history = []  # Epsilon decay over time

# Aggregated statistics for plotting
aggr_stats = {
    'episode': [],
    'avg_reward': [],
    'win_rate': [],
    'avg_wrong': [],
    'avg_repeated': [],
    'epsilon': []
}

for episode in tqdm(range(EPISODES), desc="Training"):
    word = random.choice(corpus)
    game = Game(word)
    state = agent.state_key(game)

    episode_reward = 0  # Track total reward for this episode

    while not game.done():
        letter = agent.choose_letter(game, training=True)
        if not letter:
            break

        was_correct = game.guess(letter)

        # OPTIMIZED REWARDS
        if was_correct is None:
            reward = -3
        elif was_correct:
            if game.won():
                reward = 100 + (game.lives * 15)  # Big win bonus
            else:
                revealed = sum(1 for c in game.word if c == letter)
                reward = 8 + (revealed * 3)  # Reward multiple reveals
        else:
            reward = -12  # Penalty for wrong guess

        episode_reward += reward

        next_state = agent.state_key(game)
        agent.update_q(state, letter, reward, next_state)
        state = next_state

    # Record episode metrics
    episode_rewards.append(episode_reward)
    episode_wins.append(1 if game.won() else 0)
    episode_wrong_guesses.append(game.wrong)
    episode_repeated_guesses.append(game.repeated)
    epsilon_history.append(agent.eps)

    # Aggregate statistics every STATS_EVERY episodes
    if (episode + 1) % STATS_EVERY == 0:
        recent_slice = slice(-STATS_EVERY, None)

        avg_reward = np.mean(episode_rewards[recent_slice])
        win_rate = np.mean(episode_wins[recent_slice])
        avg_wrong = np.mean(episode_wrong_guesses[recent_slice])
        avg_repeated = np.mean(episode_repeated_guesses[recent_slice])

        aggr_stats['episode'].append(episode + 1)
        aggr_stats['avg_reward'].append(avg_reward)
        aggr_stats['win_rate'].append(win_rate)
        aggr_stats['avg_wrong'].append(avg_wrong)
        aggr_stats['avg_repeated'].append(avg_repeated)
        aggr_stats['epsilon'].append(agent.eps)

    # Decay exploration
    if agent.eps > 0.05:
        agent.eps *= 0.9997

print("\n‚úì Training complete!\n")

TRAINING (15,000 episodes)



Training: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 15000/15000 [14:09<00:00, 17.66it/s]


‚úì Training complete!






## Training Metrics Visualization

In [11]:
print("="*70)
print("GENERATING TRAINING PLOTS")
print("="*70 + "\n")

def moving_average(data, window=50):
    """Calculate moving average for smoother plots"""
    if len(data) < window:
        return data
    return np.convolve(data, np.ones(window)/window, mode='valid')

# Create comprehensive training visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle(f'Q-Learning Agent Training Metrics ({EPISODES:,} Episodes)', fontsize=16, fontweight='bold')

# Plot 1: Reward per Episode
ax1 = axes[0, 0]
ax1.plot(episode_rewards, alpha=0.3, color='blue', label='Raw Reward')
ax1.plot(moving_average(episode_rewards, 100), color='red', linewidth=2, label='Moving Avg (100)')
ax1.set_xlabel('Episode')
ax1.set_ylabel('Total Reward')
ax1.set_title('Reward per Episode')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Win Rate
ax2 = axes[0, 1]
ax2.plot(aggr_stats['episode'], aggr_stats['win_rate'], color='green', linewidth=2, marker='o', markersize=3)
ax2.set_xlabel('Episode')
ax2.set_ylabel('Win Rate')
ax2.set_title(f'Win Rate (computed every {STATS_EVERY} episodes)')
ax2.set_ylim([-0.05, 1.05])
ax2.grid(True, alpha=0.3)

# Plot 3: Average Wrong Guesses
ax3 = axes[0, 2]
ax3.plot(episode_wrong_guesses, alpha=0.3, color='orange', label='Raw')
ax3.plot(moving_average(episode_wrong_guesses, 100), color='darkred', linewidth=2, label='Moving Avg (100)')
ax3.set_xlabel('Episode')
ax3.set_ylabel('Wrong Guesses')
ax3.set_title('Wrong Guesses per Episode')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: Average Repeated Guesses
ax4 = axes[1, 0]
ax4.plot(episode_repeated_guesses, alpha=0.3, color='purple', label='Raw')
ax4.plot(moving_average(episode_repeated_guesses, 100), color='darkviolet', linewidth=2, label='Moving Avg (100)')
ax4.set_xlabel('Episode')
ax4.set_ylabel('Repeated Guesses')
ax4.set_title('Repeated Guesses per Episode')
ax4.legend()
ax4.grid(True, alpha=0.3)

# Plot 5: Epsilon Decay
ax5 = axes[1, 1]
ax5.plot(epsilon_history, color='teal', linewidth=1.5)
ax5.set_xlabel('Episode')
ax5.set_ylabel('Epsilon')
ax5.set_title('Exploration Rate (Œµ) Decay')
ax5.grid(True, alpha=0.3)

# Plot 6: Aggregated Performance Summary
ax6 = axes[1, 2]
ax6_twin = ax6.twinx()
ax6.plot(aggr_stats['episode'], aggr_stats['avg_reward'], color='blue', linewidth=2, marker='s', markersize=4, label='Avg Reward')
ax6_twin.plot(aggr_stats['episode'], aggr_stats['win_rate'], color='green', linewidth=2, marker='o', markersize=4, label='Win Rate')
ax6.set_xlabel('Episode')
ax6.set_ylabel('Average Reward', color='blue')
ax6_twin.set_ylabel('Win Rate', color='green')
ax6.set_title('Reward and Win Rate Progress')
ax6.tick_params(axis='y', labelcolor='blue')
ax6_twin.tick_params(axis='y', labelcolor='green')
ax6.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('training_metrics.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úì Training plots saved to 'training_metrics.png'\n")

GENERATING TRAINING PLOTS

‚úì Training plots saved to 'training_metrics.png'



## Save Training Statistics

In [12]:
# Save training statistics to CSV
training_stats_df = pd.DataFrame({
    'Episode': range(1, EPISODES + 1),
    'Reward': episode_rewards,
    'Win': episode_wins,
    'Wrong_Guesses': episode_wrong_guesses,
    'Repeated_Guesses': episode_repeated_guesses,
    'Epsilon': epsilon_history
})
training_stats_df.to_csv('training_statistics.csv', index=False)
print("‚úì Training statistics saved to 'training_statistics.csv'\n")

# Display summary statistics
print("Training Summary:")
print(f"  Final Win Rate: {aggr_stats['win_rate'][-1]*100:.2f}%")
print(f"  Final Avg Reward: {aggr_stats['avg_reward'][-1]:.2f}")
print(f"  Final Avg Wrong Guesses: {aggr_stats['avg_wrong'][-1]:.2f}")
print(f"  Final Epsilon: {agent.eps:.4f}")
print()

‚úì Training statistics saved to 'training_statistics.csv'

Training Summary:
  Final Win Rate: 92.00%
  Final Avg Reward: 195.19
  Final Avg Wrong Guesses: 2.15
  Final Epsilon: 0.0500



## Evaluation on Test Set

Test the trained agent on the test set to measure final performance.

In [13]:
print("="*70)
print(f"FINAL EVALUATION ({len(test_words)} test words)")
print("="*70 + "\n")

wins = wrong = repeated = 0

for word in tqdm(test_words, desc="Evaluating"):
    game = Game(word)

    while not game.done():
        letter = agent.choose_letter(game, training=False)
        if not letter:
            break
        game.guess(letter)

    if game.won():
        wins += 1
    wrong += game.wrong
    repeated += game.repeated

rate = wins / len(test_words)
score = (rate * 2000) - (wrong * 5) - (repeated * 2)

print(f"\n{'='*70}")
print(f"FINAL RESULTS")
print(f"{'='*70}")
print(f"Wins: {wins}/{len(test_words)} ({rate*100:.2f}%)")
print(f"Total Wrong: {wrong} (Avg: {wrong/len(test_words):.2f})")
print(f"Total Repeated: {repeated} (Avg: {repeated/len(test_words):.4f})")
print(f"\n‚úì‚úì‚úì FINAL SCORE: {score:.0f} ‚úì‚úì‚úì")
print(f"{'='*70}\n")

FINAL EVALUATION (2000 test words)



Evaluating: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2000/2000 [02:57<00:00, 11.25it/s]


FINAL RESULTS
Wins: 704/2000 (35.20%)
Total Wrong: 10217 (Avg: 5.11)
Total Repeated: 0 (Avg: 0.0000)

‚úì‚úì‚úì FINAL SCORE: -50381 ‚úì‚úì‚úì






## Performance Breakdown

In [14]:
print("Performance Breakdown:")
print(f"  Success Rate: {rate*100:.2f}%")
print(f"  Penalty from wrong guesses: {wrong * 5}")
print(f"  Penalty from repeats: {repeated * 2}")
print(f"  Score formula: ({rate:.4f} √ó 2000) - ({wrong} √ó 5) - ({repeated} √ó 2) = {score:.0f}\n")

Performance Breakdown:
  Success Rate: 35.20%
  Penalty from wrong guesses: 51085
  Penalty from repeats: 0
  Score formula: (0.3520 √ó 2000) - (10217 √ó 5) - (0 √ó 2) = -50381



## Save Evaluation Results

In [15]:
# Save evaluation results
eval_results_df = pd.DataFrame({
    'Metric': ['Wins', 'Win Rate %', 'Total Wrong', 'Total Repeated', 'Avg Wrong/Game', 'Final Score'],
    'Value': [wins, rate*100, wrong, repeated, wrong/len(test_words), score]
})
eval_results_df.to_csv('evaluation_results.csv', index=False)
print("‚úì Evaluation results saved to 'evaluation_results.csv'\n")

# Display evaluation results
print("Evaluation Results:")
print(eval_results_df.to_string(index=False))

‚úì Evaluation results saved to 'evaluation_results.csv'

Evaluation Results:
        Metric       Value
          Wins    704.0000
    Win Rate %     35.2000
   Total Wrong  10217.0000
Total Repeated      0.0000
Avg Wrong/Game      5.1085
   Final Score -50381.0000


## Complete Summary Report

In [None]:
print("="*70)
print("COMPLETE PROJECT SUMMARY")
print("="*70 + "\n")

print("MODEL ARCHITECTURE:")
print(f"  HMM:")
print(f"    - States: Letters (a-z)")
print(f"    - Observations: Masked word patterns")
print(f"    - Emission: P(letter | position, length)")
print(f"    - Transition: P(letter_t | letter_t-1)")
print(f"    - Inference: Forward-Backward algorithm")
print(f"  ")
print(f"  RL Agent:")
print(f"    - Algorithm: Q-Learning")
print(f"    - State: (word_length, blanks, lives, num_guessed)")
print(f"    - Action: 26 letters")
print(f"    - Strategy: Hybrid (Candidates > HMM > Q-values)")
print(f"")

print("TRAINING SUMMARY:")
print(f"  Episodes: {EPISODES:,}")
print(f"  Final Win Rate (training): {aggr_stats['win_rate'][-1]*100:.2f}%")
print(f"  Final Avg Reward (training): {aggr_stats['avg_reward'][-1]:.2f}")
print(f"  Final Epsilon: {agent.eps:.4f}")
print(f"")

print("EVALUATION SUMMARY:")
print(f"  Test Set Size: {len(test_words)}")
print(f"  Success Rate: {rate*100:.2f}%")
print(f"  Avg Wrong Guesses: {wrong/len(test_words):.2f}")
print(f"  Avg Repeated Guesses: {repeated/len(test_words):.4f}")
print(f"  **FINAL SCORE: {score:.0f}**")
print(f"")

print(" OUTPUT FILES:")
print(f"  1. training_metrics.png - Comprehensive training visualization")
print(f"  2. training_statistics.csv - Episode-by-episode training data")
print(f"  3. evaluation_results.csv - Final test results")
print(f"  4. rl_agent.pkl - Trained Q-Learning agent")
print(f"")

print("="*70)
print("‚úì‚úì‚úì PROJECT COMPLETE ‚úì‚úì‚úì")
print("="*70)

COMPLETE PROJECT SUMMARY

MODEL ARCHITECTURE:
  HMM:
    - States: Letters (a-z)
    - Observations: Masked word patterns
    - Emission: P(letter | position, length)
    - Transition: P(letter_t | letter_t-1)
    - Inference: Forward-Backward algorithm
  
  RL Agent:
    - Algorithm: Q-Learning
    - State: (word_length, blanks, lives, num_guessed)
    - Action: 26 letters
    - Strategy: Hybrid (Candidates > HMM > Q-values)

TRAINING SUMMARY:
  Episodes: 15,000
  Final Win Rate (training): 92.00%
  Final Avg Reward (training): 195.19
  Final Epsilon: 0.0500

EVALUATION SUMMARY:
  Test Set Size: 2000
  Success Rate: 35.20%
  Avg Wrong Guesses: 5.11
  Avg Repeated Guesses: 0.0000
  **FINAL SCORE: -50381**

üíæ OUTPUT FILES:
  1. training_metrics.png - Comprehensive training visualization
  2. training_statistics.csv - Episode-by-episode training data
  3. evaluation_results.csv - Final test results
  4. rl_agent.pkl - Trained Q-Learning agent

‚úì‚úì‚úì PROJECT COMPLETE ‚úì‚úì‚úì


## Download Results

In [19]:
from google.colab import files
files.download('/content/training_metrics.png')
files.download('/content/training_statistics.csv')
files.download('/content/evaluation_results.csv')
files.download('/content/rl_agent.pkl')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>