# Hangman AI Strategy: Deep Learning Approach

This notebook demonstrates an AI strategy for playing Hangman using deep learning techniques.
The approach combines statistical analysis with LSTM neural networks to achieve high success rates.

## 🎮 What is Hangman?

Hangman is a classic word-guessing game where players try to discover a hidden word by guessing individual letters. Here's how it works:

### Game Rules:
1. **Setup**: A secret word is chosen and displayed as blank spaces (underscores), one for each letter
2. **Guessing**: Players guess one letter at a time
3. **Correct Guess**: If the letter appears in the word, all instances are revealed in their correct positions
4. **Wrong Guess**: If the letter doesn't appear, it counts as a mistake
5. **Win Condition**: Player wins by revealing the entire word before making too many wrong guesses
6. **Lose Condition**: Player loses after 6 wrong guesses

### Example Game:
```
Secret word: "MACHINE"
Initial:     _ _ _ _ _ _ _

Guess 'E': Correct!    → _ _ _ _ _ _ E
Guess 'A': Correct!    → _ A _ _ _ _ E  
Guess 'T': Wrong! (1/6) → _ A _ _ _ _ E
Guess 'I': Correct!    → _ A _ _ I _ E
Guess 'N': Correct!    → _ A _ _ I N E
Guess 'M': Correct!    → M A _ _ I N E
Guess 'C': Correct!    → M A C _ I N E
Guess 'H': Correct!    → M A C H I N E  ✅ YOU WIN!
```

### The AI Challenge:
The strategic challenge is **letter selection** - which letter should you guess next to maximize your chances of:
1. **Gaining information** about the word structure
2. **Avoiding wrong guesses** that bring you closer to losing
3. **Completing the word efficiently** with minimal guesses

This is where artificial intelligence can excel by learning patterns from thousands of words and games.

## 📊 Project Overview
- **Dataset**: 370K+ English words from public sources
- **Strategy**: Multi-stage approach (early/mid/late game)
- **Model**: Configurable LSTM architecture
- **Evaluation**: Local simulation of 1000+ games"

In [None]:
import json
import random
import string
import time
import re
import collections
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from itertools import combinations
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, Concatenate, Bidirectional, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam

## 🧠 Hangman AI Strategies

### Traditional Approaches:
1. **Frequency Analysis**: Guess letters based on how common they are in English (E, T, A, O, I, N...)
2. **Pattern Matching**: Look for common word patterns and endings (-ING, -ED, -ION)
3. **Dictionary Filtering**: Maintain a list of possible words and eliminate those that don't match

### Our Deep Learning Approach:
This project implements a **multi-stage strategy** that adapts based on game state:

- **Early Game**: When few letters are known, use statistical frequency
- **Mid Game**: Combine frequency analysis with neural network predictions  
- **Late Game**: Rely heavily on pattern recognition when word structure is clearer

The LSTM neural network learns to recognize word patterns and letter relationships from training on hundreds of thousands of English words, allowing it to make more intelligent guesses than pure frequency analysis."

## 1. Data Loading and Preparation

In [None]:
# Load public word dataset
with open("words_public.txt", "r") as f:
    word_list = [line.strip().lower() for line in f if line.strip().isalpha() and len(line.strip()) >= 3]

print(f"Loaded {len(word_list)} words")
print(f"Sample words: {word_list[:10]}")
print(f"Word length range: {min(len(w) for w in word_list)} - {max(len(w) for w in word_list)}")

In [None]:
# Split words into training and testing sets
train_words, test_words = train_test_split(word_list, test_size=0.2, random_state=42)
print(f"Training words: {len(train_words)}")
print(f"Testing words: {len(test_words)}")

## 2. Data Analysis and Visualization

In [None]:
# Analyze word length distribution
max_word_length = max(len(word) for word in train_words)
word_lengths = [len(word) for word in train_words]
word_length_counts = collections.Counter(word_lengths)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
lengths = list(word_length_counts.keys())
counts = list(word_length_counts.values())
plt.bar(lengths, counts, color='skyblue', alpha=0.7)
plt.xlabel('Word Length')
plt.ylabel('Number of Words')
plt.title('Distribution of Word Lengths')
plt.grid(axis='y', alpha=0.3)

# Letter frequency analysis
plt.subplot(1, 2, 2)
all_letters = ''.join(train_words)
letter_freq = collections.Counter(all_letters)
letters = string.ascii_lowercase
frequencies = [letter_freq.get(letter, 0) for letter in letters]
plt.bar(letters, frequencies, color='lightcoral', alpha=0.7)
plt.xlabel('Letters')
plt.ylabel('Frequency')
plt.title('Letter Frequency in Training Data')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print(f"Most common letters: {letter_freq.most_common(10)}")

## 3. Configurable Model Architecture

In [None]:
def create_hangman_model(embedding_dim=128, lstm_layers=[512, 256, 128], 
                        dense_dim=64, dropout_rate=0.0, max_len=20):
    """
    Create a configurable LSTM model for Hangman letter prediction.
    
    Args:
        embedding_dim: Dimension of character embeddings
        lstm_layers: List of LSTM layer sizes
        dense_dim: Dense layer dimension
        dropout_rate: Dropout rate (0.0 means no dropout)
        max_len: Maximum sequence length
    
    Returns:
        Compiled Keras model
    """
    vocab_size = 28  # 0=Pad, 1-26=a-z, 27=_
    
    # Input layers
    seq_input = Input(shape=(max_len,), name="seq_input")
    length_input = Input(shape=(1,), name="length_input")
    
    # Embedding layer
    embedding = Embedding(input_dim=vocab_size, output_dim=embedding_dim, 
                         mask_zero=True)(seq_input)
    
    # LSTM layers
    x = embedding
    for i, units in enumerate(lstm_layers):
        return_sequences = (i < len(lstm_layers) - 1)  # All but last layer return sequences
        x = Bidirectional(LSTM(units, return_sequences=return_sequences))(x)
        if dropout_rate > 0:
            x = Dropout(dropout_rate)(x)
    
    # Combine with length information
    combined = Concatenate()([x, length_input])
    
    # Dense layers
    if dense_dim > 0:
        combined = Dense(dense_dim, activation='relu')(combined)
        if dropout_rate > 0:
            combined = Dropout(dropout_rate)(combined)
    
    # Output layer
    output = Dense(26, activation="sigmoid", name="output", kernel_regularizer=tf.keras.regularizers.l2(0.001))(combined)
    
    # Build and compile model
    model = Model(inputs=[seq_input, length_input], outputs=output)
    optimizer = Adam(learning_rate=1e-3)
    model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=['accuracy'])
    
    return model

# Test different model configurations
model_configs = {
    'small': {'embedding_dim': 64, 'lstm_layers': [128, 64], 'dense_dim': 32},
    'medium': {'embedding_dim': 128, 'lstm_layers': [256, 128], 'dense_dim': 64},
    'large': {'embedding_dim': 128, 'lstm_layers': [512, 256, 128], 'dense_dim': 64},
    'xlarge': {'embedding_dim': 256, 'lstm_layers': [512, 256, 128, 64], 'dense_dim': 128}
}

# Create a sample model to show architecture
sample_model = create_hangman_model(**model_configs['medium'])
sample_model.summary()

## 4. Training Data Generation

In [None]:
# Vocabulary setup
alphabet = string.ascii_lowercase
vocab = ['Pad'] + list(alphabet) + ['_']
char2idx = {ch: i for i, ch in enumerate(vocab)}
letter_to_idx = {letter: idx for idx, letter in enumerate(alphabet)}

def make_masked_word(word, guessed):
    """Create masked word with underscores for unguessed letters"""
    return ''.join([c if c in guessed else '_' for c in word])

def naive_method(word, max_word_length=20):
    """Get first few letters using frequency analysis"""
    wrong_guess = []
    correct_guess = []
    word_set = set(word)
    
    if len(word_set) <= 2:
        return [], list(word_set), []
    
    # Use simple frequency for first guesses
    common_letters = ['e', 'a', 'i', 'o', 'u', 'r', 's', 't', 'l', 'n']
    
    for letter in common_letters:
        if len(correct_guess) >= 2:
            break
        if letter in word_set:
            correct_guess.append(letter)
            word_set.remove(letter)
        else:
            wrong_guess.append(letter)
    
    return list(word_set), correct_guess, wrong_guess

def create_training_examples(word, max_len=20):
    """Generate training samples for a given word"""
    if len(set(word)) <= 2:
        return []
        
    examples = []
    unique_letters, correct_letters, _ = naive_method(word)
    n = len(unique_letters)
    
    if n == 0:
        return []
    
    # Generate combinations of revealed letters
    for k in range(n):
        combinations_list = list(combinations(unique_letters, k))
        # Sample combinations to avoid too many examples
        sample_size = min(len(combinations_list), max(1, k))
        selected_combinations = random.sample(combinations_list, sample_size)
        
        for comb in selected_combinations:
            masked = make_masked_word(word, list(comb) + correct_letters)
            remaining_letters = [c for c in unique_letters if c not in comb]
            guessed = correct_letters + list(comb)
            
            examples.append({
                "masked_word": masked,
                "guessed_letters": ''.join(guessed),
                "target": ''.join(remaining_letters),
                "word_length": len(word)
            })
    
    return examples

In [None]:
# Generate training data
def generate_training_data(word_list, num_samples=50000, max_len=20):
    """Generate training examples from word list"""
    all_examples = []
    
    # Sample words to control dataset size
    if len(word_list) > num_samples:
        selected_words = random.sample(word_list, num_samples)
    else:
        selected_words = word_list
    
    for word in selected_words:
        if len(word) <= max_len and word.isalpha():
            examples = create_training_examples(word.lower().strip(), max_len)
            all_examples.extend(examples)
    
    print(f"Generated {len(all_examples)} training examples from {len(selected_words)} words")
    return all_examples

# Generate training data
training_examples = generate_training_data(train_words, num_samples=20000)
random.shuffle(training_examples)

In [None]:
# Encoding functions
def encode_input(masked_word, max_len=20):
    """Encode masked word to indices"""
    input_ids = [char2idx[c] for c in masked_word]
    if len(input_ids) < max_len:
        input_ids += [char2idx['Pad']] * (max_len - len(input_ids))
    else:
        input_ids = input_ids[:max_len]
    return input_ids

def encode_target(target):
    """Encode target letters to binary vector"""
    target_vec = [0] * 26
    for c in target:
        if c in string.ascii_lowercase:
            target_vec[ord(c) - ord('a')] = 1
    return target_vec

def encode_guessed(guessed_letters):
    """Encode guessed letters to binary vector"""
    guessed_vec = [0] * 26
    for c in guessed_letters:
        if c in string.ascii_lowercase:
            guessed_vec[ord(c) - ord('a')] = 1
    return guessed_vec

# Prepare training data
X_words = []
X_lengths = []
X_guessed = []
y = []

for example in training_examples:
    word_encoded = encode_input(example['masked_word'])
    length = example['word_length']
    guessed_encoded = encode_guessed(example['guessed_letters'])
    target_encoded = encode_target(example['target'])
    
    X_words.append(word_encoded)
    X_lengths.append(length)
    X_guessed.append(guessed_encoded)
    y.append(target_encoded)

X_words = np.array(X_words)
X_lengths = np.array(X_lengths).reshape(-1, 1)
X_guessed = np.array(X_guessed)
y = np.array(y)

print(f"Training data shapes:")
print(f"Words: {X_words.shape}")
print(f"Lengths: {X_lengths.shape}")
print(f"Targets: {y.shape}")

## 5. Model Training and Comparison

In [None]:
# Split training data
X_train_words, X_val_words, X_train_lengths, X_val_lengths, y_train, y_val = train_test_split(
    X_words, X_lengths, y, test_size=0.2, random_state=42
)

def train_model(config_name, config, epochs=20, batch_size=512):
    """Train a model with given configuration"""
    print(f"\nTraining {config_name} model...")
    
    model = create_hangman_model(**config)
    
    early_stop = EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True,
        verbose=1
    )
    
    history = model.fit(
        [X_train_words, X_train_lengths], y_train,
        validation_data=([X_val_words, X_val_lengths], y_val),
        epochs=epochs,
        batch_size=batch_size,
        callbacks=[early_stop],
        verbose=1
    )
    
    # Save model
    model.save(f"hangman_model_{config_name}.keras")
    
    return model, history

# Train different model sizes
trained_models = {}
training_histories = {}

for config_name, config in model_configs.items():
    model, history = train_model(config_name, config)
    trained_models[config_name] = model
    training_histories[config_name] = history

In [None]:
# Visualize training results
plt.figure(figsize=(15, 10))

for i, (config_name, history) in enumerate(training_histories.items()):
    plt.subplot(2, 2, i+1)
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title(f'{config_name.capitalize()} Model Training')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Local Hangman Game Simulator

In [None]:
class HangmanGame:
    """Local Hangman game simulator"""
    
    def __init__(self, model, word_list, max_len=20):
        self.model = model
        self.word_list = word_list
        self.max_len = max_len
        self.vocab = ['Pad'] + list(string.ascii_lowercase) + ['_']
        self.char2idx = {ch: i for i, ch in enumerate(self.vocab)}
        self.letter_to_idx = {letter: idx for idx, letter in enumerate(string.ascii_lowercase)}
        
    def encode_word(self, masked_word):
        """Encode masked word for model input"""
        input_ids = [self.char2idx[c] for c in masked_word]
        if len(input_ids) < self.max_len:
            input_ids += [self.char2idx['Pad']] * (self.max_len - len(input_ids))
        else:
            input_ids = input_ids[:self.max_len]
        return input_ids
    
    def encode_guessed_letters(self, guessed_letters):
        """Encode guessed letters as binary vector"""
        target_vec = [0] * 26
        for c in guessed_letters:
            if c in string.ascii_lowercase:
                target_vec[ord(c) - ord('a')] = 1
        return target_vec
    
    def prob_screening(self, guessed_wrong_letters, guessed_correct_letters, word_length):
        """Statistical frequency analysis for early game"""
        guessed_wrong_set = set(guessed_wrong_letters)
        guessed_correct_letters = set(guessed_correct_letters)
        
        # Filter words
        possible_words = [
            word for word in self.word_list
            if len(word) == word_length and 
            guessed_wrong_set.isdisjoint(word) and 
            guessed_correct_letters.issubset(word)
        ]
        
        # Calculate letter probabilities
        prob = [0] * 26
        for word in possible_words:
            for letter in set(word):
                prob[self.letter_to_idx[letter]] += 1
        
        n = max(len(possible_words), 1)
        epsilon = 0.0001
        prob = [p/n + epsilon for p in prob]
        
        return len(possible_words), np.array(prob)
    
    def guess(self, word, tries_remaining, guessed_letters):
        """Make a guess using the three-stage strategy"""
        word = word.replace(" ", ").lower()
        correct_letters = [c for c in set(word) if c != '_']
        wrong_letters = [c for c in guessed_letters if c not in correct_letters]
        word_len = min(len(word), self.max_len)
        
        known_number = len(set(word)) - 1  # minus '_'
        
        if known_number < 2:
            # Early stage: use frequency analysis
            _, prob_vector = self.prob_screening(wrong_letters, correct_letters, word_len)
            sorted_indices = np.argsort(prob_vector)[::-1]
            for idx in sorted_indices:
                letter = chr(ord('a') + idx)
                if letter not in guessed_letters:
                    return letter
        else:
            # Mid/Late stage: use model prediction
            len_dict, uncon_prob = self.prob_screening(wrong_letters, correct_letters, word_len)
            
            # Encode for model
            word_encoded = np.array(self.encode_word(word)).reshape(1, -1)
            word_len_encoded = np.array(word_len).reshape(1, -1)
            
            # Get model prediction
            probs = self.model.predict([word_encoded, word_len_encoded], verbose=0)[0]
            
            # Combine with frequency if many candidate words remain
            if len_dict > 200 and tries_remaining > 2:
                probs = uncon_prob * probs
            
            # Mask already guessed letters
            guessed_mask = self.encode_guessed_letters(guessed_letters)
            probs[np.array(guessed_mask, dtype=bool)] = -1
            
            # Select best letter
            next_letter_index = np.argmax(probs)
            return string.ascii_lowercase[next_letter_index]
        
        return 'e'  # fallback
    
    def play_game(self, word, verbose=False):
        """Play a complete game of Hangman"""
        word = word.lower()
        tries_remaining = 6
        guessed_letters = []
        current_state = '_' * len(word)
        
        if verbose:
            print(f"Playing word: {word} (length: {len(word)})")
        
        while tries_remaining > 0 and '_' in current_state:
            guess = self.guess(current_state, tries_remaining, guessed_letters)
            guessed_letters.append(guess)
            
            if guess in word:
                # Update current state
                new_state = ''.join([c if c in guessed_letters else '_' for c in word])
                current_state = new_state
                if verbose:
                    print(f"Guess '{guess}': HIT! Current: {current_state}")
            else:
                tries_remaining -= 1
                if verbose:
                    print(f"Guess '{guess}': MISS! Tries left: {tries_remaining}")
        
        success = '_' not in current_state
        if verbose:
            print(f"Game {'WON' if success else 'LOST'}! Final word: {word}")
            print()
        
        return success, len(word), 6 - tries_remaining
    
    def simulate_games(self, num_games=1000, verbose_interval=100):
        """Simulate multiple games and return statistics"""
        results = []
        test_words_sample = random.sample(self.word_list, min(num_games, len(self.word_list)))
        
        for i, word in enumerate(test_words_sample):
            success, word_length, guesses_used = self.play_game(word, verbose=(i % verbose_interval == 0))
            results.append({
                'word': word,
                'success': success,
                'length': word_length,
                'guesses_used': guesses_used
            })
            
            if (i + 1) % verbose_interval == 0:
                current_success_rate = sum(r['success'] for r in results) / len(results)
                print(f"Completed {i+1}/{num_games} games. Success rate: {current_success_rate:.3f}")
        
        return results

## 7. Model Performance Comparison

In [None]:
# Test all models
model_results = {}
num_test_games = 500  # Adjust based on computational resources

for config_name, model in trained_models.items():
    print(f"\nTesting {config_name} model...")
    game = HangmanGame(model, test_words)
    results = game.simulate_games(num_test_games, verbose_interval=100)
    
    success_rate = sum(r['success'] for r in results) / len(results)
    avg_guesses = np.mean([r['guesses_used'] for r in results])
    
    model_results[config_name] = {
        'success_rate': success_rate,
        'avg_guesses': avg_guesses,
        'results': results
    }
    
    print(f"{config_name} model: {success_rate:.3f} success rate, {avg_guesses:.1f} avg guesses")

In [None]:
# Visualize model comparison
plt.figure(figsize=(15, 10))

# Overall performance comparison
plt.subplot(2, 3, 1)
models = list(model_results.keys())
success_rates = [model_results[m]['success_rate'] for m in models]
plt.bar(models, success_rates, color='skyblue', alpha=0.7)
plt.title('Success Rate by Model Size')
plt.ylabel('Success Rate')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

# Average guesses comparison
plt.subplot(2, 3, 2)
avg_guesses = [model_results[m]['avg_guesses'] for m in models]
plt.bar(models, avg_guesses, color='lightcoral', alpha=0.7)
plt.title('Average Guesses Used by Model Size')
plt.ylabel('Average Guesses')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

# Performance by word length (using best model)
best_model = max(models, key=lambda m: model_results[m]['success_rate'])
best_results = model_results[best_model]['results']

plt.subplot(2, 3, 3)
length_success = {}
for result in best_results:
    length = result['length']
    if length not in length_success:
        length_success[length] = []
    length_success[length].append(result['success'])

lengths = sorted(length_success.keys())
success_by_length = [np.mean(length_success[l]) for l in lengths]
plt.plot(lengths, success_by_length, 'o-', color='green', linewidth=2, markersize=6)
plt.title(f'Success Rate by Word Length\n({best_model} model)')
plt.xlabel('Word Length')
plt.ylabel('Success Rate')
plt.grid(True, alpha=0.3)

# Model parameters vs performance
plt.subplot(2, 3, 4)
model_params = []
for config_name in models:
    model = trained_models[config_name]
    params = model.count_params()
    model_params.append(params)

plt.scatter(model_params, success_rates, s=100, alpha=0.7, color='purple')
for i, config_name in enumerate(models):
    plt.annotate(config_name, (model_params[i], success_rates[i]), 
                xytext=(5, 5), textcoords='offset points')
plt.title('Model Size vs Performance')
plt.xlabel('Number of Parameters')
plt.ylabel('Success Rate')
plt.grid(True, alpha=0.3)

# Success rate distribution
plt.subplot(2, 3, 5)
all_success_rates = []
labels = []
for config_name in models:
    successes = [r['success'] for r in model_results[config_name]['results']]
    all_success_rates.append(successes)
    labels.append(config_name)

plt.boxplot(all_success_rates, labels=labels)
plt.title('Success Rate Distribution')
plt.ylabel('Success (0=Loss, 1=Win)')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)

# Summary statistics
plt.subplot(2, 3, 6)
plt.axis('off')
summary_text = "Model Performance Summary\n\n"
for config_name in models:
    sr = model_results[config_name]['success_rate']
    ag = model_results[config_name]['avg_guesses']
    params = trained_models[config_name].count_params()
    summary_text += f"{config_name}: {sr:.3f} ({ag:.1f} guesses, {params:,} params)\n"

plt.text(0.1, 0.9, summary_text, transform=plt.gca().transAxes, 
         fontsize=10, verticalalignment='top', fontfamily='monospace')

plt.tight_layout()
plt.show()

## 8. Analysis and Insights

In [None]:
# Detailed analysis of best performing model
best_model_name = max(model_results.keys(), key=lambda m: model_results[m]['success_rate'])
best_results = model_results[best_model_name]['results']

print(f"\nDetailed Analysis of {best_model_name} Model:")
print(f"Overall Success Rate: {model_results[best_model_name]['success_rate']:.3f}")
print(f"Average Guesses Used: {model_results[best_model_name]['avg_guesses']:.1f}")

# Success rate by word length
length_analysis = {}
for result in best_results:
    length = result['length']
    if length not in length_analysis:
        length_analysis[length] = {'wins': 0, 'total': 0, 'guesses': []}
    
    length_analysis[length]['total'] += 1
    length_analysis[length]['guesses'].append(result['guesses_used'])
    if result['success']:
        length_analysis[length]['wins'] += 1

print("\nPerformance by Word Length:")
for length in sorted(length_analysis.keys()):
    data = length_analysis[length]
    if data['total'] >= 5:  # Only show lengths with sufficient data
        success_rate = data['wins'] / data['total']
        avg_guesses = np.mean(data['guesses'])
        print(f"Length {length:2d}: {success_rate:.3f} success rate ({data['wins']:2d}/{data['total']:2d} games), {avg_guesses:.1f} avg guesses")

# Model comparison summary
print("\nModel Comparison Summary:")
for config_name in ['small', 'medium', 'large', 'xlarge']:
    if config_name in model_results:
        sr = model_results[config_name]['success_rate']
        params = trained_models[config_name].count_params()
        print(f"{config_name:6s}: {sr:.3f} success rate, {params:7,d} parameters")

## 9. Strategy Summary

This notebook demonstrates a sophisticated AI approach to playing Hangman with the following key components:

### Multi-Stage Strategy
1. **Early Stage** (0-1 letters known): Statistical frequency analysis
2. **Mid Stage** (2+ letters, many candidates): Hybrid frequency + neural network
3. **Late Stage** (few candidates or critical decisions): Pure neural network pattern recognition

### Key Findings
- Larger models generally perform better, but with diminishing returns
- Performance strongly correlates with word length (better on longer words)
- The three-stage strategy effectively adapts to different game phases
- LSTM networks excel at pattern completion when sufficient context is available

### Future Improvements
- Specialized models for short words (≤7 letters)
- Adaptive stage transitions based on word length
- Integration of word frequency information
- Ensemble methods combining multiple model sizes