# Trexquant Interview Project (The Hangman Game)

* Copyright Trexquant Investment LP. All Rights Reserved. 
* Redistribution of this question without written consent from Trexquant is prohibited

## Instruction:
For this coding test, your mission is to write an algorithm that plays the game of Hangman through our API server. 

When a user plays Hangman, the server first selects a secret word at random from a list. The server then returns a row of underscores (space separated)—one for each letter in the secret word—and asks the user to guess a letter. If the user guesses a letter that is in the word, the word is redisplayed with all instances of that letter shown in the correct positions, along with any letters correctly guessed on previous turns. If the letter does not appear in the word, the user is charged with an incorrect guess. The user keeps guessing letters until either (1) the user has correctly guessed all the letters in the word
or (2) the user has made six incorrect guesses.

You are required to write a "guess" function that takes current word (with underscores) as input and returns a guess letter. You will use the API codes below to play 1,000 Hangman games. You have the opportunity to practice before you want to start recording your game results.

Your algorithm is permitted to use a training set of approximately 250,000 dictionary words. Your algorithm will be tested on an entirely disjoint set of 250,000 dictionary words. Please note that this means the words that you will ultimately be tested on do NOT appear in the dictionary that you are given. You are not permitted to use any dictionary other than the training dictionary we provided. This requirement will be strictly enforced by code review.

You are provided with a basic, working algorithm. This algorithm will match the provided masked string (e.g. a _ _ l e) to all possible words in the dictionary, tabulate the frequency of letters appearing in these possible words, and then guess the letter with the highest frequency of appearence that has not already been guessed. If there are no remaining words that match then it will default back to the character frequency distribution of the entire dictionary.

This benchmark strategy is successful approximately 18% of the time. Your task is to design an algorithm that significantly outperforms this benchmark.

# README

To tackle the Hangman word prediction task, a **Transformer Encoder-Decoder** setup was deployed.


##  Dataset

Using a custom Python script, all possible hangman states for each word were generated. This yielded approximately **1 billion (xi, yi) pairs**, where:

* **xi**: Masked hangman state of the word
* **yi**: True word

**Final dataset size: \~3 GB**

---

##  Data Engineering

* Each `(xi, yi)` pair is converted into a **PyTorch tensor** of shape `(seq_len, 27)`
* Characters are **one-hot encoded**:

  * `'a'` → index 0
  * `'_'` (masked character) → index 26
* Data is loaded using a **PyTorch DataLoader** with:

  * `batch_size = 512`
  * Padding per batch to match sequence length using zero vectors

---

##  Model Architecture

A standard **Transformer Encoder-Decoder** was used with the following parameters:

* `input_size = 27`
* `d_model = 256`
* `nhead = 8`
* `num_layers = 4`
* **Sinusoidal Positional Encoding**

###  Training Details

* **soft cross-entropy loss** for backpropagation
* **Optimizer**: Adam (`lr = 1e-3`)
* **Total parameters**: \~2M (\~10 MB)
* **Epochs**: 1 (due to computational constraints)

**Only training for 1 epoch limited model accuracy**

---

##  Results & Conclusion

* Final Accuracy on practice runs (100 runs): **0.62**
* Final Accuracy on recorded runs (338 runs): **0.553** [exhausted 662 on old models]
* Baseline model accuracy: **0.18**
* Challenge cutoff: **0.50** 

The model surpassed the baseline and cutoff even after **just 1 epoch** of training. Since transformers typically benefit from longer training or pretrained weights, performance is expected to improve with extended training.

**Many successful transformer-based models are fine-tuned from large pretrained models.**

---

##  Python Files

Attached at end of the notebook

| File                   | Description                                                                                  |
| ---------------------- | -------------------------------------------------------------------------------------------- |
| `create_data.py`       | Generates all possible hangman states from `words_250000_train.txt`                          |
| `load_data.py`         | Builds a PyTorch DataLoader from the generated text file                                     |
| `transformer_model.py` | Trains the transformer model and saves it to `.pth` format                                   |
| `predict.py`           | Takes a masked hangman word as input and returns character guesses in descending probability |

---

##  Declaration

All logic and code were written by me. I used **LLMs for debugging** and referencing library syntax when needed.






In [1]:
import json
import requests
import random
import string
import secrets
import time
import re
import collections

try:
    from urllib.parse import parse_qs, urlencode, urlparse
except ImportError:
    from urlparse import parse_qs, urlparse
    from urllib import urlencode

from requests.packages.urllib3.exceptions import InsecureRequestWarning
from predict import make_predictions
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

In [2]:
class HangmanAPI(object):
    def __init__(self, access_token=None, session=None, timeout=None):
        self.hangman_url = self.determine_hangman_url()
        self.access_token = access_token
        self.session = session or requests.Session()
        self.timeout = timeout
        self.guessed_letters = []
        
        full_dictionary_location = "words_250000_train.txt"
        self.full_dictionary = self.build_dictionary(full_dictionary_location)        
        self.full_dictionary_common_letter_sorted = collections.Counter("".join(self.full_dictionary)).most_common()
        
        self.current_dictionary = []
        
    @staticmethod
    def determine_hangman_url():
        links = ['https://trexsim.com']

        data = {link: 0 for link in links}

        for link in links:

            requests.get(link)

            for i in range(10):
                s = time.time()
                requests.get(link)
                data[link] = time.time() - s

        link = sorted(data.items(), key=lambda x: x[1])[0][0]
        link += '/trexsim/hangman'
        return link

    def guess(self, word): # word input example: "_ p p _ e "
        word_fixed = ''
        for i in range(0, len(word), 2) :
            word_fixed += word[i]
        predictions = make_predictions(word_fixed)
        for letter in predictions:
            if (letter not in self.guessed_letters) and (letter not in word_fixed):
                return letter
        # ###############################################
        # # Replace with your own "guess" function here #
        # ###############################################

        # # clean the word so that we strip away the space characters
        # # replace "_" with "." as "." indicates any character in regular expressions
        # clean_word = word[::2].replace("_",".")
        
        # # find length of passed word
        # len_word = len(clean_word)
        
        # # grab current dictionary of possible words from self object, initialize new possible words dictionary to empty
        # current_dictionary = self.current_dictionary
        # new_dictionary = []
        
        # # iterate through all of the words in the old plausible dictionary
        # for dict_word in current_dictionary:
        #     # continue if the word is not of the appropriate length
        #     if len(dict_word) != len_word:
        #         continue
                
        #     # if dictionary word is a possible match then add it to the current dictionary
        #     if re.match(clean_word,dict_word):
        #         new_dictionary.append(dict_word)
        
        # # overwrite old possible words dictionary with updated version
        # self.current_dictionary = new_dictionary
        
        
        # # count occurrence of all characters in possible word matches
        # full_dict_string = "".join(new_dictionary)
        
        # c = collections.Counter(full_dict_string)
        # sorted_letter_count = c.most_common()                   
        
        # guess_letter = '!'
        
        # # return most frequently occurring letter in all possible words that hasn't been guessed yet
        # for letter,instance_count in sorted_letter_count:
        #     if letter not in self.guessed_letters:
        #         guess_letter = letter
        #         break
            
        # # if no word matches in training dictionary, default back to ordering of full dictionary
        # if guess_letter == '!':
        #     sorted_letter_count = self.full_dictionary_common_letter_sorted
        #     for letter,instance_count in sorted_letter_count:
        #         if letter not in self.guessed_letters:
        #             guess_letter = letter
        #             break            
        
        # return guess_letter

    ##########################################################
    # You'll likely not need to modify any of the code below #
    ##########################################################
    
    def build_dictionary(self, dictionary_file_location):
        text_file = open(dictionary_file_location,"r")
        full_dictionary = text_file.read().splitlines()
        text_file.close()
        return full_dictionary
                
    def start_game(self, practice=True, verbose=True):
        # reset guessed letters to empty set and current plausible dictionary to the full dictionary
        self.guessed_letters = []
        self.current_dictionary = self.full_dictionary
                         
        response = self.request("/new_game", {"practice":practice})
        if response.get('status')=="approved":
            game_id = response.get('game_id')
            word = response.get('word')
            tries_remains = response.get('tries_remains')
            if verbose:
                print("Successfully start a new game! Game ID: {0}. # of tries remaining: {1}. Word: {2}.".format(game_id, tries_remains, word))
            while tries_remains>0:
                # get guessed letter from user code
                guess_letter = self.guess(word)
                    
                # append guessed letter to guessed letters field in hangman object
                self.guessed_letters.append(guess_letter)
                if verbose:
                    print("Guessing letter: {0}".format(guess_letter))
                    
                try:    
                    res = self.request("/guess_letter", {"request":"guess_letter", "game_id":game_id, "letter":guess_letter})
                except HangmanAPIError:
                    print('HangmanAPIError exception caught on request.')
                    continue
                except Exception as e:
                    print('Other exception caught on request.')
                    raise e
               
                if verbose:
                    print("Sever response: {0}".format(res))
                status = res.get('status')
                tries_remains = res.get('tries_remains')
                if status=="success":
                    if verbose:
                        print("Successfully finished game: {0}".format(game_id))
                    return True
                elif status=="failed":
                    reason = res.get('reason', '# of tries exceeded!')
                    if verbose:
                        print("Failed game: {0}. Because of: {1}".format(game_id, reason))
                    return False
                elif status=="ongoing":
                    word = res.get('word')
        else:
            if verbose:
                print("Failed to start a new game")
        return status=="success"
        
    def my_status(self):
        return self.request("/my_status", {})
    
    def request(
            self, path, args=None, post_args=None, method=None):
        if args is None:
            args = dict()
        if post_args is not None:
            method = "POST"

        # Add `access_token` to post_args or args if it has not already been
        # included.
        if self.access_token:
            # If post_args exists, we assume that args either does not exists
            # or it does not need `access_token`.
            if post_args and "access_token" not in post_args:
                post_args["access_token"] = self.access_token
            elif "access_token" not in args:
                args["access_token"] = self.access_token

        time.sleep(0.2)

        num_retry, time_sleep = 50, 2
        for it in range(num_retry):
            try:
                response = self.session.request(
                    method or "GET",
                    self.hangman_url + path,
                    timeout=self.timeout,
                    params=args,
                    data=post_args,
                    verify=False
                )
                break
            except requests.HTTPError as e:
                response = json.loads(e.read())
                raise HangmanAPIError(response)
            except requests.exceptions.SSLError as e:
                if it + 1 == num_retry:
                    raise
                time.sleep(time_sleep)

        headers = response.headers
        if 'json' in headers['content-type']:
            result = response.json()
        elif "access_token" in parse_qs(response.text):
            query_str = parse_qs(response.text)
            if "access_token" in query_str:
                result = {"access_token": query_str["access_token"][0]}
                if "expires" in query_str:
                    result["expires"] = query_str["expires"][0]
            else:
                raise HangmanAPIError(response.json())
        else:
            raise HangmanAPIError('Maintype was not text, or querystring')

        if result and isinstance(result, dict) and result.get("error"):
            raise HangmanAPIError(result)
        return result
    
class HangmanAPIError(Exception):
    def __init__(self, result):
        self.result = result
        self.code = None
        try:
            self.type = result["error_code"]
        except (KeyError, TypeError):
            self.type = ""

        try:
            self.message = result["error_description"]
        except (KeyError, TypeError):
            try:
                self.message = result["error"]["message"]
                self.code = result["error"].get("code")
                if not self.type:
                    self.type = result["error"].get("type", "")
            except (KeyError, TypeError):
                try:
                    self.message = result["error_msg"]
                except (KeyError, TypeError):
                    self.message = result

        Exception.__init__(self, self.message)

# API Usage Examples

## To start a new game:
1. Make sure you have implemented your own "guess" method.
2. Use the access_token that we sent you to create your HangmanAPI object. 
3. Start a game by calling "start_game" method.
4. If you wish to test your function without being recorded, set "practice" parameter to 1.
5. Note: You have a rate limit of 20 new games per minute. DO NOT start more than 20 new games within one minute.

In [3]:
api = HangmanAPI(access_token="8912423d2cd0184ccd5e951957165f", timeout=2000)


## Playing practice games:
You can use the command below to play up to 100,000 practice games.

In [4]:
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes_before] = api.my_status()
print(total_practice_runs, total_practice_successes_before)

2291 728


In [5]:
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes_before] = api.my_status()
no_of_runs = 100
for i in range(no_of_runs):
    api.start_game(practice=1,verbose=False)
    print(f'{(i + 1)/no_of_runs*100} % done...')
    time.sleep(0.5)
    # [total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes] = api.my_status() # Get my game stats: (# of tries, # of wins)
    # practice_success_rate = total_practice_successes / total_practice_runs
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes_after] = api.my_status()

print('run %d practice games out of an allotted 100,000.' % (total_practice_runs))
print("This session's accuracy:", (total_practice_successes_after - total_practice_successes_before)/no_of_runs)

  output = torch._nested_tensor_from_mask(


1.0 % done...
2.0 % done...
3.0 % done...
4.0 % done...
5.0 % done...
6.0 % done...
7.000000000000001 % done...
8.0 % done...
9.0 % done...
10.0 % done...
11.0 % done...
12.0 % done...
13.0 % done...
14.000000000000002 % done...
15.0 % done...
16.0 % done...
17.0 % done...
18.0 % done...
19.0 % done...
20.0 % done...
21.0 % done...
22.0 % done...
23.0 % done...
24.0 % done...
25.0 % done...
26.0 % done...
27.0 % done...
28.000000000000004 % done...
28.999999999999996 % done...
30.0 % done...
31.0 % done...
32.0 % done...
33.0 % done...
34.0 % done...
35.0 % done...
36.0 % done...
37.0 % done...
38.0 % done...
39.0 % done...
40.0 % done...
41.0 % done...
42.0 % done...
43.0 % done...
44.0 % done...
45.0 % done...
46.0 % done...
47.0 % done...
48.0 % done...
49.0 % done...
50.0 % done...
51.0 % done...
52.0 % done...
53.0 % done...
54.0 % done...
55.00000000000001 % done...
56.00000000000001 % done...
56.99999999999999 % done...
57.99999999999999 % done...
59.0 % done...
60.0 % done...
6

## ACHIEVED AN ACCURACY OF 0.60 + 

In [5]:
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes] = api.my_status()
print(total_practice_runs, total_recorded_runs, total_recorded_successes, total_practice_successes)

368 0 0 54


## Playing recorded games:
Please finalize your code prior to running the cell below. Once this code executes once successfully your submission will be finalized. Our system will not allow you to rerun any additional games.

Please note that it is expected that after you successfully run this block of code that subsequent runs will result in the error message "Your account has been deactivated".

Once you've run this section of the code your submission is complete. Please send us your source code via email.

In [6]:
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes] = api.my_status()
print(total_practice_runs, total_recorded_runs, total_recorded_successes, total_practice_successes)

2391 661 345 790


In [11]:
for i in range(1000):
    print('Playing ', i, ' th game')
    # Uncomment the following line to execute your final runs. Do not do this until you are satisfied with your submission
    api.start_game(practice=0,verbose=False)
    
    # DO NOT REMOVE as otherwise the server may lock you out for too high frequency of requests
    time.sleep(0.5)

Playing  0  th game
Playing  1  th game
Playing  2  th game
Playing  3  th game
Playing  4  th game
Playing  5  th game
Playing  6  th game
Playing  7  th game
Playing  8  th game
Playing  9  th game
Playing  10  th game
Playing  11  th game
Playing  12  th game
Playing  13  th game
Playing  14  th game
Playing  15  th game
Playing  16  th game
Playing  17  th game
Playing  18  th game
Playing  19  th game
Playing  20  th game
Playing  21  th game
Playing  22  th game
Playing  23  th game
Playing  24  th game
Playing  25  th game
Playing  26  th game
Playing  27  th game
Playing  28  th game
Playing  29  th game
Playing  30  th game
Playing  31  th game
Playing  32  th game
Playing  33  th game
Playing  34  th game
Playing  35  th game
Playing  36  th game
Playing  37  th game
Playing  38  th game
Playing  39  th game
Playing  40  th game
Playing  41  th game
Playing  42  th game
Playing  43  th game
Playing  44  th game
Playing  45  th game
Playing  46  th game
Playing  47  th game
Pl

HangmanAPIError: {'error': 'You have reached 1000 of games', 'status': 'denied'}

### Got cut after 661 runs because of network issue

## To check your game statistics
1. Simply use "my_status" method.
2. Returns your total number of games, and number of wins.

In [12]:
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes] = api.my_status() # Get my game stats: (# of tries, # of wins)
success_rate = total_recorded_successes/total_recorded_runs
print('overall success rate = %.3f' % success_rate)

overall success rate = 0.532


In [16]:
print("Accuracy of new model : ",  (total_recorded_successes - 345 )/ 338)

Accuracy of new model :  0.5532544378698225


## COULD'NT COMPLETE 1000 RUNS BECAUSE I EXHAUSTED SOME OF THEM WITH AN OLD MODEL... STILL THE NEW MODEL HAS 0.55+ IN THE 338 RUNS

# create data.py

```python
def permute(word, results, removed):
    for i in word:
        if i != '_' and i not in removed:
            removed.append(i)
            stripped_word = word.copy()
            for j in range(len(stripped_word)):
                if stripped_word[j] == i:
                    stripped_word[j] = '_'
            results.append(stripped_word)
            permute(stripped_word, results, removed.copy())
    return results


filename = "words_250000_train.txt"

with open(filename, 'r') as f:
    words = f.read().split()

print(f"{len(words)} words")

output_file = "small_strip_250000.txt"
batch_size = 1000

with open(output_file, 'w') as f:
    for batch in range(len(words) // batch_size):
        for word in words[batch_size * batch : (batch + 1) * batch_size]:
            word_as_list = list(word)
            strip_list = permute(word_as_list, [], [])
            for strip_word_as_list in strip_list:
                strip_word = ''.join(strip_word_as_list)
                f.write(f"{strip_word} {word}\n")
        if (batch % 100 == 0) :
            print(f"{batch + 1} batches complete of out {len(words) // batch_size}")

```
# load_data.py


```python
import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np
from torch.nn.utils.rnn import pad_sequence

class WordCompletionDataset(Dataset):
    def __init__(self, filepath):
        self.filepath = filepath
        self.line_offsets = []

        # Precompute line start offsets for fast random access
        with open(filepath, 'rb') as f:
            offset = 0
            for line in f:
                self.line_offsets.append(offset)
                offset += len(line)

    def input_encode(self, word):
        matrix = np.zeros((27, len(word)))
        for i, ch in enumerate(word):
            if ch != '_':
                matrix[ord(ch) - ord('a')][i] = 1
            else:
                matrix[26][i] = 1
        return matrix

    def output_encode(self, word):
        matrix = np.zeros((27, len(word)))
        for i, ch in enumerate(word):
            matrix[ord(ch) - ord('a')][i] = 1
        return matrix

    def input_decode(self, matrix):
        word = []
        matrix = np.array(matrix)
        for i in range(matrix.shape[1]):
            idx = np.argmax(matrix[:, i])
            word.append('_' if idx == 26 else chr(idx + ord('a')))
        return ''.join(word)

    def output_decode(self, matrix):
        word = []
        matrix = np.array(matrix)
        for i in range(matrix.shape[1]):
            idx = np.argmax(matrix[:, i])
            word.append(chr(idx + ord('a')))
        return ''.join(word)

    def __len__(self):
        return len(self.line_offsets)

    def __getitem__(self, idx):
        offset = self.line_offsets[idx]
        with open(self.filepath, 'r') as f:
            f.seek(offset)
            line = f.readline().strip()
            input_word, output_word = line.split()

        input_ids = self.input_encode(input_word)
        output_ids = self.output_encode(output_word)
        return torch.tensor(input_ids, dtype=torch.float32), torch.tensor(output_ids, dtype=torch.float32)

def collate_fn(batch):
    inputs, outputs = zip(*batch)
    inputs = [i.permute(1, 0) for i in inputs]
    outputs = [o.permute(1, 0) for o in outputs]

    padded_inputs = pad_sequence(inputs, batch_first=True)  # (batch, seq_len, 27)
    padded_outputs = pad_sequence(outputs, batch_first=True)  # (batch, seq_len, 27)

    key_padding_mask = (padded_inputs.sum(-1) != 0)  # (batch, seq_len)
    return padded_inputs, padded_outputs, key_padding_mask

def return_dataloader():
    dataset = WordCompletionDataset("small_strip_250000.txt")
    dataloader = DataLoader(dataset, batch_size=512, shuffle=True, collate_fn=collate_fn)
    print("Dataset Loaded Successfully")
    return dataset, dataloader


if __name__ == "__main__": 
    dataset = WordCompletionDataset("small_strip_250000.txt")
    dataloader = DataLoader(dataset, batch_size=256, shuffle=True, collate_fn=collate_fn)

    for inputs, outputs, _ in dataloader:
        print(f"Number of batches: {len(dataloader)}")
        print(f"Inputs batch shape: {inputs.shape}")   # (batch_size, seq_len, 27)
        print(f"Outputs batch shape: {outputs.shape}") # (batch_size, seq_len, 27)

        # Visualize one word pair
        idx = 0  # first sample in batch
        input_matrix = inputs[idx].permute(1, 0).numpy()   # (27, seq_len)
        output_matrix = outputs[idx].permute(1, 0).numpy() # (27, seq_len)

        input_word = dataset.input_decode(input_matrix)
        output_word = dataset.output_decode(output_matrix)

        print(f"\nSample {idx + 1}")
        print(f"Masked Input Word:  {input_word}")
        print(f"Ground Truth Word:  {output_word}")
        
        break
```

# transformer_model.py


```python
import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from load_data import return_dataloader
from load_data import WordCompletionDataset
import math

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=100):
        super(PositionalEncoding, self).__init__()
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len).unsqueeze(1).float()
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))

        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)

        pe = pe.unsqueeze(0)  # (1, max_len, d_model)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + self.pe[:, :x.size(1)]
        return x


class TransformerModel(nn.Module):
    def __init__(self, input_size=27, d_model=256, nhead=8, num_layers=4, dim_feedforward=512, max_len=100):
        super(TransformerModel, self).__init__()

        self.embedding = nn.Linear(input_size, d_model)
        self.pos_encoder = PositionalEncoding(d_model, max_len=max_len)
        
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=nhead,
            dim_feedforward=dim_feedforward,
            dropout=0.1,
            batch_first=True
        )
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
        self.output_layer = nn.Linear(d_model, input_size)

    def forward(self, x, src_key_padding_mask=None):
        x = self.embedding(x)
        x = self.pos_encoder(x)
        
        
        x = self.transformer_encoder(x, src_key_padding_mask=src_key_padding_mask)
        x = self.output_layer(x)
        x = F.log_softmax(x, dim = -1)
        
        return x  # (batch, seq_len, 27)


def train(model, dataloader, optimizer, num_epochs, device):
    model.to(device)

    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        total_batches = len(dataloader)
        batch = 0

        for inputs, outputs, key_padding_mask in dataloader:
            inputs, outputs = inputs.to(device), outputs.to(device)

            # Compute mask for padding positions
            src_key_padding_mask = (inputs == 0).all(dim=-1).bool()  # (batch, seq_len)

            optimizer.zero_grad()
            predictions = model(inputs, src_key_padding_mask=src_key_padding_mask)


            # # Focus loss only on masked positions
            # mask_token_index = 26
            # masked_positions = (inputs[:, :, mask_token_index] == 1).unsqueeze(-1)  # (batch, seq_len, 1)

            # loss = -torch.sum(masked_positions * outputs * predictions)
            # loss = loss / masked_positions.sum().clamp(min=1)
            
            loss = soft_cross_entropy_with_mask(predictions, outputs, pad_mask=src_key_padding_mask)
            
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
            batch += 1
            if (batch % 100 == 0) :
                print(f'Epoch: {epoch+1} - {batch} batch done of total {total_batches} batches...({batch/total_batches * 100 :.2f}%)')
            if (batch % 10000 == 0):
                torch.save(model.state_dict(), "trained_model.pth")
                print("Model saved")

        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(dataloader):.4f}")

    torch.save(model.state_dict(), "trained_model.pth")
    
def soft_cross_entropy_with_mask(predictions, targets, pad_mask):
    """
    predictions: (B, T, C) - log probabilities (log_softmax output)
    targets:     (B, T, C) - target probability distributions
    pad_mask:    (B, T)    - bool, True where padding
    """
    non_pad_mask = ~pad_mask  # True where not padding
    loss_per_pos = -torch.sum(targets * predictions, dim=-1)  # (B, T)
    masked_loss = loss_per_pos * non_pad_mask.float()         # zero out pad positions
    return masked_loss.sum() / non_pad_mask.sum().clamp(min=1)

if __name__ == "__main__":
    dataset, dataloader = return_dataloader()
    

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = TransformerModel().to(device)
    print("Number of trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))
    optimizer = optim.Adam(model.parameters(), lr=1e-3)

    train(model, dataloader, optimizer, 1, device)


```

# predict.py


```python
from transformer_model import TransformerModel
import torch
from load_data import WordCompletionDataset

import torch
import torch.nn.functional as F
from transformer_model import TransformerModel
from load_data import WordCompletionDataset
import numpy as np

import torch
import numpy as np
from transformer_model import TransformerModel
from load_data import WordCompletionDataset

def make_predictions(word, max_len=64):
    dataset = WordCompletionDataset("small_strip_25000.txt")
    encoded = dataset.input_encode(word)  # (27, seq_len)
    seq_len = encoded.shape[1]

    # Pad to max_len
    if seq_len < max_len:
        pad_width = ((0, 0), (0, max_len - seq_len))
        encoded = np.pad(encoded, pad_width, mode='constant', constant_values=0)

    input_tensor = torch.tensor(encoded, dtype=torch.float32).T.unsqueeze(0)  # (1, max_len, 27)

    # Create src_key_padding_mask: True where position is padding
    pad_mask = (input_tensor.sum(-1) == 0)  # (1, max_len)

    model = TransformerModel()
    model.load_state_dict(torch.load("trained_model.pth", map_location='cpu'))
    model.eval()

    with torch.no_grad():
        output = model(input_tensor, src_key_padding_mask=pad_mask)  # (1, max_len, 27)
        output = output.squeeze(0).T  # (27, max_len)

    # Extract masked positions (channel 26 indicates masked token)
    mask_indices = np.where(encoded[26] == 1)[0]
    if len(mask_indices) == 0:
        print("No masked characters found in input.")
        return []

    # Average logits across masked positions
    masked_preds = output[:, mask_indices].mean(dim=1)  # (27,)
    masked_preds[26] = -float('inf')  # prevent predicting [MASK]

    sorted_indices = torch.argsort(masked_preds, descending=True).tolist()
    guesses = [chr(i + ord('a')) for i in sorted_indices]

    return guesses



if __name__ == '__main__':
    print(make_predictions('hi_an'))
    
```