<a href="https://colab.research.google.com/github/aghalandar/Cramer/blob/main/Hangman_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Trexquant Interview Project (The Hangman Game)

*   Copyright Trexquant Investment LP. All Rights Reserved.
*   Redistribution of this question without written consent from Trexquant is prohibited

# Instruction:
For this coding test, your mission is to write an algorithm that plays the game of Hangman through our API server.

When a user plays Hangman, the server first selects a secret word at random from a list. The server then returns a row of underscores (space separated)—one for each letter in the secret word—and asks the user to guess a letter. If the user guesses a letter that is in the word, the word is redisplayed with all instances of that letter shown in the correct positions, along with any letters correctly guessed on previous turns. If the letter does not appear in the word, the user is charged with an incorrect guess. The user keeps guessing letters until either (1) the user has correctly guessed all the letters in the word or (2) the user has made six incorrect guesses.

You are required to write a "guess" function that takes current word (with underscores) as input and returns a guess letter. You will use the API codes below to play 1,000 Hangman games. You have the opportunity to practice before you want to start recording your game results.

Your algorithm is permitted to use a training set of approximately 250,000 dictionary words. Your algorithm will be tested on an entirely disjoint set of 250,000 dictionary words. Please note that this means the words that you will ultimately be tested on do NOT appear in the dictionary that you are given. You are not permitted to use any dictionary other than the training dictionary we provided. This requirement will be strictly enforced by code review.

You are provided with a basic, working algorithm. This algorithm will match the provided masked string (e.g. a _ _ l e) to all possible words in the dictionary, tabulate the frequency of letters appearing in these possible words, and then guess the letter with the highest frequency of appearence that has not already been guessed. If there are no remaining words that match then it will default back to the character frequency distribution of the entire dictionary.

This benchmark strategy is successful approximately 18% of the time. Your task is to design an algorithm that significantly outperforms this benchmark.

In [1]:
import pandas as pd
import numpy as np
import os
import random
import string
import pickle
import torch
from torch.utils.data import Dataset, DataLoader
from torch import nn
from itertools import combinations
import json
import requests
import random
import string
import secrets
import time
import re
import collections
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import make_pipeline

try:
    from urllib.parse import parse_qs, urlencode, urlparse
except ImportError:
    from urlparse import parse_qs, urlparse
    from urllib import urlencode

from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

In [12]:
def read_data():
    # Function to read data from a file
    with open("/content/words_250000_train.txt", "r") as f:
        df = f.read()
    return df

def generate_ngrams(word, n):
    """
    Function to generate all possible n-grams from a given word.

    Parameters:
    - word (str): The input word.
    - n (int): The size of the n-grams.

    Returns:
    - List of n-grams.
    """
    ngrams = [word[i:i+n] for i in range(len(word) - n + 1)]
    return ngrams

def create_ngram_dictionary(df, n):
    """
    Function to create a dictionary of n-gram combinations for each word in the dataset.

    Parameters:
    - df (DataFrame): The input DataFrame containing words.
    - n (int): The size of the n-grams.

    Returns:
    - Dictionary where each word is mapped to a list of encoded n-gram combinations.
    """
    ngram_dictionary = {}
    counter = 0

    for word in df[0]:
        all_ngrams_for_word = []

        # Generate n-grams for the word
        ngrams = generate_ngrams(word, n)

        # Encode each n-gram
        encoded_ngrams = [ngram for ngram in ngrams]

        # Append the encoded n-grams to the list
        all_ngrams_for_word += encoded_ngrams

        # Map the original word to the list of encoded n-gram combinations in the dictionary
        ngram_dictionary[word] = all_ngrams_for_word

    return ngram_dictionary

def generate_filled_combinations_for_list(word_list):
    """
    Generates filled combinations for words in a given list by replacing letters with underscores.

    Parameters:
    - word_list (list): List of words for which filled combinations are generated.

    Returns:
    - all_filled_combinations (dict): Dictionary where keys are words, and values are lists of filled combinations.
    """
    # Initialize an empty dictionary to store filled combinations for each word
    all_filled_combinations = {}

    # Iterate through each word in the given list
    for word in word_list:
        # Iterate over the number of underscores starting from 1
        for num_underscores in range(1, len(word)):
            # Generate all possible pairs of positions to replace letters with underscores
            positions = list(range(len(word)))
            pairs = list(combinations(positions, num_underscores))

            # Generate filled combinations for each pair of positions and add underscores in between
            filled_combinations = [
                "".join(word[idx] if idx in pair else '_' for idx in range(len(word))) for pair in pairs
            ]

            # Add the filled combinations to the overall list for the current word
            all_filled_combinations.setdefault(word, []).extend(filled_combinations)

    return all_filled_combinations



def create_char_mapping():
    """
    Creates a character-to-index mapping and an index-to-character mapping for a predefined set of characters.

    Returns:
    - char_to_index (dict): Dictionary mapping characters to their corresponding indices.
    - index_to_char (dict): Dictionary mapping indices to their corresponding characters.
    """
    # Define a set of characters including lowercase letters, an underscore, and an asterisk
    chars = "abcdefghijklmnopqrstuvwxyz_*"

    # Create a character-to-index mapping
    char_to_index = {char: i for i, char in enumerate(chars)}

    # Create an index-to-character mapping
    index_to_char = {i: char for i, char in enumerate(chars)}

    return char_to_index, index_to_char

def encode_input(word):
    """
    Encodes the input word into a numerical vector of fixed length (6).

    Parameters:
    - word (str): The input word to be encoded.

    Returns:
    - word_vector (list): Numerical vector representing the encoded input word.
    """
    # Create a character-to-index mapping and an underscore placeholder
    char_to_index, _ = create_char_mapping()

    # Fixed length of the input vector
    embedding_len = 6

    # Initialize an input vector with zeros
    word_vector = [0] * embedding_len

    # Iterate through each letter in the input word and set the corresponding position in the vector
    for letter_no in range(embedding_len):
        if letter_no < len(word):
            word_vector[letter_no] = char_to_index[word[letter_no]]
        else:
            # If the word is shorter than the fixed length, pad with an underscore placeholder
            word_vector[letter_no] = char_to_index['*']

    return word_vector


def encode_output(word):
    """
    Encodes the output word into a numerical vector using a character mapping.

    Parameters:
    - word (str): The output word to be encoded.

    Returns:
    - output_vector (list): Numerical vector representing the encoded output word.
    """
    # Create a character mapping and an underscore placeholder
    char_mapping, _ = create_char_mapping()

    # Initialize an output vector with zeros for each letter of the alphabet
    output_vector = [0] * 26

    # Iterate through each letter in the word and set the corresponding position in the vector to 1
    for letter in word:
        output_vector[char_mapping[letter]] = 1

    return output_vector

def encode_dictionary(masked_dictionary):
    """
    Encodes words into numerical vectors for machine learning.

    Parameters:
    - masked_dictionary (dict): A dictionary where keys are output words and values are lists of input words.

    Returns:
    - input_data (list): List containing encoded numerical vectors representing input words.
    - target_data (list): List containing encoded numerical vectors representing corresponding output words.
    """
    # Initialize empty lists to store encoded input and output vectors
    target_data = []
    input_data = []
    counter = 0

    # Iterate through the masked dictionary
    for output_word, input_words in masked_dictionary.items():
        # Encode the output word
        output_vector = encode_output(output_word)

        # Iterate through the input words and encode them
        for input_word in input_words:
            target_data.append(output_vector)
            input_data.append(encode_input(input_word))

    return input_data, target_data

def convert_to_tensor(input_data, target_data):
    """
    Converts input and target data to PyTorch tensors.

    Parameters:
    - input_data (list): List containing input data in the form of encoded sequences.
    - target_data (list): List containing target data in the form of encoded sequences.

    Returns:
    - input_tensor (torch.Tensor): PyTorch tensor representing the input data.
    - target_tensor (torch.Tensor): PyTorch tensor representing the target data.
    """
    # Convert input_data and target_data to PyTorch tensors with data type torch.long
    input_tensor = torch.tensor(input_data, dtype=torch.long)
    target_tensor = torch.tensor(target_data, dtype=torch.float32)

    return input_tensor, target_tensor

def save_input_output_data(input_data, target_data):
    """
    Saves input and target data to text files.

    Parameters:
    - input_data (list): List containing input data.
    - target_data (list): List containing target data.
    """
    # Save input data to 'input_features.txt'
    with open(r'input_features.txt', 'w') as fp:
        for item in input_data:
            # Write each item on a new line
            fp.write("%s\n" % item)
        print('Input data saved successfully.')

    # Save target data to 'target_features.txt'
    with open(r'target_features.txt', 'w') as fp:
        for item in target_data:
            # Write each item on a new line
            fp.write("%s\n" % item)
        print('Target data saved successfully.')


def get_datasets():
    """
    Processes and prepares datasets for machine learning training.

    Reads data, generates n-grams from 2 to 6, creates n-gram dictionaries, flattens and concatenates n-grams,
    encodes n-grams into numerical vectors, and converts data to PyTorch tensors.

    Returns:
    - input_tensor (torch.Tensor): PyTorch tensor representing the input data.
    - target_tensor (torch.Tensor): PyTorch tensor representing the target data.
    """
    # Reading the data
    df = read_data()
    x = pd.DataFrame(df.split('\n'))

    # Initialize empty lists to store input and target data
    input_data = []
    target_data = []

    # Loop through n-grams from 2 to 6
    for ngram in range(2, 7):
        # Create n-gram dictionary
        result_ngrams = create_ngram_dictionary(x, ngram)

        # Flatten and concatenate n-grams
        result_ngrams_list = list(set(perm for perms_list in result_ngrams.values() for perm in perms_list))
        all_permutations = generate_filled_combinations_for_list(result_ngrams_list)

        # Encode n-grams
        current_input_data, current_target_data = encode_dictionary(all_permutations)

        # Append to the overall lists
        input_data.extend(current_input_data)
        target_data.extend(current_target_data)
        print(f'{ngram}-gram is Done!')

    save_input_output_data(input_data, target_data)
    # Convert to tensors
    input_tensor, target_tensor = convert_to_tensor(input_data, target_data)
    print(input_tensor.size(), target_tensor.size())

    return input_tensor, target_tensor



In [None]:
# prepare the data for LSTM
input_tensor, target_tensor = get_datasets()

2-gram is Done!
3-gram is Done!
4-gram is Done!
5-gram is Done!
6-gram is Done!
Input data saved successfully.
Target data saved successfully.
torch.Size([29977280, 6]) torch.Size([29977280, 26])


In [13]:
# train the Model
def train_loop(data_loader, model, loss_fn, optimizer, loss_estimate, batch_no, epoch, epoch_no):
    """
    Training loop for the machine learning model.

    Parameters:
    - data_loader (torch.utils.data.DataLoader): DataLoader containing training data.
    - model (torch.nn.Module): The machine learning model to be trained.
    - loss_fn: The loss function used for training.
    - optimizer: The optimization algorithm for updating model parameters.
    - loss_estimate (list): List to store loss values for visualization.
    - batch_no (list): List to store batch numbers for visualization.
    - epoch (int): The current epoch number.
    - epoch_no (list): List to store epoch numbers for visualization.
    """
    size = len(data_loader.dataset)
    model.train()

    # Iterate through batches in the data loader
    for batch, (X, y) in enumerate(data_loader):
        # Forward pass
        pred = model(X)

        # Compute the loss
        loss = loss_fn(pred, y)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        # Logging and visualization
        if batch % 1000 == 0:
            loss_value, current_batch = loss.item(), (batch + 1) * len(X)

            # Append values for visualization
            loss_estimate.append(loss_value)
            batch_no.append(current_batch)
            epoch_no.append(epoch)

            # Print progress
            print(f"loss: {loss_value:>7f}  [{current_batch:>5d}/{size:>5d}]")


def test_loop(data_loader, model, loss_fn):
    """
    Testing loop for evaluating the performance of a machine learning model on a test dataset.

    Parameters:
    - data_loader (torch.utils.data.DataLoader): DataLoader containing test data.
    - model (torch.nn.Module): The trained machine learning model.
    - loss_fn: The loss function used for evaluation.
    """
    size = len(data_loader.dataset)
    model.eval()
    num_batches = len(data_loader)
    test_loss, correct = 0, 0

    # Disable gradient computation during evaluation
    with torch.no_grad():
        # Iterate through batches in the data loader
        for (X, y) in data_loader:
            # Forward pass
            pred = model(X)

            # Compute test loss
            test_loss += loss_fn(pred, y).item()

            # Calculate the number of correct predictions
            correct += (pred.argmax(dim = 1) == y.argmax(dim=1)).type(torch.float).sum().item()

    # Calculate average test loss and accuracy
    test_loss /= num_batches
    accuracy = correct / size

    # Print test results
    print(f"Test Error: \n Accuracy: {(100 * accuracy):>0.1f}%, Avg loss: {test_loss:>8f} \n")


class CustomDatasetTrain(Dataset):
    """
    Custom PyTorch dataset for training data.

    Parameters:
    - X_train: Features of the training dataset.
    - y_train: Labels of the training dataset.
    """
    def __init__(self, X_train, y_train):
        self.features = X_train
        self.label = y_train

    def __len__(self):
        """
        Returns the total number of samples in the dataset.
        """
        return len(self.label)

    def __getitem__(self, idx):
        """
        Returns a sample from the dataset given an index.

        Parameters:
        - idx (int): Index of the sample.

        Returns:
        - features (tensor): Features of the sample.
        - label (tensor): Label of the sample.
        """
        features = self.features[idx]
        label = self.label[idx]
        sample = {"features": features, "label": label}
        return features, label


class extract_tensor(nn.Module):
    def forward(self,x):
        # Output shape (batch, features, hidden)
        tensor, _ = x
        # Reshape shape (batch, hidden)
        return tensor[:, -1, :]


class NeuralNetwork(nn.Module):
    """
    Definition of a neural network model with an LSTM stack for a specific task.

    Architecture:
    - Embedding layer with input dimension 64, output dimension 32, max_norm regularization, and L2 normalization.
    - Bidirectional LSTM layer with input size 32, hidden size 64, 1 layer, batch-first, and 20% dropout.
    - Custom function extract_tensor() (please provide the implementation).
    - Linear layer with input size 128 and output size 26.

    Parameters:
    - None

    Input:
    - x (torch.Tensor): Input tensor to be processed by the neural network.

    Output:
    - logits (torch.Tensor): Output logits produced by the neural network.
    """
    def __init__(self):
        super().__init__()
        self.LSTM_stack = nn.Sequential(
            nn.Embedding(36, 6, max_norm=1, norm_type=2),
            nn.LSTM(input_size=6, hidden_size=36, num_layers=1, batch_first=True, dropout=0.2, bidirectional=True),
            extract_tensor(),  # Please provide the implementation of extract_tensor()
            nn.Linear(72, 26)
        )

    def forward(self, x):
        logits = self.LSTM_stack(x)
        return logits


def create_dataloader(input_tensor, target_tensor):
    all_features_data = CustomDatasetTrain(input_tensor, target_tensor)
    all_features_dataloader = DataLoader(all_features_data, batch_size=128, shuffle=True)
    return all_features_dataloader

def save_model(model):
    torch.save(model.state_dict(), "lstm_ngram_2.pt")

def train_model(input_tensor, target_tensor):
    """
    Trains a neural network model using the specified input and target tensors.

    Parameters:
    - input_tensor (torch.Tensor): Input data tensor.
    - target_tensor (torch.Tensor): Target data tensor.
    """
    # Create a DataLoader for the training data
    all_features_dataloader = create_dataloader(input_tensor, target_tensor)

    # Initialize the neural network model
    model = NeuralNetwork()

    # Define the loss function and optimizer
    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

    # Lists for storing loss, batch, and epoch values for visualization
    loss_estimate = []
    batch_no = []
    epoch_no = []

    # Number of training epochs
    epochs = 5

    # Training loop
    for epoch in range(epochs):
        print(f"Epoch {epoch + 1}\n-------------------------------")

        # Train the model
        train_loop(all_features_dataloader, model, loss_fn, optimizer, loss_estimate, batch_no, epoch, epoch_no)

        # Evaluate on the test set
        test_loop(all_features_dataloader, model, loss_fn)

    print("Training complete!")

    # Save the trained model
    save_model(model)


In [None]:
# train the model
train_model(input_tensor, target_tensor)



Epoch 1
-------------------------------
loss: 16.674883  [  128/29977280]
loss: 16.267006  [128128/29977280]
loss: 15.784765  [256128/29977280]
loss: 15.513313  [384128/29977280]
loss: 15.323220  [512128/29977280]
loss: 15.203616  [640128/29977280]
loss: 15.313610  [768128/29977280]
loss: 15.335670  [896128/29977280]
loss: 15.396443  [1024128/29977280]
loss: 15.493000  [1152128/29977280]
loss: 15.203526  [1280128/29977280]
loss: 15.059254  [1408128/29977280]
loss: 14.934552  [1536128/29977280]
loss: 14.840385  [1664128/29977280]
loss: 15.151722  [1792128/29977280]
loss: 14.850445  [1920128/29977280]
loss: 15.199343  [2048128/29977280]
loss: 15.356495  [2176128/29977280]
loss: 15.289555  [2304128/29977280]
loss: 15.281339  [2432128/29977280]
loss: 15.127481  [2560128/29977280]
loss: 14.914280  [2688128/29977280]
loss: 15.089948  [2816128/29977280]
loss: 15.076271  [2944128/29977280]
loss: 14.877400  [3072128/29977280]
loss: 15.212141  [3200128/29977280]
loss: 14.767307  [3328128/2997728

In [86]:
def encode_input_for_prediction(masked_word):
    """
    Encode the masked word.

    Parameters:
    - masked_word (str): The word with masked characters.

    Returns:
    - torch.Tensor: Encoded input tensor for prediction.
    """
    # Create a character-to-index mapping and an underscore placeholder
    char_to_index, _ = create_char_mapping()

    # Encode the masked word using the char_to_index mapping
    input_data = [encode_input(masked_word)]
    input_tensor = torch.tensor(input_data, dtype=torch.long)
    return input_tensor


def extract_ngrams(word):
    """
    Make all possible n-grams from the word with at least one underscore.

    Parameters:
    - word (str): The input word.

    Returns:
    - list: List of unique n-grams.
    """
    ngrams = set()

    # Iterate over different n-gram lengths (2, 3, 4, 5, 6)
    for n in range(2, 7):
        # Extract n-grams from the word
        for i in range(len(word) - n + 1):
            ngram = word[i:i + n]

            # Check if the n-gram contains at least one alphabet and one underscore
            if any(char.isalpha() for char in ngram) and '_' in ngram:
                # If the n-gram is shorter than 6, pad it with asterisks
                ngram = ngram.ljust(6, '*')

                # Ensure the n-gram is of length 6
                ngram = ngram[:6]

                # Add the n-gram to the set
                ngrams.add(ngram)

    return list(ngrams)

def encode_ngram(ngram, char_to_index):
    """
    Encode a given n-gram using a character-to-index mapping.

    Parameters:
    - ngram (str): The input n-gram.
    - char_to_index (dict): Character-to-index mapping.

    Returns:
    - list: Encoded n-gram.
    """
    # Ensure the n-gram is of length 6
    ngram = ngram[:6]

    # Encode each character in the n-gram using char_to_index mapping
    encoded_ngram = [char_to_index[char] for char in ngram]

    return encoded_ngram

def get_sorted_letters(new_dictionary, guessed_letters):
    """
    Get sorted letters based on their frequency in the new dictionary.

    Parameters:
    - new_dictionary (list): List of possible words.
    - guessed_letters (list): List of guessed letters.

    Returns:
    - list: Sorted letters based on frequency, excluding guessed letters.
    """
    full_dict_string = "".join(new_dictionary)

    # Count the occurrences of each letter
    c = collections.Counter(full_dict_string)

    sorted_letter_count = c.most_common()

    # Filter out guessed letters
    remaining_sorted_letters = [item for item in sorted_letter_count if item[0] not in guessed_letters]

    return remaining_sorted_letters

def func(new_dictionary):
    """
    Count the occurrences of each letter in the new dictionary.

    Parameters:
    - new_dictionary (list): List of possible words.

    Returns:
    - collections.Counter: Count of occurrences for each letter.
    """
    dictx = collections.Counter()
    for words in new_dictionary:
        temp = collections.Counter(words)
        for i in temp:
            temp[i] = 1
            dictx = dictx + temp
    return dictx

In [91]:
class HangmanAPI(object):
    def __init__(self, access_token=None, session=None, timeout=None):
        self.hangman_url = self.determine_hangman_url()
        self.access_token = access_token
        self.session = session or requests.Session()
        self.timeout = timeout
        self.guessed_letters = []
        full_dictionary_location = "/content/words_250000_train.txt"
        self.full_dictionary = self.build_dictionary(full_dictionary_location)
        self.full_dictionary_common_letter_sorted = collections.Counter("".join(self.full_dictionary)).most_common()
        self.current_dictionary = []
        self.tries_remains  = 6
        self.model = NeuralNetwork()
        self.model.load_state_dict(torch.load("/content/lstm_ngram_2.pt"))
        self.ngram_used = set()
        self.LSTM_guess = 0

    @staticmethod
    def determine_hangman_url():
        links = ['https://trexsim.com', 'https://sg.trexsim.com']

        data = {link: 0 for link in links}

        for link in links:

            requests.get(link)

            for i in range(10):
                s = time.time()
                requests.get(link)
                data[link] = time.time() - s

        link = sorted(data.items(), key=lambda x: x[1])[0][0]
        link += '/trexsim/hangman'
        return link


    def predicted_letter_lstm(self, masked_word):
        """
        Predict the next letter using the LSTM model based on the masked word.

        Parameters:
        - masked_word (str): The word with masked characters.

        Returns:
        - list: Predicted letters sorted by probability.
        """
        # Create character mappings
        char_to_index, int_to_char = create_char_mapping()

        # Extract unique n-grams with at least one alphabet
        ngrams = extract_ngrams(masked_word)

        # Initialize an empty dictionary to store accumulated probabilities
        accumulated_probabilities = {}

        # Traverse over n-grams
        for ngram in ngrams:
            # Encode the n-gram
            input_tensor_for_prediction = encode_ngram(ngram, char_to_index)

            # Convert to tensor
            input_tensor = torch.tensor(input_tensor_for_prediction, dtype=torch.long)
            input_tensor = input_tensor.view(1, -1)

            # Ensure the model is in evaluation mode
            self.model.eval()

            # Make predictions
            with torch.no_grad():
                output = self.model(input_tensor)

            # Apply softmax to get probabilities
            probabilities = torch.softmax(output, dim=1).numpy()

            # Process the probabilities using int_to_char
            probabilities_list = [(int_to_char[i], prob) for i, prob in enumerate(probabilities[0])]

            alphabet_count = sum(1 for char in ngram if char.isalpha())

            # Give more weight to the new built n-gram (new information)
            if ngram not in self.ngram_used:
                self.ngram_used.add(ngram)
                weight = 2
            else:
                weight = 1

            # Accumulate probabilities for each alphabet
            for char, prob in probabilities_list:
                accumulated_probabilities[char] = accumulated_probabilities.get(char, 0) + prob * weight

        # Convert the accumulated probabilities to a list of tuples
        final_accumulated_list = list(accumulated_probabilities.items())

        sorted_probabilities_list = sorted(final_accumulated_list, key=lambda x: x[1], reverse=True)

        sorted_letters_list = [pred for pred, _ in sorted_probabilities_list]

        return sorted_letters_list


    def guess(self, word):
        # word input example: "_ p p _ e "

        # clean the word so that we strip away the space characters
        # replace "_" with "." as "." indicates any character in regular expressions
        clean_word = word[::2].replace("_",".")

        # find length of passed word
        len_word = len(clean_word)

        # remaing spaces
        remaining_spaces = clean_word.count('.')

        # grab current dictionary of possible words from self object, initialize new possible words dictionary to empty
        current_dictionary = self.current_dictionary
        new_dictionary = []

        # iterate through all of the words in the old plausible dictionary
        for dict_word in current_dictionary:
            # continue if the word is not of the appropriate length
            if len(dict_word) != len_word:
                continue

            # if dictionary word is a possible match then add it to the current dictionary
            if re.match(clean_word,dict_word):
                new_dictionary.append(dict_word)

        # overwrite old possible words dictionary with updated version
        self.current_dictionary = new_dictionary

        # start the guess letter
        guess_letter = '!'

        # if we have not yet guessed at least 2, start with most common letters,
        # in the dictionary of the words with the same length
        if (len_word - remaining_spaces) < 2:
            full_dict_string = "".join(new_dictionary)
            # return most frequently occurring letter in all possible words that hasn't been guessed yet
            c = collections.Counter(full_dict_string)
            sorted_letter_count = c.most_common()
            for letter,_ in sorted_letter_count:
                if letter not in self.guessed_letters:
                    guess_letter = letter
                    break

        remaining_sorted_letters = get_sorted_letters(new_dictionary, self.guessed_letters)

        # now we have at least two letters, use LSTM:
        if guess_letter == '!':
          # if the last three guest by LSTM is wrong, don't use it
          if self.LSTM_guess > 2 and all(letter not in clean_word for letter in self.guessed_letters[-3:]):
            self.LSTM_guess = 0
          # otherwise use LSTM
          else:
            predict_letters = self.predicted_letter_lstm(word[::2])
            for letter in predict_letters:
                # check if the prediction is alphabet and it is not already suggested
                if letter.isalpha() and letter not in self.guessed_letters:
                    guess_letter = letter
                    self.LSTM_guess += 1
                    break

        # if there was no match: based on words with the same pattern:
        if guess_letter == '!' or guess_letter == '_':
            # return most frequently occurring letter in all possible words that hasn't been guessed yet
            c = func(new_dictionary)
            sorted_letter_count = c.most_common()
            for letter,_ in sorted_letter_count:
                if letter not in self.guessed_letters:
                    guess_letter = letter
                    break

        # if no word matches in training dictionary, default back to ordering of full dictionary
        if guess_letter == '!' or guess_letter == '_':
            sorted_letter_count = self.full_dictionary_common_letter_sorted
            for letter,_ in sorted_letter_count:
                if letter not in self.guessed_letters:
                    guess_letter = letter
                    break

        return guess_letter

    ##########################################################
    # You'll likely not need to modify any of the code below #
    ##########################################################

    def build_dictionary(self, dictionary_file_location):
        text_file = open(dictionary_file_location,"r")
        full_dictionary = text_file.read().splitlines()
        text_file.close()
        return full_dictionary

    def start_game(self, practice=True, verbose=True):
        # reset guessed letters to empty set and current plausible dictionary to the full dictionary
        self.guessed_letters = []
        self.current_dictionary = self.full_dictionary

        response = self.request("/new_game", {"practice":practice})
        if response.get('status')=="approved":
            game_id = response.get('game_id')
            word = response.get('word')
            self.tries_remains = response.get('tries_remains')
            if verbose:
                print("Successfully start a new game! Game ID: {0}. # of tries remaining: {1}. Word: {2}.".format(game_id, self.tries_remains, word))
            while self.tries_remains > 0:
                # get guessed letter from user code
                guess_letter = self.guess(word)

                # append guessed letter to guessed letters field in hangman object
                self.guessed_letters.append(guess_letter)
                if verbose:
                    print("Guessing letter: {0}".format(guess_letter))

                try:
                    res = self.request("/guess_letter", {"request":"guess_letter", "game_id":game_id, "letter":guess_letter})
                except HangmanAPIError:
                    print('HangmanAPIError exception caught on request.')
                    continue
                except Exception as e:
                    print('Other exception caught on request.')
                    raise e

                if verbose:
                    print("Sever response: {0}".format(res))
                status = res.get('status')
                self.tries_remains = res.get('tries_remains')
                if status=="success":
                    if verbose:
                        print("Successfully finished game: {0}".format(game_id))
                    return True
                elif status=="failed":
                    reason = res.get('reason', '# of tries exceeded!')
                    if verbose:
                        print("Failed game: {0}. Because of: {1}".format(game_id, reason))
                    return False
                elif status=="ongoing":
                    word = res.get('word')
        else:
            if verbose:
                print("Failed to start a new game")
        return status=="success"

    def my_status(self):
        return self.request("/my_status", {})

    def request(
            self, path, args=None, post_args=None, method=None):
        if args is None:
            args = dict()
        if post_args is not None:
            method = "POST"

        # Add `access_token` to post_args or args if it has not already been
        # included.
        if self.access_token:
            # If post_args exists, we assume that args either does not exists
            # or it does not need `access_token`.
            if post_args and "access_token" not in post_args:
                post_args["access_token"] = self.access_token
            elif "access_token" not in args:
                args["access_token"] = self.access_token

        time.sleep(0.2)

        num_retry, time_sleep = 50, 2
        for it in range(num_retry):
            try:
                response = self.session.request(
                    method or "GET",
                    self.hangman_url + path,
                    timeout=self.timeout,
                    params=args,
                    data=post_args,
                    verify=False
                )
                break
            except requests.HTTPError as e:
                response = json.loads(e.read())
                raise HangmanAPIError(response)
            except requests.exceptions.SSLError as e:
                if it + 1 == num_retry:
                    raise
                time.sleep(time_sleep)

        headers = response.headers
        if 'json' in headers['content-type']:
            result = response.json()
        elif "access_token" in parse_qs(response.text):
            query_str = parse_qs(response.text)
            if "access_token" in query_str:
                result = {"access_token": query_str["access_token"][0]}
                if "expires" in query_str:
                    result["expires"] = query_str["expires"][0]
            else:
                raise HangmanAPIError(response.json())
        else:
            raise HangmanAPIError('Maintype was not text, or querystring')

        if result and isinstance(result, dict) and result.get("error"):
            raise HangmanAPIError(result)
        return result

class HangmanAPIError(Exception):
    def __init__(self, result):
        self.result = result
        self.code = None
        try:
            self.type = result["error_code"]
        except (KeyError, TypeError):
            self.type = ""

        try:
            self.message = result["error_description"]
        except (KeyError, TypeError):
            try:
                self.message = result["error"]["message"]
                self.code = result["error"].get("code")
                if not self.type:
                    self.type = result["error"].get("type", "")
            except (KeyError, TypeError):
                try:
                    self.message = result["error_msg"]
                except (KeyError, TypeError):
                    self.message = result

        Exception.__init__(self, self.message)

In [92]:
api = HangmanAPI(access_token="8639b13cb6b189aee1f97ac663b526", timeout=2000)



In [96]:
for _ in range (100):
    api.start_game(practice=1,verbose=False)

In [97]:
[total_practice_runs,total_recorded_runs,total_recorded_successes,total_practice_successes] = api.my_status() # Get my game stats: (# of tries, # of wins)
practice_success_rate = total_practice_successes / total_practice_runs
print('run %d practice games out of an allotted 100,000. practice success rate so far = %.3f' % (total_practice_runs, practice_success_rate))

run 1233 practice games out of an allotted 100,000. practice success rate so far = 0.174


# **Playing recorded games:**
Please finalize your code prior to running the cell below. Once this code executes once successfully your submission will be finalized. Our system will not allow you to rerun any additional games.

Please note that it is expected that after you successfully run this block of code that subsequent runs will result in the error message "Your account has been deactivated".

Once you've run this section of the code your submission is complete. Please send us your source code via email.

In [98]:
for i in range(1000):
    print('Playing ', i, ' th game')
    # Uncomment the following line to execute your final runs. Do not do this until you are satisfied with your submission
    api.start_game(practice=0, verbose=False)

    # DO NOT REMOVE as otherwise the server may lock you out for too high frequency of requests
    time.sleep(0.5)

Playing  0  th game
Playing  1  th game
Playing  2  th game
Playing  3  th game
Playing  4  th game
Playing  5  th game
Playing  6  th game
Playing  7  th game
Playing  8  th game
Playing  9  th game
Playing  10  th game
Playing  11  th game
Playing  12  th game
Playing  13  th game
Playing  14  th game
Playing  15  th game
Playing  16  th game
Playing  17  th game
Playing  18  th game
Playing  19  th game
Playing  20  th game
Playing  21  th game
Playing  22  th game
Playing  23  th game
Playing  24  th game
Playing  25  th game
Playing  26  th game
Playing  27  th game
Playing  28  th game
Playing  29  th game
Playing  30  th game
Playing  31  th game
Playing  32  th game
Playing  33  th game
Playing  34  th game
Playing  35  th game
Playing  36  th game
Playing  37  th game
Playing  38  th game
Playing  39  th game
Playing  40  th game
Playing  41  th game
Playing  42  th game
Playing  43  th game
Playing  44  th game
Playing  45  th game
Playing  46  th game
Playing  47  th game
Pl

HangmanAPIError: {'error': 'You have reached 1000 of games', 'status': 'denied'}