# Trexquant Interview Project (The Hangman Game)

* Copyright Trexquant Investment LP. All Rights Reserved. 
* Redistribution of this question without written consent from Trexquant is prohibited

## Instruction:
For this coding test, your mission is to write an algorithm that plays the game of Hangman through our API server. 

When a user plays Hangman, the server first selects a secret word at random from a list. The server then returns a row of underscores (space separated)—one for each letter in the secret word—and asks the user to guess a letter. If the user guesses a letter that is in the word, the word is redisplayed with all instances of that letter shown in the correct positions, along with any letters correctly guessed on previous turns. If the letter does not appear in the word, the user is charged with an incorrect guess. The user keeps guessing letters until either (1) the user has correctly guessed all the letters in the word
or (2) the user has made six incorrect guesses.

You are required to write a "guess" function that takes current word (with underscores) as input and returns a guess letter. You will use the API codes below to play 1,000 Hangman games. You have the opportunity to practice before you want to start recording your game results.

Your algorithm is permitted to use a training set of approximately 250,000 dictionary words. Your algorithm will be tested on an entirely disjoint set of 250,000 dictionary words. Please note that this means the words that you will ultimately be tested on do NOT appear in the dictionary that you are given. You are not permitted to use any dictionary other than the training dictionary we provided. This requirement will be strictly enforced by code review.

You are provided with a basic, working algorithm. This algorithm will match the provided masked string (e.g. a _ _ l e) to all possible words in the dictionary, tabulate the frequency of letters appearing in these possible words, and then guess the letter with the highest frequency of appearence that has not already been guessed. If there are no remaining words that match then it will default back to the character frequency distribution of the entire dictionary.

This benchmark strategy is successful approximately 18% of the time. Your task is to design an algorithm that significantly outperforms this benchmark.

In [11]:
# import tensorflow as tf
import numpy as np

In [12]:
import json
import requests
import random
import string
import secrets
import time
import re
import collections
try:
    from urllib.parse import parse_qs, urlencode, urlparse
except ImportError:
    from urlparse import parse_qs, urlparse
    from urllib import urlencode

## Training models

Defining a sequential neural network consisting of 3 dense layers with 16, 32 and 8 units ("relu" activation) and 1 dense layer with 26 units ("linear" activation)

In [13]:
def NNModel(input_dim=9):
    
    model = Sequential()
    model.add(Dense(16, input_dim=input_dim, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(26, activation='linear'))
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mse'])
    return model

Defining function that encodes the word in the training set. It takes input the word and the already guessed letters list and returns an array with length equal to (word length + 26). For the letters which have been guessed, corresponding indices (e.g. for alphabet 'd', it'll be 4th element) get value 1 and others 0. If any of the letters from the word has been guessed then the list 'encoding' gets value accordingly (mapping 'a' to 'z' with 1 to 26 and '_' with 27).   

In [14]:

def encodeWord(word, guesses):
    encoding = []
    for c in word:
        ci = ord(c)-ord('a')
        if(guesses[ci]==0):
            encoding += [27]
        else:
            encoding += [ord(c)-ord('a')+1]
    return np.concatenate([np.array(encoding), guesses])

Accessing the given training set file and making a list of words out of that.

In [6]:
words = []
with open('words_250000_train.txt','r') as f:
    for l in f.readlines():
        words += [l[:-1]]

Compiling a dictionary according to the length of the words.

In [7]:
words_by_length = {}
for w in words:
    if(len(w) in words_by_length.keys()):
        words_by_length[len(w)] += [w]
    else:
        words_by_length[len(w)] = [w]

This getValidGuesses function returns the probabilities of all the letters being a valid guess. If some particular letter which is in our word is yet to be guessed, then it's a valid guess and will have non zero probability.

In [16]:
def getValidGuesses(word, guesses):
    valids = np.zeros(26)
    for c in word:
        ci = ord(c) - ord('a')
        if(guesses[ci]==0):
            valids[ci]+=1
    return valids/(sum(valids)+1)
            

Making models for different word lengths.

In [1]:
models_all = {}
for k in words_by_length.keys():
    print(k)
    try:
        models_all[k] = tf.keras.models.load_model('results/model_'+str(k)+'_0')
    except:
        models_all[k] = NNModel(k+26)
    

NameError: name 'words_by_length' is not defined

The "getAccuracy" function returns the success rate, ratio of wins over total games provided a testing set.

In [None]:
########testing###########
def getAccuracy(word_length, test_set):
    wins = 0
    losses = 0
    for word in test_set:
        tries_left = 6
        guesses = np.zeros(26)
        valid_guesses = getValidGuesses(word,guesses)
        while(tries_left!=0 and sum(valid_guesses)!=0):
            encoded = encodeWord(word, guesses).reshape(1,-1)
            predicted = models_all[word_length].predict([encoded])

            predicted[0][(guesses!=0)] = 0
            new_guess = np.argmax(predicted)

            guesses[new_guess] = 1

            if(valid_guesses[new_guess]>0):
                valid_guesses = getValidGuesses(word, guesses)
            else:
                tries_left-=1

        if(tries_left==0):
            losses += 1
        else:
            wins += 1

    print('wins: ',wins,"and losses:",losses)
    return wins/(wins+losses)

Training the models over the word dictionary provided, returning an output vector giving probabilities for all the letters to be the next possible letter. And then the letter with the maximum probability is picked and looked for its probability in the valid_guesses. If the probability for this letter being valid is non zero then this is our next guess otherwise we loose a try.
This training of all the models for all the word lengths is going over and over again 50 times, so models are getting better in every iteration.

In [None]:
epochs = 50
for e in range(epochs):
    print('epoch:',e)
    for word_length in list(sorted(words_by_length.keys())):
            
            # 80% words of that particular word length have been taken as training set to train the model for that length
            training_set = np.random.choice(words_by_length[word_length],int(0.8*len(words_by_length[word_length])))
            test_set = [x for x in words_by_length[word_length] if x not in training_set]
    
            print('word_length: ',word_length)
            wins = 0
            losses = 0
            gc.collect()
            for wi,word in enumerate(training_set):
                tries_left = 6
                guesses = np.zeros(26)
                valid_guesses = getValidGuesses(word,guesses)

                encoded_all = []
                valid_guesses_all = []
                
                # "encoded", "encoded1", "encoded2" have been taken so that words with 's', 'es', 'er', 'ly' at the end
                # can be dealt with easily, e.g. if word is 'stocks' (6 letter word) then it is possible that the model
                # for word length 5 (for word stock) can guess the word easily as compared to model for word length 6
                while(tries_left!=0 and sum(valid_guesses)!=0):
                    encoded = encodeWord(word, guesses)
                    encoded1 = encodeWord(word[:-1], guesses)
                    encoded2 = encodeWord(word[:-2], guesses)

                    encoded_all += [encoded]
                    valid_guesses_all += [valid_guesses]

                    predicted = models_all[word_length].predict([encoded.reshape(1,-1)])
                    predicted1 = models_all[word_length-1].predict([encoded1.reshape(1,-1)])
                    predicted2 = models_all[word_length-2].predict([encoded2.reshape(1,-1)])
                    
                    
                    
                    predicted[0][(guesses!=0)] = 0
                    predicted1[0][(guesses!=0)] = 0
                    predicted2[0][(guesses!=0)] = 0
                    
                    # taking the one with the maximum probability
                    new_guess = np.argmax(predicted)
                    new_guess1 = np.argmax(predicted1)
                    new_guess2 = np.argmax(predicted2)
                    
                    ngv = predicted[0][new_guess]
                    ngv1 = predicted1[0][new_guess1]
                    ngv2 = predicted2[0][new_guess2]
                    
                    final_guess = new_guess
                    
                    if(ngv1>ngv and ngv1>ngv2):
                        final_guess = new_guess1
                    elif(ngv2>ngv and ngv2>ngv1):
                        final_guess = new_guess2
                        

                    guesses[final_guess] = 1

                    if(valid_guesses[final_guess]>0):
                        valid_guesses = getValidGuesses(word, guesses)
                    else:
                        tries_left-=1

                encoded_all += [encodeWord(word, guesses)]
                valid_guesses_all += [valid_guesses]



                encoded_all = np.array(encoded_all)
                valid_guesses_all = np.array(valid_guesses_all)

                models_all[word_length].fit(encoded_all,valid_guesses_all, verbose=False)        

                if(tries_left==0):
                    losses += 1

                else:
                    wins += 1
                if(wi%200==0):
                    print('wins:',wins,'losses:',losses)

            print('Training: wins: ',wins,"and losses:",losses)
            print('Testing',getAccuracy(word_length, test_set))
            models_all[word_length].save('model_'+str(word_length)+'_'+str(e))
        

## Training complete

## Hangman API Code

Below block includes the new guess function which uses the functions and models trained above to return the next letter to be that should be guessed. The previous guess function has been commented.

In [24]:
HANGMAN_URL = "https://www.trexsim.com/trexsim/hangman"
from keras import backend as K

class HangmanAPI(object):
    def __init__(self, access_token=None, session=None, timeout=None):
        self.access_token = access_token
        self.session = session or requests.Session()
        self.timeout = timeout
        self.guessed_letters = []
        
        full_dictionary_location = "words_250000_train.txt"
        self.full_dictionary = self.build_dictionary(full_dictionary_location)        
        self.full_dictionary_common_letter_sorted = collections.Counter("".join(self.full_dictionary)).most_common()
        
        self.current_dictionary = []
    
        
        self.models = {}
        for k in range(29):
            print('loading: ',k)
            try:
                self.models[k] = tf.keras.models.load_model('results/model_'+str(k)+'_1')
            except:
                try:
                    self.models[k] = tf.keras.models.load_model('results/model_'+str(k)+'_0')
                except:
                    continue
                    
    #representing already guessed letters list as a vector of integers        
    def encodeGuesses(self):
        guesses = np.zeros(26)
        for g in self.guessed_letters:
            gi = ord(g)-ord('a')
            guesses[gi] = 1
        return guesses
    
    
    #encodig the partially or fully obscured word
    def encodeHiddenWord(self, word, guesses):
        encoding = []
        for c in word:
            ci = ord(c)-ord('a')+1
            if(ci>=0 and ci<=26):
                encoding += [ci]
            else:
                encoding += [27]
        return np.concatenate([np.array(encoding), guesses])
        
        
    def guess(self, word): # word input example: "_ p p _ e "
        ###############################################
        # Replace with your own "guess" function here #
        ###############################################

        # clean the word so that we strip away the space characters
        
        clean_word = word.replace(' ','')

        
        #representing the word with underscores
        encode = self.encodeHiddenWord(clean_word, self.encodeGuesses()).reshape(1,-1)
        if(len(clean_word)>1):
            encode1 = self.encodeHiddenWord(clean_word[:-1], self.encodeGuesses()).reshape(1,-1)
        if(len(clean_word)>2):
            encode2 = self.encodeHiddenWord(clean_word[:-2], self.encodeGuesses()).reshape(1,-1)
        
        pred, pred1, pred2 = np.zeros(26), np.zeros(26),np.zeros(26)
        
        if(len(clean_word) in self.models.keys()):
            pred = self.models[len(clean_word)].predict(encode)[0]
            
            for g in self.guessed_letters:
                pred[ord(g)-ord('a')] = 0
        if((len(clean_word)-1) in self.models.keys()):
            pred1 = self.models[len(clean_word)-1].predict(encode1)[0]
            
            for g in self.guessed_letters:
                pred1[ord(g)-ord('a')] = 0
                
        if((len(clean_word)-2) in self.models.keys()):
            pred2 = self.models[len(clean_word)-2].predict(encode2)[0]
            
            for g in self.guessed_letters:
                pred2[ord(g)-ord('a')] = 0
        
        if(sum(pred)>0 or sum(pred1)>0 or sum(pred2)>0):
            p = [np.argmax(pred),np.argmax(pred1),np.argmax(pred2)]
            p_ = [pred[i] for i in p]
            new_guess = p[np.argmax(p_)]

            return chr(ord('a')+new_guess)
        else:
            while True:
                x = np.random.choice(list(range(26)),1)[0]
                if(chr(ord('a')+x) not in self.guessed_letters):
                    return chr(ord('a')+x)
        

    ##########################################################
    # You'll likely not need to modify any of the code below #
    ##########################################################
    
    def build_dictionary(self, dictionary_file_location):
        text_file = open(dictionary_file_location,"r")
        full_dictionary = text_file.read().splitlines()
        text_file.close()
        return full_dictionary
                
    def start_game(self, practice=True, verbose=True):
        # reset guessed letters to empty set and current plausible dictionary to the full dictionary
        self.guessed_letters = []
        self.current_dictionary = self.full_dictionary
                         
        response = self.request("/new_game", {"practice":practice})
        if response.get('status')=="approved":
            game_id = response.get('game_id')
            word = response.get('word')
            tries_remains = response.get('tries_remains')
            if verbose:
                print("Successfully start a new game! Game ID: {0}. # of tries remaining: {1}. Word: {2}.".format(game_id, tries_remains, word))
            while tries_remains>0:
                # get guessed letter from user code
                guess_letter = self.guess(word)
                    
                # append guessed letter to guessed letters field in hangman object
                self.guessed_letters.append(guess_letter)
                if verbose:
                    print("Guessing letter: {0}".format(guess_letter))
                    
                try:    
                    res = self.request("/guess_letter", {"request":"guess_letter", "game_id":game_id, "letter":guess_letter})
                except HangmanAPIError:
                    print('HangmanAPIError exception caught on request.')
                    continue
                except Exception as e:
                    print('Other exception caught on request.')
                    raise e
               
                if verbose:
                    print("Sever response: {0}".format(res))
                status = res.get('status')
                tries_remains = res.get('tries_remains')
                if status=="success":
                    if verbose:
                        print("Successfully finished game: {0}".format(game_id))
                    return True
                elif status=="failed":
                    reason = res.get('reason', '# of tries exceeded!')
                    if verbose:
                        print("Failed game: {0}. Because of: {1}".format(game_id, reason))
                    return False
                elif status=="ongoing":
                    word = res.get('word')
        else:
            if verbose:
                print("Failed to start a new game")
        return status=="success"
        
    def my_status(self):
        return self.request("/my_status", {})
    
    def request(
            self, path, args=None, post_args=None, method=None):
        if args is None:
            args = dict()
        if post_args is not None:
            method = "POST"

        # Add `access_token` to post_args or args if it has not already been
        # included.
        if self.access_token:
            # If post_args exists, we assume that args either does not exists
            # or it does not need `access_token`.
            if post_args and "access_token" not in post_args:
                post_args["access_token"] = self.access_token
            elif "access_token" not in args:
                args["access_token"] = self.access_token

        num_retry, time_sleep = 5, 2                                                                                        
        for it in range(num_retry):                                                                                         
            try:                                                                                                            
                response = self.session.request(                                                                            
                    method or "GET",                                                                                        
                    HANGMAN_URL + path,                                                                                     
                    timeout=self.timeout,                                                                                   
                    params=args,                                                                                            
                    data=post_args                                                                                          
                )                                                                                                           
                break                                                                                                       
            except requests.HTTPError as e:                                                                                 
                response = json.loads(e.read())                                                                             
                raise HangmanAPIError(response)                                                                             
            except requests.exceptions.SSLError as e:                                                                       
                if it + 1 == num_retry:                                                                                     
                    raise                                                                                                   
                time.sleep(time_sleep)  

        headers = response.headers
        if 'json' in headers['content-type']:
            result = response.json()
        elif "access_token" in parse_qs(response.text):
            query_str = parse_qs(response.text)
            if "access_token" in query_str:
                result = {"access_token": query_str["access_token"][0]}
                if "expires" in query_str:
                    result["expires"] = query_str["expires"][0]
            else:
                raise HangmanAPIError(response.json())
        else:
            raise HangmanAPIError('Maintype was not text, or querystring')

        if result and isinstance(result, dict) and result.get("error"):
            raise HangmanAPIError(result)
        return result
    
class HangmanAPIError(Exception):
    def __init__(self, result):
        self.result = result
        self.code = None
        try:
            self.type = result["error_code"]
        except (KeyError, TypeError):
            self.type = ""

        try:
            self.message = result["error_description"]
        except (KeyError, TypeError):
            try:
                self.message = result["error"]["message"]
                self.code = result["error"].get("code")
                if not self.type:
                    self.type = result["error"].get("type", "")
            except (KeyError, TypeError):
                try:
                    self.message = result["error_msg"]
                except (KeyError, TypeError):
                    self.message = result

        Exception.__init__(self, self.message)

# API Usage Examples

## To start a new game:
1. Make sure you have implemented your own "guess" method.
2. Use the access_token that we sent you to create your HangmanAPI object. 
3. Start a game by calling "start_game" method.
4. If you wish to test your function without being recorded, set "practice" parameter to 1.
5. Note: You have a rate limit of 20 new games per minute. DO NOT start more than 20 new games within one minute.

In [25]:
api = HangmanAPI(access_token="5ffea8d477a5d209ba7dfbeb78ae1e", timeout=2000)


loading:  0
loading:  1
loading:  2
loading:  3
loading:  4
loading:  5
loading:  6
loading:  7
loading:  8
loading:  9
loading:  10
loading:  11
loading:  12
loading:  13
loading:  14
loading:  15
loading:  16
loading:  17
loading:  18
loading:  19
loading:  20
loading:  21
loading:  22
loading:  23
loading:  24
loading:  25
loading:  26
loading:  27
loading:  28


## Playing practice games:
You can use the command below to play up to 100,000 practice games.

In [31]:
api.start_game(practice=1,verbose=True)
[total_practice_runs,total_recorded_runs,total_recorded_successes] = api.my_status() # Get my game stats: (# of tries, # of wins)
print('run %d practice games out of an allotted 100,000' %total_practice_runs)


HangmanAPIError: {'error': 'Your account has been deactivated!'}

## Playing recorded games:
Please finalize your code prior to running the cell below. Once this code executes once successfully your submission will be finalized. Our system will not allow you to rerun any additional games.

Please note that it is expected that after you successfully run this block of code that subsequent runs will result in the error message "Your account has been deactivated".

Once you've run this section of the code your submission is complete. Please send us your source code via email.

In [29]:
for i in range(1000):
    print('Playing ', i, ' th game')
    # Uncomment the following line to execute your final runs. Do not do this until you are satisfied with your submission
    api.start_game(practice=0,verbose=False)
    
    # DO NOT REMOVE as otherwise the server may lock you out for too high frequency of requests
    time.sleep(0.5)

Playing  0  th game
Playing  1  th game
Playing  2  th game
Playing  3  th game
Playing  4  th game
Playing  5  th game
Playing  6  th game
Playing  7  th game
Playing  8  th game
Playing  9  th game
Playing  10  th game
Playing  11  th game
Playing  12  th game
Playing  13  th game
Playing  14  th game
Playing  15  th game
Playing  16  th game
Playing  17  th game
Playing  18  th game
Playing  19  th game
Playing  20  th game
Playing  21  th game
Playing  22  th game
Playing  23  th game
Playing  24  th game
Playing  25  th game
Playing  26  th game
Playing  27  th game
Playing  28  th game
Playing  29  th game
Playing  30  th game
Playing  31  th game
Playing  32  th game
Playing  33  th game
Playing  34  th game
Playing  35  th game
Playing  36  th game
Playing  37  th game
Playing  38  th game
Playing  39  th game
Playing  40  th game
Playing  41  th game
Playing  42  th game
Playing  43  th game
Playing  44  th game
Playing  45  th game
Playing  46  th game
Playing  47  th game
Pl

Playing  378  th game
Playing  379  th game
Playing  380  th game
Playing  381  th game
Playing  382  th game
Playing  383  th game
Playing  384  th game
Playing  385  th game
Playing  386  th game
Playing  387  th game
Playing  388  th game
Playing  389  th game
Playing  390  th game
Playing  391  th game
Playing  392  th game
Playing  393  th game
Playing  394  th game
Playing  395  th game
Playing  396  th game
Playing  397  th game
Playing  398  th game
Playing  399  th game
Playing  400  th game
Playing  401  th game
Playing  402  th game
Playing  403  th game
Playing  404  th game
Playing  405  th game
Playing  406  th game
Playing  407  th game
Playing  408  th game
Playing  409  th game
Playing  410  th game
Playing  411  th game
Playing  412  th game
Playing  413  th game
Playing  414  th game
Playing  415  th game
Playing  416  th game
Playing  417  th game
Playing  418  th game
Playing  419  th game
Playing  420  th game
Playing  421  th game
Playing  422  th game
Playing  4

Playing  751  th game
Playing  752  th game
Playing  753  th game
Playing  754  th game
Playing  755  th game
Playing  756  th game
Playing  757  th game
Playing  758  th game
Playing  759  th game
Playing  760  th game
Playing  761  th game
Playing  762  th game
Playing  763  th game
Playing  764  th game
Playing  765  th game
Playing  766  th game
Playing  767  th game
Playing  768  th game
Playing  769  th game
Playing  770  th game
Playing  771  th game
Playing  772  th game
Playing  773  th game
Playing  774  th game
Playing  775  th game
Playing  776  th game
Playing  777  th game
Playing  778  th game
Playing  779  th game
Playing  780  th game
Playing  781  th game
Playing  782  th game
Playing  783  th game
Playing  784  th game
Playing  785  th game
Playing  786  th game
Playing  787  th game
Playing  788  th game
Playing  789  th game
Playing  790  th game
Playing  791  th game
Playing  792  th game
Playing  793  th game
Playing  794  th game
Playing  795  th game
Playing  7

HangmanAPIError: {'error': 'You have reached 1000 of games', 'status': 'denied'}

## To check your game statistics
1. Simply use "my_status" method.
2. Returns your total number of games, and number of wins.

In [32]:
[total_practice_runs,total_recorded_runs,total_recorded_successes] = api.my_status() # Get my game stats: (# of tries, # of wins)
success_rate = total_recorded_successes/total_recorded_runs
print('overall success rate = %.3f' % success_rate)

overall success rate = 0.113
