# <a style="color:red;">Project - Wordle

## Table of Contents

- [Rules](#)
- [TODO](#)
- [Tips](#)
- [Test](#)

---

Wordle is a word game, which recently got very popular and was added to NYT Games website. It is developed by Josh Wardle. You can find the original game [here](https://www.nytimes.com/games/wordle/index.html). However, you can only play it once a day.

Luckily, in this version of Wordle that you are going to be programming, you will be able to play as many times as you want in a day. Moreover, you will be allowed to see which words could potentially be the right answer. What is more, you will be using a bigger data set than the actual Wordle, which basically involves all the 5 letter words in a Scrabble dictionar

### Rules

- The player enters a random 5-letter word.
- If the random word is the word to be guessed, the game is over. The player receives a congratulations message.
- If the random word isn’t the word to be guessed, the player is informed about whether the right letter is at the right place and if some of the letters are in the word but wrongly placed.
- Based on this, the player has 6 tries to guess the word.
- At the end of the 6 attempts, if the player fails to guess the right word, the word is revealed.

### Todo

1. Read the possible words from the txt file and save them on a list.
2. Make sure that the user can enter input exactly 6 times.
3. Make sure that regardless of the case, the input is processed correctly.
4. Make sure that you use appropriate data structures for valid characters, invalid positions, and invalid characters.
5. Use the random module to make sure the word to be guessed is randomly chosen.
6. Cluster the potential words accordingly and reveal it to the player each round.
7. In case the player first guesses the right letter at the wrong place, and later on gets the place right, remove that from your valid characters invalid positions.ition

### Tips

- At the very beginning each of the words have a chance of being the word to be guessed.
- A word is invalid, when there are invalid letters in it or when there is a valid letter at the wrong place.
- A word is possible, when it isn’t invalid and contains the correctly guessed letter at the right place.
- You can initiate a random number by:

```python
from random import randint, seed
seed()
```

### Test

If the word to be guessed is _BUIST_ and if I guess first _MILKY_, my cluster of potential words should consist of 1127 words. If I then go ahead and guess _POUND_, my cluster of potential words should consist of only 52 words. If I go ahead and guess _RATES_, my cluster should consist of only 3 words and they should be `['BUIST', 'BUSTI', 'QUIST']`.

### Solution

#### Build Data

First, we need to download a corpus of english (or any language you are interested in) words. By a simple google search, you can find many datasets. However, we want to use a dataset that has frequency information as well. Words frequency data tell us how popular a word is and to control the difficulty level of the game, we can use top popular words to make it easy or rare words to make it difficult to guess.

For the purpose of this project, we use data in [Kaggle English Word Frequency](https://www.kaggle.com/datasets/rtatman/english-word-frequency). This dataset contains the counts of the 333,333 most commonly-used single words on the English language web, as derived from the Google Web Trillion Word Corpus.

Dataset file is in `.csv` version which you may not know how to work with, so we convert it to a comma separated `.txt` file.

### Build Data

In [1]:
# Step 1: Read file - len word - limit
def generate_word_frequency(file_path, word_len: int = 5, limit: int = 1000):
    pass

In [2]:
file_path = './src/data/words_frequency.txt'

In [3]:
word_len = 5
limit = 1000

In [4]:
with open(file_path) as f:
    for line in f:
        print(line.strip())
        break

the, 23135851162


In [6]:
with open(file_path) as f:
    for line in f:
        word, frequency = line.split(',')
        print(type(frequency))
        print(word, int(frequency))
        break

<class 'str'>
the 23135851162


In [7]:
words = []

with open(file_path) as f:
    for line in f:
        word, frequency = line.split(',')
        frequency = int(frequency)
        words.append((word, int(frequency)))

In [8]:
words

[('the', 23135851162),
 ('of', 13151942776),
 ('and', 12997637966),
 ('to', 12136980858),
 ('a', 9081174698),
 ('in', 8469404971),
 ('for', 5933321709),
 ('is', 4705743816),
 ('on', 3750423199),
 ('that', 3400031103),
 ('by', 3350048871),
 ('this', 3228469771),
 ('with', 3183110675),
 ('i', 3086225277),
 ('you', 2996181025),
 ('it', 2813163874),
 ('not', 2633487141),
 ('or', 2590739907),
 ('be', 2398724162),
 ('are', 2393614870),
 ('from', 2275595356),
 ('at', 2272272772),
 ('as', 2247431740),
 ('your', 2062066547),
 ('all', 2022459848),
 ('have', 1564202750),
 ('new', 1551258643),
 ('more', 1544771673),
 ('an', 1518266684),
 ('was', 1483428678),
 ('we', 1390661912),
 ('will', 1356293641),
 ('home', 1276852170),
 ('can', 1242323499),
 ('us', 1229112622),
 ('about', 1226734006),
 ('if', 1134987907),
 ('page', 1082121730),
 ('my', 1059793441),
 ('has', 1046319984),
 ('search', 1024093118),
 ('free', 1014107316),
 ('but', 999899654),
 ('our', 998757982),
 ('one', 993536631),
 ('other', 97

In [9]:
# Build Data
words_freq = []
with open(file_path) as f:
    for line in f:
        word, frequency = line.split(',')
        frequency = int(frequency)
        words_freq.append((word, int(frequency)))

# Sort Data
words_freq = sorted(words_freq, key=lambda w_freq: w_freq[1], reverse=True)

# Limit Data
words_freq = words_freq[:limit]

# Drop Frequency Data
words = [w_freq[0] for w_freq in words_freq]

# word_len letters words
words = list(filter(lambda w: len(w) == word_len, words))

In [10]:
words

['about',
 'other',
 'which',
 'their',
 'there',
 'first',
 'would',
 'these',
 'click',
 'price',
 'state',
 'email',
 'world',
 'music',
 'after',
 'video',
 'where',
 'books',
 'links',
 'years',
 'order',
 'items',
 'group',
 'under',
 'games',
 'could',
 'great',
 'hotel',
 'store',
 'terms',
 'right',
 'local',
 'those',
 'using',
 'phone',
 'forum',
 'based',
 'black',
 'check',
 'index',
 'being',
 'women',
 'today',
 'south',
 'pages',
 'found',
 'house',
 'photo',
 'power',
 'while',
 'three',
 'total',
 'place',
 'think',
 'north',
 'posts',
 'media',
 'water',
 'since',
 'guide',
 'board',
 'white',
 'small',
 'times',
 'sites',
 'level',
 'hours',
 'image',
 'title',
 'shall',
 'class',
 'still',
 'money',
 'every',
 'visit',
 'tools',
 'reply',
 'value',
 'press',
 'learn',
 'print',
 'stock',
 'point',
 'sales',
 'large',
 'table',
 'start',
 'model',
 'human',
 'movie',
 'march',
 'yahoo',
 'going',
 'study',
 'staff',
 'again',
 'april',
 'never',
 'users',
 'topic',


In [11]:
# Step 1: Read file - len word - limit
def generate_word_frequency(file_path, word_len: int = 5, limit: int = 1000):
    # Build Data
    words_freq = []
    with open(file_path) as f:
        for line in f:
            word, frequency = line.split(',')
            frequency = int(frequency)
            words_freq.append((word, int(frequency)))

    # Sort Data
    words_freq = sorted(words_freq, key=lambda w_freq: w_freq[1], reverse=True)

    # Limit Data
    words_freq = words_freq[:limit]

    # Drop Frequency Data
    words = [w_freq[0] for w_freq in words_freq]

    # word_len letters words
    words = list(filter(lambda w: len(w) == word_len, words))

    return words

In [12]:
word_len = 5
limit = 10000

In [13]:
words = generate_word_frequency(file_path, word_len=word_len, limit=limit)

In [14]:
words

['about',
 'other',
 'which',
 'their',
 'there',
 'first',
 'would',
 'these',
 'click',
 'price',
 'state',
 'email',
 'world',
 'music',
 'after',
 'video',
 'where',
 'books',
 'links',
 'years',
 'order',
 'items',
 'group',
 'under',
 'games',
 'could',
 'great',
 'hotel',
 'store',
 'terms',
 'right',
 'local',
 'those',
 'using',
 'phone',
 'forum',
 'based',
 'black',
 'check',
 'index',
 'being',
 'women',
 'today',
 'south',
 'pages',
 'found',
 'house',
 'photo',
 'power',
 'while',
 'three',
 'total',
 'place',
 'think',
 'north',
 'posts',
 'media',
 'water',
 'since',
 'guide',
 'board',
 'white',
 'small',
 'times',
 'sites',
 'level',
 'hours',
 'image',
 'title',
 'shall',
 'class',
 'still',
 'money',
 'every',
 'visit',
 'tools',
 'reply',
 'value',
 'press',
 'learn',
 'print',
 'stock',
 'point',
 'sales',
 'large',
 'table',
 'start',
 'model',
 'human',
 'movie',
 'march',
 'yahoo',
 'going',
 'study',
 'staff',
 'again',
 'april',
 'never',
 'users',
 'topic',


### Select a random word

In [15]:
import random

In [16]:
random.seed(42)

In [17]:
word = random.choice(words)

In [18]:
word = word.upper()

In [19]:
word

'LANCE'

### Wordle

In [20]:
from termcolor import colored

In [21]:
print(colored('Hi', 'yellow', attrs=['reverse']))

[7m[33mHi[0m


In [3]:
from termcolor import colored


def print_success(text, end='\n'):
    print(colored(text, 'green', attrs=['reverse']), end=end)


def print_warning(text, end='\n'):
    print(colored(text, 'yellow', attrs=['reverse']), end=end)


def print_error(text, end='\n'):
    print(colored(text, 'red', attrs=['reverse']), end=end)


def print_grey(text, end='\n'):
    print(colored(text, 'grey', attrs=['reverse']), end=end)

In [4]:
print_success('Hello')

[7m[32mHello[0m


In [24]:
num_try = 6
success = False

while num_try:
    guess_word = input(f'Enter a {word_len} letter word (or q to exit): ')
    if guess_word == 'q':
        break

    guess_word = guess_word.upper()

    # Word Length
    if len(guess_word) != 5:
        print(f'Word must have {word_len} letters. You entered {len(guess_word)}!')

    # Check valid word
    if guess_word.lower() not in words:
        print_warning('Word is not valid')
        continue

    # Check valid, invalid positions, invalid characters
    for w_letter, g_letter in zip(word, guess_word):
        if w_letter == g_letter:
            print_success(f' {g_letter} ', end=' ')
        elif g_letter in word:
            print_warning(f' {g_letter} ', end=' ')
        else:
            print_error(f' {g_letter} ', end=' ')
    print()

    # Check success
    if word == guess_word:
        print_success('Congratulations!')
        success = True
        break

    num_try -= 1

if not success:
    print_error(f'Game over: The word was "{word}".')

[7m[31m H [0m [7m[33m E [0m [7m[33m L [0m [7m[33m L [0m [7m[31m O [0m 
[7m[31m P [0m [7m[31m R [0m [7m[33m E [0m [7m[31m S [0m [7m[31m S [0m 
Word must have 5 letters. You entered 0!
[7m[33mWord is not valid[0m
[7m[32m L [0m [7m[32m A [0m [7m[32m N [0m [7m[32m C [0m [7m[32m E [0m 
[7m[32mCongratulations![0m
