## Training Goal 
Prefix Expansion

## Steps Shown
### 1. Data Cleaning (read_vocabulary 6-8)  
Loading text, lowercasing, removing punctuation

### 1a. tokenization (read_vocabulary 9) 
Splitting into array

### 2. model training (read_vocabulary 10-12) 
Computing word frequencies via Counter and sorting by frequency (this IS your model - a frequency-based language model)

### 3. inference (autocomplete_word)
- Inference doesn't require neural networks; it just means using your model to make predictions
- Your "model" (frequency-sorted vocab) predicts the most likely completions by returning frequent words first

In [7]:
with open("shakespeare-edit.txt", "r") as f:
    for i in range(5):
        print(f.readline())


                     1

  From fairest creatures we desire increase,

  That thereby beauty's rose might never die,

  But as the riper should by time decease,

  His tender heir might bear his memory:



In [None]:
import re
from collections import Counter

def read_vocabulary(filename):
    with open(filename, "r") as f:
        text = f.read().lower()
    # Remove punctuation/digits
    text = re.sub(r"[^a-z\s]", "", text)
    words = text.split()
    counter = Counter(words)
    # Sort by frequency (most common first)
    vocab = [word for word, _ in counter.most_common()]
    return vocab

print(read_vocabulary("tiny.txt"))
print(read_vocabulary("shakespeare-edit.txt"))


['hello', 'world']


In [None]:
def process_data(vocab):
    return vocab

def autocomplete_word(prefix, vocab):
    results = [word for word in vocab if word.startswith(prefix)]
    return results[:10]  # top 10


In [11]:
vocab = read_vocabulary("shakespeare-edit.txt")
print(autocomplete_word("love", vocab))
print(autocomplete_word("the", vocab))
print(autocomplete_word("thou", vocab))
print(autocomplete_word("rome", vocab))


['love', 'loves', 'lovers', 'lovely', 'lover', 'loved', 'lovell', 'lovel', 'lovest', 'lovesong']
['the', 'thee', 'they', 'then', 'their', 'them', 'there', 'these', 'therefore', 'theres']
['thou', 'though', 'thought', 'thousand', 'thoughts', 'thourt', 'thousands', 'thoult', 'thoudst', 'thout']
['rome', 'romeo', 'romes', 'romeos', 'romei']


In [12]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Assume you already defined: autocomplete_word(prefix, model)

# Create a text box
textbox = widgets.Text(
    placeholder="Type a word prefix...",
    description="Prefix:",
    disabled=False
)

# Create an output display area
output = widgets.Output()

# Define what happens when textbox changes
def on_value_change(change):
    prefix = change['new']
    suggestions = autocomplete_word(prefix, vocab)  # <- your function + data structure
    with output:
        clear_output()
        if suggestions:
            print("Suggestions:")
            for word in suggestions:
                print(word)
        else:
            print("No suggestions found.")

# Hook the function up to the textbox
textbox.observe(on_value_change, names='value')

# Display UI
display(textbox, output)


Text(value='', description='Prefix:', placeholder='Type a word prefix...')

Output()

<!-- Convert to .py
jupyter nbconvert --to script my_notebook.ipynb
 -->