# Interactive Text Generation

Often the completly automatic generated text is not perfect and it would be nice to *write with the machine* instead of being a passive consumer.

In this notebook we will use a input field to write our own text. The text generator (based on [this](https://github.com/experimental-informatics/hands-on-text-generators/blob/master/markov_basic.ipynb) notebook) will give us next token recommendations.

In [34]:
''' Libraries. '''

import numpy as np
import random
from nltk.tokenize import word_tokenize as tok
import string
from ipywidgets import interact
import ipywidgets as widgets
from ipywidgets import Layout
from collections import Counter

## Tokenizer

In [2]:
''' Read text and tokenize it with NLTK. '''

with open('data/wiki_selection.txt', 'r') as f:
    text = f.read()
token = tok(text)
print('Number of tokens:',len(token))
print(token[:50])

Number of tokens: 50415
['Aesthetics', ',', 'or', 'esthetics', '(', ')', ',', 'is', 'a', 'branch', 'of', 'philosophy', 'that', 'deals', 'with', 'the', 'nature', 'of', 'beauty', 'and', 'taste', ',', 'as', 'well', 'as', 'the', 'philosophy', 'of', 'art', '(', 'its', 'own', 'area', 'of', 'philosophy', 'that', 'comes', 'out', 'of', 'aesthetics', ')', '.', 'It', 'examines', 'subjective', 'and', 'sensori-emotional', 'values', ',', 'or']


## Vocabulary

Create pairs of tokens: one token as input and the preceding token as a possible output.

In [3]:
''' Create a generator with pairs of tokens. '''

def make_pairs(token):
    for i in range(len(token)- 1):
        yield (token[i], token[i+1:i+2])

pairs = make_pairs(token) # pairs is a generator object

''' Create a vocabulary of all tokens and map them to their preceding tokens. '''

# Create an empty dictionary.
vocabulary = {}

# Iterate through the pairs created above.
for current_token, next_token in pairs:
    # Check if the current token is already included into the dictionary.
    if current_token in vocabulary.keys():
        # If yes, append the next token to this entry.
        vocabulary[current_token].append(' '.join(next_token))
    else:
        # Otherwise create a new entry with the current token.
        vocabulary[current_token] = [' '.join(next_token)]
        
print('Size of the vocabulary:', len(vocabulary))      

Size of the vocabulary: 6946


In [4]:
''' Inspect all options for a given token. '''

key = token[0]
print('All options for\n', key, ':', vocabulary[key])

All options for
 Aesthetics : [',', 'in', ',', 'and', 'is', 'encompasses', 'examines', 'is', ',', '.', 'and']


## Multiple recommendations

In the basic markov example we used the `np.random.choice()` to get one value of all the tokens stored in the dictionary for a specific token.<br>
In this example we don't need just one token, but a recommendation of several choices.

In the next step we will reduce our vocabulary, so that every token appears only once in the options.<br>
We will make sure that the most likely token (the one that appears most often) is the most likely one in our new vocabulary.<br>
(Nevertheless this step leads to a reduction of information.)

### Tests

In [5]:
''' Redue and sort a list with counter(). 
See: https://stackoverflow.com/questions/53923847/sorting-a-list-by-number-of-appearances-and-removing-duplicates '''

data = ["apple", "apple", "banana", "orange", "orange", "banana", "banana", "apple", "banana"]

counts = Counter(data)
result = sorted(counts, key=counts.get, reverse=True)
print(result)

['banana', 'apple', 'orange']


We will try to apply the code above to a small dictionary.

In [9]:
''' Create a small dictionary for testing. '''

ex = {}
ex['a'] = ['two', 'one', 'two']
ex['b'] = ['one', 'two', 'two', 'zwei', 'zwei']
ex['c'] = ['one', '1', 'eins']

for key in ex:
    print(key, ex[key])

a ['two', 'one', 'two']
b ['one', 'two', 'two', 'zwei', 'zwei']
c ['one', '1', 'eins']


In [10]:
''' Reduce and sort it with the method from above. '''

for key in ex:
    # store all options in a list
    options = ex[key]
    # create a counter object with this list
    counts = Counter(options)
    # reduce and sort the list
    options = sorted(counts, key=counts.get, reverse=True)
    # override the options of our key
    ex[key] = options

In [11]:
''' Reduced and sorted vocabulary. '''

for key in ex:
    print(key, ex[key])

a ['two', 'one']
b ['two', 'zwei', 'one']
c ['one', '1', 'eins']


### Modifying our vocabulary

As we now know that it works we can apply it to our vocabulary.

In [12]:
for key in vocabulary:
    # store all options in a list
    options = vocabulary[key]
    # create a counter object with this list
    counts = Counter(options)
    # reduce and sort the list
    options = sorted(counts, key=counts.get, reverse=True)
    # override the options of our key
    vocabulary[key] = options

In [13]:
''' Inspect all options for a given token. '''

key = token[0]
print('All options for\n', key, ':', vocabulary[key])

All options for
 Aesthetics : [',', 'and', 'is', 'in', 'encompasses', 'examines', '.']


## Interactive notebooks

With the help of the library [ipywidgets](https://ipywidgets.readthedocs.io) we can interact with our program easily.

### Test

First we will have a look at it with a simple example from its [docs](https://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html#Basic-interact).

We have to define a function first. When we later give input (like a word) to the ipywidget, it performs this function with the input.

In [21]:
def f(input_):
    # perform some action on the input
    return input_*2

interact(f, input_='')

interactive(children=(Text(value='', description='input_'), Output()), _dom_classes=('widget-interact',))

<function __main__.f(input_)>

We can find recommendations through our vocabulary only with whole words. So it does not make sense to look for them while we are writing a word. 

**We will update our function only when we type a space.** 

Maybe the library provides a solution for that but we can do it like this:

In [22]:
def f(input_):
    # if the last token is a space
    if len(input_) > 0 and input_[-1] == ' ':
        return input_*2
    # else do nothing

interact(f, input_='')

interactive(children=(Text(value='', description='input_'), Output()), _dom_classes=('widget-interact',))

<function __main__.f(input_)>

## Interactive text generation

In [23]:
def recommendations(input_):
    if len(input_) > 0 and input_[-1] == ' ':
        # get the last token of our input
        last_token = tok(input_)[-1]
        # check if token is included into the dictionary
        if not last_token in vocabulary.keys():
            # pick a random choice if not included
            last_token = random.choice(list(vocabulary.keys()))
        # get all options for the token
        options = vocabulary[last_token]
        # return this options
        return options

interact(recommendations, input_='')

interactive(children=(Text(value='', description='input_'), Output()), _dom_classes=('widget-interact',))

<function __main__.recommendations(input_)>

## Advanced interactive text generation

Below we will use a larger input field (textarea) to write/ generate our text.<br>
Furthermore we can easily access our generated text.

In [37]:
''' Create a widget object. 
https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Basics.html '''

w = widgets.Textarea(
    value='',
    placeholder='Type something. Press space to get recommendations.',
    description='Input:',
    layout=Layout(width='65%', height='200px'),
    disabled=False
)

''' Widgets require an observer which listens to event changes.
https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Events.html '''

def on_change(change):
    # clear output
    display_recommendations.clear_output()
    # define destination of output below
    with display_recommendations:
        # call our function
        print(recommendations(change['new']))  
    

''' This listener calls the function on_change() if the value changes. '''
w.observe(on_change, names='value')

display_recommendations = widgets.Output()

display(w, display_recommendations)

Textarea(value='', description='Input:', layout=Layout(height='200px', width='65%'), placeholder='Type somethi…

Output()

In [39]:
''' We can receive our text with calling value. '''

print(w.value)

The idea of language must include artificial intelligence.
