## 9.6 SMS

Text messaging apps suggest completions for the word you're typing.
Mobile phones usually don't have space for more than three suggestions.
As there can be many words starting with the characters you typed so far,
one way to reduce the suggestions is to associate a numeric score to each word
and suggest only those with the highest scores. For example,
a word's score can be based on how frequently it occurs in English texts.

In this problem you are going to implement two operations:

- completions: given a prefix string, return a sequence of the three
  highest-scoring words that start with that prefix, ordered by descending score
- initialisation: given each word and its score, in no particular order,
  create a collection for the first operation to use.


The completions operation returns the first three
(or fewer if there aren't three) words that start with the given prefix,
when ordered by descending score. When several words have the same score,
it doesn't matter which get into the top three.

Here are some examples if the words and their scores are 'hello'/10, 'hi'/9,
'here'/ 5, 'there'/5 and 'hickup'/1:

Prefix | Completions
:-|:-
''  |  ('hello', 'hi', 'here') or ('hello', 'hi', 'there')
'ha'  |  ()
'he'  | ('hello', 'here')

Some mobile phones have less memory, some have a slower CPU.
You are asked for two different approaches, to support a space–time tradeoff.

### 9.6.1 First approach

#### Exercise 9.6.1

Think of an approach that uses little memory
besides the memory needed for storing all words.
Describe the algorithm and ADT(s) for each operation, and
their complexities in terms of the total number of words.
Distinguish between best- and worst-case complexities if necessary.

The initialisation is done once, during the phone's software installation, so
it doesn't have to be efficient.
The completions operation should preferably take less than quadratic time.

[Hint](../31_Hints/Hints_09_6_01.ipynb)
[Answer](../32_Answers/Answers_09_6_01.ipynb)

#### Exercise 9.6.2

Implement your approach by completing and running the following code cell.
The `__init__` method includes code to read the words and scores from a file.
You don't have to understand or change that code.

The file used in the tests is in the same folder as this notebook and
contains 100 common English words. You don't have to add tests.

In [1]:
%run -i ../m269_util

class SMS:
    """A collection of words for completing prefixes."""

    def __init__(self, filename: str) -> None:
        """Load the words and their scores from the given file.

        Preconditions: filename is the name of a text file where
        - each line is of the form 'word score'
        - scores are positive integers
        - words aren't empty nor repeated
        """
        pass            # create the data structure
        with open(filename, 'r') as infile:
            for line in infile:
                pair = line.split()
                word = pair[0]
                score = int(pair[1])
                pass    # process the word and score
        pass            # do any further processing

    def completions(self, prefix: str) -> list:
        """Return the highest-scoring words starting with prefix.

        Postconditions: the output is a list of at most 3 words
        from the file, ordered by descending scores
        """
        pass

words_tests_100 = [
    # case,             prefix, completions
    ('no prefix',       '',     ['the', 'of', 'and']),
    ('matches > 3',     'a',    ['and', 'as', 'at']),
    ('matches = 3',     'an',   ['and', 'an', 'any']),
    ('matches < 3',     'wi',   ['with', 'will']),
    ('matches = 0',     'z',    []),
    ('prefix = word',   'said', ['said']),
    ('last words',      'y',    ['you', 'your']),
]

sms100 = SMS('100words.txt')
test(sms100.completions, words_tests_100)

Once your code passes the tests above,
run the next cell, with a larger file of 10,000 English words.

In [2]:
words_tests_10000 = [
    # case,           prefix, completions
    ('no prefix',     '',     ['the', 'of', 'and']),
    ('matches > 3',   'a',    ['and', 'as', 'at']),
    ('matches = 3',   'anx',  ['anxious', 'anxiety', 'anxiously']),
    ('matches < 3',   'tric', ['trick', 'tricks']),
    ('matches = 0',   'glu',  []),
    ('prefix = word', 'said', ['said']),
    ('last words',    'zo',   ['zone']),
]

sms10000 = SMS('10000words.txt')
test(sms10000.completions, words_tests_10000)

[Hint](../31_Hints/Hints_09_6_02.ipynb)
[Answer](../32_Answers/Answers_09_6_02.ipynb)

#### Exercise 9.6.3

Usually the suggestions are updated after each keystroke.
Your code should be able to produce three suggestions within 0.05&nbsp;seconds
for a typical vocabulary of 100,000 words.

<div class="alert alert-info">
<strong>Info:</strong> Jakob Nielsen states in
[Powers of 10: Time Scales in User Experience](https://www.nngroup.com/articles/powers-of-10-time-scales-in-ux/)
that 0.1&nbsp;seconds is the time limit for 'users to feel
like their actions are directly causing something to happen on the screen'.
To simplify, I allocate the same time for computing the suggestions
and for displaying them on screen. Hence the 0.05&nbsp;seconds limit.
</div>

Run the next cell. What's the worst time you expect for 100 thousand words?
Is it under the 0.05&nbsp;s limit?

_Write your answer here._

In [3]:
print('100 words:')
for test in words_tests_100:
    prefix = test[1]
    print("'" + prefix + "'")
    %timeit -r 5 -n 10000 sms100.completions(prefix)
print('\n10,000 words:')
for test in words_tests_10000:
    prefix = test[1]
    print("'" + prefix + "'")
    %timeit  -r 5 -n 1000 sms10000.completions(prefix)

[Hint](../31_Hints/Hints_09_6_03.ipynb)
[Answer](../32_Answers/Answers_09_6_03.ipynb)

### 9.6.2 Second approach

#### Exercise 9.6.4

Think of a different approach to solve the same problem.
Aim to make the completions operation as fast as possible.
You can use as much extra memory as needed.
You can assume that any operation that is linear in the length of a word takes
in effect constant time, because the length of commonly used words is bounded.

Again, describe the ADT(s), algorithms and their complexities
for both operations.

[Hint](../31_Hints/Hints_09_6_04.ipynb)
[Answer](../32_Answers/Answers_09_6_04.ipynb)

#### Exercise 9.6.5 (optional)

Copy the code cell with the class to below this paragraph and
implement the second approach.
Run the cell again, but this time with the timing code.

⟵ [Previous section](09_5_browsing.ipynb) | [Up](09-introduction.ipynb) | [Next section](../10_TMA01-2/10-introduction.ipynb) ⟶