# [4. Processing Raw Text](https://www.nltk.org/book/ch04.html) - Exercise Solutions

* [NLTK-Book-Resource Repository](https://github.com/BetoBob/NLTK-Book-Resource)
* [NLTK-Book-Resource Table of Contents](https://github.com/BetoBob/NLTK-Book-Resource#table-of-contents)

Run the cell below before running any other code.

In [2]:
import nltk

## 2.

☼ Identify three operations that can be performed on both tuples and lists. Identify three list operations that cannot be performed on tuples. Name a context where using a list instead of a tuple generates a Python error.

In [3]:
help(tuple)

Help on class tuple in module builtins:

class tuple(object)
 |  tuple(iterable=(), /)
 |  
 |  Built-in immutable sequence.
 |  
 |  If no argument is given, the constructor returns an empty tuple.
 |  If iterable is specified the tuple is initialized from iterable's items.
 |  
 |  If the argument is a tuple, the return value is the same object.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __getnewargs__(self, /)
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |

In [4]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list(iterable=(), /)
 |  
 |  Built-in mutable sequence.
 |  
 |  If no argument is given, the constructor creates a new empty list.
 |  The argument must be an iterable if specified.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self))

## 3.

☼ Find out how to create a tuple consisting of a single item. There are at least two ways to do this.

* [Tuple Syntax](https://wiki.python.org/moin/TupleSyntax)

In [12]:
ex1 = (1, )

In [13]:
type(ex1)

tuple

In [15]:
ex2 = 1,

In [17]:
type(ex2)

tuple

## 4.

☼ Create a list `words = ['is', 'NLP', 'fun', '?']`. Use a series of assignment statements (e.g. `words[1] = words[2]`) and a temporary variable `tmp` to transform this list into the list `['NLP', 'is', 'fun', '!']`. Now do the same transformation using tuple assignment.

### Solution

* [Article on Tuple Assignment](https://openbookproject.net/thinkcs/python/english3e/tuples.html#tuple-assignment)

#### list assignment

In [22]:
words = ['is', 'NLP', 'fun', '?']

tmp = words[0]
words[0] = words[1]
words[1] = tmp
words[-1] = '!'

words

['NLP', 'is', 'fun', '!']

#### tuple assignment

In [26]:
words = ['is', 'NLP', 'fun', '?']

(words[0], words[1]) = (words[1], words[0])
words[-1] = '!'

words

['NLP', 'is', 'fun', '!']

## 6.

☼ Does the method for creating a sliding window of n-grams behave correctly for the two limiting cases: `n = 1`, and `n = len(sent)`?

## 7.

☼ We pointed out that when empty strings and empty lists occur in the condition part of an `if` clause, they evaluate to `False`. In this case, they are said to be occurring in a Boolean context. Experiment with different kind of non-Boolean expressions in Boolean contexts, and see whether they evaluate as `True` or `False`.

### Solution

See the documentation on [Python Truth Value Testing](https://docs.python.org/2.4/lib/truth.html) to see all the ways non-Boolean expressions can be evaluated. Some examples of these non-boolean expressions are shown below:

In [2]:
not None

True

In [3]:
not False

True

In [5]:
not 0

True

In [6]:
not ''

True

In [None]:
not ()

In [8]:
not []

True

In [9]:
not {}

True

In [30]:
class emptyClass:
    
    def __len__(self):
        return 0

In [31]:
x = emptyClass()

In [32]:
not x

True

## 8.

☼ Use the inequality operators to compare strings, e.g. `'Monty' < 'Python'`. What happens when you do `'Z' < 'a'`? Try pairs of strings which have a common prefix, e.g. `'Monty' < 'Montague'`. Read up on "lexicographical sort" in order to understand what is going on here. Try comparing structured objects, e.g. `('Monty', 1) < ('Monty', 2)`. Does this behave as expected?

### Solution

When evaluating the expressions, notice that strings appear to be evaluated in alphabetical order:

In [33]:
'Monty' < 'Python'

True

More specifically, it follows ASCII value ordering. Upper case letters `A - Z` are represented by numbers `65 - 90` and lower case letters `a - z` are represented by numbers `97 - 122` in the ASCII table. This makes the order of the words case sensitive.

* [ASCII Table](https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html)

In [35]:
'Z' < 'a'

True

In [39]:
'Monty' < 'Montague'

False

Experiment with non-alphabetical characters like numbers and symbols to see where they are placed in the ASCII table.

In [42]:
'01234' < 'a'

True

In [43]:
'~' < 'a'

False

Tuple comparisons have interesting behaviors. The first item in each tuple is compared. If they are not equal, then the evaluation of the tuple is made by evaluating the two elements. If the first two elements are equal, then the second item of each tuple is evaluated. This creates a sequence of checks.

In the example below, both of the first items in each tuple are equal. Therefore the second items of the tuples are evaluated for the expression. `1 < 2` is a true expression, therefore `('Monty', 1) < ('Monty', 2)` is `true`.

* [StackOverflow post on Tuple Comparisons](https://stackoverflow.com/questions/5292303/how-does-tuple-comparison-work-in-python)

In [46]:
('Monty', 1) < ('Monty', 2)

True

## 9.

☼ Write code that removes whitespace at the beginning and end of a string, and normalizes whitespace between words to be a single space character.

1. do this task using `split()` and `join()`
2. do this task using regular expression substitutions