# Writing Structured Programs

## Back to the Basics

### Assignment

Let's consider some subtleties with Assignment, one of the most basic programming concepts. 

In [1]:
foo = 'Monty'
bar = foo
foo = 'Python'
bar

'Monty'

When we write `bar = foo`, the value of `foo` i.e. the string `'Monty'` is assigned to `bar`. This means that `bar` is a copy of `foo`. When we overwrite `foo` with a new string `'Python'`, the value of `bar` is not affected. 

However, assignment statements do not always work in the above way. For example, the value of a structured object such as a list is actually just a reference to the object. 

In the example below, when we assign reference of list `foo` to `bar`, and then change the values of `foo`, the values of `bar` will also change.

In [2]:
foo = ['Monty', 'Python']
bar = foo
bar

['Monty', 'Python']

In [3]:
foo[1] = 'Bodkin'
bar

['Monty', 'Bodkin']

Thus, `bar = foo` only copies the object reference of the variable, not its contents.

Here's another experiment using an empty list:

In [9]:
empty = []
nested = [empty, empty, empty]
nested

[[], [], []]

In [10]:
nested[1].append('Python')
nested

[['Python'], ['Python'], ['Python']]

We can see that changing one of the items in the nested list changed them all. This is, of course, because all of the items in the list are just a reference to the one list: `empty`. 

In the below example, we'll see that when we assign a new value to one of the elements of the list, it does not propagate to the others.

In [14]:
nested = [[]] * 3
nested

[[], [], []]

In [15]:
nested[1].append('Python')
nested

[['Python'], ['Python'], ['Python']]

In [16]:
nested[1] = ['Monty']
nested

[['Python'], ['Monty'], ['Python']]

Above, we have **overwritten** the reference at index 1, to another object. This is the difference between modifying an object via an object reference, and overwriting an object reference.

### Equality

Python provides two ways to check that a pair of items are the same. In addition, the `is` operator tests for object identity. 

Let's use the above to verify our observations from the earlier section.

In [17]:
size = 5
python = ['Python']
snake_nest = [python] * size
snake_nest

[['Python'], ['Python'], ['Python'], ['Python'], ['Python']]

In [18]:
snake_nest[0] == snake_nest[1] == snake_nest[2] == snake_nest[3] == snake_nest[4]

True

In [19]:
snake_nest[0] is snake_nest[1] is snake_nest[2] is snake_nest[3] is snake_nest[4]

True

Now let's put a new python object in this nest and check if the objects are identical or not:

In [20]:
import random
position = random.choice(range(size))
snake_nest[position] = ['Python']
snake_nest

[['Python'], ['Python'], ['Python'], ['Python'], ['Python']]

In [21]:
snake_nest[0] == snake_nest[1] == snake_nest[2] == snake_nest[3] == snake_nest[4]

True

In [22]:
snake_nest[0] is snake_nest[1] is snake_nest[2] is snake_nest[3] is snake_nest[4]

False

We can actually check which position contains the object that's the odd one out:

In [23]:
[id(snake) for snake in snake_nest]

[4439307440, 4438805872, 4439307440, 4439307440, 4439307440]

### Conditionals

In the condition of an `if` statement, a nonempty string or list is evaluated as true, while an empty string or list is evaluated as false.

In [24]:
mixed = ['cat', '', ['dog'], []]
for element in mixed:
    if element:
        print(element)

cat
['dog']


Thus, we don't need to say `if len(element) > 0` in the condition.

Let's see what the difference is between using `if...elif` as opposed to using a couple `if` statements in a row:

In [25]:
animals = ['cat', 'dog']
if 'cat' in animals:
    print(1)
elif 'dog' in animals:
    print(2)

1


Since the if clause of the statement is satisfied, Python never tries to evaluate the elif clause, so we never get to print out 2. By contrast, if we replaced the elif by an if, then we would print out both 1 and 2.

We can use the functions `all()` or `any()` on a list (or other sequence) to check whether all or any items meet some condition:

In [26]:
sent = ['No', 'good', 'fish', 'goes', 'anywhere', 'without', 'a', 'porpoise', '.']

In [27]:
all(len(w) > 4 for w in sent)

False

In [28]:
any(len(w) > 4 for w in sent)

True

## Sequences

Apart from strings and lists, there's another kind of sequence called a **tuple**. Tuples are formed with the comma operator, and typically enclosed using parentheses. 

In [29]:
t = 'walk', 'fem', 3

In [30]:
t

('walk', 'fem', 3)

In [31]:
# Accessing tuple members by index
t[0]

'walk'

In [32]:
# Slicing a tuple
t[1:]

('fem', 3)

In [33]:
# Length of a tuple
len(t)

3

Comparison of strings, lists and tuples

In [34]:
raw = "I turned off the spectroroute"
text =  ['I', 'turned', 'off', 'the', 'spectroroute']
pair = (6, 'turned')

In [35]:
raw[2], text[3], pair[1]

('t', 'the', 'turned')

In [37]:
raw[-3:], text[-3:], pair[-3:]

('ute', ['off', 'the', 'spectroroute'], (6, 'turned'))

In [38]:
len(raw), len(text), len(pair)

(29, 5, 2)

As you can see above, we computed multiple values on a single line. These comma-separated expressions are also tuples.

### Operating on Sequence Types

Below is an example of converting a `FreqDist` to a sequence:

In [41]:
import nltk

In [42]:
raw = 'Red lorry, yellow lorry, red lorry, yellow lorry.'
text = nltk.word_tokenize(raw)
fdist = nltk.FreqDist(text)
sorted(fdist)

[',', '.', 'Red', 'lorry', 'red', 'yellow']

In [43]:
for key in fdist:
    print(key + ':', fdist[key], end='; ')

Red: 1; lorry: 4; ,: 3; yellow: 2; red: 1; .: 1; 

In the next example, we use tuples to re-arrange the contents of a list:

In [44]:
words = ['I', 'turned', 'off', 'the', 'spectroroute']

In [45]:
words[2], words[3], words[4] = words[3], words[4], words[2]

In [46]:
words

['I', 'turned', 'the', 'spectroroute', 'off']

Using `zip()`, we can zip together items in two or more sequences:

In [47]:
words

['I', 'turned', 'the', 'spectroroute', 'off']

In [48]:
tags = ['noun', 'verb', 'prep', 'det', 'noun']

In [49]:
zip(words, tags)

<zip at 0x1a1b8b2640>

In [50]:
list(zip(words, tags))

[('I', 'noun'),
 ('turned', 'verb'),
 ('the', 'prep'),
 ('spectroroute', 'det'),
 ('off', 'noun')]

Another sequence function, `enumerate()` returns the pairs consisting of an index and the item at that index:

In [51]:
list(enumerate(words))

[(0, 'I'), (1, 'turned'), (2, 'the'), (3, 'spectroroute'), (4, 'off')]

For some NLP tasks, we might want to cut up a sequence into two or more parts. For example, we might want to train a model on 90% of the data and test it on the remaining 10%. 

Here's how we do that by using a location to cut our data at:

In [52]:
text = nltk.corpus.nps_chat.words()
cut = int(0.9 * len(text))
training_data, test_data = text[:cut], text[cut:]

In [55]:
# Verifying that the data is not lost 
# or duplicating in the above process
text == training_data + test_data

True

In [56]:
# Verifying the ratio of sizes
len(training_data) / len(test_data)

9.0

### Combining Different Sequence Types

Here's an example of sorting the words in a string by their length using list comprehensions:

In [57]:
words = "I turned off the spectroroute".split()
wordlens = [(len(word), word) for word in words]
wordlens

[(1, 'I'), (6, 'turned'), (3, 'off'), (3, 'the'), (12, 'spectroroute')]

In [58]:
wordlens.sort()
" ".join(w for (_, w) in wordlens)

'I off the turned spectroroute'

### Generator Expressions

Take the example below where we have a text that we wanted to normalise using a list comprehension.

In [59]:
text = '''"When I use a word," Humpty Dumpty said in rather a scornful tone,
 "it means just what I choose it to mean - neither more nor less."'''

In [62]:
print([w.lower() for w in nltk.word_tokenize(text)])

['``', 'when', 'i', 'use', 'a', 'word', ',', "''", 'humpty', 'dumpty', 'said', 'in', 'rather', 'a', 'scornful', 'tone', ',', '``', 'it', 'means', 'just', 'what', 'i', 'choose', 'it', 'to', 'mean', '-', 'neither', 'more', 'nor', 'less', '.', "''"]


If we wanted to process the words further, we can wrap the above list comprehension around the required function:

In [64]:
max([w.lower() for w in nltk.word_tokenize(text)])

'word'

Or we can use a generator expression, by omitting the `[]` as follows:

In [66]:
max(w.lower() for w in nltk.word_tokenize(text))

'word'

Using a generator expression has a fundamental advantage: when using one, the data is streamed to the calling function. Instead of allocating storage for the whole list object, using a generator instead just stores the required information, in the above case it is the word which comes the latest in the lexicographic sort order. Thus, it is more efficient to use generator expressions in cases where we deal with potentially large amounts of data.

## Questions of Style

### Procedural vs Declarative Style

Consider the following program to compute the average length of words in the Brown Corpus:


In [70]:
tokens = nltk.corpus.brown.words(categories='news')

In [71]:
count = 0
total = 0
for token in tokens:
    count += 1
    total += len(token)

In [72]:
total / count

4.401545438271973

The above is an example of a *procedural style*, dictating operations step by step. 

The example below computes the same thing:

In [73]:
total = sum(len(t) for t in tokens)
print(total/len(tokens))

4.401545438271973


The first line uses a generator expression to sum the token lengths, while the second line computes the average as before. While both of the programs above do the same thing, in the second instance, the implementation details of step by step computation are left to the Python interpreter. This is an example of the *declarative style* of programming.

Let's take a look at an extreme example:

In [None]:
word_list = []
i = 0
while i < len(tokens):
    j = 0
    while j < len(word_list) and word_list[j] <= tokens[i]:
        j += 1
    if j == 0 or tokens[i] != word_list[j-1]:
        word_list.insert(j, tokens[i])
    i += 1

The purpose of the above program might be obscure without a deeper look, but its equivalent declarative version uses familar built-in functions and its purpose is instantly clear:

In [79]:
word_list = sorted(set(tokens))
print(word_list[:20])

['!', '$1', '$1,000', '$1,000,000,000', '$1,500', '$1,500,000', '$1,600', '$1,800', '$1.1', '$1.4', '$1.5', '$1.80', '$10', '$10,000', '$10,000-per-year', '$100', '$100,000', '$102,285,000', '$109', '$11.50']


The function `enumerate(s)` processes a sequence s and produces a tuple of the form (i, s[i]) for each item in s, starting with (0, s[0]).

Let's take a look at an example of enumerating over a frequency distribution to print the ranks of words according to their frequencies.

In [80]:
fd = nltk.FreqDist(nltk.corpus.brown.words())
cumulative = 0.0
most_common_words = [word for (word, count) in fd.most_common()]
for rank, word in enumerate(most_common_words):
    cumulative += fd.freq(word)
    print("%3d %6.2f%% %s" % (rank + 1, cumulative * 100, word))
    if cumulative > 0.25:
        break

  1   5.40% the
  2  10.42% ,
  3  14.67% .
  4  17.78% of
  5  20.19% and
  6  22.40% to
  7  24.29% a
  8  25.97% in


It might be tempting to use loop variables to store a maximum or minimum value seen so far. 

In [81]:
text = nltk.corpus.gutenberg.words('milton-paradise.txt')
longest = ''
for word in text:
    if len(word) > len(longest):
        longest = word

In [82]:
longest

'unextinguishable'

A more transparent solution will be to use list comprehensions:

In [84]:
maxlen = max(len(word) for word in text)
maxlen

16

In [85]:
[word for word in text if len(word) == maxlen]

['unextinguishable',
 'transubstantiate',
 'inextinguishable',
 'incomprehensible']

### Some Legitimate Uses for Counters

Let's take a look at a case where we want to use loop variables in a list comprehension. For example, using a loop variable to extract successive overlapping n-grams from a list:

In [86]:
sent = ['The', 'dog', 'gave', 'John', 'the', 'newspaper']
n = 3
[sent[i:i+n] for i in range(len(sent)-n+1)]

[['The', 'dog', 'gave'],
 ['dog', 'gave', 'John'],
 ['gave', 'John', 'the'],
 ['John', 'the', 'newspaper']]

For the above example, it makes more sense to use NLTK's `bigrams(text)`, or the general purpose `ngrams(text, n)` method. 

Below is an example where we use loop variables in building multidimensional structures. Let's build an array with m rows and n columns, where each cell is a set using a nested list comprehension:

In [87]:
m, n = 3, 7
array = [[set() for i in range(n)] for j in range(m)]
array

[[set(), set(), set(), set(), set(), set(), set()],
 [set(), set(), set(), set(), set(), set(), set()],
 [set(), set(), set(), set(), set(), set(), set()]]

In [89]:
array[2][5].add("Alice")
array

[[set(), set(), set(), set(), set(), set(), set()],
 [set(), set(), set(), set(), set(), set(), set()],
 [set(), set(), set(), set(), set(), {'Alice'}, set()]]

It is incorrect to do the above using multiplication, for reasons related to object copying as mentioned earlier in the chapter/notebook.

In [90]:
array = [[set()] * n] * m
array

[[set(), set(), set(), set(), set(), set(), set()],
 [set(), set(), set(), set(), set(), set(), set()],
 [set(), set(), set(), set(), set(), set(), set()]]

In [91]:
array[2][5].add(7)
array

[[{7}, {7}, {7}, {7}, {7}, {7}, {7}],
 [{7}, {7}, {7}, {7}, {7}, {7}, {7}],
 [{7}, {7}, {7}, {7}, {7}, {7}, {7}]]

## Functions: The Foundation of Structured Programming

A function is a segment of code that can be given a meaningful name and which performs a well-defined task. Functions allow us to abstract away from the details, to see a bigger picture, and to program more effectively.

### Function Inputs and Outputs

We pass information to functions using a function's parameters. Here's an example function:


In [92]:
def repeat(msg, num):
    return " ".join([msg] * num)

In [93]:
monty = "Monty Python"

In [94]:
repeat(monty, 3)

'Monty Python Monty Python Monty Python'

It is not necessary to have any parameters:

In [95]:
def monty():
    return "Monty Python"

In [96]:
monty()

'Monty Python'

It is also possible to pass a function as a parameter of another function. To the callling program, it looks like if the function call has been replaced with the function result:

In [97]:
repeat(monty(), 3)

'Monty Python Monty Python Monty Python'

In [98]:
repeat("Monty Python", 3)

'Monty Python Monty Python Monty Python'

A Python function is not required to have a return statement. Some functions do their work as a side effect, printing a result, modifying a file, or updating the contents of a parameter to the function (such functions are called "procedures" in some other programming languages).

### Parameter Passing

Earlier in the chapter, we saw that Python's assignment works on values, but the value of a structured object is a reference to that object. The same is true for functions.

In the following code, `set_up()` has two parameters, both of which are modified inside the function.

In [99]:
def set_up(word, properties):
    word = 'lolcat'
    properties.append('noun')
    properties = 5

Now we assign an empty string to w and an empty list to p.

In [101]:
w = ''
p = []

In [102]:
set_up(w, p)

In [103]:
w

''

In [104]:
p

['noun']

We notice that after calling the function, w is unchanged while p is changed. This is because the Python's call-by-calue works like assignment. 

The parameter passing for w in the above function `set_up()` is similar to the following series of assignments:

In [106]:
w = ''
word = w
word = 'lolcat'
w

''

In [109]:
p = []
properties = p
properties.append('noun')
properties = 5
p

['noun']

### Checking Parameter Types

Python does not allow us to declare the type of a variable when we write a program. An advantage of this is that the functions are flexible with the type of their arguments. 

However we often might want to write programs for later use by others, and / or program in a defensive style. LOok at the example below where the author of the `tag()` function assumed that its argument would always be a string:

In [110]:
def tag(word):
    if word in ['a', 'the', 'all']:
        return 'det'
    else:
        return 'noun'

In [111]:
tag('the')

'det'

In [112]:
tag('knight')

'noun'

In [113]:
tag(["'Tis", 'but', 'a', 'scratch'])

'noun'

The above function works as expected when the argument is a string, but fails to complain and returns an incorrect result when we pass it a list. 

A solution to the above problem would be to use an `assert` statement:

In [116]:
def tag(word):
    assert isinstance(word, str), "argument to tag() must be a string"
    if word in ['a', 'the', 'all']:
        return 'det'
    else:
        return 'noun'

In [117]:
tag(["'Tis", 'but', 'a', 'scratch'])

AssertionError: argument to tag() must be a string

#### Improving Functions

Consider the following function `freq_words()`. It updates the contents of a frequency distribution that is passed in as a parameter, and it also prints a list of the n most frequent words. 

In [121]:
from urllib import request
from bs4 import BeautifulSoup

def freq_words(url, freqdist, n):
    html = request.urlopen(url).read().decode('utf8')
    raw = BeautifulSoup(html, 'html.parser').get_text()
    for word in nltk.word_tokenize(raw):
        freqdist[word.lower()] += 1
    result = []
    for word, count in freqdist.most_common(n):
        result = result + [word]
    print(result)

In [122]:
constitution = "http://www.archives.gov/exhibits/charters/constitution_transcript.html"
fd = nltk.FreqDist()
freq_words(constitution, fd, 30)

["''", ',', ':', ':1', 'the', ';', '(', ')', '``', '{', '}', 'of', '?', 'url', 'https', '@', 'import', 'q93x43', "'", 'archives', '#', 'and', '.', '[', ']', 'national', 'a', 'documents', 'founding', 'to']


The above function has two side effects: it modifies the contents of its second parameter, and it prints results. The function would be easier to understand and re-use, if we initialize `FreqDist()` inside the function, and move the selection/display of results to the calling program. 

With that in mind, let's refactor the above function.

In [125]:
def freq_words(url, n):
    html = request.urlopen(url).read().decode('utf8')
    text = BeautifulSoup(html, 'html.parser').get_text()
    freqdist = nltk.FreqDist(word.lower() for word in nltk.word_tokenize(text))
    return [word for (word, _) in fd.most_common(n)]

In [127]:
print(freq_words(constitution, 30))

["''", ',', ':', ':1', 'the', ';', '(', ')', '``', '{', '}', 'of', '?', 'url', 'https', '@', 'import', 'q93x43', "'", 'archives', '#', 'and', '.', '[', ']', 'national', 'a', 'documents', 'founding', 'to']


### Documenting Functions

Below is an example of a complete docstring, consisting of a one-line summary, detailed epxlanation, a doctest example, and Sphinx markup specifying the parameters, types, return type, and exceptions.

In [128]:
def accuracy(reference, test):
    """
    Calculate the fraction of test items that equal the corresponding reference items.

    Given a list of reference values and a corresponding list of test values,
    return the fraction of corresponding values that are equal.
    In particular, return the fraction of indexes
    {0<i<=len(test)} such that C{test[i] == reference[i]}.

        >>> accuracy(['ADJ', 'N', 'V', 'N'], ['N', 'N', 'V', 'ADJ'])
        0.5

    :param reference: An ordered list of reference values
    :type reference: list
    :param test: A list of values to compare against the corresponding
        reference values
    :type test: list
    :return: the accuracy score
    :rtype: float
    :raises ValueError: If reference and length do not have the same length
    """

    if len(reference) != len(test):
        raise ValueError("Lists must have the same length.")
    num_correct = 0
    for x, y in zip(reference, test):
        if x == y:
            num_correct += 1
    return float(num_correct) / len(reference)

## Doing More with Functions

### Functions as Arguments

Functions can be passed as arguments to another function. This way, we can abstract out operation and apply different operations to the same data. Take a look at the example below:

In [129]:
sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the',
         'sounds', 'will', 'take', 'care', 'of', 'themselves', '.']

In [130]:
def extract_property(prop):
    return [prop(word) for word in sent]

In [131]:
extract_property(len)

[4, 4, 2, 3, 5, 1, 3, 3, 6, 4, 4, 4, 2, 10, 1]

In [132]:
def last_letter(word):
    return word[-1]

In [133]:
extract_property(last_letter)

['e', 'e', 'f', 'e', 'e', ',', 'd', 'e', 's', 'l', 'e', 'e', 'f', 's', '.']

Python provides one more way to define functions as arguments of other functions using: **lambda expressions**.

In [134]:
extract_property(lambda w: w[-1])

['e', 'e', 'f', 'e', 'e', ',', 'd', 'e', 's', 'l', 'e', 'e', 'f', 's', '.']

In our next example, we pass a function to the `sorted()` function to supply our own sorting mechanism.

In [137]:
# Sorting without any additional parameters
print(sorted(sent))

[',', '.', 'Take', 'and', 'care', 'care', 'of', 'of', 'sense', 'sounds', 'take', 'the', 'the', 'themselves', 'will']


In [147]:
# Sorting in decreasing length
print(sorted(sent, key = lambda x: len(x), reverse=True))

['themselves', 'sounds', 'sense', 'Take', 'care', 'will', 'take', 'care', 'the', 'and', 'the', 'of', 'of', ',', '.']


### Accumulative Functions

These functions initialize some storage, iterate over input to build it up, and lastly, return the final object.

A standard way to do this is using an empty list as shown in `search1()` below:

In [148]:
def search1(substring, words):
    result = []
    for word in words:
        if substring in word:
            result.append(word)
    return result

In [151]:
for item in search1('zzl', nltk.corpus.brown.words()):
    print(item, end=" ")

Grizzlies' fizzled dazzler embezzling embezzlement nozzle drizzly puzzle puzzle dazzling Sizzling guzzle puzzles dazzling puzzler nozzles nozzle puzzle puzzle sizzling puzzled puzzle puzzle muzzle muzzle muzzle puzzles puzzles embezzle puzzled puzzled muzzle dazzling puzzling dazzling dazzling puzzles puzzling puzzling dazzle puzzle dazzling puzzled frazzled puzzling dazzled bedazzlement bedazzled nozzles nozzles dazzles puzzling puzzling puzzling puzzle muzzle puzzled nozzle puzzled dazzling nozzle grizzled muzzle puzzled puzzle muzzle drizzle drizzle drizzle sizzled puzzled puzzled puzzled frizzled drizzle drizzle drizzling drizzling puzzle puzzling puzzled puzzled dazzling muzzle muzzle muzzle sizzled puzzlement frizzling puzzled puzzled puzzled dazzling muzzles sizzle grizzly guzzled nuzzled Puzzled dazzled puzzled puzzling puzzled puzzled 

Another way to do this would be to use a generator function:

In [152]:
def search2(substring, words):
    for word in words:
        if substring in word:
            yield word

In [153]:
for item in search2('zzl', nltk.corpus.brown.words()):
    print(item, end=" ")

Grizzlies' fizzled dazzler embezzling embezzlement nozzle drizzly puzzle puzzle dazzling Sizzling guzzle puzzles dazzling puzzler nozzles nozzle puzzle puzzle sizzling puzzled puzzle puzzle muzzle muzzle muzzle puzzles puzzles embezzle puzzled puzzled muzzle dazzling puzzling dazzling dazzling puzzles puzzling puzzling dazzle puzzle dazzling puzzled frazzled puzzling dazzled bedazzlement bedazzled nozzles nozzles dazzles puzzling puzzling puzzling puzzle muzzle puzzled nozzle puzzled dazzling nozzle grizzled muzzle puzzled puzzle muzzle drizzle drizzle drizzle sizzled puzzled puzzled puzzled frizzled drizzle drizzle drizzling drizzling puzzle puzzling puzzled puzzled dazzling muzzle muzzle muzzle sizzled puzzlement frizzling puzzled puzzled puzzled dazzling muzzles sizzle grizzly guzzled nuzzled Puzzled dazzled puzzled puzzling puzzled puzzled 

Using a generator is more efficient, as the function only generates the data as it is required by the calling program, and does not need to allocate additional memory to store the output.

Here's another example of using a generator function to produce all permutations of a list of words:

In [157]:
def permutations(seq):
    if len(seq) <=1:
        yield seq
    else:
        for perm in permutations(seq[1:]):
            for i in range(len(perm)+1):
                yield perm[:i] + seq[0:1] + perm[i:]

In [158]:
list(permutations(['police', 'fish', 'buffalo']))

[['police', 'fish', 'buffalo'],
 ['fish', 'police', 'buffalo'],
 ['fish', 'buffalo', 'police'],
 ['police', 'buffalo', 'fish'],
 ['buffalo', 'police', 'fish'],
 ['buffalo', 'fish', 'police']]

### Higher-Order Functions

Let's define a function `is_content_word()` which checks whether a word is from the open class of content words. We will use this function as a parameter of the higher order function `filter()`, which applies the function to each item in the sequence contained in its second parameter. It only retains the items for which the function returns `True`.

In [159]:
def is_content_word(word):
    return word.lower() not in ['a', 'of', 'the', 'and', 'will', ',', '.']

In [160]:
sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the',
         'sounds', 'will', 'take', 'care', 'of', 'themselves', '.']

In [161]:
list(filter(is_content_word, sent))

['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves']

In [162]:
# Validating equivalence
[w for w in sent if is_content_word(w)]

['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves']

`map()` is another higher-order function, which applies a function to every item in a sequence:


In [163]:
lengths = list(map(len, nltk.corpus.brown.sents(categories='news')))
sum(lengths) / len(lengths)

21.75081116158339

In [164]:
# Validating equivalence
lengths = [len(sent) for sent in nltk.corpus.brown.sents(categories='news')]
sum(lengths) / len(lengths)

21.75081116158339

We can also provide a lambda expressions to the higher-order functions. The below example counts the numer of vowels in each word:

### Named Arguments

We can refer to function parameters by name, or even assign them a default value in case the calling program does not provide one. These are called keyword arguments. 

If we mix keyword and unnamed parameters, we must ensure that unnamed parameters precede the named ones. 

In [167]:
def repeat(msg="<empty>", num=1):
    return msg * num

In [168]:
repeat(num=3)

'<empty><empty><empty>'

In [169]:
repeat(msg="Alice")

'Alice'

In [170]:
repeat(msg="Alice", num=3)

'AliceAliceAlice'

We can define a function that takes an arbitary number of unnamed and named parameters, and access them via an in-place list of arguments `*args*` and an in-place dict of keyword arguments `**kwargs`.

In [171]:
def generic(*args, **kwargs):
    print(args)
    print(kwargs)

In [172]:
generic(1, "African swallow", monty="python")

(1, 'African swallow')
{'monty': 'python'}


When `*args` appears as a function parameter, it corresponds to all the unnamed parameters of the funciton. 

Let's take an example of using the `zip()` function below:

In [173]:
song = [['four', 'calling', 'birds'],
         ['three', 'French', 'hens'],
         ['two', 'turtle', 'doves']]

In [174]:
list(zip(song[0], song[1], song[2]))

[('four', 'three', 'two'),
 ('calling', 'French', 'turtle'),
 ('birds', 'hens', 'doves')]

In [175]:
list(zip(*song))

[('four', 'three', 'two'),
 ('calling', 'French', 'turtle'),
 ('birds', 'hens', 'doves')]

It should be clear from the above example that typing *song is just a convenient shorthand, and equivalent to typing out song[0], song[1], song[2].