# 4   Writing Structured Programs

# 4.1 Back to the basics 

In [2]:
empty = []
nested = [empty, empty, empty]
nested
[[], [], []]
nested[1].append('Python')
nested
[['Python'], ['Python'], ['Python']]
#Pretty cool

[['Python'], ['Python'], ['Python']]

In [3]:
nested = [[]] * 3
nested[1].append('Python')
nested[1] = ['Monty']
nested
[['Python'], ['Monty'], ['Python']]

[['Python'], ['Monty'], ['Python']]

In [4]:
#This takes the object reference 
foo = ['Monty', 'Python']
bar = foo [1]
foo[1] = 'Bodkin' [2]
bar

'Python'

In [17]:
#To check for reference code we use id()
id(bar)

1241205637448

## Conditionals 
The functions all() and any() can be applied to a list (or other sequence) to check whether all or any items meet some condition:

In [18]:
sent = ['No', 'good', 'fish', 'goes', 'anywhere', 'without', 'a', 'porpoise', '.']
all(len(w) > 4 for w in sent)

False

In [19]:
any(len(w) > 4 for w in sent)

True

Python Sequences 
$$
\begin{array}{|ll|l|}
\hline  {\text { Python Expression }} &  {\text { Comment }} \\
\text { for item in s } & \text { iterate over the items of s } \\
\text { for item in sorted(s) } & \text { iterate over the items of s in order } \\
\text { for item in set(s) } & \text { iterate over unique elements of s } \\
\text { for item in reversed(s) } & \text { iterate over elements of s in reverse } \\
\text { for item in set(s).difference(t) } & \text { iterate over elements of s not in t } \\
\hline
\end{array}
$$

## Generator Expressions

Gets ride of the list comphrension for efficency 

In [21]:
from nltk import word_tokenize
text = '''"When I use a word," Humpty Dumpty said in rather a scornful tone,
"it means just what I choose it to mean - neither more nor less."'''
[w.lower() for w in word_tokenize(text)]

['``',
 'when',
 'i',
 'use',
 'a',
 'word',
 ',',
 "''",
 'humpty',
 'dumpty',
 'said',
 'in',
 'rather',
 'a',
 'scornful',
 'tone',
 ',',
 "''",
 'it',
 'means',
 'just',
 'what',
 'i',
 'choose',
 'it',
 'to',
 'mean',
 '-',
 'neither',
 'more',
 'nor',
 'less',
 '.',
 "''"]

In [25]:
# IF we want to process this further we have a few options 
#list comprehension
max([w.lower() for w in word_tokenize(text)])

'word'

In [24]:
#generator expressions 
max(w.lower() for w in word_tokenize(text))
#Quicker, doesn't require storage for the list. 

'word'

## Coding consistency 

The python creators made a [handbook](http://www.python.org/dev/peps/pep-0008/). Lines should be no longer than 80 characters

In [None]:
#When tryin to multiline an expression we can use () or we can 
if len(syllables) > 4 and len(syllables[2]) == 3 and \ #\tells the SVM to continue this on the next line


## Procedural vs Declarative Style

Procedural level is doing everything at a machine level. IE: 


In [26]:
import nltk.corpus
tokens = nltk.corpus.brown.words(categories='news')
count = 0
total = 0
for token in tokens:
    count += 1
    total += len(token)

While Declarative style is:


In [27]:
total = sum(len(t) for t in tokens)
print(total / len(tokens))

4.401545438271973


## Restricting functions 


In [None]:
def tag(word):
    assert isinstance(word, basestring), "argument to tag() must be a string"
    if word in ['a', 'the', 'all']:
        return 'det'
     else:
        return 'noun'

## Documenting in python

- Use triple quotes to begin documentation
- Use a one-liner to summarize the purpose. 
- after that, one complete sentence per line 
- Use ':' to begin parameters, types, etc. 

In [None]:
def accuracy(reference, test):
    """
    Calculate the fraction of test items that equal the corresponding reference items.

    Given a list of reference values and a corresponding list of test values,
    return the fraction of corresponding values that are equal.
    In particular, return the fraction of indexes
    {0<i<=len(test)} such that C{test[i] == reference[i]}.

        >>> accuracy(['ADJ', 'N', 'V', 'N'], ['N', 'N', 'V', 'ADJ'])
        0.5

    :param reference: An ordered list of reference values
    :type reference: list
    :param test: A list of values to compare against the corresponding
        reference values
    :type test: list
    :return: the accuracy score
    :rtype: float
    :raises ValueError: If reference and length do not have the same length
    """

    if len(reference) != len(test):
        raise ValueError("Lists must have the same length.")
    num_correct = 0
    for x, y in zip(reference, test):
        if x == y:
            num_correct += 1
    return float(num_correct) / len(reference)

# 4.5 Doing more with functions 


In [30]:
sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the',
        'sounds', 'will', 'take', 'care', 'of', 'themselves', '.']
def extract_property(prop):
    return [prop(word) for word in sent]
#Gets the length of each word 
extract_property(len)

[4, 4, 2, 3, 5, 1, 3, 3, 6, 4, 4, 4, 2, 10, 1]

In [31]:
def last_letter(word):
    return word[-1]
extract_property(last_letter)

['e', 'e', 'f', 'e', 'e', ',', 'd', 'e', 's', 'l', 'e', 'e', 'f', 's', '.']

- Notice that parentheses are only used after a function name if we are invoking the function; when we are simply treating the function as an object these are omitted.

Python provides us with one more way to define functions as arguments to other functions, so-called **lambda expressions**. Supposing there was no need to use the above last_letter() function in multiple places, and thus no need to give it a name. We can equivalently write the following:

In [32]:
extract_property(lambda w: w[-1])

['e', 'e', 'f', 'e', 'e', ',', 'd', 'e', 's', 'l', 'e', 'e', 'f', 's', '.']

## Accumulation Functions 

- Search 2: The first time this function is called, it gets as far as the yield statement and pauses. The calling program gets the first word and does any necessary processing. Once the calling program is ready for another word, execution of the function is continued from where it stopped, until the next time it encounters a yield statement. This approach is typically more efficient, as the function only generates the data as it is required by the calling program, and does not need to allocate additional memory to store the output

- Yield: goes into the print statement and does that, rather than save the word in memory. 

In [38]:
def search1(substring, words):
    result = []
    for word in words:
        if substring in word:
            result.append(word)
    return result

def search2(substring, words):
    for word in words:
        if substring in word:
            yield word

In [39]:
for item in search1('zz', nltk.corpus.brown.words()):
    print(item, end=" ")

Grizzlies' fizzled Rizzuto huzzahs dazzler jazz Pezza Pezza Pezza embezzling embezzlement pizza jazz Ozzie nozzle drizzly puzzle puzzle dazzling Sizzling guzzle puzzles dazzling jazz jazz Jazz jazz Jazz jazz jazz Jazz jazz jazz jazz Jazz jazz dizzy jazz Jazz puzzler jazz jazzmen jazz jazz Jazz Jazz Jazz jazz Jazz jazz jazz jazz Jazz jazz jazz jazz jazz jazz jazz jazz jazz jazz Jazz Jazz jazz jazz nozzles nozzle puzzle buzz puzzle blizzard blizzard sizzling puzzled puzzle puzzle muzzle muzzle muezzin blizzard Neo-Jazz jazz muzzle piazzas puzzles puzzles embezzle buzzed snazzy buzzes puzzled puzzled muzzle whizzing jazz Belshazzar Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie's Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie blizzard blizzards blizzard blizzard fuzzy Lazzeri Piazza piazza palazzi Piazza Piazza Palazzo Palazzo Palazzo Piazza Piazza Palazzo palazzo palazzo Palazzo Palazzo Piazza piazza piazza piazza Piazza Piazza Palazzo palazzo Piazza piazz

In [40]:
for item in search2('zz', nltk.corpus.brown.words()):
    print(item, end=" ")

Grizzlies' fizzled Rizzuto huzzahs dazzler jazz Pezza Pezza Pezza embezzling embezzlement pizza jazz Ozzie nozzle drizzly puzzle puzzle dazzling Sizzling guzzle puzzles dazzling jazz jazz Jazz jazz Jazz jazz jazz Jazz jazz jazz jazz Jazz jazz dizzy jazz Jazz puzzler jazz jazzmen jazz jazz Jazz Jazz Jazz jazz Jazz jazz jazz jazz Jazz jazz jazz jazz jazz jazz jazz jazz jazz jazz Jazz Jazz jazz jazz nozzles nozzle puzzle buzz puzzle blizzard blizzard sizzling puzzled puzzle puzzle muzzle muzzle muezzin blizzard Neo-Jazz jazz muzzle piazzas puzzles puzzles embezzle buzzed snazzy buzzes puzzled puzzled muzzle whizzing jazz Belshazzar Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie's Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie Lizzie blizzard blizzards blizzard blizzard fuzzy Lazzeri Piazza piazza palazzi Piazza Piazza Palazzo Palazzo Palazzo Piazza Piazza Palazzo palazzo palazzo Palazzo Palazzo Piazza piazza piazza piazza Piazza Piazza Palazzo palazzo Piazza piazz

# Higher Order Functions 

First, we use this function as the first parameter of filter(), which applies the function to each item in the sequence contained in its second parameter, and only retains the items for which the function returns True.

In [41]:
def is_content_word(word):
    return word.lower() not in ['a', 'of', 'the', 'and', 'will', ',', '.']
sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the',
        'sounds', 'will', 'take', 'care', 'of', 'themselves', '.']
list(filter(is_content_word, sent))

['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves']

In [43]:
#Which is the same as 
[w for w in sent if is_content_word(w)]

['Take', 'care', 'sense', 'sounds', 'take', 'care', 'themselves']

In [44]:
#Or we can use map to apply a function to each word.
lengths = list(map(len, nltk.corpus.brown.sents(categories='news')))
sum(lengths) / len(lengths)

21.75081116158339

In [63]:
sent = nltk.corpus.brown.sents(categories='news')

In [64]:
# We can also use a lambda function rather than an in memory function 
list(map(lambda w: len( filter(lambda c: c.lower() in "aeiou", w) ), sent) )
#Idk why this one won't word 

TypeError: object of type 'filter' has no len()

## Named Arguments

When there are a lot of parameters it is easy to get confused about the correct order. Instead we can refer to parameters by name, and even assign them a default value just in case one was not provided by the calling program. Now the parameters can be specified in any order, and can be omitted.



In [65]:

>>> def repeat(msg='<empty>', num=1):
...     return msg * num
>>> repeat(num=3)

'<empty><empty><empty>'

- These are called **keyword arguments**. If we mix these two kinds of parameters, then we must ensure that the unnamed parameters precede the named ones.

- We can define a function that takes an arbitrary number of unnamed and named parameters, and access them via an in-place list of arguments * **arg**s and an "in-place dictionary" of keyword arguments ** **kwargs**.

In [66]:
def generic(*args, **kwargs):
...     print(args)
...     print(kwargs)
...
>>> generic(1, "African swallow", monty="python")

(1, 'African swallow')
{'monty': 'python'}


In [69]:
# We'll use the variable name *song to demonstrate that there's nothing special about the name *args.

song = [['four', 'calling', 'birds'],
        ['three', 'French', 'hens'],
        ['two', 'turtle', 'doves']]
list(zip(song[0], song[1], song[2]))

[('four', 'three', 'two'),
 ('calling', 'French', 'turtle'),
 ('birds', 'hens', 'doves')]

In [None]:
list(zip(*song))

- It should be clear from the above example that typing *song is just a convenient shorthand, and equivalent to typing out song[0], song[1], song[2].