怎么能写出结构良好、可读的程序，你和其他人将能够很容易的重新使用它？
基本结构块，如循环、函数以及赋值，是如何执行的？
Python 编程的陷阱有哪些，你怎么能避免它们吗？

In [2]:
python = ['Python']
snake_nest = [python] * 5
[id(snake) for snake in snake_nest]

[2221267693064, 2221267693064, 2221267693064, 2221267693064, 2221267693064]

In [5]:
import random
size = 5 
position = random.choice(range(size))
snake_nest[position] = ['Python']
snake_nest
[id(snake) for snake in snake_nest]

[2221268030408, 2221267693064, 2221267945032, 2221267693064, 2221267693064]

In [6]:
words = ['I', 'turned', 'off', 'the', 'spectroroute']
tags = ['noun', 'verb', 'prep', 'det', 'noun']
list(zip(words, tags))

[('I', 'noun'),
 ('turned', 'verb'),
 ('off', 'prep'),
 ('the', 'det'),
 ('spectroroute', 'noun')]

In [7]:
list(enumerate(words))

[(0, 'I'), (1, 'turned'), (2, 'off'), (3, 'the'), (4, 'spectroroute')]

In [12]:
import nltk
text = nltk.corpus.nps_chat.words()
cut = int(0.9 * len(text))
training_data, test_data = text[:cut], text[cut:]
cut

40509

In [11]:
text == training_data + test_data
len(training_data) / len(text)

0.9

In [14]:
# 生成器表达式
from nltk import word_tokenize
text = '''"When I use a word," Humpty Dumpty said in rather a scornful tone,
... "it means just what I choose it to mean - neither more nor less."'''
[w.lower() for w in word_tokenize(text)]

['``',
 'when',
 'i',
 'use',
 'a',
 'word',
 ',',
 "''",
 'humpty',
 'dumpty',
 'said',
 'in',
 'rather',
 'a',
 'scornful',
 'tone',
 ',',
 '...',
 '``',
 'it',
 'means',
 'just',
 'what',
 'i',
 'choose',
 'it',
 'to',
 'mean',
 '-',
 'neither',
 'more',
 'nor',
 'less',
 '.',
 "''"]

In [17]:
max([w.lower() for w in word_tokenize(text)])

'word'

这些被称为关键字参数。如果我们混合使用这两种参数，就必须确保未命名的参数在命名的参数前面。必须是这样，因为未命名参数是根据位置来定义的。我们可以定义一个函数，接受任意数量的未命名和命名参数，并通过一个就地的参数列表*args和一个就地的关键字参数字典**kwargs来访问它们

In [20]:
def generic(*args, **kwargs):
    print(args)
    print(kwargs)
    
generic(1, "African swallow", monty="python")

(1, 'African swallow')
{'monty': 'python'}


In [21]:
generic("African swallow", monty="python")

('African swallow',)
{'monty': 'python'}


## 调试技术

In [27]:
def find_words(text, wordlength, result=[]):
...     for word in text:
...         if len(word) == wordlength:
...             result.append(word)
...     return result

In [28]:
import pdb
find_words(['cat'], 3) # [_first-run]

['cat']

In [29]:
pdb.run("find_words(['dog'], 3)") # [_second-run]

> <string>(1)<module>()
(Pdb) step
--Call--
> <ipython-input-27-4465ec416688>(1)find_words()
-> def find_words(text, wordlength, result=[]):
(Pdb) call
*** NameError: name 'call' is not defined
(Pdb) args
text = ['dog']
wordlength = 3
result = ['cat']
(Pdb) wordlength
3
(Pdb) return
--Return--
> <ipython-input-27-4465ec416688>(5)find_words()->['cat', 'dog']
-> return result
(Pdb) stop
*** NameError: name 'stop' is not defined
(Pdb) q


## 防御性编程

为了避免一些调试的痛苦，养成防御性的编程习惯是有益的。不要写20行程序然后测试它，而是自下而上的打造一些明确可以运作的小的程序片。每次你将这些程序片组合成更大的单位都要仔细的看它是否能如预期的运作。考虑在你的代码中添加assert语句，指定变量的属性，例如assert(isinstance(text, list))。如果text的值在你的代码被用在一些较大的环境中时变为了一个字符串，将产生一个AssertionError，于是你会立即得到问题的通知。

## 算法设计

In [30]:
def factorial1(n):
...     result = 1
...     for i in range(n):
...         result *= (i+1)
...     return result

def factorial2(n):
...     if n == 1:
...         return 1
...     else:
...         return n * factorial2(n-1)

In [37]:
def size1(s):
...     return 1 + sum(size1(child) for child in s.hyponyms())

def size2(s):
...     layer = [s] [1]
...     total = 0
...     while layer:
...         total += len(layer) [2]
...         layer = [h for c in layer for h in c.hyponyms()] [3]
...     return total

In [38]:
from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
size1(dog)

190

In [41]:
# 时间与空间

import re


def raw(file):
    contents = open(file).read()
    contents = re.sub(r'<.*?>', ' ', contents)
    contents = re.sub('\s+', ' ', contents)
    return contents

def snippet(doc, term):
    text = ' '*30 + raw(doc) + ' '*30
    pos = text.index(term)
    return text[pos-30:pos+30]

print("Building Index...")
files = nltk.corpus.movie_reviews.abspaths()
idx = nltk.Index((w, f) for f in files for w in raw(f).split())

query = ''
while query != "quit":
    query = input("query> ")     # use raw_input() in Python 2
    if query in idx:
        for doc in idx[query]:
            print(snippet(doc, query))
    else:
        print("Not found")

Building Index...
query> zxkjchaskjfsa
Not found
query> quit
s funded by her mother . lucy quit working professionally 10
erick . i disliked that movie quite a bit , but since " prac
t disaster . babe ruth didn't quit baseball after one season
o-be fiance . i think she can quit that job and get a more r
 and rose mcgowan should just quit acting . she has no chari
and get a day job . and don't quit it .                     
 kubrick , alas , should have quit while he was ahead . this
everyone involved should have quit while they were still ahe
l die . so what does joe do ? quit his job , of course ! ! w
red " implant . he's ready to quit the biz and get a portion
hat he always recorded , they quit and become disillusioned 
 admit that i ? ? ? ve become quite the " scream " fan . no 
 again , the fact that he has quit his job to feel what it's
school reunion . he has since quit his job as a travel journ
ells one of his friends , " i quit school because i didn't l
ms , cursing off the bos

In [42]:
# 动态规划
from numpy import arange
from matplotlib import pyplot


In [46]:
colors = 'rgbcmyk' # red, green, blue, cyan, magenta, yellow, black

def bar_chart(categories, words, counts):
    "Plot a bar chart showing counts for each word by category"
    
    ind = arange(len(words))
    width = 1 / (len(categories) + 1)
    
    bar_groups = []
    for c in range(len(categories)):
        bars = pyplot.bar(ind+c*width, counts[categories[c]], width,
                         color=colors[c % len(colors)])
        bar_groups.append(bars)
        
    pyplot.xticks(ind+width, words)
    pyplot.legend([b[0] for b in bar_groups], categories, loc='upper left')
    pyplot.ylabel('Frequency')
    pyplot.title('Frequency of Six Modal Verbs by Genre')
    pyplot.show()
    
# bar_chart()

TypeError: bar_chart() missing 3 required positional arguments: 'categories', 'words', and 'counts'

In [44]:
from matplotlib import use, pyplot

In [45]:
use('Agg')
pyplot.savefig('modals.png')