In [57]:
import nltk
from nltk import word_tokenize
from nltk import FreqDist # need this to access "most_common" method

import urllib2

from IPython.display import Image

import re

import random

## 4.1   Back to the Basics

### Value Assigment

#### Python behaves differently when assigning values to strings and lists:

In [58]:
foo = 'Monty'
bar = foo
foo = 'Python'
bar

'Monty'

In strings, assigning the value of foo to bar, bar is a copy of foo. So when we overwrite foo with a new value 'Python', bar is not affected.

#### Now let's look at lists

In [59]:
foo = ['Monty', 'Python']
bar = foo
foo[1] = 'zzzzz'
bar

['Monty', 'zzzzz']

Why this happened? Here when we assigning the value of foo to bar, we are not assigning the actual values. We are assigning "object reference". So when assigning foo to bar, bar will have the memory location of the object stored. 

#### To summarize, the string copies the value, the list copies the memory location. 

#### To further explain how list works, here is an example:

In [60]:
empty = []

# here we assign empty lists to a new list
nested = [empty, empty, empty] 
nested

[[], [], []]

Now let's only try to change the first empty list in "nested":

In [61]:
nested[1].append('Python')
nested

[['Python'], ['Python'], ['Python']]

#### We only chaged the second empty list, but all got updated. This is because all the empty lists refered to the same object reference (memory location).

#### Exercise

In [62]:
nested = [[]]*3
nested

[[], [], []]

In [63]:
nested[0].append('zzz')
nested

[['zzz'], ['zzz'], ['zzz']]

In [64]:
for list_obj in nested:
    print id(list_obj)

4518160360
4518160360
4518160360


Here we can see all listed are in the same object reference.

#### Modifying an object via an object reference vs. overwriting an object reference

The previous example modifies an object reference

Now let's **overwrite** the object reference with a new object:

In [65]:
nested = [[]]*3
# nested[0].append('zzz')
nested

[[], [], []]

In [66]:
nested[1] = ['7777']
nested

[[], ['7777'], []]

#### The difference between append and assigning value is: append does not change the object reference, it only appends a value to it. Where as assigning values overwirtes the object reference. Once overwritten, it is a new object now.

### Equality

In [67]:
size = 5
python = ['Python']
nested_list = [python] * 5


In [68]:
nested_list[0] == nested_list[1] == nested_list[2] == nested_list[3] == nested_list[4]

True

In [69]:
nested_list[0] is nested_list[1] is nested_list[2] is nested_list[3] is nested_list[4]

True

### Conditionals

#### An if statement will take a nonempty string or list as true, while an empty string or list as false.

In [73]:
list_ = ['nonempty', '', 'oh nonempty again', '', '']
for ele in list_:
    if ele:
        print ele

nonempty
oh nonempty again


#### Difference between if...elif and if...if

### Sequence

#### Zip function

Thus, zip() takes the items of two or more sequences and "zips" them together into a single list of tuples.

In [70]:
words = ['I', 'turned', 'off', 'the', 'spectroroute']
tags = ['noun', 'verb', 'prep', 'det', 'noun']

In [71]:
list(zip(words, tags))

[('I', 'noun'),
 ('turned', 'verb'),
 ('off', 'prep'),
 ('the', 'det'),
 ('spectroroute', 'noun')]

#### enumerate

"enumerate()" returns pairs consisting of an index and the item at that index

In [72]:
list(enumerate(words))

[(0, 'I'), (1, 'turned'), (2, 'off'), (3, 'the'), (4, 'spectroroute')]