# 3.1 Data Structures and Sequences

## 3.1.1 Tuple

A tuple is a fixed-length, immutable sequence of Python objects

In [2]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [3]:
tup = (4, 5, 6)
tup

(4, 5, 6)

In [5]:
nested_tup = (4, 5, 6) ,(7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [6]:
tuple([4, 0, 2])

(4, 0, 2)

In [8]:
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [9]:
tup[0]

's'

While the objects stored in a tuple may be mutable themselves, once the tuple is created, it is not possible to modify which object is stored in each slot

In [11]:
tup = tuple(['foo', [1,2], True])
tup

('foo', [1, 2], True)

In [13]:
try: 
    tup[2] = True
except TypeError:
    print('tuples are immutable')

tuples are immutable


In [12]:
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in place 

In [14]:
tup[1]

[1, 2]

In [15]:
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

In [16]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [17]:
('foo','bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking tuples

In [18]:
tup = (4, 5, 6)
tup

(4, 5, 6)

In [19]:
a, b, c = tup
a

4

In [20]:
b

5

In [21]:
c

6

In [22]:
tup = 4, 5, (6, 7)
tup

(4, 5, (6, 7))

In [23]:
a, b, (c, d) = tup

In [24]:
c

6

In [25]:
d

7

In [26]:
# swap variable names 
tmp = a
a = b
b = tmp 

In [27]:
# swap variable names using tuples
a,b = 1,2

In [28]:
a

1

In [29]:
b

2

In [30]:
b,a = a,b

In [31]:
a

2

In [32]:
b

1

A common use of variable unpacking is iterating over sequences of tuples or lists 

In [33]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print('a = {0}, b = {1}, c = {2}'.format(a, b, c))

a = 1, b = 2, c = 3
a = 4, b = 5, c = 6
a = 7, b = 8, c = 9


In [34]:
seq[0]

(1, 2, 3)

In [35]:
values = 1, 2, 3, 4, 5
a, b, *rest = values 

In [36]:
a, b

(1, 2)

In [37]:
rest

[3, 4, 5]

In [38]:
a, b, *_ = values

In [39]:
_

[3, 4, 5]

#### Tuple methods

In [40]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2) # count the number of occurrences of a value 

4

## 3.1.2 List

lists are variable-length and their contents can be modified in-place 

In [44]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [45]:
b_list[1] = 'peekaboo'
b_list

['foo', 'peekaboo', 'baz']

Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions

The list function is frequently used in data processing as a way to materialize an iterator or generator expression

In [46]:
gen = range(10)
gen

range(0, 10)

In [47]:
type(gen)

range

In [48]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### Adding and removing elements

In [49]:
b_list

['foo', 'peekaboo', 'baz']

In [50]:
b_list.append('dwarf')
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

In [51]:
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

insert is computationally expensive compared with append, because references to subsequent elements have to be shifted internally to make room for the new element. If wants to insert elements at both the beginning and end of a sequence, use collections.deque, a double-ended queue, for this purpose

In [52]:
b_list.pop(2) # the inverse operation to insert is pop, which removes and returns an elements at a particular index

'peekaboo'

In [53]:
b_list

['foo', 'red', 'baz', 'dwarf']

In [54]:
b_list.append('foo')
b_list

['foo', 'red', 'baz', 'dwarf', 'foo']

In [55]:
b_list.remove('foo') # elements can be removed by value with remove, which locates the first such value and removes it from the last
b_list

['red', 'baz', 'dwarf', 'foo']

In [56]:
'dwarf' in b_list

True

In [57]:
'dwarf' not in b_list

False

Checking whether a list contains a value is a lot slower than doing so with dicts and sets, as python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time

#### Concatenating and combining lists 

In [58]:
[4,None,'foo'] + [7,8,(2,3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [59]:
[4,None,'foo'] * 4

[4, None, 'foo', 4, None, 'foo', 4, None, 'foo', 4, None, 'foo']

In [61]:
x = [4,None,'foo']
x.extend([7,8,(2,3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

In [65]:
x = [4, None, 'foo']
x.append([7, 8, (2, 3)])
x

[4, None, 'foo', [7, 8, (2, 3)]]

List concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if building up a large list, is usually preferable 

In [None]:
# fast
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)

In [None]:
# slow
everything = []
for chunk in list_of_lists:
    everything = everything + chunk

#### Sorting 

In [67]:
a = [7, 2, 5, 1, 3]
a

[7, 2, 5, 1, 3]

In [68]:
# sort the list in-place (without creating a new object)
a.sort()
a

[1, 2, 3, 5, 7]

In [69]:
b = ['saw','small','He','foxes','six']
b

['saw', 'small', 'He', 'foxes', 'six']

In [72]:
b.sort(key = len)
b

['He', 'saw', 'six', 'small', 'foxes']

#### Binary search and maintaining a sorted list 

bisect.bisect finds the location where an element should be inserted to keep it sorted, while bisect.insort actually inserts the element into that location

In [73]:
import bisect

c = [1,2,2,2,3,4,7]
c

[1, 2, 2, 2, 3, 4, 7]

In [74]:
bisect.bisect(c, 2)

4

In [75]:
bisect.bisect(c, 5)

6

In [76]:
bisect.insort(c, 6)

In [77]:
c

[1, 2, 2, 2, 3, 4, 6, 7]

The bisect module functions do not check whether the list is sorted, as doing so would be computationally expensive. Thus, using them with an unsorted list will succeed without error but may lead to incorrect results

#### Slicing 

In [78]:
seq = [7,2,3,7,5,6,0,1]
seq[1:5]

[2, 3, 7, 5]

In [79]:
seq[3:4] = [6,3]
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

In [80]:
seq[:5]

[7, 2, 3, 6, 3]

In [81]:
seq[3:]

[6, 3, 5, 6, 0, 1]

In [82]:
seq[-4:]

[5, 6, 0, 1]

In [83]:
seq[-6:-2]

[6, 3, 5, 6]

take every other element

In [84]:
seq[::2]

[7, 3, 3, 6, 1]

reverse a list or tuple

In [85]:
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

## 3.1.3 Built-in sequence functions

#### Enumerate 

Iterating over a sequence to want to keep track of the index of the current item

enumerate returns a sequence of (i, value) tuples

In [86]:
a = [1,4,7,None,True]
list(enumerate(a))

[(0, 1), (1, 4), (2, 7), (3, None), (4, True)]

In [88]:
a = [1, 4, 7, None, True]
b = [0, 1, 2, 3, 4]
list(zip(b, a))

[(0, 1), (1, 4), (2, 7), (3, None), (4, True)]

In [None]:
i = 0
for value in collection:
    # do something with value 
    i += 1

In [None]:
for i, value in enumerate(collection):
    # dp something with value 

When indexing data, a helpful pattern that uses enumerate is computing a dict mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence

In [89]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping 

{'foo': 0, 'bar': 1, 'baz': 2}

In [90]:
list(enumerate(some_list))

[(0, 'foo'), (1, 'bar'), (2, 'baz')]

#### Sorted

The sorted function returns a new sorted list from the elements of any sequence

In [91]:
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

In [92]:
s = [7,1,2,6,0,3,2]
s.sort()
s

[0, 1, 2, 2, 3, 6, 7]

In [93]:
sorted('horse race')

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

#### Zip

zip pairs up the elements of a number of lists, tuples, or other sequences to create a list of tuples

In [94]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']

zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zip can take an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence

In [95]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A very common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate

In [96]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


In [97]:
list(enumerate(zip(seq1, seq2)))

[(0, ('foo', 'one')), (1, ('bar', 'two')), (2, ('baz', 'three'))]

Given a zipped sequence, zip can be applied in a clever way to unzip the sequence. Another way to think about his is converting a list of rows into a list of columns

In [98]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]
pitchers

[('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]

In [105]:
list(zip(*pitchers))

[('Nolan', 'Roger', 'Schilling'), ('Ryan', 'Clemens', 'Curt')]

In [99]:
first_names, last_names = zip(*pitchers)

In [100]:
first_names

('Nolan', 'Roger', 'Schilling')

In [101]:
last_names

('Ryan', 'Clemens', 'Curt')

#### Reversed

In [108]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Keep in mind that reversed is a generator, so it does not create the reversed sequence until materialized (e.g. with list or a for loop)

## 3.1.4 Dict

**key-value pairs**: hash map or associative array. It is a flexibly sized collection of key-value pairs, where key and value are python objects

In [109]:
empty_dict = {}
d1 = {'a':'some value', 'b':[1,2,3,4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [111]:
d1.keys()

dict_keys(['a', 'b'])

In [112]:
d1.values()

dict_values(['some value', [1, 2, 3, 4]])

Access, insert, or set elements using the same syntax as for accessing elements of a list or tuple

In [113]:
d1[7]='an iteger'

In [114]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an iteger'}

In [116]:
d1['b']

[1, 2, 3, 4]

In [117]:
'b' in d1 # check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value 

True

In [118]:
d1[5] = 'some value'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an iteger', 5: 'some value'}

In [119]:
d1['dummy'] = 'another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an iteger',
 5: 'some value',
 'dummy': 'another value'}

In [120]:
del d1[5]
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an iteger',
 'dummy': 'another value'}

In [121]:
ret = d1.pop('dummy')
ret

'another value'

In [122]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an iteger'}

The keys and values method give you iterators of the dict's keys and values, respectively

In [123]:
list(d1.keys())

['a', 'b', 7]

In [124]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an iteger']

In [125]:
d1.update({'b':'foo','c':12}) # merge one dict into another using the update method
d1

{'a': 'some value', 'b': 'foo', 7: 'an iteger', 'c': 12}

#### Creating dicts from sequences

a dict is essentially a collection of 2-tuples

In [126]:
key_list = ['a','b','c']
value_list = [1,2,3]

In [127]:
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
mapping

{'a': 1, 'b': 2, 'c': 3}

In [128]:
mapping = dict(zip(range(5),reversed(range(5))))
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

In [129]:
list(zip(range(5),reversed(range(5))))

[(0, 4), (1, 3), (2, 2), (3, 1), (4, 0)]

#### Default values

In [None]:
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value 

The dict methods get and pop can take a default value to be returned, so that the above if-else block can be written simply as: 

In [None]:
value = some_dict.get(key, default_value)

get by default will return None if the key is not present, while pop will raise an exception

In [134]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [135]:
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in collections module has a useful class, defaultdict, which makes this even easier. To create one, pass a type or function for generating the default value for each slot in the dict

In [143]:
from collections import defaultdict

by_letter = defaultdict(list)

for word in words:
    by_letter[word[0]].append(word)

by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

In [140]:
dict(by_letter)

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

#### Valid dict key types

**Hashability**: While the values of a dict can be any python object, the keys generally have to be immutable objects like scalar types (int, float, string), or tuples (all the objects in the tuple need to be immutable too).

In [151]:
hash('string')

-6143669387306037813

In [152]:
hash((1,2,(2,3)))

-9209053662355515447

In [153]:
hash((1,2,[2,3])) # fails because lists are mutable 

TypeError: unhashable type: 'list'

In [154]:
try: 
    hash((1, 2, [2, 3]))
except TypeError:
    print('Lists are unhashable')

Lists are unhashable


To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can 

In [155]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

In [159]:
hash(tuple([1,2,3]))

529344067295497451

## 3.1.5 Set

A set is an unordered collection of unique elements. They are like dicts, but keys only, no values

In [160]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [161]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations like union, intersection, difference, and symmetric difference

In [162]:
a = {1,2,3,4,5}
b = {3,4,5,6,7,8}

In [164]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [165]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [166]:
a.intersection(b)

{3, 4, 5}

In [167]:
a & b

{3, 4, 5}

In [168]:
c = a.copy()
c |= b
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [169]:
d = a.copy()
d &= b
d

{3, 4, 5}

Like dicts, set elements generally must be immutable. To have list-like elements, must convert it to a tuple

In [170]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [171]:
a_set = {1,2,3,4,5}
{1,2,3}.issubset(a_set)

True

In [172]:
a_set.issuperset({1,2,3})

True

In [173]:
{1,2,3} == {3,2,1}

True

In [174]:
{1,2,2,3} == {3,2,1}

True

## 3.1.6 List, Set, and Dict comprehensions 

List comprehensions are one of the most-loved python language features. They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in one concise expression. they take the basic form:

In [None]:
[expr for val in collection if condition]

result = []
for val in collection:
    if condition:
        result.append(expr)
# the filter condition can be omitted, leaving only the expression

In [178]:
strings = ['a','as','bat','car','dove','python']

[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions are a natural extension, producing sets and dicts in an idiomatically similar way instead of lists

In [None]:
dict_comp = {key-expr: value-expr for value in collection if condition}

In [None]:
set_comp = {expr for value in collection if condition}

In [179]:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

**Map vs list comprehension**

In [182]:
map(len, strings)

<map at 0x7fa5a2ebc640>

In [180]:
list(map(len, strings))

[1, 2, 3, 3, 4, 6]

In [181]:
set(map(len, strings))

{1, 2, 3, 4, 6}

In [184]:
loc_mapping = {val: index for index, val in enumerate(strings)}
loc_mapping 

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

In [185]:
list((enumerate(strings)))

[(0, 'a'), (1, 'as'), (2, 'bat'), (3, 'car'), (4, 'dove'), (5, 'python')]

### Nested list comprehensions

In [186]:
all_data = [['Jonn','Emily','Michael','Mary','Steven'],
            ['Maria','Juan','Javier','Natalia','Pilar']]

In [188]:
# get a single list containing all names with two or more e's in them
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)
names_of_interest

['Steven']

In [191]:
result = [name 
          for names in all_data
          for name in names 
          if name.count('e') >= 2
          ]
result

['Steven']

In [192]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x
             for tup in some_tuples 
             for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [193]:
some_tuples[0]

(1, 2, 3)

In [194]:
some_tuples[0][0]

1

In [195]:
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [196]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# 3.2 Functions

Functions are the primary and most important method of code organization and reuse in python. As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function. Functions can also help make your code more readable by giving a name to a group of python statements

Functions are declared with the def keyword and returned from with the return keyword

There is no issue with having multiple return statements. If Python reaches the end of a function without encountering a return tatement, None is returned automatically

Each function can have positional arguments and keyword arguements. The main restriction on function arguments is that the keyword arguments must follow the positional arguments, if any

In [197]:
def my_function(x, y, z = 1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In [198]:
my_function(5, 6, z = 0.7)

0.06363636363636363

In [199]:
my_function(3.14, 7, 3.5)

35.49

In [200]:
my_function(10, 20)

45.0

In [201]:
my_function(x = 5, y = 6, z = 7)

77

## 3.2.1 Namespaces, Scope, and Local Functions 

Functions can access variables in two different scopes: global and local. Namespaces. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function's arguments. After the function is finished, the local namespace is destroyed

When func() is called, the empty list a is created, five elements are appended, and then a is destroyed when the function exits 

In [205]:
def func():
    a = []
    for i in range(5):
        a.append(i)
a

{1, 2, 3, 4, 5}

In [206]:
a = []
def func():
    for i in range(5):
        a.append(i)
a

[]

Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the global keyword

In [207]:
a = None 

def bind_a_variable():
    global a
    a = []

bind_a_variable()
a

[]

Generally discourage use of the global keyword. Typically global variables are used to store some kind of state in a system. If you find yourself using a lot of them, it may indicate a need for object oriented programming (using classes)

## 3.2.2 Returning Multiple Values

Return multiple values as a tuple. 

The function is actually just returning one object, namely a tuple, which is then being unpacked into the result variables 

In [208]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

In [209]:
a, b, c = f()

In [211]:
a, b, c

(5, 6, 7)

In [212]:
return_value = f()
return_value

(5, 6, 7)

Return multiple values as a dict

In [213]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a': a, 'b': b, 'c': c}

In [214]:
return_value = f()
return_value

{'a': 5, 'b': 6, 'c': 7}

## 3.2.3 Functions Are Objects

In [215]:
states = ['  Alabama','Georgia!','Georgia','georgia','FlOrIda','south carolina##','West virginia?']

In [217]:
import re # regular expression

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

In [219]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result 

In [220]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

Use functions as arguments to other functions like the built-in map function, which applies a function to a sequence of some kind

In [221]:
for x in map(remove_punctuation, states):
    print(x)

  Alabama
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia


In [222]:
list(map(remove_punctuation, states))

['  Alabama',
 'Georgia',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south carolina',
 'West virginia']

## 3.2.4 Anonymous (Lambda) Functions

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword, which has no meaning other than "we are declaring an anonymous function"

In [223]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

In [224]:
short_function(3)

6

In [225]:
equiv_anon(3)

6

Lambda functions are especially convenient in data analysis because, there are many cases where data transformation functions will take functions as arguments. It is often less typing and clearer to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable

In [226]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [227]:
[x * 2 for x in ints]

[8, 0, 2, 10, 12]

sort a collection of strings by the number of distinct letters in each string

In [228]:
strings = ['foo','card','bar','aaaa','abab']

In [229]:
strings.sort(key = lambda x: len(set(list(x))))

In [230]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

One reason lambda functions are called anonymous functions is that, unlike functions declared with the def keyword, the function object itself is never given an explicit name attribute

## 3.2.5 Currying: Partial Argument Application

Derive new functions from existing ones by partial argument application

In [240]:
def add_numbers(x, y):
    return x + y

In [242]:
add_numbers(1, 2)

3

In [243]:
add_five = lambda y: add_numbers(5, y)

In [244]:
add_five(10)

15

The second argument to add_numbers is said to be curried. the built-in functools module can simplify this process using the partial function

In [245]:
from functools import partial

add_five = partial(add_numbers, 5)

In [246]:
add_five(10)

15

## 3.2.6 Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file, is an important python feature. This is accomplished by means of the iterator protocol, a generic way to make objects iterable. For instance, iterating over a dict yields the dict keys

In [247]:
some_dict = {'a': 1, 'b': 2, 'c': 3}

for key in some_dict:
    print(key)

a
b
c


When you write for key in some_dict, the python interpreter first attempts to create an iterator out of some_dict

In [254]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x7fa5a35fc090>

An iterator is any object that will yield objects to the python interpreter when used in a context like a for loop. Most methods expecting a list or list-like object will also accept any iterable object. This include built-in methods such as min, max and sum,and type constructors like list and tuple

In [255]:
list(dict_iterator)

['a', 'b', 'c']

A generator is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested. To create a generator, use the yield keyword of return in a function

In [289]:
def squares(n = 10):
    print('Generating squares from 1 to {0}.format(n ** 2)')
    for i in range(1, n + 1):
        yield i ** 2

In [290]:
gen = squares()
gen

<generator object squares at 0x7fa5a3961510>

In [291]:
for x in gen:
    print(x, end = ' ')

Generating squares from 1 to {0}.format(n ** 2)
1 4 9 16 25 36 49 64 81 100 

In [292]:
def squares(n = 10):
    print('Generating squares from 1 to {0}.format(n ** 2)')
    for i in range(1, n + 1):
        yield i ** 2

list(squares())

Generating squares from 1 to {0}.format(n ** 2)


[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [293]:
def squares(n = 10):
    for i in range(1, n + 1):
        print(i ** 2)

In [294]:
squares()

1
4
9
16
25
36
49
64
81
100


### Generator expressions

Generator expression is a generator analogue to list, dict and set comprehensions. To create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets

In [298]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x7fa5a3961f90>

In [299]:
print(list(gen))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801]


In [301]:
gen = [x ** 2 for x in range(100)]
print(gen)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801]


In [302]:
def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()

Generator expressions can be used instead of list comprehensions as function arguments in many cases

In [303]:
sum(x ** 2 for x in range(100))

328350

In [304]:
sum([x ** 2 for x in range(100)])

328350

In [307]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

In [308]:
dict([(i, i ** 2) for i in range(5)])

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

### Itertools module

The standard library itertools model has a collection of generators for many common data algorithms. For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function

In [311]:
import itertools

first_letter = lambda x: x[0]

names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator 

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


## 3.2.7 Errors and Exception Handling 

In [312]:
float('1.2345')

1.2345

In [313]:
float('something')

ValueError: could not convert string to float: 'something'

In [314]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [315]:
attempt_float('1.2345')

1.2345

In [316]:
attempt_float('something')

'something'

In [317]:
float((1,2))

TypeError: float() argument must be a string or a number, not 'tuple'

In [318]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [319]:
attempt_float('something')

'something'

In [320]:
attempt_float((1,2))

TypeError: float() argument must be a string or a number, not 'tuple'

In [321]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [322]:
attempt_float((1,2))

(1, 2)

Do not want to suppress an exception, but want some code to be executed regardless of whether the code in the try block succeeds or not, use finally

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

### Exceptions in IPython

In [342]:
%xmode

Exception reporting mode: Context


In [343]:
1 + 's'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [346]:
%xmode?

[0;31mDocstring:[0m
Switch modes for the exception handlers.

Valid modes: Plain, Context, Verbose, and Minimal.

If called without arguments, acts as a toggle.
[0;31mFile:[0m      ~/anaconda3/envs/sds/lib/python3.8/site-packages/IPython/core/magics/basic.py


# 3.3 Files and the Operating System

In [None]:
path = 'examples/segismundo.txt'
f = open(path)

In [None]:
for line in f:
    pass

In [None]:
lines = [x.rstrip() for x in open(path)]
lines

In [None]:
f.close()

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f] # this will automatically close the file f when exiting the with block 

In [None]:
f = open(path)

f.read(10)

f2 = open(path, 'rb') # Binary mode

f2.read(10)

In [None]:
# The read method advances the file handle's position by the number of bytes read, tell gives the current position
f.tell()

f2.tell()

In [347]:
import sys

sys.getdefaultencoding()

'utf-8'

In [None]:
# seek changes the file position to the indicated byte in the file 
f.seek(3)
f.read(1)

In [None]:
f.close()
f2.close()

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

with open('tmp.txt') as f:
    lines = f.readlines()

lines

## 3.3.1 Bytes and Unicode with Files 

In [None]:
with open(path) as f:
    chars = f.read(10)

chars

In [None]:
with open(path, 'rb') as f:
    data = f.read(10)

In [None]:
data.decode('utf8')

data[:4].decode('utf8')

In [None]:
sink_path = 'sink.txt'

with open(path) as source:
    with open(sink_path, 'xt', encoding = 'iso-8859-1') as sink:
        sink.write(source.read())

with open(sink_path, encoding = 'iso-8859-1') as f:
    print(f.read(10))

In [None]:
f = open(path)

f.read(5)

f.seek(4)

f.read(1)

# 3.4 Conclusion