<h1>Chapter 3: Built-In Data Structures, Functions, and Files<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#3.1-Data-Structures-and-Sequences" data-toc-modified-id="3.1-Data-Structures-and-Sequences-1">3.1 Data Structures and Sequences</a></span><ul class="toc-item"><li><span><a href="#Tuples" data-toc-modified-id="Tuples-1.1">Tuples</a></span><ul class="toc-item"><li><span><a href="#Unpacking-tuples" data-toc-modified-id="Unpacking-tuples-1.1.1">Unpacking tuples</a></span></li><li><span><a href="#Tuple-methods" data-toc-modified-id="Tuple-methods-1.1.2">Tuple methods</a></span></li></ul></li><li><span><a href="#List" data-toc-modified-id="List-1.2">List</a></span><ul class="toc-item"><li><span><a href="#Adding-and-removing-elements" data-toc-modified-id="Adding-and-removing-elements-1.2.1">Adding and removing elements</a></span></li><li><span><a href="#Concatenating-and-combining-lists" data-toc-modified-id="Concatenating-and-combining-lists-1.2.2">Concatenating and combining lists</a></span></li><li><span><a href="#Sorting" data-toc-modified-id="Sorting-1.2.3">Sorting</a></span></li><li><span><a href="#Slicing" data-toc-modified-id="Slicing-1.2.4">Slicing</a></span></li></ul></li><li><span><a href="#Dictionary" data-toc-modified-id="Dictionary-1.3">Dictionary</a></span><ul class="toc-item"><li><span><a href="#Creating-dictionaries-from-sequences" data-toc-modified-id="Creating-dictionaries-from-sequences-1.3.1">Creating dictionaries from sequences</a></span></li><li><span><a href="#Default-values" data-toc-modified-id="Default-values-1.3.2">Default values</a></span></li><li><span><a href="#Valid-dictionary-key-types" data-toc-modified-id="Valid-dictionary-key-types-1.3.3">Valid dictionary key types</a></span></li></ul></li><li><span><a href="#Set" data-toc-modified-id="Set-1.4">Set</a></span></li><li><span><a href="#Built-In-Sequence-Functions" data-toc-modified-id="Built-In-Sequence-Functions-1.5">Built-In Sequence Functions</a></span><ul class="toc-item"><li><span><a href="#sort" data-toc-modified-id="sort-1.5.1">sort</a></span></li><li><span><a href="#zip" data-toc-modified-id="zip-1.5.2">zip</a></span></li><li><span><a href="#enumerate" data-toc-modified-id="enumerate-1.5.3">enumerate</a></span></li><li><span><a href="#reversed" data-toc-modified-id="reversed-1.5.4">reversed</a></span></li></ul></li><li><span><a href="#List,-Set,-and-Dictionary-Comprehensions" data-toc-modified-id="List,-Set,-and-Dictionary-Comprehensions-1.6">List, Set, and Dictionary Comprehensions</a></span><ul class="toc-item"><li><span><a href="#Nested-list-comprehensions" data-toc-modified-id="Nested-list-comprehensions-1.6.1">Nested list comprehensions</a></span></li></ul></li></ul></li><li><span><a href="#3.2-Functions" data-toc-modified-id="3.2-Functions-2">3.2 Functions</a></span><ul class="toc-item"><li><span><a href="#Namespaces,-Scope,-and-Local-Functions" data-toc-modified-id="Namespaces,-Scope,-and-Local-Functions-2.1">Namespaces, Scope, and Local Functions</a></span></li><li><span><a href="#Returning-Multiple-Values" data-toc-modified-id="Returning-Multiple-Values-2.2">Returning Multiple Values</a></span></li><li><span><a href="#Functions-Are-Objects" data-toc-modified-id="Functions-Are-Objects-2.3">Functions Are Objects</a></span></li><li><span><a href="#Anonymous-(Lambda)-Functions" data-toc-modified-id="Anonymous-(Lambda)-Functions-2.4">Anonymous (Lambda) Functions</a></span></li><li><span><a href="#Generators" data-toc-modified-id="Generators-2.5">Generators</a></span><ul class="toc-item"><li><span><a href="#Generator-expressions" data-toc-modified-id="Generator-expressions-2.5.1">Generator expressions</a></span></li><li><span><a href="#itertools-module" data-toc-modified-id="itertools-module-2.5.2">itertools module</a></span></li></ul></li><li><span><a href="#Errors-and-Exception-Handling" data-toc-modified-id="Errors-and-Exception-Handling-2.6">Errors and Exception Handling</a></span></li></ul></li><li><span><a href="#3.3-Files-and-the-Operating-System" data-toc-modified-id="3.3-Files-and-the-Operating-System-3">3.3 Files and the Operating System</a></span><ul class="toc-item"><li><span><a href="#Bytes-and-Unicode-with-Files" data-toc-modified-id="Bytes-and-Unicode-with-Files-3.1">Bytes and Unicode with Files</a></span></li></ul></li></ul></div>

In [1]:
# If you use Colab Notebook, you can uncomment the following to mount your Google Drive to Colab
# After that, your colab notebook can read/write files and data in your colab

#from google.colab import drive
#drive.mount('/content/drive')


In [2]:
# If you use Colab Notebook, please change the current directory to be the folder that you save 
# your Notebook and data folder for example, I save my Colab files and data at the following location

#%cd /content/drive/MyDrive/Colab\ Notebooks

In [3]:
#set up standards for the remainder of the notebook


# display all results in each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"


## 3.1 Data Structures and Sequences

### Tuples

A tuple is a fixed-length, immutable sequence of Python objects which, once assigned,
cannot be changed. 

In [4]:
# The easiest way to create one is with a comma-separated sequence of values wrapped in parentheses:

tup = (4, 5, 6)
tup

(4, 5, 6)

In [5]:
# the parentheses can be omitted

tup = 4, 5, 6
tup

(4, 5, 6)

In [6]:
# convert any sequence or iterator (e.g., a list) to a tuple by invoking tuple


tuple([4, 0, 2])

(4, 0, 2)

In [7]:
# convert any sequence or iterator (e.g., a string) to a tuple by invoking tuple

tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [8]:
# Elements can be accessed with square brackets []
tup[0]

's'

In [9]:
# defining tuples within more complicated expressions

nested_tup = (4, 5, 6), (7, 8)
nested_tup


((4, 5, 6), (7, 8))

In [10]:
nested_tup[0]

(4, 5, 6)

In [11]:
nested_tup[1]

(7, 8)

In [12]:
# While the objects stored in a tuple may be mutable themselves, once the tuple is 
# created it’s not possible to modify which object is stored in each slot

tup = tuple(['foo', [1, 2], True])

#tup[2] = False

In [13]:
# If an object inside a tuple is mutable, such as a list, you can modify it in place:

tup[1].append(3)
tup

('foo', [1, 2, 3], True)

In [14]:
# concatenate tuples using the + operator to produce longer tuples

(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [15]:
# Multiplying a tuple by an integer, as with lists, has the effect of concatenating 
# that many copies of the tuple

('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking tuples

In [16]:
# If you try to assign to a tuple-like expression of variables, Python will attempt to
# unpack the value on the righthand side of the equals sign

tup = (4, 5, 6)
a, b, c = tup
b

5

In [17]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d


7

In [18]:
# Unpacking a tuple
a, b = 1, 2
print('a is', a)
print('b is', b)

# swap can be done like this
b, a = a, b
print('a is', a)
print('b is', b)

a is 1
b is 2
a is 2
b is 1


In [19]:
# A common use of variable unpacking is iterating over sequences of tuples or lists

# seq is a list of three tuples that each is a sequence of three elements.
# print unpacked tuples one by one
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [20]:
# “pluck” a few elements from the beginning of a tuple. 
#There is a special syntax that can do this, *rest

values = 1, 2, 3, 4, 5
a, b, *rest = values
a
b
rest

1

2

[3, 4, 5]

In [21]:
# This rest bit is sometimes something you want to discard; there is nothing special
# about the rest name. As a matter of convention, many Python programmers will use
#the underscore (_) for unwanted variables

a, b, *_ = values

#### Tuple methods

Since the size and contents of a tuple cannot be modified, it is very light on instance
methods.

In [22]:
# A particularly useful one (also available on lists) is count

a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

### List

In contrast with tuples, lists are variable length and their contents can be modified in
place. Lists are mutable. 

In [23]:
# You can define them using square brackets [] or using the list type function

a_list = [2, 3, 7, None]

tup = ("foo", "bar", "baz")
b_list = list(tup)
print(f'b_list ={b_list}')

# change the second element to be "peekaboo"
b_list[1] = "peekaboo"
print(f'\nb_list ={b_list}')

b_list =['foo', 'bar', 'baz']

b_list =['foo', 'peekaboo', 'baz']


In [24]:
# The list built-in function is frequently used in data processing as a way to materialize
# an iterator or generator expression

gen = range(10)
print(f'gen is: {gen}')
list(gen)

gen is: range(0, 10)


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### Adding and removing elements

In [25]:
# Elements can be appended to the end of the list with the append method

b_list.append("dwarf")
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

In [26]:
# Using insert you can insert an element at a specific location in the list
# Be caution that insert is computationally expensive compared with append

b_list.insert(1, "red")
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

In [27]:
# The inverse operation to insert is pop, which removes and returns an element at a particular index:

b_list.pop(1)
b_list

'red'

['foo', 'peekaboo', 'baz', 'dwarf']

In [28]:
# Elements can be removed by value with remove, which locates the first such value and
# removes it from the list:

b_list.append("foo")
print(f'b_list is {b_list}')

b_list.remove("foo")
print(f'b_list is {b_list}')

b_list is ['foo', 'peekaboo', 'baz', 'dwarf', 'foo']
b_list is ['peekaboo', 'baz', 'dwarf', 'foo']


In [29]:
# Check if a list contains a value using the in keyword
# Checking whether a list contains a value is a lot slower than doing so with dictionaries and sets

"dwarf" in b_list

True

In [30]:
# Check if a list does not contain a value using the not in keyword

"dwarf" not in b_list

False

#### Concatenating and combining lists

In [31]:
# adding two lists together with + concatenates them

[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [32]:
# # appending multiple elements to it using the extend method:
x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

#### Sorting

In [33]:
# sort a list in place by calling the sort method
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [34]:
# sort has a few options that will occasionally come in handy. One is the ability to
# pass a secondary sort key—that is, a function that produces a value to use to sort the
#objects. For example, we could sort a collection of strings by their lengths
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#### Slicing

In [35]:
# select a section of list using slice notation

seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [36]:
# Slices can also be assigned with a sequence

seq[3:5] = [6, 3]
seq

[7, 2, 3, 6, 3, 6, 0, 1]

In [37]:
# The start can be omitted, in which case it default to the start of the sequence
seq[:5]


[7, 2, 3, 6, 3]

In [38]:
# The end can be omitted, in which case it default to the end of the sequence
seq[3:]

[6, 3, 6, 0, 1]

In [39]:
# Negative indices slice the sequence relative to the end:

seq[-4:]


[3, 6, 0, 1]

In [40]:
seq[-6:-2]

[3, 6, 3, 6]

In [41]:
# A step can also be used after a second colon to, say, take every other element:

seq[::2]

[7, 3, 3, 0]

In [42]:
# step -1 has the useful effect of reversing a list or tuple

seq[::-1]

[1, 0, 6, 3, 6, 3, 2, 7]

### Dictionary

A dictionary stores a collection of key-value pairs, where key and value are Python objects.
Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key. 

In [43]:
# One approach for creating a dictionary is to use curly braces {} and colons to separate keys and values:

empty_dict = {}
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [44]:
# You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple:
d1[7] = "an integer"
d1


{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [45]:
d1["b"]


[1, 2, 3, 4]

In [46]:
# check if a dictionary contains a key using the in keyword
"b" in d1

True

In [47]:
d1[5] = "some value"
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [48]:
d1["dummy"] = "another value"
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [49]:
# # delete values using the del keyword

del d1[5]
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [50]:
# You can also delete values using the pop method, which simultaneously returns the value and deletes the key
ret = d1.pop("dummy")
ret

'another value'

In [51]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [52]:
# The keys and values method gives you iterators of the dictionary’s keys and values,
#respectively. The order of the keys depends on the order of their insertion, and these
#functions output the keys and values in the same respective order

list(d1.keys())


['a', 'b', 7]

In [53]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

In [54]:
# you can use the items method to iterate over the keys and values as 2-tuples

list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

In [55]:
# merge one dictionary into another using the update method

d1.update({"b": "foo", "c": 12})
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

#### Creating dictionaries from sequences

It’s common to occasionally end up with two sequences that you want to pair up
element-wise in a dictionary

In [56]:

tuples = zip(range(5), reversed(range(5)))
tuples


<zip at 0x10c85dcc0>

In [57]:
mapping = dict(tuples)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

#### Default values

In [58]:
# categorizing a list of words by their first letters as a dictionary of lists

words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [59]:
# setdefault dictionary method can be used to simplify the workflow above
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [60]:
# The built-in collections module has a useful class, defaultdict
# pass a type or function for generating the default value for each slot in the dictionary

from collections import defaultdict

by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
    
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

#### Valid dictionary key types

While the values of a dictionary can be any Python object, the keys generally have to
be immutable objects like scalar types (int, float, string) or tuples (all the objects in
the tuple need to be immutable, too). 

In [61]:
# The technical term here is hashability. You can check whether an object is 
# hashable (can be used as a key in a dictionary) with the hash function

hash("string")


2676021754886327220

In [62]:
hash((1, 2, (2, 3)))


-9209053662355515447

In [63]:
#hash((1, 2, [2, 3])) # fails because lists are mutable

In [64]:
# To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can be
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

### Set

A set is an unordered collection of unique elements. 
Please refer to Table 3-1 in the textbook about Python set operations Function Alternative

In [65]:
# A set can be created via the set function  
set([2, 2, 2, 1, 3, 3])


{1, 2, 3}

In [66]:
# A set can also be created via a set literal with curly braces:

{2, 2, 2, 1, 3, 3}

{1, 2, 3}

In [67]:
#Sets support mathematical set operations like union, intersection, difference, and 
#symmetric difference.

a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [68]:
# union of a and b using the union method
a.union(b)


{1, 2, 3, 4, 5, 6, 7, 8}

In [69]:
# union of a and b using | binary operator
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [70]:
# intersection of a an b using the intersection method
a.intersection(b)


{3, 4, 5}

In [71]:
# intersection of a an b using & binary operator

a & b

{3, 4, 5}

In [72]:
# set the content of c to be the union of a and b
c = a.copy()
c |= b
c


{1, 2, 3, 4, 5, 6, 7, 8}

In [73]:
# set the content of d to be the intersection of d and b
d = a.copy()
d &= b
d

{3, 4, 5}

In [74]:
# set elements generally must be immutable, and they must be hashable 
# (which means that calling hash on a value does not raise an exception). 
#In order to store list-like elements (or other mutable sequences) in a set, 
# you can convert them to tuples:

my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [75]:
# We can check if a set is a subset of (is contained in) of another set

a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)


True

In [76]:
# We can check if a set is a superset of (contains allelements of) another set

a_set.issuperset({1, 2, 3})

True

In [77]:
# Sets are equal if and only if their contents are equal

{1, 2, 3} == {3, 2, 1}

True

### Built-In Sequence Functions

Python has a handful of useful sequence functions that you should familiarize yourself
with and use at any opportunity

#### sort
The sorted function returns a new sorted list from the elements of any sequence

In [78]:
sorted([7, 1, 2, 6, 0, 3, 2])


[0, 1, 2, 2, 3, 6, 7]

In [79]:
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

#### zip
zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a
list of tuples

In [80]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [81]:
# zip can take an arbitrary number of sequences, and the number of elements it 
# produces is determined by the shortest sequence

seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

#### enumerate
It’s common when iterating over a sequence to want to keep track of the index of the
current item

In [82]:
# a common use of zip is simultaneously iterating over multiple sequences, possibly
# also combined with enumerate

for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")


0: foo, one
1: bar, two
2: baz, three


#### reversed
reversed iterates over the elements of a sequence in reverse order

In [83]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

### List, Set, and Dictionary Comprehensions

In [84]:
# List comprehensions are a convenient and widely used Python language feature. They
# allow you to concisely form a new list by filtering the elements of a collection,
# transforming the elements passing the filter into one concise expression. They take
# the basic form:
# [expr for value in collection if condition]

strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [85]:
# A set comprehension looks like the equivalent list comprehension except with curly
# braces instead of square brackets

unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [86]:
# using the map function is another approach:

set(map(len, strings))

{1, 2, 3, 4, 6}

In [87]:
# A dictionary comprehension
#dict_comp = {key-expr: value-expr for value in collection
#if condition}

loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### Nested list comprehensions

In [88]:
# a list of lists containing some English and Spanish names

all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

In [89]:
# we wanted to get a single list containing all names with two or more a’s in them

names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)
names_of_interest

['Maria', 'Natalia']

In [90]:
# we can actually wrap this whole operation up in a single nested list comprehension

result = [name for names in all_data for name in names
          if name.count("a") >= 2]
result

['Maria', 'Natalia']

In [91]:
# we “flatten” a list of tuples of integers into a simple list of integers

some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [92]:
# the order of the for expressions would be the same if you wrote a
# nested for loop instead of a list comprehension

flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)
        
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [93]:
# This produces a list of lists, rather than a flattened list of all of the inner elements

[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

## 3.2 Functions

As a rule of thumb, if you anticipate needing to repeat the same
or very similar code more than once, it may be worth writing a reusable function

In [94]:
# define a function by specifying function name, arguments, and returned output if any
def my_function(x, y):
    return x + y

In [95]:
# call a function
my_function(1, 2)
result = my_function(1, 2)
result

3

3

In [96]:
# the function has no returned output, but execute an operation
def function_without_return(x):
    print(x)
    
result = function_without_return("hello!")
print(result) 

hello!
None


In [97]:
# Each function can have positional arguments and keyword arguments. Keyword arguments
#are most commonly used to specify default values or optional arguments

def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In [98]:
my_function2(5, 6, z=0.7)

0.06363636363636363

In [99]:
my_function2(3.14, 7, 3.5)

35.49

In [100]:
my_function2(10, 20)

45.0

### Namespaces, Scope, and Local Functions

In [101]:
# Functions can access variables created inside the function as well as those outside
#the function in higher (or even global) scopes. An alternative and more descriptive
#name describing a variable scope in Python is a namespace. Any variables that are
#assigned within a function by default are assigned to the local namespace. The local
#namespace is created when the function is called and is immediately populated by the
#function’s arguments. After the function is finished, the local namespace is destroyed


def func():
    a=[]
    for i in range(5):
        a.append(i)

In [102]:
func()

In [103]:
# Suppose instead we had declared a as follows. Then calling func() will modify a outside the function
a = []
def func():
    for i in range(5):
        a.append(i)

In [104]:
func()
a
func()
a

[0, 1, 2, 3, 4]

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

In [105]:
# Assigning variables outside of the function’s scope is possible, but those variables
# must be declared explicitly using either the global or nonlocal keywords

a = None
def bind_a_variable():
    global a  # if you comment this line, what happened to a?
    a = [] # if we comment the line above, here a is a local variable. The assignment of [] to a doesn't change the variable a outside the function
bind_a_variable()
print(a)

[]


### Returning Multiple Values

In [106]:
# return multiple values from a function

def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()
print(f'a is {a}')
print(f'b is {b}')
print(f'c is {c}')


a is 5
b is 6
c is 7


In [107]:
def f():
    a = 5
    b = 6
    c = 7
    return {"a" : a, "b" : b, "c" : c}

f()

{'a': 5, 'b': 6, 'c': 7}

### Functions Are Objects

In [108]:
# we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings

states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

In [109]:
# use built-in string methods along with the re standard library module for regular expressions

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip() # string.strip() removes spaces at the beginning and at the end of the string
        value = re.sub("[!#?]", "", value) # re.sub(pattern, repl, string) Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged
        value = value.title() # string.title() makes the first letter in each word upper case:
        result.append(value)
    return result

In [110]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [111]:
# the same as the cell above
def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result

In [112]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [113]:
# We can use functions as arguments to other functions like the built-in map function
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


### Anonymous (Lambda) Functions

Python has support for so-called anonymous or lambda functions, which are a way
of writing functions consisting of a single statement, the result of which is the return
value. They are defined with the lambda keyword, which has no meaning other than
“we are declaring an anonymous function”

In [114]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

In [115]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [116]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

In [117]:
# suppose we want to sort a collection of strings by the number of distinct letters in each string:
strings.sort(key=lambda x: len(set(x)))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

### Generators

Many objects in Python support iteration, such as over objects in a list or lines in a
file. This is accomplished by means of the iterator protocol, a generic way to make
objects iterable

In [124]:
#iterating over a dictionary yields the dictionary keys

some_dict = {"a": 1, "b": 2, "c": 3}
for key in some_dict:
    print(key)

a
b
c


In [125]:
# iter() returns an iterator object
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x10d1e37e0>

In [126]:

list(dict_iterator)

['a', 'b', 'c']

In [129]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

In [132]:
# When we call the generator, no code is immediately executed
# new iterable object
gen = squares()
gen

<generator object squares at 0x10ce25d20>

In [130]:
# It is not until we request elements from the generator that it begins executing its code:
for x in gen:
    print(x, end=" ")

0 1 2 3 4 5 6 7 8 9 

#### Generator expressions

In [133]:
# Another way to make a generator is by using a generator expression
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x109a00380>

In [134]:
# Generator expressions can be used instead of list comprehensions as function arguments in some cases
sum(x ** 2 for x in range(100))


328350

In [135]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### itertools module

The standard library itertools module has a collection of generators for many common data algorithms
Please refer to Table 3-2


In [136]:
import itertools

def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


### Errors and Exception Handling

Handling Python errors or exceptions gracefully is an important part of building
robust programs

In [137]:
float("1.2345")


1.2345

In [138]:
#could not convert string to float: 'something'
#float("something")


In [139]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [140]:
attempt_float("1.2345")


1.2345

In [141]:
#Suppose we wanted a version of float that fails gracefully, returning the input
#argument. We can do this by writing a function that encloses the call to float in a
#try/except block (execute this code in IPython):

attempt_float("something")

'something'

In [142]:
#TypeError: float() argument must be a string or a real number, not 'tuple'
#float((1, 2))

In [143]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [144]:
#TypeError: float() argument must be a string or a real number, not 'tuple'
#attempt_float((1, 2))

In [145]:
# We can catch multiple exception types by writing a tuple of exception types 
# instead (the parentheses are required):

def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [146]:
attempt_float((1,2))

(1, 2)

## 3.3 Files and the Operating System

Most of this book uses high-level tools like pandas.read_csv to read data files from
disk into Python data structures. However, it’s important to understand the basics of
how to work with files in Python

Table 3-3. Python file modes

CAN BE SKIPPED 

In [147]:
# I pass encoding="utf-8" as a best practice because the default Unicode encoding
#for reading files varies from platform to platform

path = "examples/segismundo.txt"
f = open(path, encoding="utf-8")

FileNotFoundError: [Errno 2] No such file or directory: 'examples/segismundo.txt'

In [None]:
# The lines come out of the file with the end-of-line (EOL) markers intact
for line in f:
    print(line)



In [None]:
# get an EOL-free list of lines in a file like
lines = [x.rstrip() for x in open(path, encoding="utf-8")] # string.rstrip() removes any white spaces at the end of the string:
lines

In [None]:
# When you use open to create file objects, it is recommended to close the file when
# you are finished with it

f.close()

In [None]:
# One of the ways to make it easier to clean up open files is to use the with statement

with open(path, encoding="utf-8") as f:
    lines = [x.rstrip() for x in f]
    
lines

In [None]:
# read returns a certain number of characters from the file
f1 = open(path)
f1.read(10)


In [None]:
#The read method advances the file object position by the number of bytes read. tell gives you the current position
# Even though we read 10 characters from the file f1 opened in text mode, the position is 11 because it took
# that many bytes to decode 10 characters using the default encoding.
f1.tell()


In [None]:
f2 = open(path, mode="rb")  # Binary mode
f2.read(10)

In [None]:
f2.tell()

In [None]:
# We can check the default encoding in the sys module

import sys
sys.getdefaultencoding()

In [None]:
# seek changes the file position to the indicated byte in the file
f1.seek(3)
f1.read(1)
f1.tell()

In [None]:
f1.close()
f2.close()

In [None]:
#To write text to a file, you can use the file’s write or writelines methods

path

with open("tmp.txt", mode="w") as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

with open("tmp.txt") as f:
    lines = f.readlines()

lines

In [None]:
# os: Miscellaneous operating system interfaces
import os
os.remove("tmp.txt")

### Bytes and Unicode with Files

In [None]:
# UTF-8 is a variable-length Unicode encoding, so when we request some number of
#characters from the file, Python reads enough bytes (which could be as few as 10 or
#as many as 40 bytes) from the file to decode that many characters

with open(path) as f:
    chars = f.read(10)

chars
#len(chars)

In [None]:
# If we open the file in "rb" mode instead, read requests that exact number of bytes:

with open(path, mode="rb") as f:
    data = f.read(10)

data

In [None]:
data.decode("utf-8")


In [None]:
data[:4]

In [None]:
# Depending on the text encoding, you may be able to decode the bytes to a str 
# object yourself, but only if each of the encoded Unicode characters is fully formed:

# try to change the 4 to different number and see different result.
#data[:4].decode("utf-8")

#UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 3: unexpected end of data

In [None]:
sink_path = "sink.txt"
with open(path) as source:
    with open(sink_path, "x", encoding="iso-8859-1") as sink:
        sink.write(source.read())

with open(sink_path, encoding="iso-8859-1") as f:
    print(f.read(10))

In [None]:
os.remove(sink_path)

In [None]:
f = open(path, encoding='utf-8')
f.read(5)


In [None]:
#Beware using seek when opening files in any mode other than binary. If the file
#position falls in the middle of the bytes defining a Unicode character, then subsequent
#reads will result in an error:
f.seek(4)


In [None]:
#f.read(1)
#UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 0: invalid start byte


In [None]:
f.close()