#### 3.1 Data Structures and sequences

##### Tuple

fixed length **immutable** sequence of Python objects

In [0]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [0]:
# importance of paranthesis
nested_tuple = (4, 5, 6), (7, 8)
nested_tuple

((4, 5, 6), (7, 8))

In [0]:
# conversion of any sequence / iterator to tuple:
tuple([4, 0, 2])

(4, 0, 2)

In [0]:
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [0]:
# indexing
tup[0]

's'

immutability

In [0]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: ignored

In [0]:
tup[1].append(3) # since list is a mutable Python object, it can be modified in-place
tup

('foo', [1, 2, 3], True)

Tuple Concatenation

In [0]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [0]:
# concatenating multiple copies of a tuple (similar to string concetenation)
('foo', 'baar') * 4 # objects themselves are not copied, only the references to them

('foo', 'baar', 'foo', 'baar', 'foo', 'baar', 'foo', 'baar')

Unpacking tuples

In [0]:
tup = (4, 5, 6)
a, b, c = tup
b

5

In [0]:
# nested tuple sequence unpacking
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

In [0]:
# swapping variables with the power of tuples
a, b = 1, 2
a, b = b, a
a, b

(2, 1)

In [0]:
# a common use of variable unpacking is iterating over sequences of tuples or lists:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
  print('a={0}, b={1}, c={2}'.format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [0]:
# when we want to "pluck" a few elements from the begining of the tuple
values = 1, 2, 3, 4, 5, 6
a, b, *rest = values # we can use any arbitary variable name inplace of rest, _ is moslty used
a, b

(1, 2)

In [0]:
rest

[3, 4, 5, 6]

Tuple methods

In [0]:
a = (1, 2, 3, 2, 3, 2, 2, 54, 65, 4, 6)
a.count(2) # count the number of occurances of 2

4

##### List

Variable-length python object. It's contents can be modified in-place

In [0]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [0]:
b_list[1] = 'peekaboooooooo'
b_list

['foo', 'peekaboooooooo', 'baz']

In [0]:
# list() is used frequently in data preprocessing as a way to materialize an iterator or generator expression:
gen = range(10)
gen

range(0, 10)

In [0]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Adding and removing elements

In [0]:
# elements can be appended to the end of the list
b_list.append('darf')
b_list

['foo', 'peekaboooooooo', 'baz', 'darf']

In [0]:
# elements can be inserted at specific location
# insert is computationaly expensive then append, since it shifts elements.
# explore collections.deque, a double-ended queue for such purposes
b_list.insert(1, 'red') # args = (index, value)
b_list

['foo', 'red', 'peekaboooooooo', 'baz', 'darf']

In [0]:
# removing an element by index
b_list.pop(2) # returns and removes element at specified index

'peekaboooooooo'

In [0]:
b_list

['foo', 'red', 'baz', 'darf']

In [0]:
# removing an element by value
b_list.append('foo')
b_list

['foo', 'red', 'baz', 'darf', 'foo']

In [0]:
b_list.remove('foo') # does not returns the removed element

In [0]:
b_list

['red', 'baz', 'darf', 'foo']

In [0]:
# append and remove both are computationaly expensive for large lists
# check if a list contains a value using the "in" keyword:
'dwarf' in b_list

False

In [0]:
# for negate:
'dwarf' not in b_list # checking values in list is also an expensive op, prefering dicts in place of list is recommended

True

Concatenating and combining lists

In [0]:
[4, None, 'foo'] + [7, 8, (2, 3)] # using '+' creates a new list (computationaly expensive) 

[4, None, 'foo', 7, 8, (2, 3)]

In [0]:
# appending multiple elements to an existing list
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)]) # appends elements to the existing list, instead of creating. Use this instead of '+' while working with large lists 
x

[4, None, 'foo', 7, 8, (2, 3)]

Sorting

In [0]:
# list can be sorted in-place by calling its sort function
a = [7, 2 , 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [0]:
# key argument of sort()
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len) # sort by the length of the strings
b

['He', 'saw', 'six', 'small', 'foxes']

Binary Search and maintaning a sorted list

In [0]:
# bisect module functions do not checks whether the list is sorted or not
# bisect.bisect() -> find the location where an element should be inserted to keep a list sorted
import bisect
c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c, 2)

4

In [0]:
bisect.bisect(c, 6)

6

In [0]:
# insert the element in the appropriate location while keeping the list sorted
bisect.insort(c, 6)
c

[1, 2, 2, 2, 3, 4, 6, 7]

Slicing

In [0]:
# selecting sections of sequence types by slicing notation
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1: 5] # start(include): stop(exclude) 

[2, 3, 7, 5]

In [0]:
# assigning values with slicing
seq[3:4] = [6, 3]
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

In [0]:
# start/stop if omitted, default start/stop will be used
seq[:5]

[7, 2, 3, 6, 3]

In [0]:
seq[3:]

[6, 3, 5, 6, 0, 1]

In [0]:
# slicing with negative indices
seq[-4:] # from the last forth element

[5, 6, 0, 1]

In [0]:
seq[-6:-2] # from the last sixth to last second

[6, 3, 5, 6]

In [0]:
# using step to take every second element
seq[::2]

[7, 3, 3, 6, 1]

In [0]:
# reversing a list or a tuple
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

##### Built-in Sequence Functions

enumerate

In [0]:
# Python has a built-in function enumerate, which returns a sequence of (index, value) tuples:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
  mapping[v] = i

mapping

{'bar': 1, 'baz': 2, 'foo': 0}

sorted

In [0]:
# sorted function returns a new sorted list from the elements of any sequence
sorted([7, 1, 2, 3, 6, 0, 4])

[0, 1, 2, 3, 4, 6, 7]

In [0]:
sorted('horse race') # recieves same arguments as sort()

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

zip

In [0]:
# "pairs" up elements of a number of sequences to create a list of tuples:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2) # can take arbitary number of sequences
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [0]:
# length of the zipped sequence is decided by the smallest sequence
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

In [0]:
# a common use of zip with enumerate:
for i, (a, b) in enumerate(zip(seq1, seq2)):
  print('{0}: {1} {2}'.format(i, a, b))

0: foo one
1: bar two
2: baz three


In [0]:
# using zip for "unzip" the sequence:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]
first_names, last_names = zip(*pitchers)
first_names

('Nolan', 'Roger', 'Schilling')

In [0]:
last_names

('Ryan', 'Clemens', 'Curt')

reversed

In [0]:
# reversed is a generator, which iterates over elements of a sequence in reversed order
reversed(range(10))

<range_iterator at 0x7f4d097f8c00>

In [0]:
# told you! its a generator
list(reversed(range(10))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

##### dict

In [0]:
# dict is hash map or associative array
# a flexibly sized collection of key-value pairs. (Key-values are python objects)
empty_dict = {}
d1 = {'a': 'some value', 'b': [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [0]:
# accessing and insertion can be done using same syntax
d1[7] = 'an integer'
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [0]:
d1['b']

[1, 2, 3, 4]

In [0]:
# checking wether a dict contains a key
'b' in d1

True

In [0]:
# To delete values from dict: "del" keyword or "pop()" method can be used
d1[5] = 'some value'
d1

{5: 'some value', 7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [0]:
d1['dummy'] = 'another value'
d1

{5: 'some value',
 7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [0]:
del d1[5]
d1

{7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [0]:
ret = d1.pop('dummy') # this will return the value
ret

'another value'

In [0]:
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [0]:
# dicts keys() and values() methods returns keys and values in the same order 
print(list(d1.keys()))
print(list(d1.values()))

['a', 'b', 7]
['some value', [1, 2, 3, 4], 'an integer']


In [0]:
# merge one dict into another: update() method
d1.update({'b': 'foo', 'c': 132}) # if the key is already present then changes will be made in-place
d1

{7: 'an integer', 'a': 'some value', 'b': 'foo', 'c': 132}

Creating dicts from sequences

In [0]:
mapping = dict(zip(range(5), reversed(range(5)))) # dict method accepts paired sequences
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

Default values

In [0]:
# get() method:
# -------------
# a common logic to get some value from a dict like this:
key = 9
default_value = 'blank'
if key in mapping:
  value = mapping[key]
else:
  value = default_value
print(value)

blank


In [0]:
# can be easily implemented using get() method of dict
value = mapping.get(key, default_value) # if key is not present then default value from the argument will be returned
value

'blank'

In [0]:
value = mapping.get(3, default_value)
value

1

In [0]:
value = mapping.get(key) # if default value is not passed as argument, then None will be returned in case of a miss
print(value)

None


In [0]:
# pop() method:
# -------------
mapping.pop(key, default_value) # to remove a key

'blank'

In [0]:
mapping.pop(key) # if default value is not passed as argument, KeyError will be raised in case of a miss

KeyError: ignored

In [0]:
ret = mapping.pop(3) # example of removing an element successfully
ret, mapping

(1, {0: 4, 1: 3, 2: 2, 4: 0})

In [0]:
# setdefault() method:
# --------------------
# categorizing a list of words by their first letters as a dict without setdefault() method:
words = ['apple', 'bat', 'bat', 'atom', 'book']
by_letter = {}

for word in words:
  letter = word[0]
  if letter not in by_letter:
    by_letter[letter] = [word]
  else:
    by_letter[letter].append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bat', 'book']}

In [0]:
# with setdefault() method
by_letter = {}
for word in words:
  letter = word[0]
  by_letter.setdefault(letter, []).append(word) # (if letter is not in key then assign it with a blank list []).append(word) 
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bat', 'book']}

In [0]:
# defaultdict() method from collections module:
# ---------------------------------------------
# same previous problem with defaultdict() method:
from collections import defaultdict
by_letter = defaultdict(list) # pass a type or a function for generating the default value for each slot in the dict
for word in words:
  by_letter[word[0]].append(word)
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bat', 'book']})

In [0]:
dict(by_letter) # required dict

{'a': ['apple', 'atom'], 'b': ['bat', 'bat', 'book']}

Valid dict types

In [0]:
# if an object is hashable, it is also a valid key. (immutable objects are hashable)
hash('string')

-1717063527501387336

In [0]:
hash((1, 2, (3, 4)))

-2725224101759650258

In [0]:
hash((1, 2, [2, 3])) # fails because lists are immutable

TypeError: ignored

In [0]:
# to use a list as a key, convert it to tuple
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

##### set

In [0]:
# unordered collection of unique elements
# 1 way of creating a set:
set([2, 3, 2, 2, 1, 3, 3])

{1, 2, 3}

In [0]:
# another way of creating a set:
{2, 2, 2, 1, 3, 3} # curly braces in this context also known as "set literal"

{1, 2, 3}

In [0]:
# Set supports mathematical operations: union, intersection, difference, and symmetric difference
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [0]:
# union: set of distinct elements occuring in either set
print(a.union(b))
print(a | b) # | binary operator can also be used for the same

{1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 4, 5, 6, 7, 8}


In [0]:
# instersection: elements occuring in both sets (common elements)
print(a.intersection(b))
print(a & b) # using & operator

{3, 4, 5}
{3, 4, 5}


In [0]:
c = a.copy()
c |= b # c = union of the contents of c and b
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [0]:
d = a.copy()
d &= b # d = intersection of the contents of d and b
d

{3, 4, 5}

In [0]:
# elements of set must also be immutable
my_data = {[1, 2, 3, 4]} # this will through error because list is mutable

TypeError: ignored

In [0]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)} # since tuple is immutable, it will not give any errors
my_set

{(1, 2, 3, 4)}

In [0]:
# wether a set is a subset / superset of another set:
a_set = {1, 2, 3, 4, 5}
print({1, 2, 3}.issubset(a_set)) # True if {1, 2, 3} are contained in a_set
print(a_set.issuperset({1, 2, 3})) # True if a_set contains all elements of {1, 2, 3}
print({1, 2, 3} == {3, 2, 1}) # sets are equal only uf their contents are equal

True
True
True


##### List, Set and Dict Comprehensions

<pre># one consice expressions of for loop and if-else (one liners)
# syntax for list comprehension: 
# ---------
[expr for val in collection if condition]
# is equivalent to:
result = []
for val in collection:
  if condition:
    result.append(expr)</pre>

In [0]:
# example:
# given a list of strings, filter out strings with length 2 or less and also convert them to uppercase
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

<pre>
# dict comprehension
# syntax:
#--------
{key-expr: value-expr for value in collection if condition}
# set comprehension
# syntax:
#--------
{expr for value in collection if condition}
</pre>

In [0]:
# set comprehension example:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [0]:
# more functional way of expressing the set comprehension:
set(map(len, strings)) # map apply len() function to each element of strings list and returns a iterator

{1, 2, 3, 4, 6}

In [0]:
# dict comprehension example:
loc_mapping = {val: index for index, val in enumerate(strings) }
loc_mapping # mapping element with corresponding index

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

Nested list comprehension

In [0]:
# task: get list of all names with two or more e's
all_data = [['John', 'Emily', 'Michael, Mary', 'Steven'], # english names
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']] # spanish names

# a common approach will be:
names_of_interest = []
for names in all_data:
  enough_es = [name for name in names if name.count('e') >= 2]
  names_of_interest.extend(enough_es)
names_of_interest

['Steven']

In [0]:
# wrapping up above operation in a single expression using nested list comprehension:
result = [name for names in all_data for name in names if name.count('e') >= 2]
result

['Steven']

In [0]:
# another example: "flatten" a list of tuples into a list
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [element for tup in some_tuples for element in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [0]:
# produce a list of lists from the above tuples list
[[element for element in tup] for tup in some_tuples] # we can use arbitarily any number of nested comprehensions. If you use more of it, just make sure it is readable

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

#### 3.2 Functions

In [0]:
# method for code organization and reusablity
# syntax with example:
#---------------------
def my_function(x, y, z=1.5): # declared with def keyword
  if z > 1:
    return z * (x + y) # can have multiple return statements
  else:
    return z / (x + y) # if return statements are not executed / or not specified, "None" is returned as a default return value

In [0]:
# positional arguments: x and y
# keyword arguments: z
# i.e., above function can be called in any of these ways:
print(my_function(5, 6, z=0.7))
print(my_function(3.14, 7, 3.5))
print(my_function(10, 20))

0.06363636363636363
35.49
45.0


In [0]:
# keyword arguments must always follow positional arguments, if any
# more convinient way, if do not want to remember order of positional arguments:
print(my_function(x=5, y=6, z=7))
print(my_function(y=6, x=5, z=7))

77
77


##### Namespaces, Scope and Local Functions

In [0]:
# local namespace is created when a function is called:
def func():
  a = []
  for i in range(5):
    a.append(i)
func() # a is created at function call and destroyed once function call is executed
a # error, because a is only accessible in the function's local space

NameError: ignored

In [0]:
a = []
def func():
  for i in range(5):
    a.append(i)
func()
a # a is accessible, since its scope lies inside as well as outside of the function

[0, 1, 2, 3, 4]

In [0]:
a = None # declared outside
def bind_a_variable():
  a = [] # declared inside, hence this is a different variable
bind_a_variable()
print(a)

None


In [0]:
# to assign variable outside of the function, use "global" keyword
# one should avoid global declaration if possible
a = None
def bind_a_variable():
  global a # if global declaration is used many times, then you should use object oriented programming
  a = []
bind_a_variable()
print(a)

[]


##### Returning Multiple Values

In [0]:
# example:
def f():
  a = 5
  b =6
  c =7
  return a, b, c # returns a tuple
a, b, c = f()
print("a: {}, b: {}, c: {}".format(a, b, c))

a: 5, b: 6, c: 7


In [0]:
# alternative approach is to return a dictionary
def f():
  a = 5
  b =6
  c =7
  return {'a': a, 'b': b, 'c': c}
f()

{'a': 5, 'b': 6, 'c': 7}

##### Functions are objects

In [0]:
# example of a common way for data preprocessing of a list like below should be:
states = ['    Alabama', 'Georgia!', 'Georgia', 'georgia', 'FlOriDa', 'south   carolina##', 'West virginia?' ]

import re # built-in module for regular expressions

def clean_strings(strings):
  result = []
  for value in strings:
    value = value.strip()
    value = re.sub("[!#?]", "", value) # removing special symbols like !#?
    value = value.title() # convert to title case
    result.append(value)
  return result

clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [0]:
# Alternate approach:
def remove_punctuation(value):
  return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title] # since functions are also objects in python, we can store them in a list

def clean_strings(strings, ops):
  result = []
  for value in strings:
    for function in ops:
      value = function(value) # calling functions in a generalized way
    result.append(value)
  return result

clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [0]:
# a more functional approach for the same problem can be:
for x in map(remove_punctuation, states): # map() applies input function to every value of the input iterator 
  print(x)

    Alabama
Georgia
Georgia
georgia
FlOriDa
south   carolina
West virginia


##### Anonymus (Lambda) functions

In [0]:
# a way of writing functions consisting of a single statement
def short_function(x):
  return x * 2

# equivalent lambda function:
equiv_anon = lambda x: x*2

print(short_function(3))
print(equiv_anon(3))

6
6


In [0]:
# it can be helpful for simple tasks like:
def apply_to_list(some_list, f):
  return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x*2) # in this way can pass a custom operator to apply_to_list() function

[8, 0, 2, 10, 12]

In [0]:
# another example, sort a collection of strings by the number of distinct letters in each string:
strings = ['foo', 'card', 'bar', 'aaa', 'abab']
strings.sort(key=lambda x: len(set(list(x))))
strings

['aaa', 'foo', 'abab', 'bar', 'card']

In [0]:
# lambda functions never give an explicit __name__ attribute, hence known as anonymus function
equiv_anon.__name__

'<lambda>'

In [0]:
apply_to_list.__name__ # since this function is declared with "def" keyword

'apply_to_list'

##### Currying: Partial Argument Application

In [0]:
# deriving new functions from existing ones by partial argument application
def add_numbers(x, y):
  return x+y

add_five = lambda y: add_numbers(5, y) # second number is curried
add_five(7)

12

In [0]:
# built-in functools module can simplify this process using the partial function
from functools import partial
add_five = partial(add_numbers, 5)
add_five(6)

11

##### Generators

Before moving forward. I would suggest to read this: https://stackoverflow.com/a/42838757/8438777

In [0]:
# iterator protocol is a generic way to make objects Iterable
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict: # here python interpreter first attempts to create an iterator out of some_dict
  print(key)

a
b
c


In [0]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x7f1a65953278>

In [0]:
# an iterator is any object that yeild objects to the Python interpretor when used in a context like a for loop
list(dict_iterator)

['a', 'b', 'c']

In [0]:
# generator return a sequence of multiple results lazily, pausing after each one untill the next is requested
def squares(n=10):
  print("Generating squares from 1 to {0}".format(n ** 2))
  for i in range(1, n+1):
    yield i ** 2

In [0]:
gen = squares() # no code is executed when a generator is called
gen

<generator object squares at 0x7f1a658b9fc0>

In [0]:
for x in gen: # when we request elements code is executed for each element
  print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

###### Generator expressions

In [0]:
# alternate way to create generator (similar to list, dict, set comprehension)
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x7f1a6586daf0>

In [0]:
# above statement is equivalent to:
def _make_gen():
  for x in range(100):
    yield x ** 2
gen = _make_gen()
gen

<generator object _make_gen at 0x7f1a658b9f10>

In [0]:
# example of such usecases:
sum(x ** 2 for x in range(100))

328350

In [0]:
dict((i, i**2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

###### itertools module

In [0]:
# itertools module has a collection of generators
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
for letter, names in itertools.groupby(names, first_letter):
  print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


In [0]:
# some other useful functions of itertools module
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
list(itertools.combinations(names, 3))

[('Alan', 'Adam', 'Wes'),
 ('Alan', 'Adam', 'Will'),
 ('Alan', 'Adam', 'Albert'),
 ('Alan', 'Adam', 'Steven'),
 ('Alan', 'Wes', 'Will'),
 ('Alan', 'Wes', 'Albert'),
 ('Alan', 'Wes', 'Steven'),
 ('Alan', 'Will', 'Albert'),
 ('Alan', 'Will', 'Steven'),
 ('Alan', 'Albert', 'Steven'),
 ('Adam', 'Wes', 'Will'),
 ('Adam', 'Wes', 'Albert'),
 ('Adam', 'Wes', 'Steven'),
 ('Adam', 'Will', 'Albert'),
 ('Adam', 'Will', 'Steven'),
 ('Adam', 'Albert', 'Steven'),
 ('Wes', 'Will', 'Albert'),
 ('Wes', 'Will', 'Steven'),
 ('Wes', 'Albert', 'Steven'),
 ('Will', 'Albert', 'Steven')]

In [0]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
list(itertools.permutations(names, 2))

[('Alan', 'Adam'),
 ('Alan', 'Wes'),
 ('Alan', 'Will'),
 ('Alan', 'Albert'),
 ('Alan', 'Steven'),
 ('Adam', 'Alan'),
 ('Adam', 'Wes'),
 ('Adam', 'Will'),
 ('Adam', 'Albert'),
 ('Adam', 'Steven'),
 ('Wes', 'Alan'),
 ('Wes', 'Adam'),
 ('Wes', 'Will'),
 ('Wes', 'Albert'),
 ('Wes', 'Steven'),
 ('Will', 'Alan'),
 ('Will', 'Adam'),
 ('Will', 'Wes'),
 ('Will', 'Albert'),
 ('Will', 'Steven'),
 ('Albert', 'Alan'),
 ('Albert', 'Adam'),
 ('Albert', 'Wes'),
 ('Albert', 'Will'),
 ('Albert', 'Steven'),
 ('Steven', 'Alan'),
 ('Steven', 'Adam'),
 ('Steven', 'Wes'),
 ('Steven', 'Will'),
 ('Steven', 'Albert')]

In [0]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
list(itertools.combinations_with_replacement(names, 2))

[('Alan', 'Alan'),
 ('Alan', 'Adam'),
 ('Alan', 'Wes'),
 ('Alan', 'Will'),
 ('Alan', 'Albert'),
 ('Alan', 'Steven'),
 ('Adam', 'Adam'),
 ('Adam', 'Wes'),
 ('Adam', 'Will'),
 ('Adam', 'Albert'),
 ('Adam', 'Steven'),
 ('Wes', 'Wes'),
 ('Wes', 'Will'),
 ('Wes', 'Albert'),
 ('Wes', 'Steven'),
 ('Will', 'Will'),
 ('Will', 'Albert'),
 ('Will', 'Steven'),
 ('Albert', 'Albert'),
 ('Albert', 'Steven'),
 ('Steven', 'Steven')]

In [0]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
cities = ['Mumbai', 'Delhi', 'Hyderabad', 'Tokyo', 'Moscow']
list(itertools.product(names, cities, repeat=1)) # tweak repeat parameter to change repetitions of iterables

[('Alan', 'Mumbai'),
 ('Alan', 'Delhi'),
 ('Alan', 'Hyderabad'),
 ('Alan', 'Tokyo'),
 ('Alan', 'Moscow'),
 ('Adam', 'Mumbai'),
 ('Adam', 'Delhi'),
 ('Adam', 'Hyderabad'),
 ('Adam', 'Tokyo'),
 ('Adam', 'Moscow'),
 ('Wes', 'Mumbai'),
 ('Wes', 'Delhi'),
 ('Wes', 'Hyderabad'),
 ('Wes', 'Tokyo'),
 ('Wes', 'Moscow'),
 ('Will', 'Mumbai'),
 ('Will', 'Delhi'),
 ('Will', 'Hyderabad'),
 ('Will', 'Tokyo'),
 ('Will', 'Moscow'),
 ('Albert', 'Mumbai'),
 ('Albert', 'Delhi'),
 ('Albert', 'Hyderabad'),
 ('Albert', 'Tokyo'),
 ('Albert', 'Moscow'),
 ('Steven', 'Mumbai'),
 ('Steven', 'Delhi'),
 ('Steven', 'Hyderabad'),
 ('Steven', 'Tokyo'),
 ('Steven', 'Moscow')]

##### Errors and Exception Handling

In [0]:
float('1.2345') # can convert a string with numeric characters to float

1.2345

In [0]:
float('something') # value error on improper inputs

ValueError: could not convert string to float: 'something'

In [0]:
# example of exception handling
# return input if fails to convert into float
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [0]:
attempt_float('1.2345')

1.2345

In [0]:
attempt_float('something')

'something'

In [0]:
float((1, 2)) # float can raise exceptions other than ValueError

TypeError: float() argument must be a string or a number, not 'tuple'

In [0]:
# if you want to handle only ValueError:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

attempt_float('something')

'something'

In [0]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

In [0]:
# We can catch multiple exception types by writing tuple expression types
def attempt_float(x):
    try:
        return float(x)
    except (ValueError, TypeError):
        return x

attempt_float((1, 2))

(1, 2)

In [0]:
attempt_float('something')

'something'

In [0]:
# finally block:
# if you want some code to be executed regardless of wether the code in try block succeeds or not.
path = "test_file1.txt"
with open(path, 'w') as f:
    try:
        f.write('Hi')
        print('file write success')
    finally:
        print('closing file')
        f.close() # close regardless of try success

file write success
closing file


In [0]:
# we can have code that executes only if the try: block succeeds using else:
path = "test_file1.txt"
with open(path, 'w') as f:
    try:
        f.write("Hi")
    except:
        print('Failed')
    else:
        print('Success')
    finally:
        print('closing file')
        f.close()

Success
closing file


###### Exceptions in IPython

Since in this chapter there is not much explaination about %xmode and %debug magic commands. I gone through some blogs and found below github page very useful.

(from Jake VanderPlas' blog)

<pre>Before moving further. Please consider going through:
https://jakevdp.github.io/PythonDataScienceHandbook/01.06-errors-and-debugging.html#Partial-list-of-debugging-commands
I found this blog very useful by jakevdp (author: Python Data Science Handbook).
</pre>

%xmode magic function, IPython allows us to control the amount of information printed when exception is raised

In [0]:
def func1(a, b):
    return a/b

def func2(x):
    a = x
    b = x - 1
    return func1(a, b)

func2(1)

ZeroDivisionError: division by zero

<pre>%xmode arguments:
1. Plain (more compact and gives less info)
2. Context (gives output like that just shown before)
3. Verbose (gives extra info, including arguments to any function)</pre>

In [0]:
%xmode Plain

Exception reporting mode: Plain


In [0]:
func2(1)

ZeroDivisionError: division by zero

In [0]:
%xmode Verbose

Exception reporting mode: Verbose


In [0]:
func2(1)

ZeroDivisionError: division by zero

##### Debugging (from Jake VanderPlas' blog)

<pre>pdb is standard Python tool for interactive debugging.
IPython-enhanced version is ipdb. (the IPython Debugger)
</pre>

%debug magic command is more convinient. If you call it after hitting an exception, it will automatically open an interactive debugging prompt at the point of the exception.

In [0]:
%debug

> [0;32m<ipython-input-21-1b22cb5b9fa4>[0m(2)[0;36mfunc1[0;34m()[0m
[0;32m      1 [0;31m[0;32mdef[0m [0mfunc1[0m[0;34m([0m[0ma[0m[0;34m,[0m [0mb[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 2 [0;31m    [0;32mreturn[0m [0ma[0m[0;34m/[0m[0mb[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      3 [0;31m[0;34m[0m[0m
[0m[0;32m      4 [0;31m[0;32mdef[0m [0mfunc2[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m    [0ma[0m [0;34m=[0m [0mx[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  print(a)


1


ipdb>  print(b)


0


ipdb>  quit


stepping up and down through the stack

In [0]:
%debug

> [0;32m<ipython-input-21-1b22cb5b9fa4>[0m(2)[0;36mfunc1[0;34m()[0m
[0;32m      1 [0;31m[0;32mdef[0m [0mfunc1[0m[0;34m([0m[0ma[0m[0;34m,[0m [0mb[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 2 [0;31m    [0;32mreturn[0m [0ma[0m[0;34m/[0m[0mb[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      3 [0;31m[0;34m[0m[0m
[0m[0;32m      4 [0;31m[0;32mdef[0m [0mfunc2[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m    [0ma[0m [0;34m=[0m [0mx[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  up


> [0;32m<ipython-input-21-1b22cb5b9fa4>[0m(7)[0;36mfunc2[0;34m()[0m
[0;32m      5 [0;31m    [0ma[0m [0;34m=[0m [0mx[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      6 [0;31m    [0mb[0m [0;34m=[0m [0mx[0m [0;34m-[0m [0;36m1[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 7 [0;31m    [0;32mreturn[0m [0mfunc1[0m[0;34m([0m[0ma[0m[0;34m,[0m [0mb[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      8 [0;31m[0;34m[0m[0m
[0m[0;32m      9 [0;31m[0mfunc2[0m[0;34m([0m[0;36m1[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  print(x)


1


ipdb>  up


> [0;32m<ipython-input-30-7cb498ea7ed1>[0m(1)[0;36m<module>[0;34m()[0m
[0;32m----> 1 [0;31m[0mfunc2[0m[0;34m([0m[0;36m1[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  down


> [0;32m<ipython-input-21-1b22cb5b9fa4>[0m(7)[0;36mfunc2[0;34m()[0m
[0;32m      5 [0;31m    [0ma[0m [0;34m=[0m [0mx[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      6 [0;31m    [0mb[0m [0;34m=[0m [0mx[0m [0;34m-[0m [0;36m1[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 7 [0;31m    [0;32mreturn[0m [0mfunc1[0m[0;34m([0m[0ma[0m[0;34m,[0m [0mb[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      8 [0;31m[0;34m[0m[0m
[0m[0;32m      9 [0;31m[0mfunc2[0m[0;34m([0m[0;36m1[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  quit


#### 3.3 Files and the Operating System

In [0]:
%%bash
touch segismundo.txt
cat > segismundo.txt
Sueña el rico en su riqueza,
que más cuidados le ofrece;

sueña el pobre que padece
su miseria y su pobreza;

sueña el que a medrar empieza,
sueña el que afana y pretende,
sueña el que agravia y ofende,

y en el mundo, en conclusión,
todos sueñan lo que son,
aunque ninguno lo entiende.


In [0]:
# open a file for reading and writing
path = 'segismundo.txt'
f = open(path) # By default, the file is opened in read-only mode 'r'
# we can read like a list
for line in f:
  pass

In [0]:
# lines come out of the file with the end-of-line (EOL) markers intect
# we can get EOL-free list of files:
lines = [x.rstrip() for x in open(path)]
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.']

In [0]:
# closing the file releases its resource back to the operating system
f.close()

One way to make it easier to clean up open files is to use the **with** statement

In [0]:
with open(path) as f:
  lines = [x.rstrip() for x in f] # this will automatically close the file f when exiting the with: block
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.']

<pre>
f = open(path, 'w') --> will create a new file at path, overwriting any one in its place
f = open(path, 'w') --> creates a writable file if not exists, else raise error
</pre>

Some of the most commonly used methods for readable files:

In [0]:
f = open(path)
f.read(10) # returns certain number of characters from the file, in this case first 10 letters

'Sueña el r'

In [0]:
f2 = open(path, 'rb') # Binary mode (raw bytes instead of characters like before)
f2.read(10)

b'Sue\xc3\xb1a el '

In [0]:
# read method advances the file handle's position by the number of bytes read.
# tell gives you the current position
f.tell() # since default encoding took 11 bytes to decode 10 characters

11

In [0]:
f2.tell()

10

In [0]:
# check default encoding
import sys
sys.getdefaultencoding()

'utf-8'

In [0]:
# seek changes the file position to the indicated byte in the file
f.seek(3)

3

In [0]:
f2.seek(1)

1

In [0]:
f.close()
f2.close()

write() and writelines() methods

In [0]:
with open('tmp.txt', 'w') as handle:
  handle.writelines(x for x in open(path) if len(x) > 1)
with open('tmp.txt') as f:
  lines = f.readlines()
lines

['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.\n']

In [0]:
# to check wether a file is closed
f.closed

True

##### Bytes and Unicode with Files

Default mode for Python files: **text mode**

In [0]:
with open(path) as f:
  chars = f.read(10)
chars

'Sueña el r'

f.read(10) in binary mode read enough bytes (as few as 10 or as many as 40) to read 10 unicode characters

In [0]:
with open(path, 'rb') as f:
  data = f.read(10)
data

b'Sue\xc3\xb1a el '

you can only decode bytes to a str object from Unicode characters, only if each of them are fully formed

In [0]:
data.decode('utf8')

'Sueña el '

In [0]:
data[:4].decode('utf8')

UnicodeDecodeError: ignored

Convinient way to convert from one Unicode encoding to another:

In [0]:
sink_path = 'sink.txt'
with open(path) as source:
  with open(sink_path, 'xt') as sink:
    sink.write(source.read())

with open(sink_path, encoding='iso-8859-1') as f:
  print(f.read(10))

SueÃ±a el 


Similar to decoding, seeking when opening a file in any mode other than binary, make sure it does not falls in the middle of the bytes defining Unicode character

In [0]:
f = open(path)
f.read(5)

'Sueña'

In [0]:
f.seek(4)

4

In [0]:
f.read(1)

UnicodeDecodeError: ignored

In [0]:
f.close()