# Chapter 2. A crash course in python

## 2.1 The zen of python

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## 2.2 Getting python

Anaconda distribution

## 2.3 Virtual environments

Every data science project you do will require some combination of external libraries, sometimes with specific versions that differ from the specific versions you used for other projects. If you were to have a single Python installation, these libraries would conflict and cause you all sorts of problems.

The standard solution is to use virtual environments, which are sandboxed Python environments that maintain their own versions of Python libraries (and, depending on how you set up the environment, of Python itself).

conda create -n dsfs python=3.6

As a matter of good discipline, you should always work in a virtual environment, and never using the “base” Python installation.

## 2.4 Whitespace formatting

Many languages use curly braces to delimit blocks of code. Python uses indentation

Whitespace is ignored inside parentheses and brackets, which can be helpful for long-winded computations

Use backslash "\" to indicate that a statement continues onto the next line

One consequence of whitespace formatting is that it can be hard to copy and paste code into the Python shell

## 2.5 Modules

Certain features of Python are not loaded by default. These include both features that are included as part of the language as well as third-party features that you download yourself. In order to use these features, you’ll need to import the modules that contain them.

In [14]:
import re
my_regex = re.compile("[0-9]+", re.I)

In [15]:
import re as regex
my_regex = regex.compile("[0-9]+", regex.I)

In [16]:
import matplotlib.pyplot as plt

In [17]:
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()

In [None]:
# from re import *

## 2.6 Functions

A function is a rule for taking zero or more inputs and returning a corresponding output

In [18]:
def double(x):
    '''Multiplies its input by 2'''
    return x * 2

Python functions are first-class, which means that we can assign them to variables and pass them into functions just like any other arguments

In [19]:
def apply_to_one(f):
    '''Calls the function f with 1 as its argument'''
    return f(1)
my_double = double
x = apply_to_one(my_double)
x

2

It is also easy to create short anonymous functions, or lambdas

In [26]:
y = apply_to_one(lambda x: x + 4)
y

5

Function parameters can also be given default arguments, which only need to be specified when you want a value other than the default:

In [27]:
def my_print(message = 'my default message'):
    print(message)
my_print()
my_print('hello')

my default message
hello


In [31]:
def full_name(first = "What's-his-name", 
              last = "Something"):
    return first + " " + last

full_name('Joel', 'Grus'), full_name('Joel'), full_name(last = 'Grus'), full_name()

('Joel Grus',
 'Joel Something',
 "What's-his-name Grus",
 "What's-his-name Something")

## 2.7 Strings

In [32]:
single_quoted_string = 'data science'
double_quoted_string = "data science"

Python uses backslashes to encode special characters

In [33]:
tab_string = "\t"
len(tab_string)

1

In [34]:
# Raw strings 
not_tab_string = r'\t'
len(not_tab_string)

2

Create multiline strings using three double quotes

In [36]:
multi_line_string = '''This is the first line.
and this is the second line
and this is the third line
'''
multi_line_string

'This is the first line.\nand this is the second line\nand this is the third line\n'

f-string provides a simple way to substitute values into strings 

In [37]:
first_name = 'Joel'
last_name = 'Grus'

In [38]:
full_name1 = first_name + " " + last_name
full_name1

'Joel Grus'

In [39]:
full_name2 = "{0} {1}".format(first_name, last_name)
full_name2

'Joel Grus'

In [40]:
full_name3 = f'{first_name} {last_name}'
full_name3

'Joel Grus'

## 2.8 Exceptions

When something goes wrong, Python raises an exception. Unhandled, exceptions will cause your program to crash. You can handle them using try and except

In [41]:
try: 
    print(0 / 0)
except ZeroDivisionError:
    print('cannot divide by zero')

cannot divide by zero


## 2.9 Lists

List is the most fundamental data structure in python, which is simply an ordered collection (it is similar to what in other languages might be called an array, but with some added functionality 

In [11]:
integer_list = [1,2,3]
heterogeneous_list = ['string', 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []]

In [13]:
list_length = len(integer_list)
list_length

3

In [14]:
list_sum = sum(integer_list)
list_sum

6

In [15]:
x = [0,1,2,3,4,5,6,7,8,9]
zero = x[0]
one = x[1]
nine = x[-1]
eight = x[-2]
x[0] = -1

In [16]:
x

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [18]:
first_three = x[:3]
three_to_end = x[3:]
one_to_four = x[1:5]
last_three = x[-3:]
without_first_and_last = x[1:-1]
copy_of_x = x[:]

In [19]:
every_third = x[::3]
five_to_three = x[5:2:-1]

In [20]:
0 in [1,2,3]

False

In [21]:
1 in [1,2,3]

True

In [31]:
x = [1,2,3]
x.extend([4,5,6])
x

[1, 2, 3, 4, 5, 6]

In [32]:
x = [1,2,3]
y = x + [4,5,6]
y

[1, 2, 3, 4, 5, 6]

In [33]:
x = [1,2,3]
x.append(0)
x

[1, 2, 3, 0]

In [35]:
y = x[-1]
z = len(x)

In [36]:
x, y = [1,2]
x

1

In [37]:
_, y = [1,2]
y

2

## 2.10 Tuples

Tuples are lists’ immutable cousins. Pretty much anything you can do to a list that doesn’t involve modifying it, you can do to a tuple. You specify a tuple by using parentheses (or nothing) instead of square brackets:

In [39]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4
my_list[1] = 3

In [40]:
my_tuple[1] 

2

In [41]:
my_tuple[1] = 3

TypeError: 'tuple' object does not support item assignment

In [42]:
try:
    my_tuple[1] = 3
except TypeError:
    print("cannot modify a tuple")

cannot modify a tuple


Tuples are a convenient way to return multiple values from functions

In [43]:
def sum_and_product(x, y):
    return (x+y),(x*y)

sp = sum_and_product(2,3)
s, p = sum_and_product(5,10)

In [44]:
sp

(5, 6)

In [45]:
s, p

(15, 50)

In [46]:
x, y = 1, 2

In [47]:
x

1

In [48]:
y

2

In [49]:
x, y = y, x

In [50]:
x

2

In [51]:
y

1

## 2.11 Dictionaries

Dicts associate values with keys and allows quickly retrieve the value corresponding to a given key

In [52]:
empty_dict = {}
empty_dict2 = dict()
grades = {"Joel":80, "Tim": 95}

In [54]:
joels_grade = grades['Joel']
joels_grade

80

In [55]:
try:
    kates_grade = grades['Kate']
except KeyError:
    print("no grade for Kate!")

no grade for Kate!


In [62]:
joel_has_grade = 'Joel' in grades
joel_has_grade

True

In [63]:
kate_has_grade = 'Kate' in grades
kate_has_grade

False

In [65]:
joels_grade = grades.get('Joel', 0)
joels_grade

80

In [67]:
kates_grade = grades.get('Kate', 0)
kates_grade

0

In [70]:
no_ones_grade = grades.get('No One')
print(no_ones_grade)

None


In [71]:
grades['Tim'] = 99
grades['Kate'] = 100
num_students = len(grades)
num_students

3

In [72]:
grades

{'Joel': 80, 'Tim': 99, 'Kate': 100}

In [75]:
tweet = {
    "user" : "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count" : 100,
    "hashtags" : ["#data", "#science", "#datascience", "#awesome", "#yolo"]
}

In [76]:
tweet_keys = tweet.keys()
tweet_values = tweet.values()
tweet_items = tweet.items()

In [77]:
'user' in tweet_keys

True

In [78]:
'user' in tweet

True

In [79]:
'joelgrus' in tweet_values

True

Dictionary keys must be “hashable”; in particular, you cannot use lists as keys. If you need a multipart key, you should probably use a tuple or figure out a way to turn the key into a string

### defaultdict

Imagine that you’re trying to count the words in a document. An obvious approach is to create a dictionary in which the keys are words and the values are counts. As you check each word, you can increment its count if it’s already in the dictionary and add it to the dictionary if it’s not

In [None]:
word_counts = {}

for word in document:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

In [None]:
word_counts = {}

for word in document:
    try:
        word_counts[word] += 1
    except KeyError:
        word_counts[word] = 1

In [None]:
word_counts = {}

for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1

A defaultdict is like a regular dictionary, except that when you try to look up a key it doesn’t contain, it first adds a value for it using a zeroargument function you provided when you created it. In order to use defaultdicts, you have to import them from collections:

In [None]:
from collections import defaultdict

word_counts = defaultdict(int) # int() produces 0
for word in document:
    word_counts[word] += 1

In [83]:
from collections import defaultdict

dd_list = defaultdict(list)
dd_list[2].append(1)
dd_list

defaultdict(list, {2: [1]})

In [87]:
dd_dict = defaultdict(dict)
dd_dict['Joel']['City'] = 'Seattle'
dd_dict

defaultdict(dict, {'Joel': {'City': 'Seattle'}})

In [89]:
dd_pair = defaultdict(lambda: [0,0])
dd_pair

defaultdict(<function __main__.<lambda>()>, {})

In [90]:
dd_pair[2][1] = 1
dd_pair

defaultdict(<function __main__.<lambda>()>, {2: [0, 1]})

## 2.12 Counters

A Counter turns a sequence of values into a defaultdict(int)-like object mapping keys to counts

In [91]:
from collections import Counter
c = Counter([0,1,2,0])
c

Counter({0: 2, 1: 1, 2: 1})

In [None]:
# document is a list of words
word_counts = Counter(document)

In [None]:
# print the 10 most common words and their counts
for word, count in word_counts.most_common(10):
    print(word, count)

## 2.13 Sets

set represents a collection of distinct elements. You can define a set by listing its elements between curly braces

In [93]:
primes_below_10 = {2, 3, 5, 7}
primes_below_10

{2, 3, 5, 7}

In [94]:
s = set()
s.add(1)
s.add(2)
s.add(2)
s

{1, 2}

In [95]:
x = len(s)
x

2

In [96]:
y = 2 in s
y

True

In [97]:
z = 3 in s
z

False

We’ll use sets for two main reasons. The first is that in is a very fast operation on sets. If we have a large collection of items that we want to use for a membership test, a set is more appropriate than a list

In [None]:
stopwords_list = ["a", "an", "at"] + hundreds_of_other_words + ["yet", "you"]
"zip" in stopwords_list # False, but have to check every element

stopwords_set = set(stopwords_list)
"zip" in stopwords_set # very fast to check

The second reason is to find the distinct items in a collection:

In [98]:
item_list = [1, 2, 3, 1, 2, 3]
num_items = len(item_list)
num_items

6

In [99]:
item_set = set(item_list)
item_set

{1, 2, 3}

In [100]:
num_distinct_items = len(item_set)
num_distinct_items

3

In [101]:
distinct_item_list = list(item_set)
distinct_item_list

[1, 2, 3]

## 2.14 Control flow

In [106]:
if 1 > 2:
    message = 'if only 1 were greater than two ...'
elif 1 > 3:
    message = 'elif stands for else if'
else: 
    message = 'when all else fails use else'

In [107]:
# Ternary if-then-else on one line 
parity = 'even' if x % 2 == 0 else 'odd'

In [105]:
x = 0
while x < 10:
    print(f'{x} is less than 10')
    x += 1

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


In [104]:
for x in range(10):
    print(f'{x} is less than 10')

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


In [103]:
for x in range(10):
    if x == 3:
        continue # go immediately to the next iteration
    if x == 5:
        break # quit the loop entirely 
    print(x)

0
1
2
4


## 2.15 Truthiness

In [108]:
one_is_less_than_two = 1 < 2
one_is_less_than_two

True

In [109]:
ture_equals_false = True == False
ture_equals_false

False

In [111]:
x = None
assert x == None
assert x is None

In [113]:
x is None

True

In [None]:
s = some_function_that_returns_a_string()
if s:
    first_char = s[0]
else:
    first_char = ""

In [None]:
first_char = s and s[0]

In [None]:
safe_X = x or 0

In [None]:
safe_x = x if x is not None else 0

In [114]:
all([True, 1, {3}])

True

In [115]:
all([True, 1, {}])

False

In [116]:
any([True, 1, {}])

True

In [117]:
all([])

True

In [119]:
any([])

False

## 2.16 Sorting

In [120]:
x = [4,1,2,3]
x.sort()
x

[1, 2, 3, 4]

In [121]:
x = [4,1,2,3]
y = sorted(x)
y, x

([1, 2, 3, 4], [4, 1, 2, 3])

In [122]:
# sort the list by absolute value from largest to smallest 
x = sorted([-4, 1, -2, 3], key = abs, reverse = True)
x

[-4, 3, -2, 1]

In [123]:
# sort the words and counts from highest count to lowest 
wc = sorted(word_counts.items(),
            key = lambda word_and_count: word_and_count[1],
            reverse = True)
wc

[]

## 2.17 List comprehensions

Transform a list into another list by choosing only certain elements

In [124]:
even_numbers = [x for x in range(5) if x % 2 == 0]
even_numbers

[0, 2, 4]

In [125]:
squares = [x * x for x in range(5)]
squares

[0, 1, 4, 9, 16]

In [126]:
even_squares = [x * x for x in even_numbers]
even_squares

[0, 4, 16]

In [127]:
square_dict = {x: x * x for x in range(5)}
square_dict

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

In [128]:
square_set = {x * x for x in [1, -1]}
square_set

{1}

In [132]:
zeros = [0 for _ in even_numbers]
zeros

[0, 0, 0]

In [133]:
zeros = [0 for x in even_numbers]
zeros

[0, 0, 0]

In [135]:
pairs = [(x, y)
         for x in range(10)
         for y in range(10)]
print(pairs)

[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (5, 7), (5, 8), (5, 9), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (6, 7), (6, 8), (6, 9), (7, 0), (7, 1), (7, 2), (7, 3), (7, 4), (7, 5), (7, 6), (7, 7), (7, 8), (7, 9), (8, 0), (8, 1), (8, 2), (8, 3), (8, 4), (8, 5), (8, 6), (8, 7), (8, 8), (8, 9), (9, 0), (9, 1), (9, 2), (9, 3), (9, 4), (9, 5), (9, 6), (9, 7), (9, 8), (9, 9)]


In [136]:
increasing_pairs = [(x, y)
                    for x in range(10)
                    for y in range(x+1, 10)]
print(increasing_pairs)

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]


## 2.18 Automated testing and assert

Assert statement cause the code to raise an AssertionError if the specified condition is not truthy

In [137]:
assert 1 + 1 == 2

In [138]:
assert 1 + 1 == 2, "1 + 1 should equal 2 but didn't"

In [142]:
# To assert the functions that wrote are doing what expected to do 
def smallest_item(xs):
    return min(xs)

assert smallest_item([10, 20, 5, 40]) == 5
assert smallest_item([1, 0, -1, 2]) == -1

In [None]:
def smallest_item(xs):
    assert xs, 'empty list has no smallest item'
    return min(xs)

## 2.19 Object-oriented programming

Python allows to define classes that encapsulate data and the functions that operate on them

Example: Here we’ll construct a class representing a “counting clicker,” the sort that is used at the door to track how many people have shown up for the “advanced topics in data science” meetup. It maintains a count, can be clicked to increment the count, allows you to read_count, and can be reset back to zero. (In real life one of these rolls over from 9999 to 0000, but we won’t bother with that.)

To define a class, you use the class keyword and a PascalCase name

A class contains zero or more member functions. By convention, each takes a first parameter, self, that refers to the particular class instance

Normally, a class has a constructor, named init. It takes whatever parameters you need to construct an instance of your class and does whatever setup you need

In [None]:
class CountingClicker:
    '''A class can/should have a docstring, just like a function'''
    def __int__(self, count = 0):
        self.count = count
    def __repr__(self):
        return f"CountingClicker(count={self.count})"
    def click(self, num_times = 1):
        '''Click the clicker some number of times'''
        self.count += num_times
    def read(self):
        return self.count
    def reset(self):
        self.count = 0

Notice that the __init__ method name starts and ends with double underscores. These “magic” methods are sometimes called “dunder” methods (double-UNDERscore) and represent “special” behaviors

In [None]:
clicker1 = CountingClicker()
clicker2 = CountingClicker(100)
clicker3 = CountingClicker(count = 100)

Class methods whose names start with an underscore are—by convention— considered “private,” and users of the class are not supposed to directly call them. However, Python will not stop users from calling them

Another such method is __repr__, which produces the string representation of a class instance

And finally we need to implement the public API of our class

Having defined it, let’s use assert to write some test cases for our clicker:

In [None]:
clicker = CountingClicker()
assert clicker.read() == 0, "clicker should start with count 0"
clicker.click()
clicker.click()
assert clicker.read() == 2, "after two clicks, clicker should have count 2"
clicker.reset()
assert clicker.read() == 0, "after reset, clicker should be back to 0"

We’ll also occasionally create subclasses that inherit some of their functionality from a parent class. For example, we could create a non-resetable clicker by using CountingClicker as the base class and overriding the reset method to do nothing

In [None]:
# A subclass inherits all the behavior of its parent class
class NoResetClicker(CountingClicker):
    # This class has all the same methods as CountingClicker
    # Except that it has a reset method that does nothing 
    def reset(self):
        pass
    
clicker2 = NoResetClicker()
assert clicker2.read() == 0
clicker2.click()
assert clicker2.read() == 1
clicker2.reset()
assert clicker2.read() == 1, "reset shouldn't do anything"

## 2.20 Iterables and generators

In [147]:
def generate_range(n):
    i = 0
    while i < n:
        yield i # every call to yield produces a value of the generator 
        i += 1
# The following loop will consume the yielded values one at a time until none are left 
for i in generate_range(10):
    print(f"i:{i}")
# In fact, range is itself lazy, so there is no point in doing this 

i:0
i:1
i:2
i:3
i:4
i:5
i:6
i:7
i:8
i:9


In [148]:
# Create an infinite sequence
def natural_numbers():
    '''returns 1, 2, 3, ...'''
    n = 1
    while True:
        yield n
        n += 1

The flip side of laziness is that you can only iterate through a generator once. If you need to iterate through something multiple times, you’ll need to either recreate the generator each time or use a list. If generating the values is expensive, that might be a good reason to use a list instead.

A second way to create generators is by using for comprehensions wrapped in parentheses

In [150]:
evens_below_20 = (i for i in generate_range(20) if i % 2 == 0)
list(evens_below_20)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Such a “generator comprehension” doesn’t do any work until you iterate over it (using for or next). We can use this to build up elaborate dataprocessing pipelines

In [153]:
# None of these computations does anything until we iterate 
data = natural_numbers()
evens = (x for x in data if x % 2 == 0)
even_squares = (x ** 2 for x in evens)
even_squares_ending_in_six = (x for x in even_squares if x % 10 == 6)

Not infrequently, when we’re iterating over a list or a generator we’ll want not just the values but also their indices. For this common case Python provides an enumerate function, which turns values into pairs (index, value)

In [155]:
names = ['Alice', 'Bob', 'Charlie', 'Debbie']

for i in range(len(names)):
    print(f'name {i} is {names[i]}')

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


In [156]:
i = 0
for name in names:
    print(f'name {i} is {names[i]}')
    i += 1

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


In [157]:
for i, name in enumerate(names):
    print(f'name {i} is {name}')

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


## 2.21 Randomness

In [158]:
import random 

random.seed(10)

four_uniform_randoms = [random.random() for _ in range(4)]
four_uniform_randoms

[0.5714025946899135,
 0.4288890546751146,
 0.5780913011344704,
 0.20609823213950174]

In [159]:
random.seed(10)
print(random.random())

0.5714025946899135


In [162]:
random.randrange(10)

9

In [168]:
random.randrange(3, 6)

5

In [169]:
up_to_ten = [1,2,3,4,5,6,7,8,9,10]
random.shuffle(up_to_ten)
print(up_to_ten)

[4, 5, 6, 7, 2, 9, 10, 8, 1, 3]


In [170]:
my_best_friend = random.choice(['Alice', 'Bob', 'Charlie'])
my_best_friend

'Alice'

In [172]:
lottery_numbers = range(60)
winning_numbers = random.sample(lottery_numbers, 6)
winning_numbers

[43, 16, 29, 11, 59, 19]

In [180]:
four_with_replacement = [random.choice(range(10)) for _ in range(4)]
print(four_with_replacement)

[6, 3, 0, 0]


## 2.22 Regular expressions

In [182]:
import re

re_example = [
    # All of these are True because 
    not re.match('a', 'cat'), 
    # cat doesn't start with a 
    re.search('a', 'cat'),
    # cat has an a in it 
    not re.search('c','dog'),
    # dog doesn't have a c in it 
    3 == len(re.split('[ab]', 'carbs')),
    # split on a or b to c, r, s
    'R-D-' == re.sub('[0-9]', '-', 'R2D2')
    # replace digits with dashes
]

assert all(re_example), "all the regex examples should ne True"

One important thing to note is that re.match checks whether the beginning of a string matches a regular expression, while re.search checks whether any part of a string matches a regular expression.

## 2.23 Functional programming

Avoid partial, map, reduce, and filter. Use list comprehensions, for loops and others

## 2.24 Zip and argument unpacking

The zip function transforms multiple iterables into a single iterable of tuples of corresponding function

In [183]:
list1 = ['a','b','c']
list2 = [1,2,3]
[pair for pair in zip(list1, list2)]

[('a', 1), ('b', 2), ('c', 3)]

In [188]:
list(zip(list1, list2))

[('a', 1), ('b', 2), ('c', 3)]

If the list are different lengths, zip stops as soon as the first list ends 

Unzip using *, the asterisk performs argument unpacking, which uses the elements of pairs as individual arguments to zip

In [184]:
pairs = [('a',1), ('b',2), ('c',3)]
letters, numbers = zip(*pairs)

In [185]:
letters

('a', 'b', 'c')

In [186]:
numbers

(1, 2, 3)

In [189]:
def add(a, b): return a + b
add(1, 2)

3

Use argument unpacking with any function

In [190]:
try:
    add([1,2])
except TypeError:
    print('add expects two inputs')

add expects two inputs


In [191]:
add(*[1, 2])

3

## 2.25 args and kwargs

Example: Let’s say we want to create a higher-order function that takes as input some function f and returns a new function that for any input returns twice the value of f:

In [196]:
def doubler(f):
    # Here we define a new function that keeps a reference to f
    def g(x):
        return 2 * f(x)
    # And return that new function 
    return g

In [197]:
def f1(x):
    return x + 1
g = doubler(f1)
assert g(3) == 8, "(3+1)*2 should equal 8"
assert g(-1) == 0, "(-1+1)*2 should equal 0"

In [198]:
def f2(x, y):
    return x + y
g = doubler(f2)
try: 
    g(1, 2)
except TypeError:
    print("as defined, g only takes one arguement")

as defined, g only takes one arguement


We need a way to specify a function that takes arbitrary arguments. We can do this with argument unpacking and a little bot of magic

In [199]:
def magic(*args, **kwargs):
    print('unnamed args:', args)
    print('keyword args:', kwargs)

In [200]:
magic(1, 2, key = 'word', key2 = 'word2')

unnamed args: (1, 2)
keyword args: {'key': 'word', 'key2': 'word2'}


That is, when we define a function like this, args is a tuple of its unnamed arguments and kwargs is a dict of its named arguments. It works the other way too, if you want to use a list (or tuple) and dict to supply arguments to a function:

In [201]:
def other_way_magic(x, y, z):
    return x + y + z

x_y_list = [1, 2]
z_dict = {'z': 3}
assert other_way_magic(*x_y_list, **z_dict) == 6, "1 + 2 + 3 should be 6"

In [202]:
other_way_magic(*x_y_list, **z_dict)

6

To produce higher-order functions whose inputs can accept arbitrary arguments

In [206]:
def doubler_correct(f):
    '''works no matter what kind of inputs f expects'''
    def g(*args, **kwargs):
        '''whatever arguments g is supplied, pass them through to f'''
        return 2 * f(*args, **kwargs)
    return g
g = doubler_correct(f2)
assert g(1, 2) == 6, "doubler should work now"

In [207]:
g(1, 2)

6

## 2.26 Type annotations

Python is a dynamically typed language 

In [210]:
def add(a, b):
    return a + b

assert add(10, 5) == 15, "+ is valid for numbers"
assert add([1,2],[3]) == [1,2,3], "+ is valid for lists"
assert add('hi ', 'there') == 'hi there', "+ is valid for strings"

In [211]:
try:
    add(10, 'five')
except TypeError:
    print('cannot add an int to a string')

cannot add an int to a string


In [212]:
def add(a: int, b: int) -> int:
    return a + b
add(10, 5)

15

In [214]:
add('hi ', 'there')

'hi there'

1. Types are an important form of documentation
2. There are external tools that will read the code, inspect the type annotation, and let you know about type errors before you ever run the code (for example in mypy)
3. Having to think about the types in your code forces you to design cleaner functions and interfaces
4. Using types allows you editor to help you with things like autocomplete and type errors

In [None]:
from typing import Union

def secretly_ugly_function(value, operation): ...

def ugly_function(value: int,
                  operation: Union[str, int, float, bool]) -> int: ... 

### How to write type annotations

In [215]:
def total(xs: list) -> float:
    return sum(total)

In [219]:
from typing import List

def total(xs: List[float]) -> float:
    return sum(total)

In [228]:
values = []
best_so_far = None

In [229]:
from typing import Optional 

values: List[int] = []
best_so_far: Optional[float] = None # allowed to be either a float or None

In [231]:
type(values)

list

In [232]:
type(best_so_far)

NoneType

In [None]:
# The type annotations in this snippet are all unnecessary 
from typing import Dict, Iterable, Tuple 

# keys are strings, values are ints 
counts: Dict[str, int] = {'data':1, 'science':2}

# lists and generators are both iterable 
if lazy:
    evens: Iterable[int] = (x for x in range(10) if x % 2 == 0)
else:
    evens = [0,2,4,6,8]

# tuples specify a type for each element 
triple: Tuple[int, float, int] = (10, 2.3, 5)

In [234]:
from typing import Callable 

# The type hint says that repeater is a function that takes two arguments 
# a string and an int, and returns a string 

def twice(repeater: Callable[[str, int], str], s: str) -> str:
    return repeater(s, 2)

def comma_repeater(s: str, n: int) -> str:
    n_copies = [s for _ in range(n)]
    return ', '.join(n_copies)

assert twice(comma_repeater, 'type hints') == 'type hints, type hints'

In [235]:
Number = int
Numbers = List[Number]

def total(xs: Numbers) -> Number:
    return sum(xs)

## 2.27 Welcome to DataSciencester!

## 2.28 For further exploration