## Zip up

### How zip works

In a simple for loop we have an iterator `it`

In [45]:
it = range(5)
for element in it:
    print(element)

0
1
2
3
4


An "iterator" is something that can be traversed linearly, like a list or a string.

Sometimes you will have two iterators with related information  and we need to loop over those iterators to do something. Check this example:

In [46]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith", "Johnson"]

for i in range(len(firsts)):
    print(f"{firsts[i]} {lasts[i]}")

John Doe
Jane Smith
Jack Johnson


This is what `zip` is for: use a pair up iterables that you want to traverse at the same time.

In [47]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith", "Johnson"]

for first, last in zip(firsts, lasts):
    print(f"{first} {last}")

John Doe
Jane Smith
Jack Johnson


We ar doin an unpacking assignment because zip actually returns tuples.

In [48]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith", "Johnson"]

for z in zip(firsts, lasts):
    print(z)

('John', 'Doe')
('Jane', 'Smith')
('Jack', 'Johnson')


### Zip is lazy

`zip` does not create tuples immediatly. `zip` is lazy, meaning it generates tuples on the fly when you iterate over it, for example when you iterate over them in a `for` loop or when you convert it to a list.

In [49]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith", "Johnson", "Davis"]
z = zip(firsts, lasts)
print(z)
print(list(z))

<zip object at 0x10a5e47c0>
[('John', 'Doe'), ('Jane', 'Smith'), ('Jack', 'Johnson')]


`zip` being lazy means that by itself is not that similar to a list. For example, you cannot ask for the length of a zip object.

In [50]:
len(z)

TypeError: object of type 'zip' has no len()

### Three is a crowd

`zip` can take three or more iterables and return a tuple of the same length as the shortest iterable.

In [11]:
firsts = ["John", "Jane", "Jack"]
middles = ["Z.", "A.", "C."]
lasts = ["Doe", "Smith", "Johnson"]

for z in zip(firsts, middles, lasts):
    print(z)

('John', 'Z.', 'Doe')
('Jane', 'A.', 'Smith')
('Jack', 'C.', 'Johnson')


### Mismatched lengths

If `zip`'s arguments have different lengths, it will stop as soon as it hits the end of the shortest iterable.

In [None]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith"]

for z in zip(firsts, lasts):
    print(z)

Starting with Python 3.10, `zip` will be able to receive a keyword argument `strict` to error out if the iterables have different lengths.

In [None]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith", "Johnson", "Davis"]

for z in zip(firsts, lasts, strict=True):
    print(z)

`zip` only errors when finds the length mismatch, not when it's about to start iterating over the longer iterable; this is because the arguments to `zip` are lazy iterators.

In general, `zip` is used with iterators that are expected to have the same lenght.  If that is the case is a good idea to always set `strict=True` to catch bugs in your code.

### Create a dictionary with zip

You can create dictioneries by feeding key-value pairs to the dict function, which means `zip` can be used to create dictionaries from two lists.

In [15]:
firsts = ["John", "Jane", "Jack"]
lasts = ["Doe", "Smith", "Johnson"]

dictionary = dict(zip(firsts, lasts))
print(dictionary)

{'John': 'Doe', 'Jane': 'Smith', 'Jack': 'Johnson'}


## Enumerate me

### How enumerate works

Python newcomers are usually exposed to this type of `for` loop very early.

In [16]:
for i in range(3):
    print(i)

0
1
2


This leads them to "learning" this anti-pattern of `for` loops to go over a list:

In [18]:
words = ["apple", "banana", "cherry"]
for i in range(len(words)):
    print(f"'{words[i]}' has {len(words[i])} characters.")

'apple' has 5 characters.
'banana' has 6 characters.
'cherry' has 6 characters.


The pythonic way of writing such a loop is iterating directly over the list:

In [19]:
words = ["apple", "banana", "cherry"]
for word in words:
    print(f"'{word}' has {len(word)} characters.")

'apple' has 5 characters.
'banana' has 6 characters.
'cherry' has 6 characters.


However, the final step in this indices vs elements comes when yu need to know the index of each element as well. For this, `enumerate` is your friend.

In [21]:
words = ["apple", "banana", "cherry"]
for i, word in enumerate(words):
    print(f"Word #{i}: '{word}' has {len(word)} characters.")

Word #0: 'apple' has 5 characters.
Word #1: 'banana' has 6 characters.
Word #2: 'cherry' has 6 characters.


### Optional `start` argument

The `enumerate` function can also accept an optional `start` argument. This argument specifies the starting index for the enumeration.

In [22]:
words = ["apple", "banana", "cherry"]
for i, word in enumerate(words, start=1):
    print(f"Word #{i}: '{word}' has {len(word)} characters.")

Word #1: 'apple' has 5 characters.
Word #2: 'banana' has 6 characters.
Word #3: 'cherry' has 6 characters.


This optional `start` argument is useful when you want to start the enumeration from a specific index.

By the way, the argument has to be an integer but can be negative.

In [23]:
for i, v in enumerate("abc", start=-4000):
    print(f"Index: {i}, Value: {v}")

Index: -4000, Value: a
Index: -3999, Value: b
Index: -3998, Value: c


### Unpacking when iterating

The `enumerate` function returns a lazy iterator, which means the items you iterate only become available as you need them. This can be useful when you want to process large amounts of data without consuming too much memory.

The items that `enumerate` returns are tuples, where the first element is the index and the second element is the value.

In [24]:
for tuple in enumerate("abc"):
    print(tuple)

(0, 'a')
(1, 'b')
(2, 'c')


### Deep unpacking

Things can get more interesting when you use `enumerate`, for example, on a `zip`:

In [25]:
pages = [5, 17, 32, 50]
for i, (start, end) in enumerate(zip(pages, pages[1:]), start=1):
    print(f"Chapter {i}: {end - start} pages long.")

Chapter 1: 12 pages long.
Chapter 2: 15 pages long.
Chapter 3: 18 pages long.


This snippet takes a list of pages where chapters of a book start and prints the length of each chapter. Notice how `enumerate` returns tuples with indices and values, but those values are extracted from a `zip`, which iteself returns tuples.

We use deep unpacking to access all the values directly.

## Chaining comparison operators

### Chaining of comparison operators

One excelent feature of Python is its ability to chain comparison operators. This can make your code more readable and easier to understand. Check this snippet that looks natural:

In [26]:
a = 1
b = 2
c = 3
if a < b < c:
    print("The numbers are in ascending order.")

The numbers are in ascending order.


When Python sees two comparisons in a row, like `a < b < c`, it behaves as if you wrote `a < b and b < c`, except that `b` is only evaluated once (which is relevant if `b` is an expression like a function call).

Another example usage is for when you want to make sure that three values are the same:

In [29]:
a = b = 1
c = 2

if a == b == c:
    print("The numbers are the same.")
else:
    print("The numbers are different.")

The numbers are different.


You can chain any arbitrary number of comparison operators. For example `a < b < c < d` would check if `a < b`, `b < c`, and `c < d`.

### Pitfalls

#### Non-transitive operators

We can use `a == b == c` to check if three variables are equal, but this won't work for non-transitive operators like `!=` or `<>`.

In [30]:
a = c = 1
b = 2

if a != b != c:
    print("a, b, and c all diferent:", a, b, c)

a, b, and c all diferent: 1 2 1


The problem is that the check is `a != b` and `b!= c`, which checks that `b` is different from both `a` and `c`, but says nothing about whether `a` is different from `c`.

This is because `!=` is a non-transitive operator, i.e., knowing how `a` relates to `b` and knowing how `b` relates to `c` doesn't tell you anything about how `a` relates to `c`.

#### Non-constant expressions or side-effects

Chaining comparisons like `a < b < c` evaluates `b` only once.

If `b` contains an expression with side-effects or if it's something that is not a constant, then the two expressions are not equivalent. Check this example which the element in the middle gets evaluated only once:

In [31]:
def f():
    print("hey")
    return 3

if 1 < f() < 5:
    print("done")

hey
done


Just to corroborate, that this will get evaluate `f()` twice:

In [None]:
if 1 < f() and f() < 5:
    print("done")

This snippet shows that an expression like `1 < f() < 0` can actually evaluate to `True` when its unfolded:

In [35]:
l = [-2, 2]

def f():
    global l
    l = l[::-1]  # Reverse the list
    return l[0]

# evaluated once f() = 2
# if 1 < f() < 0:
#     print("ehh")  # Never gets printed

# evaluated twice: first time f() = 2, second time f() = -2
if 1 < f() and f() < 0:
    print("ehh 2")  # gets printed

ehh 2


#### Ugly chains

This feature looks neat, but some chains where operatos are not aligned look very ugly, so thes chains look good:

In [None]:
a == b == c
a < b <= c
a <= b < c

but this chains look really ugly:

In [None]:
a < b > c   # it's better to use b > max(a, c), it's more readable and easier to understand
a <= b > c
a < b >= c

Now there are some other chains that are just confusing:

In [None]:
lst = []
a < b is True
a == b in lst
a in lst is True

In Python, `is`, `is not`, `in` and `not in` are comparison operators, so you can also chain them. But this creates weird situations like

In [1]:
a = 3
lst = [3, 5]
if a in lst == True:
    print("is True")
else:
    print("is False")

is False


Here is a break down of what this does:
- `a in lst == True` is equivalent to `a in lst` and `lst == True`
- `a in lst` is `True`, but
- `lst == True` is `False`, so
-  `a in lst == True` unfolds to `True and False`, which is `False`

## Truthy, Falsy, and bool

### "Truthy" and "Falsy"

Any object can be tested for truth value, for use in an `if` or `while` condition or as operand of the booleans operations `and`, `or`, and `not`.

In [5]:
5 > 3

True

The next step is using an object that is not boolean value, ex:

In [6]:
lst = [1, 2, 3]
if lst:
    print(lst)

[1, 2, 3]


How can we now if an object is truthy or falsy? The answer is by using the built-in `bool` function.

In [7]:
bool(lst)

True

A value of a given type is Falsy when it is "empty" or "without any useful value". Examples of Falsy values are: empty list, empty string, empty tiple, empty set, empty dictionary, the number 0, the boolean value `False`, and `None`.
- by defaault any object is Truthy
- an object is Falsy if calling `len` on it returns `0`

### The `__bool__` dunder method

An object has a Falsy vale if it defines a `__bool__` method that returns `False`.

`__bool__` is a dunder method that you can use to tell your objects if they are considered "truthy" or "falsy", by implementing it in your class.

In [1]:
class A:
    def __bool__(self):
        return False
    
a = A()
if a:
    print("Go Away!")

When given an arbitrary Python object that needs to be tested fort a truth value, Python first tries to call `bool` on it, in an attempt to use its `__bool__` method.. If the object does not implement a `__bool__` method, then Python tries to call `len` on it. Finally, if that also fails, Python defaults to giving a Truthy value to the object.

### Remarks

#### A note about containers with falsy objects

Things like a list htat only contains zeroes or a dictionary composed of zeroes and empty lists are not Falsy, because the containers themselves are not longer empty:

In [13]:
# These are false
print(bool([]))
print(bool({}))
print(bool(0))

# These are true
print(bool([0]))
print(bool({0: []}))

False
False
False
True
True


#### A note about checking for `None`

Imagine someone implemented the following function to return the integer square root of a number, returning `None` for negative inputs (because negative numbers don't have square roots). 

When you use this function, you know it returns `None` if the computation fails, so you might be tempted to use it like this:

In [18]:
import math

def int_square_root(n):
    if n < 0:
        return None
    return math.floor(math.sqrt(n))

n = int(input("Enter a number: "))
int_sqrt = int_square_root(n)

print("debug int_sqrt value: ", int_sqrt)

if not int_sqrt:
    print("Negative numbers do not have an integer square root.")
else:
    print(int_sqrt)

ValueError: invalid literal for int() with base 10: '0.5'

The problem is that `int_square_root` returned  meaningful value which is `0`  but that value is still Falsy.

So when you want to check fi a function returned `None` do not rely on the Truthy/Falsy value. Instead check explicitly if the returned value is `None`.

In [None]:
returned = ""

# Use
if returned is None:
    pass

if returned is not None:
    pass

# Avoid
if not returned:
    pass

if returned:
    pass

## Boolean short-circuiting

### Return values of the `and` and `or` operators

`x or y` returns `x` if `x` is `True`, otherwise it returns `y`. This is equivalent to the expression `(x or y) == (y if not x else x)`.

In [19]:
if 3 or 5:
    print("Yeah")
else:
    print("Nope")

Yeah


Now look at the program below and see what it prints:

In [20]:
print(3 or 5)

3


A similar thing happens with `and`. `x and y` returns `x` if `x` is `False`, otherwise it returns `y`. This is equivalent to the expression `(x and y) == (x if not x else y)`.

In [24]:
print(False and True)
print(True and 0)

False
0


### Short-circuiting

This is what short-circuting is: not evaluating the whole expression (stopping short of evaluating it) if we already have enough information to determine the result.

#### or

##### False ory

`or` evaluatest to `True` if any of its operands is truthy. If the left operand to `or` is `False` the the `or` operator hast to look to its right operand in order to determine the result.

In [1]:
y = 5  # truthy value
if False or y:
    print("Got in!, y = ", y)
else:
    print("Didn't get in...")

Got in!, y =  5


In [2]:
y = []  # falsy value
if False or y:
    print("Got in!, y = ", y)
else:
    print("Didn't get in second...")

Didn't get in second...


##### True ory

On the other hand, if the left operand to `or` is `True`, we do not need to take a look at `y` because the result will be `True`.

Let's create a simple function that return its argument unchanged but that produces a side-effect of printing something in the screen, then we can use it to take a look at the things that Python evaluates when trying to determin the vale of `x or y`: 

In [4]:
def print_and_return(x):
    print(f"Inside `print_and_return` with x = {x}")
    return x

print(print_and_return(False) or print_and_return(3))
print(print_and_return(True) or print_and_return(3))

Inside `print_and_return` with x = False
Inside `print_and_return` with x = 3
3
Inside `print_and_return` with x = True
True


Notices that, in the second example, `print_and_return` only did one print because it never reached the `print_and_return(3)`

##### Short-circuiting of `or` expressions 

Now we tie everything together. If the left operand to `or` is `False` or falsy, we know that `or` has to look to its right operand and will, therefore, return the vale of its right operand after evaluating it. On the other hand, if the left operand is `True` or truthy, `or` will return the value of the left operand without even evaluating the right operand.

#### and

##### False andy

`and` gives `True` if both operands are `True`. Therefore, if we have an expression like

In [10]:
val = False and y
print(val)

False


do we need to know what `y` is in order to figure out what `val` is? no, we do not, because regardless of wether `y` is `True` or `False`, `val` is always `False`:

In [11]:
print(False and True)
print(False and False)

False
False


If we take the `False` and `y` expressions from this example and compare them with the `if` expression we wrote earlier which was

    `(x and y) == (x if not x else y)`

we see that, in this case, `x` was substituted by `False`, and, therefore, we have
    
    `(False and y) == (False if not False else y)`

Now, the condition inside that `if` expresion reads

    `not False`

which we know evaluates to `True`, meaning that the `if` expression never returns `y`

In [20]:
print(print_and_return([]) and print_and_return(True))  # [] is falsy
print(print_and_return(0) and print_and_return(True))  # 0 is falsy
print(print_and_return({}) and print_and_return(True))  # {} is falsy
print(print_and_return(0) and print_and_return(0))  # both are falsy, but only the left matters

Inside `print_and_return` with x = []
[]
Inside `print_and_return` with x = 0
0
Inside `print_and_return` with x = {}
{}
Inside `print_and_return` with x = 0
0


##### True andy

If we evaluate `True and y`, we figure out that the result of such an expression is always the value of `y`, because the left operand being `True`, or any other truthy value, doesn't give `and` enough information.

##### Short-circuiting of `and` expressions 

To tie everything together. If the left operand to `and` is `False` or falsy, we know the expression returns the value of the left operand regardles of the right operand, and therefore we do not even evaluate the right operand. On the other hand, if the left operand to `and` is `True`, then `and` will evaluate the righ operand and return its value.

### Short-circuiting in plain English

Instead of memorising rules about what sides get evaluated when, just remember that both `and` and `or` will evaluate as many operands as needed to determine the overall Boolean result, and will then return the value of the last side that they evaluated.

As inmediate conclusion, the left operand is always evaluated, as you might imagine.


### `all` and `any`

The built-in functions `all` and `any` also short-circuit, as they are simple extensions of the behaviours provided by the `and` and `or` operators.

`all` wants to make sure that all the values of its argument are truthy, so as soon as it finds a falsy value, it knows it's game over. The docs says `all` is equivalent to this:

In [22]:
def custom_all(iterable):
    for element in iterable:
        if not element:
            return False
    return True

my_list = [True, True, True, True]
my_list_2 = [True, True, False, True]

print(all(my_list))  # True
print(custom_all([]))  # True

print(all(my_list_2))  # False
print(custom_all(my_list_2))  # False

True
True
False
False


Similarly, `any` is going to fo its best to look for some value that is truthy. Therefore, as soon as it fins one, `any` knows it has achieved its goal. Something similar to this:

In [23]:
def custom_any(iterable):
    for element in iterable:
        if element:
            return True
    return False

my_list = [False, False, False, False]
my_list_2 = [False, False, True, False]

print(any(my_list))  # False
print(custom_any(my_list))  # False

print(any(my_list_2))  # True
print(custom_any(my_list_2))  # True

False
False
True
True


### Short-circuiting in chained comparisons

Comparisons operators can be chained arbitrarily, and those are almost equivalent to a series of comparisons separated with `and`, except that the subexpressions are only evaluated once, to prevent wasting resources. Therefore, because we are also using an `and` in the background, chained comparisons are also short-circuiting.

In [24]:
## 1 > 2 is False, so there is no need to evaluate the right side.
print_and_return(1) > print_and_return(2) > print_and_return(3)

Inside `print_and_return` with x = 1
Inside `print_and_return` with x = 2


False

### Examples in code

#### Short-circuit to save time

This does not work but it shows that you should check for the simplier operand first.

In [None]:
import timeit


setup = ""
import re
s = b"a"*1000 + b"*"

validate = False
print(timeit.timeit("validate and not re.fullmatch(b'[A-Za-z0-9+/]*=[0,2]', s)", setup))

#### Short-circuit to flatten `if` statements

In [None]:
if validate:
    print("Validating...")
    if re.fullmatch(b'[A-Za-z0-9+/]*=[0,2]', s):
        print("Valid!")
    else:
        print("Not valid!")

It's best to use a single ìf statement instead of a chain of `and` and `or` operators.

In [None]:
if validate and re.fullmatch(b'[A-Za-z0-9+/]*=[0,2]', s):
    print("Valid!")

##### Checking preconditions before expression

In [None]:
def set_terminator(self, term):
    if isinstance(term, str) and self.use_encoding:
        term = bytes(term, self.encoding)
    elif isinstance(term, int) and term < 0:
        raise ValueError("Terminator must be a non-negative integer")


#### Define default values

In [31]:
greet = input("Type your name >> ") or "Guest"
print(f"Hello, {greet}!")

Hello, Chris!


#### Find witnesses in a sequence of items

In [32]:
items = [14, 16, 18, 20, 35, 41, 100]
any_found = False

for item in items:
    any_found = item % 2
    if any_found:
        print(f"Found an odd number: {item}")
        break

Found an odd number: 35


Look as this neat simplified version:

In [33]:
items = [14, 16, 18, 20, 35, 41, 100]
is_odd = lambda x: x % 2

if any(is_odd(witness := item) for item in items):
    print(f"Found an odd number: {witness}")

Found an odd number: 35


## set and frozenset

### (Mathematical) sets

A set is simply a collection of unique items where order does not matter. Think a set as a shopping cart.

### No ordering

If you go shopping, the order of the items you put in your shopping cart does not matter. The only thing that matters is the items that are in the cart.

You could say that the groceries that you bought form a set.

Both in maths and in Python, we use `{}` to denote a set. 

In [9]:
groceries = {"milk", "bread", "cheese", "milk"}
print(groceries)
print(type(groceries))
print(type(groceries).__name__)

{'bread', 'milk', 'cheese'}
<class 'set'>
set


To make sure that the order really does not matter in sets, we can compare this set with other sets containing the same elements but in a different order.

In [11]:
print(groceries == {"cheese", "bread", "milk"})
print(groceries == {"bread", "milk", "cheese"})

True
True


### Uniqueness

Another key property of (mathematical) sets is that there are no duplicate elements.

Think as someone told you to buy cheese, and when you go back home, someone asks you: "Did you buy cheese?" This is a yes/no question, either you bought cheese or you didn't.

For sets, the same thing happens: the element is either in the set or it's not. We don't care about element count. We don't even consider it.

In [12]:
groceries = {"apple", "banana", "apple", "milk", "milk", "milk"}
print(groceries)

{'milk', 'apple', 'banana'}


### (Common) Operations on sets

#### Creation

There are three main ways to create a set.

##### Explicit {} notation

Using the `{}` notation, you write out the elements inside the set in a comma-separated list.

In [19]:
numbers = {1, 2, 3}
letters = {"a", "b", "c"}
print(numbers)
print(letters)

{1, 2, 3}
{'b', 'c', 'a'}


By the way, you cannot use `{}` to create an empty set! `{}` by itself will create an empty dictionary. To create an empty set, you need the next method.

##### Calling set on an iterable

You can call the built-in functon `set()` on any iterable to create a set out of the elements of that iterable. Like range, strings, and lists.

In [22]:
print(set(range(3)))
print(set([73, "water", 42]))

{0, 1, 2}
{73, 'water', 42}


Calling `set` on a string produces a set with the characters of the string, not a set containing the whole string.

In [23]:
place = "mississippi"
print(place)
print(set(place))

mississippi
{'i', 'm', 'p', 's'}


Calling `set` by itself produces an empty set.

In [24]:
my_set = set()
print(my_set)

set()


##### Set comprehensions

Using `{}`, one can also write what's called a set comprehension. Very similar to list comprehensions, but for sets.

In [25]:
veggies = ["broccoli", "carrot", "spinach", "lettuce", "pepper", "tomato"]
veggies_set = {veggie for veggie in veggies if "c" in veggie}
print(veggies_set)

{'carrot', 'spinach', 'broccoli', 'lettuce'}


Secondly, a ser of comprehension with two nested for loops.

In [26]:
veggies = ["broccoli", "carrot", "spinach", "lettuce", "pepper", "tomato"]
print({char for veggie in veggies for char in veggie})

{'p', 't', 'o', 'r', 'n', 'c', 'a', 'i', 'm', 'b', 'e', 's', 'l', 'h', 'u'}


#### Operations on a single set

Many common operations are done on with a single set, namely:

- membership testing

In [1]:
groceries = {"broccoli", "carrot", "spinach", "lettuce", "pepper", "tomato"}
print("broccoli" in groceries)  # True
print("cucumber" in groceries)  # False

True
False


- computing the size of a set

In [2]:
len(groceries)  # 6

6

- popping a random element from a set

In [4]:
print(groceries.pop())
print(groceries)

tomato
['broccoli', 'carrot', 'spinach', 'lettuce', 'pepper']


- adding an element to a set

In [7]:
groceries = {"broccoli", "carrot", "spinach", "lettuce", "pepper", "tomato"}
groceries.add("zucchini")
print(groceries)

{'pepper', 'lettuce', 'carrot', 'broccoli', 'zucchini', 'spinach', 'tomato'}


#### Iteration

Sets are similar to lists with unique elements, but lists are ordered: a list can be traversed from the beginning to the end, and a list can be indexed.

Sets can also be iterated over (in an order you can't rely on)

In [11]:
for item in groceries:
    print(item)

pepper
lettuce
carrot
broccoli
zucchini
spinach
tomato


Sets cannot be indexed directly:

In [12]:
print(groceries[0])  # This will raise an error

TypeError: 'set' object is not subscriptable

#### Computation with multiple sets

When having multiple sets you may need to do other sorts of operations. Here are some common ones:

- check for overlap between two sets

In [15]:
groceries = {"milk", "bread", "cheese"}
treats = {"cake", "ice cream", "cookies", "cheese"}

print(groceries & treats)

{'cheese'}


- join the two sets. Here the pipe is similar to the usage of | to merge dictionaries.

In [16]:
print(groceries | treats)

{'cake', 'ice cream', 'cookies', 'milk', 'cheese', 'bread'}


- find differences between two sets (what's on the left set but not on the right set)

In [17]:
print(groceries - treats)

{'milk', 'bread'}


- check for containment using <, <=, >= and >

In [27]:
print({"cheese", "milk"} < groceries)
print(groceries < groceries)
print({"cheese", "milk"} <= groceries)
print(groceries <= groceries)
print(treats > {"cake"})
print(treats >= {"cake", "cheese"})

True
False
True
True
True
True


### Differences between `set` and `frozenset`

#### Creation

While you can create a set with the built-in `set()` function or through the `{}` notation, `frozenset` can only be created with the built-in `frozenset()` function.

`frozenset` can be created out of other sets or out of any iterables.

When printed, `frozenset` display the indication that they are frozen.

In [29]:
groceries = {"milk", "bread", "cheese"}
print(frozenset(groceries))
print(frozenset([73, "water", 42]))

frozenset({'bread', 'cheese', 'milk'})
frozenset({'water', 73, 42})


#### Mutability

Sets are mutable. They can be changed after they are created. You can add, remove, or change elements.

If you need to create an object that behaves like a set but is immutable, you can use `frozenset`.

A `frozenset` is an instance of a `set`except that it cannot be changed after it is created.

In [32]:
groceries_ = frozenset({"milk", "bread", "cheese"})
# groceries_.add("zucchini")  # This will raise an error
groceries_.pop()

AttributeError: 'frozenset' object has no attribute 'pop'

There's a very similar pair of built-in types that have this same dichotomy: lists and tuples.

Lists are mutable, and tuples are immutable.

In [38]:
my_list = ["apple", "banana", "apple", "milk", "milk", "milk"]
print(my_list[0])
print(my_list.pop())
my_list.append("zucchini")
print(my_list)

my_tuple = ("apple", "banana", "apple", "milk", "milk", "milk")
my_tuple.pop()  # This will raise an error

apple
milk
['apple', 'banana', 'apple', 'milk', 'milk', 'zucchini']


AttributeError: 'tuple' object has no attribute 'pop'

### To be (hashable) or not to be

An object that is hashable is an object for wich hash can be computed.

A has is an integer that the built-in function `hash()` computes to help with fast operations with fast operations with dictionaries, e.g. key lookups.

The built-in function `hash` dictates what can and annot be a key dictionary: if it's hashable it can be if not, it can't.

Lists are mutable and unhashable, while tuples are immutable and hashable.

In [42]:
dictionary = {}
# dictionary[[1, 2, 3]] = 73  # This will raise an error
dictionary[(1, 2, 3)] = 73
print(dictionary)

{(1, 2, 3): 73}


Something similar occurs with sets and frozensets.

In [44]:
dictionary = {}
# dictionary[{1, 2, 3}] = 73  # This will raise an error
dictionary[frozenset({1, 2, 3})] = 73
print(dictionary)

{frozenset({1, 2, 3}): 73}


### What are sets used for?

Sets are useful when the problems at hand would inherit from mathematical sets:
- membership testing
- uniqueness

Doing this in sets are much faster than lists.

### Examples

Never do this, this is an anti-pattern

In [51]:
seen_actions = set()
action = "test"

if action not in seen_actions:
    seen_actions.add(action)    

Checking if an element is inside a set or adding unconditionally is almost the same work, so this is doubling your work!

### Conclusion

- Use (frozen)set when youa re dealing with collections and where what matters is fast memebership checking.

## List comprehensions 101

### What is a list comprehension?

A list comprehension is a Python expression that builds a list.



### A loop that builds a list

Consider the next loop. It builds a list called `squares` which contains the first square of numbers:

In [53]:
squares = []
for num in range(10):
    squares.append(num**2)
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


The key idea behind list comprehensions is that many list can be built out of other, simpler iterables (lists, tuples, strings) by transforming the data that we get from those iterables. In such cases, we want to focus on the data transformation that we are doing.

In [54]:
squares_comprehension= [num**2 for num in range(10)]
print(squares_comprehension)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In this case we are dropping:
- the initialisation of the list (`squares = []`)
- the call to append (`squares.append(...)`)

### Exercises: practice rewriting `for` loops as list of comprehensions

1. Compute the first square numbers

In [57]:
squares = []
for n in range(10):
    squares.append(n ** 2)
print(squares)

squares_comp = [n ** 2 for n in range(10)]
print(squares_comp)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


2. Uppercasing a series of words:

In [59]:
fruits = "banana pear peach strawberry tomato".split()
upper_words = []
for fruit in fruits:
    upper_words.append(fruit.upper())
print(upper_words)

upper_comp = [fruit.upper() for fruit in fruits]
print(upper_comp)

['BANANA', 'PEAR', 'PEACH', 'STRAWBERRY', 'TOMATO']
['BANANA', 'PEAR', 'PEACH', 'STRAWBERRY', 'TOMATO']


3. Find the length of each word in a sentence:

In [3]:
words = "the quick brown fox jumps over the lazy dog".split()
lengths = [len(word) for word in words]
print(lengths)

[3, 5, 5, 3, 5, 4, 3, 4, 3]


### Filtering data in a list comprehension

List comprehensions allow you to filter data so that the new list only transforms some of the data that comes from the source iterable.

In [8]:
square_list = []
for number in range(1, 10):
    if (number % 3 == 0) or (number % 5 == 0):
        square_list.append(number**2)
print(square_list)

square_list_comprehension = [
    number ** 2 
    for number in range(1, 10) 
    if (number % 3 == 0) or (number % 5 == 0)
]
print(square_list_comprehension)

[9, 25, 36, 81]
[9, 25, 36, 81]


### More exercises

1. Squaring

In [11]:
fizz_buzz_squares = []
for n in range(10):
    if (n % 3 == 0) or (n % 5 == 0):
        fizz_buzz_squares.append(n ** 2)
print(fizz_buzz_squares)

# List comprehension
fb_list_comp = [
    n ** 2 
    for n in range(10) 
    if (n % 3 == 0) or (n % 5 == 0)
]
print(fb_list_comp)

[0, 9, 25, 36, 81]
[0, 9, 25, 36, 81]


2. Upper cassing words

In [15]:
fruits = "Banana pear PEACH strawberry tomato".split()
upper_cased = []
for fruit in fruits:
    if (fruit.islower()):
        upper_cased.append(fruit.upper())
print(upper_cased)

# List comprehension
fruit_list_comp = [
    fruit.upper()
    for fruit in fruits
    if fruit.islower()
]
print(fruit_list_comp)

['PEAR', 'STRAWBERRY', 'TOMATO']
['PEAR', 'STRAWBERRY', 'TOMATO']


3. Finding length of words

In [17]:
words = "the quick brown fox jumps over the lazy dog".split()
lengths = []
for word in words:
    if "o" in word:
        lengths.append(len(word))
print(lengths)

lengths_list_comp = [
    len(word)
    for word in words
    if "o" in word
]
print(lengths_list_comp)

[5, 3, 4, 3]
[5, 3, 4, 3]


### Full anatomy of a list comprehension

The anatomy of a list comprehension is dictated by 3 components enclosed in square brackets:
1. a data transformation
2. a data source
3. a data filter (optional)

In [None]:
ns = []
my_list = [
    n ** 2       # data transformation
    for n in ns  # data source
    if n == 0    # data filter
]

There's no restriction on the number of data sources or data filters in a list comprehension

In [None]:
my_list = []
for it2 in it1:
    for _ in it3:
        if p1(it2):
            for it4 in it2:
                for v1 in it4:
                    if p2(v1):
                        if p3(it4):
                            for it6 in it5:
                                for v2, it7 in it6:
                                    for v3, it8m, it9 in it7:
                                        if p2(v1):
                                            for v4, v5 in zip(it8, it9):
                                                my_list.append(func(v1, v2, v3, v4, v5))

This is the transformation to list comprehension

In [None]:
my_list = [
    func(v1, v2, v3, v4, v5)
    for it2 in it1
    for _ in it3
    if p1(it2)
    for it4 in it2
    for v1 in it4
    if p2(v1)
    if p3(it4)
    for it6 in it5
    for v2, it7 in it6
    for v3, it8m, it9 in it7
    if p2(v1)
    for v4, v5 in zip(it8, it9)
]

List comprehensions should be kept simple. For most people, that's just one data source and one data filter (one `for` and one `if`) or two data sources and no data filters (two `for` and zero `if`)

### Advantages of list comprehensions

The main advantages over nested structures are:
- speed
- conciseness
- purity
- readability

Keep in mind **the main advantage is readability**

### Bad use cases

#### Initialising another list

In [None]:
squares = []
[squares.append(num ** 2) for num in range(10)]

Here we are creating an empty list and we're appending to it from inside another list... Totally not the point.

Also we created a list not assigned to any variable, so we are creating a list an wasting it.

In [18]:
squares = []
some_list = [squares.append(num ** 2) for num in range(10)]
print(some_list)

[None, None, None, None, None, None, None, None, None, None]


We get a bunch of `None` because that's the return value of method `append`

#### Side effects

In [20]:
numbers = range(10)
[print(value for value in numbers)]

<generator object <genexpr> at 0x110546c80>


[None]

Again, this does things even if you don't assign the list to a variable, for this is better a normal for loop

In [21]:
numbers = range(10)
for value in numbers:
    print(value)

0
1
2
3
4
5
6
7
8
9


#### Replacing built-ins

In [23]:
numbers = range(10)
print(numbers)
my_list = [value for value in numbers]
print(my_list)

range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


This looks like a perfect list but the code is equivalent to `list`

In [24]:
my_other_list = list(numbers)
print(my_other_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Another built-in you may end up reinventing is `reversed`.

## Sequence indexing

### Introduction

This is using integers to index linear sequences. 

A very simple example is a string. 

To index a specific character we use square brackets. Python is 0-indexed, so the first character is at index 0.

In [3]:
s = "Indexing is easy!"
print(s[0])
print(s[1])

I
n


### Maximum legal index and index errors

Because indices start at 0, the maximum legal index is the length of the sequence minus 1.

In [6]:
s = "Indexing is easy!"
print(len(s))
print(s[16])
print(s[17])  # This will raise an IndexError

17
!


IndexError: string index out of range

### Negative indices

If the last legal index is the length minus 1, then there is an obvious way to access the last element.

In [9]:
s = "Indexing is easy!"
print(s[len(s)-1])

!


However, Python provides this feature where you can use negative indices to count from the end of the sequence. Thik about writing the sequence to the left of itself:
 
|e |a |s |y |! |I|n|d|e|x|i|n|g| |i|
|- |- |- |- |- |-|-|-|-|-|-|-|-|-|-|
|-5|-4|-3|-2|-1|0|1|2|3|4|5|6|7|8|9|

From this figure we can see that the index -1 refers to the last element, -2 to the second last, and so on.

In [10]:
s = "Indexing is easy!"
print(s[-1])
print(s[-2])

!
y


Nother way to look at negative indices is to pretend there's a `len(s)` to their left.

In [13]:
s = "Indexing is easy!"

print(s[len(s)-1])
print(s[-1])

print(s[len(s)-5])
print(s[-5])

!
!
e
e


### Indexing idioms

Having seen the basic syntax for indexinf, there are a coupe of indices that would be helpful if you were able to read them immediatly for what they are, without having to think about them:

In [None]:
s = "Indexing is easy!"
s[0]  # First element of s
s[1]  # Second element of s
s[-1]  # Last element of s
s[-2]  # Second-to-last element of s

### To index or not to index?

Strings, lists and tuples are indexable with integers. Sets and dictionaries are not.

Be careful of thinhs that you think are like a list, but aren't. These include `enumerate`, `zip`, `map` and other objexts. None of these are indexable, none ofthes have a `len` value.

In [17]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

e = enumerate(numbers)
# print(e[3])

z = zip(numbers)
# print(z[3])

m = map(str, numbers)
# print(m[3])

TypeError: 'map' object is not subscriptable

### Best practices in code

#### A looping pattern with `range`

Because of the way both `range` and `len` work, one can understand that `range(len(s))` will generate all the legal indices for `s`.

In [19]:
s = "Indexing is easy!"
print(list(range(len(s))))
print(s[0])
print(s[16])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
I
!


Thes can lead to an anti-pattern for beginners. To exemplify this, suppose we want to wrtie a program to find unique characters in a string. This is the anti patter:

In [23]:
s = "Indexing is easy!"
uniques = []
for index in range(len(s)):
    if s[index] not in uniques:
        uniques.append(s[index])

print(uniques)

['I', 'n', 'd', 'e', 'x', 'i', 'g', ' ', 's', 'a', 'y', '!']


The better soultion is to use a set for more efficient implementation:

In [22]:
s = "Indexing is easy!"
print(set(s))

{'!', 'd', 'x', 'y', 'e', 'a', 'n', 'I', 'i', ' ', 'g', 's'}


Another way to do it's not using range because of Python's slicing:

In [24]:
s = "Indexing is easy!"
uniques = []
for letter in s:
    if letter not in uniques:
        uniques.append(letter)

print(uniques)

['I', 'n', 'd', 'e', 'x', 'i', 'g', ' ', 's', 'a', 'y', '!']


If you care about the indices, then use:

In [27]:
s = "Indexing is easy!"
print(list(range(len(s))))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]


When you need to work with indices and values:

In [28]:
s = "Indexing is easy!"
print(list(enumerate(s)))

[(0, 'I'), (1, 'n'), (2, 'd'), (3, 'e'), (4, 'x'), (5, 'i'), (6, 'n'), (7, 'g'), (8, ' '), (9, 'i'), (10, 's'), (11, ' '), (12, 'e'), (13, 'a'), (14, 's'), (15, 'y'), (16, '!')]


#### Large expressions as indices

When you are dealing with sequences and indices for those sequences, you may end up needing to perform some calculations to compute new indices. For example, you want the middle element of a string and you don't know about `//` yet:

In [35]:
s = "Indexing is so easy!"
print(s[len(s)/2])

TypeError: string indices must be integers, not 'float'

In [34]:
import math
s = "Indexing is so easy!"
print(s[math.floor(len(s)/2)])
print(s[len(s)//2])  # Pro-tip: the operation // is ideal here

s
s


Another alternative is to create a well-named variable to hold the result of the computation:

In [36]:
import math
s = "Indexing is so easy!"
mid_char_idx = math.floor(len(s)/2)
print(s[mid_char_idx])

s


If you have large expressions to compute indices, consider using an intermediate variable with a descriptive name.

#### Unpacking with indexing

You will find yourself often working with small groups of data, for example pairs of things that you keep together in a small list. For example:

In [39]:
names = ["Mary", "Doe"]

def greet(names, formal):
    if formal:
        print(f"Hello Miss {names[1]}")
    else:
        print(f"Hey there {names[0]}")

greet(names, True)
greet(names, False)

Hello Miss Doe,
Hey there Mary,


You might consider unpacking the `names`before reach the if statement:

In [40]:
names = ["Mary", "Doe"]

def greet(names, formal):
    first, last = names
    if formal:
        print(f"Hello Miss {last}")
    else:
        print(f"Hey there {first}")

greet(names, True)
greet(names, False)

Hello Miss Doe
Hey there Mary


This makes the intent of the code much more obvious. Just from looking at the function first line we know `names` is supposed to be a pair of names. This forces your greet function to expect a pair of names.

In [43]:
names = ["Mary", "Doe", "Jane"]

def greet(names, formal):
    first, last = names
    if formal:
        print(f"Hello Miss {last}")
    else:
        print(f"Hey there {first}")

# greet(names, True)
greet("Mary", False)

ValueError: too many values to unpack (expected 2)

## Idiomatic sequence slicing

### Introduction

Slicing is a "more advanced" way f accessing portions of sequences