In [None]:
from pprint import pprint
from itertools import groupby

# Python data structures

Here are the topics that need to be covered:
1. Python data structure hetereogeneity
2. Immutability
3. Unpacking

## Flexibility

Python data structures prioritize flexibility.  If something might work, then it probably will work even if it makes the data structure more complex or difficult to use going forward.

Other languages like C++ prioritize speed and minimizing the potential for runtime errors.

### Heterogeneity

#### Sequences (lists and tuples)

In some strictly type-checked languages like C++ a vector of integers is different from a vector of strings.  Vectors and arrays can contain only a single data type, and the compiler enforces this.  This is part of prioritizing the prevention of runtime errors as I described above.  Python prioritizes flexiblity, so lists can contain a mixture of types.

In [1119]:
mixed_tuple = ("Alice", 1234, "Sesame ST")
pprint(mixed_tuple)

('Alice', 1234, 'Sesame ST')


In [1120]:
another_mixed_tuple = ("Bob", "515-555-9955", False)
pprint(another_mixed_tuple)

('Bob', '515-555-9955', False)


This arbitariness can run arbitrarily deep, as well.

In [1121]:
confounding_list = [mixed_tuple, another_mixed_tuple, "Charlie"]
pprint(confounding_list)

[('Alice', 1234, 'Sesame ST'), ('Bob', '515-555-9955', False), 'Charlie']


#### Mapping (dicts)

Mappings like dictionaries can even have keys of mixed types.  And of course the values can be mixed as well.

In [1122]:
mixed_dict = {0: "Dan",
              1: "Eric",
              "Alice": mixed_tuple,
              "Bob": another_mixed_tuple,
              "everyone": confounding_list}
pprint(mixed_dict)

{0: 'Dan',
 1: 'Eric',
 'Alice': ('Alice', 1234, 'Sesame ST'),
 'Bob': ('Bob', '515-555-9955', False),
 'everyone': [('Alice', 1234, 'Sesame ST'),
              ('Bob', '515-555-9955', False),
              'Charlie']}


Although you sometimes see this done intentionally, it's also easy to do by accident.  The string "1" and the integer 1 are of different types.

#### Where would I see this in practice?

Positional data strucutres are very common.  You might have an address book where the first element of every entry is the name, the second is their address, the third is their phone number, and so on.  Some languages might represent these with special objects, like structs, but Python just uses lists and tuples unless there's a particular reason not to do so.  Tuples, and espeically named tuples, tend to assume that position is significant.  Built in functions like `zip()` are designed for manipulation positional data.

## Immutability

Certain types, namely numeric types like integers and floats, strings, and tuples are immutable in Python.  This means that once they've been created they can't be changed.  However, each of these categories handles immutability slightly differently.  The most likely place for this to trip someone up is the overlap between tuples, which are immutable, and lists, which are mutable.

### Mutability for comparison

Fist lets take a look at mutable behavior.  This means that an object can change after it is created.  It also means that anyone with a reference to that object will see those changes.  THis can be confusing when they think they have a copy!

In [1123]:
my_list = ["one", "two", "three"]
your_list = my_list

In [1124]:
print("Initial lists")
print("My list:")
pprint(my_list)
print()
print("Your list")
pprint(your_list)

Initial lists
My list:
['one', 'two', 'three']

Your list
['one', 'two', 'three']


In [1125]:
my_list.append(4)

In [1126]:
print("After adding to my list")
print("My list:")
pprint(my_list)
print()
print("Your list")
pprint(your_list)

After adding to my list
My list:
['one', 'two', 'three', 4]

Your list
['one', 'two', 'three', 4]


In [1127]:
your_list[1] = 2

In [1128]:
print("After changing your list")
print("My list:")
pprint(my_list)
print()
print("Your list")
pprint(your_list)

After changing your list
My list:
['one', 2, 'three', 4]

Your list
['one', 2, 'three', 4]


In the above example, `my_list` and `your_list` both refer to the same list.

#### Checking for identical references

Is there a way to check if two variables refer to the same object and will change together, like the lists in the example above?  Yes.  This is what the `is` keyword does.  The `is` keyword returns true if two variables point to the same object.  Note that this has some pitfalls when interacting with immutable object, which I will cover later.

In [1129]:
print(f"{my_list is your_list=}")

my_list is your_list=True


##### A safer alternative

Because the `is` keyword can give misleading results when dealing with immutable objects, because people are often tempted into treating `is` as a synonym for `==`, and because it's slightly more useful when examining memory management, I tend to use a different pattern.  The `id()` function returns an object's unique identifier.

In [1130]:
print(f"{id(my_list)=}, {id(your_list)=}")
print(f"{id(my_list) == id(your_list)=}")

id(my_list)=140449289498176, id(your_list)=140449289498176
id(my_list) == id(your_list)=True


### Numerical immutability

Lets try the same sort of operations as above on an integer.

In [1131]:
my_int = 3
your_int = my_int

In [1132]:
print("initial numbers")
print(f"{my_int=}, {your_int=}")

initial numbers
my_int=3, your_int=3


In [1133]:
my_int += 1

In [1134]:
print("After changing my int")
print(f"{my_int=}, {your_int=}")

After changing my int
my_int=4, your_int=3


In [1135]:
your_int -= 1

In [1136]:
print("After changing your int")
print(f"{my_int=}, {your_int=}")

After changing your int
my_int=4, your_int=2


Despite being defined in ways that are visually identical to the lists, above, the integers don't change together.  In fact, additiona and subtraction don't change the value of the integers either.  Instead, they make a new int, and operations like `+=` update the variable to point to the new integer.  Other numeric types in Python, like floats, work the same way.

#### What the trouble with is is

I said earlier that some people are tempted into treating the `is` keyword like a synonym for the `==` operator and numeric types are often how they're tempted into doing this.  

If you're particularly memory-minded it might occur to you that you don't really need multiple copies of immutable objects, and this is true.  Python does try to reuse immutable objects when it's quick and easy to do so.  For example, some version of Python store constants for commonly used numbers (say, 0 to 100) in easy to find locations so many instances of those numbers will be identical.  So `is` will usually return True.  But this won't work reliably, or at all for arbitrary numbers.

In [1137]:
ordinary_number = 3
another_ordinary_number = 3
print(f"{ordinary_number == another_ordinary_number=}")
print(f"{ordinary_number is another_ordinary_number=}")

ordinary_number == another_ordinary_number=True
ordinary_number is another_ordinary_number=True


In [1138]:
ordinary_number = 3
division_result = 21/7
print(f"{ordinary_number == division_result=}")
print(f"{ordinary_number is division_result=}")

ordinary_number == division_result=True
ordinary_number is division_result=False


In [1139]:
meme_number = 31_556_926
seconds_per_year = 31_556_926
print(f"{meme_number == seconds_per_year=}")
print(f"{meme_number is seconds_per_year=}")

meme_number == seconds_per_year=True
meme_number is seconds_per_year=False


### String immutability

Strings are another immutable basic type in Python.  Much like immutable numeric types, anything that looks like it's changing a string is actually creating a new string and updating the variable to point to that.

In [1140]:
my_string = "Foo"
print(f"original string: {my_string}")
print(f"{(original_address := id(my_string))=}")
my_string += " Bar"
print(f"string after appending with +=: {my_string}")
print(f"{(updated_address := id(my_string))=}")
print(f"{original_address == updated_address}")

original string: Foo
(original_address := id(my_string))=140449748641200
string after appending with +=: Foo Bar
(updated_address := id(my_string))=140449295414832
False


Other languages like C++ encourage the user to treat strings as arrays or vectors of characters.  This means you can change a string with normal array operations.  In Python, if you try to use list semantics to update a string it simply isn't allowed.

In [1141]:
favorite_thing_string = "foop"
print("You can access characters in strings with list semantics just fine")
print(f"Found at typo at index -1: {favorite_thing_string[-1]}")
print("But you aren't allowed to change strings the way you'd change a list")
print(f"Trying to fix typo:")
favorite_thing_string[-1] = "d"

You can access characters in strings with list semantics just fine
Found at typo at index -1: p
But you aren't allowed to change strings the way you'd change a list
Trying to fix typo:


TypeError: 'str' object does not support item assignment

### Tuple immutability

Tuples are immutable sequences, the immutable counteprart of lists.

One of the main pitfalls of tuples is that while they are themselves immutable, the things they contain might not be.  For example, if you have a tuple full of lists then those lists can still be changed.

#### A tuple of lists

In [1142]:
tuple_of_lists = (["One"], ["Two"], ["Three"])
pprint(tuple_of_lists)

(['One'], ['Two'], ['Three'])


#### Adding to a tuple returns a new tuple

Tuples support the addition operator, this behaves like strings and lists, returning a new tuple.

In [1143]:
original_tuple_ID = id(tuple_of_lists)
extended_tuple = tuple_of_lists + (["Four"],)
extended_tuple_id = id(extended_tuple)

In [1144]:
print(f"{tuple_of_lists=}")
print(f"{original_tuple_ID=}")
print(f"{extended_tuple=}")
print(f"{extended_tuple_id}")
print(f"{original_tuple_ID == extended_tuple_id=}")

tuple_of_lists=(['One'], ['Two'], ['Three'])
original_tuple_ID=140449288204160
extended_tuple=(['One'], ['Two'], ['Three'], ['Four'])
140449287724288
original_tuple_ID == extended_tuple_id=False


##### Tuple syntax

You might find the syntax above a bit strange, why did I write `(["Four"],)`?  Well, without the comma I just have a list in parenthesis, which is just a list.  It's the presene of a comma that forces Python to evaluate the parenthesis as part of a tuple.  The empty parens `()` signifying the empty tuple are a special case.

In [1145]:
print(f"{type(())=}")
print(f"{type((3))=}")
print(f"{type((3,))=}")

type(())=<class 'tuple'>
type((3))=<class 'int'>
type((3,))=<class 'tuple'>


#### Mutable objects inside tuples are still mutable

Although tuples themselves cannot be changed, the things inside them still can be.

In [1146]:
tuple_of_lists = (["One"], ["Two"], ["Three"])
pprint(tuple_of_lists)
tuple_of_lists[1].extend(["Four", "Six", "Eight"])
pprint(tuple_of_lists)

(['One'], ['Two'], ['Three'])
(['One'], ['Two', 'Four', 'Six', 'Eight'], ['Three'])


## Sequence unpacking

Python supports multiple assignment through **sequence unpacking**.  If there are multiple variables on the left-hand side of an assignment, Python will try to iterate over whatever is present on the right hand side and try to match them up.  Importantly, Python uses the **iterator protocol** to do this.

### A basic example

In [1147]:
a, b, c = ["Alice", "Bob", "Charlie"]
print(f"{a=}, {b=}, {c=}")

a='Alice', b='Bob', c='Charlie'


### The right-hand and left-hand sides have to match

In [1148]:
a, b = ["Alice", "Bob", "Charlie"]

ValueError: too many values to unpack (expected 2)

In [1149]:
a, b, c, d = ["Alice", "Bob", "Charlie"]

ValueError: not enough values to unpack (expected 4, got 3)

### You can use an asterisk * to absorb extra values

You can use an asterisk to absorb extra values.  It doesn't have to come at the end.  That variable will become a list.  It might be empty if there are no spares.

#### One item in the wildcard still makes it a list

In [1150]:
a, *b, c = ["Alice", "Bob", "Charlie"]
print(f"{a=}, {b=}, {c=}")

a='Alice', b=['Bob'], c='Charlie'


#### Multiple items

In [1151]:
a, *b, c = ["Alice", "Bob", "Rob", "Charlie"]
print(f"{a=}, {b=}, {c=}")

a='Alice', b=['Bob', 'Rob'], c='Charlie'


#### Zero items

In [1152]:
a, *b, c = ["Alice", "Charlie"]
print(f"{a=}, {b=}, {c=}")

a='Alice', b=[], c='Charlie'


#### Trying to have more than one wildcard

In [1153]:
a, *b, *c = ["Alice", "Bob", "Charlie"]
print(f"{a=}, {b=}, {c=}")

SyntaxError: multiple starred expressions in assignment (3269453192.py, line 1)

### The types of unpacked items can be different

In the example below, the variable `b` will point to the tuple `("Bob", "Bobbert")` while the other variables hold strings.

In [1154]:

a, b, c = ["Alice", ("Bob", "Bobbert"), "Charlie"]
print(f"{a=}, {b=}, {c=}")

a='Alice', b=('Bob', 'Bobbert'), c='Charlie'


### Unpacking can be nested

In [1155]:
a, (b, bb), c = ["Alice", ("Bob", "Bobbert"), "Charlie"]
print(f"{a=}, {b=}, {bb=}, {c=}")

a='Alice', b='Bob', bb='Bobbert', c='Charlie'


In the example above, the unpacking expression expects the tuple, and extracts the values inside of it.  All the rules for the outermost unpacking expression apply to interior ones as well.  They have to match exactly, they can each have a single wildcard, and so on.

### Overusing this feature will make code hard to read

The unpacking syntax overlaps heavily with **structural pattern matching** where it's much more common to see complex examples.  But outside of that it's not common to see more than one or two levels of nested unpacking.  Remember that unpacking depends on the structures matching exactly, so overusing it can make your code more brittle.

### Pattern matching in looping constructs, a step-by-step example

It's important to remember that you're never required to use sequence unpacking if you don't want to.  Since for loops is one of the most common places to see sequence unpacking, let's examine what happens if you don't use it.

TODO