# Agenda

1. Data structures (this morning)
    - How are core data structures implemented
    - Advanced core data structures (`Decimal` and `namedtuple`)
    - Dictionaries and their variants
2. Functions (this afternoon + Tuesday morning)
    - Functions as nouns, not just verbs -- function objects, and how they work
    - Attributes on function objects
    - Bytecodes
    - How are arguments mapped to parameters when we call a function? (positional and keyword arguments)
    - Special parameter types (`*args` and `**kwargs`), keyword only, defaults
    - Variable scoping (LEGB)
    - The enclosing scope -- closures, inner functions, and why we want them
    - Type hints/annotations, and what they are (and aren't)
    - Dispatch tables
3. Functional programming in Python (Tuesday afternoon)
    - Comprehensions (list, set, dict comprehensions -- including nested comprehensions)
    - Functions as arguments to other functions
    - `lambda` and its friends
4. Modules and packages (Tuesday afternoon)
5. Objects (Wednesday)
    - Classes
    - Methods
    - Instances
    - Attributes -- one of the most important things in all of Python (ICPO rule for attribute lookup)
    - Magic methods
    - Properties
    - Descriptors
    - Methods vs. functions -- how are they different, and how does `self` work?
6. Iterators and generators (Thursday morning)
    - Making your class iterable
    - Generator functions
    - Generator comprehensions (aka generator expressions)
7. Decorators (Thursday)
8. Concurrency
    - Threads and processes (multiprocessing)
    - `asyncio`, and how it works (and doesn't)
    - Where is this all going in the Python world?

In [1]:
# gitautopush -- on PyPI


# Assignment in Python

In a language like C, a variable is an alias to a location in memory. So when I assign to a variable, the value is being put in a particular location in memory. This is why we need to (a) declare our variables as having a certain type and (b) only certain types can be assigned to certain variables.

In Python, variables refer to values.  Variables are *not* memory locations! All values are objects, and all objects are on the heap. A variable is just a *pointer* to one of those values. This is why any Python variable can refer to any Python value, and why we don't (and cannot) declare our variables to be a particular type.

This is the definition of a *dynamic language*.  It doesn't mean that values don't have types. It just means that variables don't have types.

In [2]:
x = 5    # when we assign, we're saying that the variable on the left should refer to the value on the right

In [3]:
type(x)

int

In [4]:
# there is no way in Python for one variable to refer to another variable

x = 5
y = x    # this doesn't mean that y "follows x around." Rather, it says that y refers to whatever x currently refers to

y

5

In [5]:
x = 10  # we're reassigned x

y

5

In [6]:
# if the value is mutable, then things get stickier

x = [10, 20, 30]
y = x

x[0] = '!'   # I have modified the list to which x refers .. which is also the list to which y refers!
y

['!', 20, 30]

In [7]:
mylist = [10, 20, 30]
mylist.append(mylist)

mylist

[10, 20, 30, [...]]

In [8]:
del(mylist)   # delete the variable

In [9]:
x = None

In [10]:
type(x)    

NoneType

`None` exists so that we can say that we have no value, and make that distinct from 0, `False`, empty string, etc. It's its own value. In a boolean context (i.e., in an `if` statement), it is considered `False`. But if you check whether `None` is the same as something else, it isn't.

Where do we use `None`?

- A function that doesn't explicitly return a value returns `None`
- Many times, default argument values in functions have a value of `None`
- You'll see it for default attributes in objects, too

In [11]:
None == None

True

In [12]:
None == False

False

In [13]:
None == 0

False

In [15]:
# how can I check to see if something is None?

x = None

if x == None:   # unfortunately, this works -- it isn't Pythonic
    print('Yes! It is None!')

Yes! It is None!


`None` is a singleton object; every `None` in Python is not only equal to every other one, but it is the exact same object.

In [16]:
id(None)   # this returns the unique object number

4466115072

In [17]:
id(None)

4466115072

In [18]:
new_none = type(None)()

In [19]:
id(new_none)

4466115072

In [20]:
# According to PEP 8, the Python style guide, we shouldn't use "==" on singletons, especially with None.
# Rather, we should check the identity of the object with the "is" operator.

# "is" doesn't check whether two things are equal. It checks whether their ids are equal

In [21]:
id(None) == id(new_none)

True

In [22]:
# we can say the same thing, much better:

None is new_none

True

In [25]:
ni = type(NotImplemented)()

In [26]:
NotImplemented is ni

True

In [27]:
id(5)  # yes, you get back a unique ID of the object... which happens to be its address in memory!

4466981456

In [28]:
f = open('/etc/passwd')

f

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

# What does `type` do?

It actually does *two* things:

1. If you give it just one argument, it returns the type of the object, basically the same thing that you would get from the object's `__class__` attribute.
2. If you give it three arguments, then you get back a new class which you have created. This is pretty rare.

In [29]:
type('abcd')   # this will return 'abcd'.__class__

str

In [30]:
x = 100
y = 100

x == y    # are these the same value?

True

In [31]:
x is y    # are these the same object in memory?

True

In [32]:
x = 1000
y = 1000

x == y

True

In [33]:
x is y

False

Python knows that we'll be using a lot of small integers, and thus creates -- when it starts up -- all of the integers from -5 to 256. So any time you use one of those integers, Python just grabs the object it already has available. Thus, these small integers will always be `is` to each other.

But once you get into larger integers, that's no longer the case.

In [34]:
x = 'abcd'
y = 'abcd'

x == y

True

In [35]:
x is y

True

In [36]:
x = 'abcd' * 10_000
y = 'abcd' * 10_000

x == y

True

In [37]:
x is y

False

In [38]:
x = 'ab.cd'
y = 'ab.cd'

x == y

True

In [39]:
x is y

False

What's going on?

When we assign to a variable in Python, that variable name is turned into a string, and is then used as the key in an internal dict to store our value. This means that every time we store or retrieve a variable's value, we're creating a new string.

Python's solution to this is that any short string (I think < 500 characters) that only contains characters that are legal in an identifier are cached. This means that the first time we see such a string, it's really created. The second and next times, we just reuse the same string.

- If the string is long, then this caching doesn't happen
- If the string contains `.` (or some other illegal character in an identifier), then it doesn't either.

This is transparent to us, but if you use `is` to compare strings, you'll discover it.

In [40]:
x = 100

globals()['x']  # retrieves the value of x

100

In [42]:
globals()['x'] = 9876

In [43]:
x

9876

Only use `is` to compare with `None`. Otherwise, use `==`.



In [45]:
# False is a singleton, too
bool(0) is bool(0)

True

In [46]:
# True is a singleton, too
bool(1) is bool(1)

True

# Integers

Many people new to Python ask: What is the biggest integer we're allowed? Or how many bits are our integers?

This is the *wrong* question to ask! Because integers are objects; they run themselves, and manage their own memory. Integers can be as big as you want, so long as you don't run out of memory.

In [47]:
import sys   

sys.getsizeof(0)   # how many bytes does something in Python take up?

28

In [51]:
sys.getsizeof(10_000_000_000_000_000)

32

In [52]:
x = 10_000_000_000_000_000
x = x ** 1000

In [53]:
sys.getsizeof(x)

7112

In [55]:
x = x ** 100

In [56]:
sys.getsizeof(x)

708704

In [57]:
# floats

0.1 + 0.2 

0.30000000000000004

In [58]:
# what if we could keep our number in decimal, and never go to binary?
# we can trade off longer execution and more memory with having greater precision

from decimal import Decimal 

x = Decimal('0.1')
y = Decimal('0.2')

x + y

Decimal('0.3')

In [59]:
float(x+y)

0.3

In [60]:
# another solution to the float problem: Use the builtin round() function

round(0.1 + 0.2, 2)  # round things off after 2 digits past the decimal point

0.3

In [61]:
# another solution: use ints!

In [62]:
sys.getsizeof(0.1)

24

In [63]:
sys.getsizeof(1234567890.1234567890)

24

In [64]:
sys.getsizeof(x)

104

In [None]:
# teraflops  -- trillions of floating point operations per second

In [65]:
x = 12345.6789
y = 98765.4321

%timeit x * y

26.1 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [66]:
x = Decimal('12345.6789')
y = Decimal('98765.4321')

%timeit x * y

74.1 ns ± 2.01 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [68]:
# Python does have a third numeric type: Complex!

x = 20+3j
y = 15-8j

x + y

(35-5j)

In [69]:
sys.getsizeof(x)

32

In [70]:
sys.getsizeof(y)

32

# Lists

Many people, when they come to Python, call lists "arrays." This is not true! They aren't arrays, because:

1. Arrays have a fixed size, set when we create them
2. All of the elements of an array must be of the same type

Neither of these is true regarding lists. We can modify them (their contents and their lengths), and we can also put any type we want, and any combination of types we want in a list.

That said: The tradition in Python is to have lists contain only one type. 

But... behind the scenes, a list is implemented as an array.  How?

1. A list's array is allocated with extra space. So when we add new elements, those spaces are used.
2. When we run out of those spaces, then a new array is allocated, with extra space there, as well.
3. Since all values in Python are referred to via pointers, we can argue that the array contains only one type, namely `*PyObject` in C.

In [71]:
mylist = [10, 20, 30]
sys.getsizeof(mylist)

88

In [72]:
mylist = []

for i in range(40):
    print(f'{i=}, {len(mylist)=}, {sys.getsizeof(mylist)=}')
    mylist.append(i)

i=0, len(mylist)=0, sys.getsizeof(mylist)=56
i=1, len(mylist)=1, sys.getsizeof(mylist)=88
i=2, len(mylist)=2, sys.getsizeof(mylist)=88
i=3, len(mylist)=3, sys.getsizeof(mylist)=88
i=4, len(mylist)=4, sys.getsizeof(mylist)=88
i=5, len(mylist)=5, sys.getsizeof(mylist)=120
i=6, len(mylist)=6, sys.getsizeof(mylist)=120
i=7, len(mylist)=7, sys.getsizeof(mylist)=120
i=8, len(mylist)=8, sys.getsizeof(mylist)=120
i=9, len(mylist)=9, sys.getsizeof(mylist)=184
i=10, len(mylist)=10, sys.getsizeof(mylist)=184
i=11, len(mylist)=11, sys.getsizeof(mylist)=184
i=12, len(mylist)=12, sys.getsizeof(mylist)=184
i=13, len(mylist)=13, sys.getsizeof(mylist)=184
i=14, len(mylist)=14, sys.getsizeof(mylist)=184
i=15, len(mylist)=15, sys.getsizeof(mylist)=184
i=16, len(mylist)=16, sys.getsizeof(mylist)=184
i=17, len(mylist)=17, sys.getsizeof(mylist)=248
i=18, len(mylist)=18, sys.getsizeof(mylist)=248
i=19, len(mylist)=19, sys.getsizeof(mylist)=248
i=20, len(mylist)=20, sys.getsizeof(mylist)=248
i=21, len(mylist)

In [73]:
mylist[10]

10

In [74]:
id(mylist)

4520534144

In [75]:
id(mylist) + 10

4520534154

In [76]:
sys.getsizeof(mylist)

376

In [77]:
mylist[0] = 'abcdefghij' * 10_000_000

In [78]:
sys.getsizeof(mylist)   # it's only giving us the size of the pointers, not the values to which the pointers are referring

376

In [79]:
mylist = [10, 20, 30]

# the general rule of thumb in Python is:
# if you invoke a method 
# and the method modifies the object
# then the method returns None, not the object

mylist = mylist.append(40)   
type(mylist)

NoneType

In [80]:
# if you want to append to a list, then just invoke append
# *not* on the right side of assignment

mylist = [10, 20, 30]
mylist.append(40)

mylist

[10, 20, 30, 40]

In [81]:
mylist.pop()

40

In [82]:
mylist

[10, 20, 30]

# Tuples

Tuples are for use as Python's structs/records. The idea is that if you have fields of different types, you'll use a tuple. When you retrieve from a database, you will often get a list of tuples -- a list, because you have many tuples of the same type, and tuples, because the fields/columns are different types.

I don't use tuples very much -- but Python does, behind the scenes, and there are many people who do.

Because tuples are immutable, they don't need that extra space in a list. So they're more compact than lists. They're by far the most optimized/smallest data structure in Python.

When you invoke a function, the arguments are passed as a tuple.

In [83]:
person = ('Reuven', 'Lerner', 46)

In [84]:
person[0]

'Reuven'

In [85]:
person[1]

'Lerner'

In [86]:
person[2]

46

In [87]:
# I don't like this, because I don't want to have think about/remember which numeric
# index goes with which field.

# enter named tuples!

In [88]:
from collections import namedtuple

In [93]:
# I'm going to create a new Person class, using namedtuple. The new class
# will be a subclass of tuple.

# every class needs to have a __name__ (its name, as a string). Here, we 
# provide that name as the first argument to namedtuple.

# the second argument is a string (separated by whitespace) or a list of strings,
# either way, the field names you want for your named tuple

Person = namedtuple('Person', 'first last shoesize')

In [94]:
type(Person)

type

In [95]:
Person.__bases__   # who does Person inherit from?

(tuple,)

In [96]:
# let's create a person!

p = Person('Reuven', 'Lerner', 46)

In [97]:
p[0]

'Reuven'

In [98]:
p[1]

'Lerner'

In [99]:
p[2]

46

In [100]:
p.first

'Reuven'

In [101]:
p.last

'Lerner'

In [102]:
p.shoesize

46

In [103]:
p.first = 'asdfafaf'

AttributeError: can't set attribute

In [104]:
# there is a way to change a value (sort of) on a namedtuple
# we can invoke the _replace method, passing keyword arguments with the new values.

p._replace(first='asdfsafa')

Person(first='asdfsafa', last='Lerner', shoesize=46)

In [105]:
# regular tuples -- a reminder

t = (10, 20, 30)
type(t)

tuple

In [106]:
t = (10, 20)
type(t)

tuple

In [107]:
t = (10)   # HUH?  
type(t)

int

In [108]:
t = ()
type(t)

tuple

In [109]:
# because we use () for so many things in Python, if we want a one-element tuple, we need
# to help Python resolve the ambiguity.

# remember that we can use () for priority in math expressions (for one)

4 + 5 * 6

34

In [110]:
(4 + 5) * 6

54

In [111]:
# for a one-element tuple, you *must* have a comma

t = (10,)
type(t)

tuple

In [112]:
# what about this?

(4 + 5,) * 6

(9, 9, 9, 9, 9, 9)

In [113]:
# do I need parentheses to create a tuple? No!

t = 10, 20, 30

type(t)

tuple

In [114]:
# tuple unpacking

mylist = [10, 20, 30]

x,y,z = mylist   # tuple unpacking, because we have a tuple of variables on the *left* -- parallel assignment

In [115]:
x

10

In [116]:
y

20

In [117]:
z

30

In [118]:
# what if we have the wrong number of variables/values?

x,y = mylist

ValueError: too many values to unpack (expected 2)

In [119]:
w,x,y,z = mylist

ValueError: not enough values to unpack (expected 4, got 3)

In [120]:
# if you want, one (just one!) of the variables in the unpacking can have a * before its name
# in that case, it's a list, containing all of the values that didn't "fit" into the other variables

mylist = [10, 20, 30, 40, 50, 60, 70]

x, *y, z = mylist

In [121]:
x

10

In [122]:
y

[20, 30, 40, 50, 60]

In [123]:
z

70

In [124]:
x,y,*z = mylist

In [125]:
z

[30, 40, 50, 60, 70]

In [126]:
x,*y,*z = mylist

SyntaxError: multiple starred expressions in assignment (605520719.py, line 1)

In [128]:
# can I create a tuple of lists?  YES!

t = ([10, 20, 30],
     [100, 200, 300])
t

([10, 20, 30], [100, 200, 300])

In [129]:
# Can I modify the lists?

t[0].append(40)
t

([10, 20, 30, 40], [100, 200, 300])

In [130]:
# what happens when I do this?

t[0] += [50, 60, 70]      # this is translated into the .__iadd__ method ("inplace add")

TypeError: 'tuple' object does not support item assignment

In [131]:
t

([10, 20, 30, 40, 50, 60, 70], [100, 200, 300])

In [132]:
# instead, you can always use the "extend" method on a list, which doesn't have this issue

t[0].extend([80, 90, 100])
t

([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], [100, 200, 300])

In [133]:
# _ is used for throwaway variables

for _ in range(3):
    print('Hello')

Hello
Hello
Hello


In [134]:
first, *_, last = [10, 20, 30, 40, 50, 60]

In [136]:
mylist[2:5]

[30, 40, 50]

# Next up

1. Practice with named tuples (and friends)
2. Dictionaries and their variants

Resume at :55

# Exercise: Bookstore

1. Use `namedtuple` to create a class of `Book`. Each instance of book will have a title, author, and price.
2. Create 3-4 different instances of `Book`, and put them on a list, the inventory for a store.
3. Allow a customer to ask whether a book is in stock, by entering its title.
    - If the user enters an empty string, then stop asking and print the title cost of all books they've bought
    - If the user enters the name of a book in the inventory, print its full info and the current total
    - If the user enters the name of a book *not* in the inventory, scold the user

Example:

    What book: title1
    title 1, by Author1, is 50, total is now 50
    What book: title2
    title 2, by Author 2, is 60, total is now 110
    

In [142]:
from collections import namedtuple

Book = namedtuple('Book', 'title author price')

b1 = Book('title1', 'author1', 50)
b2 = Book('title2', 'author1', 60)
b3 = Book('title3', 'author2', 70)
b4 = Book('title4', 'author3', 80)

inventory = [b1, b2, b3, b3]
total = 0

while True:
    s = input('Enter title: ').strip()

    if not s:  # all strings are True in a boolean context, except the empty string... this is the Pythonic way to check
        break

    found_it = False
    for one_book in inventory:
        if one_book.title == s:
            total += one_book.price
            print(f'Found {one_book.title} by {one_book.author}; cost is {one_book.price} and {total=}')
            found_it = True
            break

    if not found_it:
        print(f'Did not find {s} in our inventory')

print(f'In the end, {total=}')

Enter title:  title1


Found title1 by author1; cost is 50 and total=50


Enter title:  title9


Did not find title9 in our inventory


Enter title:  


In the end, total=50


In [138]:
b1

Book(title='title1', author='author1', price=50)

In [140]:
x = 10
y = [10, 20, 30]
z = 'hello'

print(f'{x=}, {y=}, {z=}, {len(z)=}')

x=10, y=[10, 20, 30], z='hello', len(z)=5


In [None]:
# let's remove the need for found_it as a variable -- using "for-else"

from collections import namedtuple

Book = namedtuple('Book', 'title author price')

b1 = Book('title1', 'author1', 50)
b2 = Book('title2', 'author1', 60)
b3 = Book('title3', 'author2', 70)
b4 = Book('title4', 'author3', 80)

inventory = [b1, b2, b3, b3]
total = 0

while True:
    s = input('Enter title: ').strip()

    if not s:  # all strings are True in a boolean context, except the empty string... this is the Pythonic way to check
        break

    for one_book in inventory:
        if one_book.title == s:
            total += one_book.price
            print(f'Found {one_book.title} by {one_book.author}; cost is {one_book.price} and {total=}')
            break

    else:   # else on a for loop means: run this code if you didn't encounter a break
        print(f'Did not find {s} in our inventory')

print(f'In the end, {total=}')

In [146]:
# let's cut down the size of our "while" loop

from collections import namedtuple

Book = namedtuple('Book', 'title author price')

b1 = Book('title1', 'author1', 50)
b2 = Book('title2', 'author1', 60)
b3 = Book('title3', 'author2', 70)
b4 = Book('title4', 'author3', 80)

inventory = [b1, b2, b3, b3]
total = 0

# the := operator is the assignment expression operator
# it means: assign, and return the value as expression
# everyone calls it the "walrus operator"
while s := input('Enter title: ').strip():

    for one_book in inventory:
        if one_book.title == s:
            total += one_book.price
            print(f'Found {one_book.title} by {one_book.author}; cost is {one_book.price} and {total=}')
            break

    else:   # else on a for loop means: run this code if you didn't encounter a break
        print(f'Did not find {s} in our inventory')

print(f'In the end, {total=}')

SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='? (3905101370.py, line 18)

In [145]:
x := 5

SyntaxError: invalid syntax (4101523498.py, line 1)

# Dictionaries

Dicts are the most important data structure in Python! Python itself uses dicts everywhere:

- Every object is a dict
- Every namespace (module or set of attributes) is a dict
- Every variable is actually a key-value pair in a dict

Some ground rules for dicts:
- Every key has a value, every value has a key
- Keys must be immutable
- Keys must be unique
- Values can be absolutely anything -- any type, any repetition, etc.
- We get values via keys, not vice versa

In [148]:
d = {'a':100, 'b':200, 'c':300}

len(d)    # how many name-value pairs do we have?

3

In [149]:
d['a'] = 2345    # updating is done via assignment
d

{'a': 2345, 'b': 200, 'c': 300}

In [150]:
d['x'] = 9999   # adding a new key-value pair is done via assignment
d

{'a': 2345, 'b': 200, 'c': 300, 'x': 9999}

In [151]:
d['a']

2345

In [152]:
# what if I want to let the user specify a key?
# I could do this:

while True:
    k = input('Enter a key: ').strip()

    if not k:
        break
        
    # these lines, where if the key exists, we want the value but if it doesn't,
    # then we want some default value/message, happens all of the time
    if k in d:
        print(f'd[{k}] is {d[k]}')
    else:
        print(f'd has no key {k}')

Enter a key:  a


d[a] is 2345


Enter a key:  b


d[b] is 200


Enter a key:  c


d[c] is 300


Enter a key:  d


d has no key d


Enter a key:  e


d has no key e


Enter a key:  a


d[a] is 2345


Enter a key:  


In [153]:
# we can instead use dict.get, which is just like []
# except that if the key doesn't exist, we get None back

d.get('a')

2345

In [154]:
d.get('xyz')

In [155]:
d['xyz']

KeyError: 'xyz'

In [156]:
# there's another method, the opposite of dict.get, dict.setdefault
# it only sets a value if the key is new; if the key already exists, then it does nothing

d.setdefault('y', 123)

123

In [157]:
d

{'a': 2345, 'b': 200, 'c': 300, 'x': 9999, 'y': 123}

In [159]:
d.setdefault('y', 246)   # we get 123 back because 'y' already exists as a key, so we won't set 246 or create a new pair

123

In [160]:
d.pop('x')  # this means: remove key-value pair with 'x' as the key, returning the value

9999

In [161]:
d

{'a': 2345, 'b': 200, 'c': 300, 'y': 123}

In [163]:
# if you want to create a new dict whose keys you know and whose values are
# all the same, use dict.fromkeys

d = dict.fromkeys('abcd', 0)
d

{'a': 0, 'b': 0, 'c': 0, 'd': 0}

In [164]:
# don't pass a value, and it'll be None
d = dict.fromkeys('abcd')
d

{'a': None, 'b': None, 'c': None, 'd': None}

In [165]:
d = dict.fromkeys('abcd', [])
d

{'a': [], 'b': [], 'c': [], 'd': []}

In [166]:
d['a'].append(10)
d['b'].append(20)
d['c'].append(30)

d

{'a': [10, 20, 30], 'b': [10, 20, 30], 'c': [10, 20, 30], 'd': [10, 20, 30]}

In [168]:
# the = sign (assignment) does two different things in Python
# 1. assignment to a variable, where the variable then refers to a new object

x = 100
y = x

x = 200   # here, I assign a new value to x; this has no effect on y
y

100

In [169]:
# 2. mutation of an existing value

x = [10, 20, 30]
y = x   # now, both x and y refer to the same value

x[0] = '!'   # here, I'm changing the object to which x refers, which y also refers to 
y

['!', 20, 30]

In [None]:
d = dict.fromkeys('abcd')
d['a'] = 100
d['b'] = 200
d