# Agenda

1. Data structures (this morning)
    - How are core data structures implemented
    - Advanced core data structures (`Decimal` and `namedtuple`)
    - Dictionaries and their variants
2. Functions (this afternoon + Tuesday morning)
    - Functions as nouns, not just verbs -- function objects, and how they work
    - Attributes on function objects
    - Bytecodes
    - How are arguments mapped to parameters when we call a function? (positional and keyword arguments)
    - Special parameter types (`*args` and `**kwargs`), keyword only, defaults
    - Variable scoping (LEGB)
    - The enclosing scope -- closures, inner functions, and why we want them
    - Type hints/annotations, and what they are (and aren't)
    - Dispatch tables
3. Functional programming in Python (Tuesday afternoon)
    - Comprehensions (list, set, dict comprehensions -- including nested comprehensions)
    - Functions as arguments to other functions
    - `lambda` and its friends
4. Modules and packages (Tuesday afternoon)
5. Objects (Wednesday)
    - Classes
    - Methods
    - Instances
    - Attributes -- one of the most important things in all of Python (ICPO rule for attribute lookup)
    - Magic methods
    - Properties
    - Descriptors
    - Methods vs. functions -- how are they different, and how does `self` work?
6. Iterators and generators (Thursday morning)
    - Making your class iterable
    - Generator functions
    - Generator comprehensions (aka generator expressions)
7. Decorators (Thursday)
8. Concurrency
    - Threads and processes (multiprocessing)
    - `asyncio`, and how it works (and doesn't)
    - Where is this all going in the Python world?

In [1]:
# gitautopush -- on PyPI


# Assignment in Python

In a language like C, a variable is an alias to a location in memory. So when I assign to a variable, the value is being put in a particular location in memory. This is why we need to (a) declare our variables as having a certain type and (b) only certain types can be assigned to certain variables.

In Python, variables refer to values.  Variables are *not* memory locations! All values are objects, and all objects are on the heap. A variable is just a *pointer* to one of those values. This is why any Python variable can refer to any Python value, and why we don't (and cannot) declare our variables to be a particular type.

This is the definition of a *dynamic language*.  It doesn't mean that values don't have types. It just means that variables don't have types.

In [2]:
x = 5    # when we assign, we're saying that the variable on the left should refer to the value on the right

In [3]:
type(x)

int

In [4]:
# there is no way in Python for one variable to refer to another variable

x = 5
y = x    # this doesn't mean that y "follows x around." Rather, it says that y refers to whatever x currently refers to

y

5

In [5]:
x = 10  # we're reassigned x

y

5

In [6]:
# if the value is mutable, then things get stickier

x = [10, 20, 30]
y = x

x[0] = '!'   # I have modified the list to which x refers .. which is also the list to which y refers!
y

['!', 20, 30]

In [7]:
mylist = [10, 20, 30]
mylist.append(mylist)

mylist

[10, 20, 30, [...]]

In [8]:
del(mylist)   # delete the variable

In [9]:
x = None

In [10]:
type(x)    

NoneType

`None` exists so that we can say that we have no value, and make that distinct from 0, `False`, empty string, etc. It's its own value. In a boolean context (i.e., in an `if` statement), it is considered `False`. But if you check whether `None` is the same as something else, it isn't.

Where do we use `None`?

- A function that doesn't explicitly return a value returns `None`
- Many times, default argument values in functions have a value of `None`
- You'll see it for default attributes in objects, too

In [11]:
None == None

True

In [12]:
None == False

False

In [13]:
None == 0

False

In [15]:
# how can I check to see if something is None?

x = None

if x == None:   # unfortunately, this works -- it isn't Pythonic
    print('Yes! It is None!')

Yes! It is None!


`None` is a singleton object; every `None` in Python is not only equal to every other one, but it is the exact same object.

In [16]:
id(None)   # this returns the unique object number

4466115072

In [17]:
id(None)

4466115072

In [18]:
new_none = type(None)()

In [19]:
id(new_none)

4466115072

In [20]:
# According to PEP 8, the Python style guide, we shouldn't use "==" on singletons, especially with None.
# Rather, we should check the identity of the object with the "is" operator.

# "is" doesn't check whether two things are equal. It checks whether their ids are equal

In [21]:
id(None) == id(new_none)

True

In [22]:
# we can say the same thing, much better:

None is new_none

True

In [25]:
ni = type(NotImplemented)()

In [26]:
NotImplemented is ni

True

In [27]:
id(5)  # yes, you get back a unique ID of the object... which happens to be its address in memory!

4466981456

In [28]:
f = open('/etc/passwd')

f

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

# What does `type` do?

It actually does *two* things:

1. If you give it just one argument, it returns the type of the object, basically the same thing that you would get from the object's `__class__` attribute.
2. If you give it three arguments, then you get back a new class which you have created. This is pretty rare.

In [29]:
type('abcd')   # this will return 'abcd'.__class__

str

In [30]:
x = 100
y = 100

x == y    # are these the same value?

True

In [31]:
x is y    # are these the same object in memory?

True

In [32]:
x = 1000
y = 1000

x == y

True

In [33]:
x is y

False

Python knows that we'll be using a lot of small integers, and thus creates -- when it starts up -- all of the integers from -5 to 256. So any time you use one of those integers, Python just grabs the object it already has available. Thus, these small integers will always be `is` to each other.

But once you get into larger integers, that's no longer the case.

In [34]:
x = 'abcd'
y = 'abcd'

x == y

True

In [35]:
x is y

True

In [36]:
x = 'abcd' * 10_000
y = 'abcd' * 10_000

x == y

True

In [37]:
x is y

False

In [38]:
x = 'ab.cd'
y = 'ab.cd'

x == y

True

In [39]:
x is y

False

What's going on?

When we assign to a variable in Python, that variable name is turned into a string, and is then used as the key in an internal dict to store our value. This means that every time we store or retrieve a variable's value, we're creating a new string.

Python's solution to this is that any short string (I think < 500 characters) that only contains characters that are legal in an identifier are cached. This means that the first time we see such a string, it's really created. The second and next times, we just reuse the same string.

- If the string is long, then this caching doesn't happen
- If the string contains `.` (or some other illegal character in an identifier), then it doesn't either.

This is transparent to us, but if you use `is` to compare strings, you'll discover it.

In [40]:
x = 100

globals()['x']  # retrieves the value of x

100

In [42]:
globals()['x'] = 9876

In [43]:
x

9876

Only use `is` to compare with `None`. Otherwise, use `==`.



In [45]:
# False is a singleton, too
bool(0) is bool(0)

True

In [46]:
# True is a singleton, too
bool(1) is bool(1)

True

# Integers

Many people new to Python ask: What is the biggest integer we're allowed? Or how many bits are our integers?

This is the *wrong* question to ask! Because integers are objects; they run themselves, and manage their own memory. Integers can be as big as you want, so long as you don't run out of memory.

In [47]:
import sys   

sys.getsizeof(0)   # how many bytes does something in Python take up?

28

In [51]:
sys.getsizeof(10_000_000_000_000_000)

32

In [52]:
x = 10_000_000_000_000_000
x = x ** 1000

In [53]:
sys.getsizeof(x)

7112

In [55]:
x = x ** 100

In [56]:
sys.getsizeof(x)

708704

In [57]:
# floats

0.1 + 0.2 

0.30000000000000004

In [58]:
# what if we could keep our number in decimal, and never go to binary?
# we can trade off longer execution and more memory with having greater precision

from decimal import Decimal 

x = Decimal('0.1')
y = Decimal('0.2')

x + y

Decimal('0.3')

In [59]:
float(x+y)

0.3

In [60]:
# another solution to the float problem: Use the builtin round() function

round(0.1 + 0.2, 2)  # round things off after 2 digits past the decimal point

0.3

In [61]:
# another solution: use ints!

In [62]:
sys.getsizeof(0.1)

24

In [63]:
sys.getsizeof(1234567890.1234567890)

24

In [64]:
sys.getsizeof(x)

104

In [None]:
# teraflops  -- trillions of floating point operations per second

In [65]:
x = 12345.6789
y = 98765.4321

%timeit x * y

26.1 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [66]:
x = Decimal('12345.6789')
y = Decimal('98765.4321')

%timeit x * y

74.1 ns ± 2.01 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [68]:
# Python does have a third numeric type: Complex!

x = 20+3j
y = 15-8j

x + y

(35-5j)

In [69]:
sys.getsizeof(x)

32

In [70]:
sys.getsizeof(y)

32

# Lists

Many people, when they come to Python, call lists "arrays." This is not true! They aren't arrays, because:

1. Arrays have a fixed size, set when we create them
2. All of the elements of an array must be of the same type

Neither of these is true regarding lists. We can modify them (their contents and their lengths), and we can also put any type we want, and any combination of types we want in a list.

That said: The tradition in Python is to have lists contain only one type. 

But... behind the scenes, a list is implemented as an array.  How?

1. A list's array is allocated with extra space. So when we add new elements, those spaces are used.
2. When we run out of those spaces, then a new array is allocated, with extra space there, as well.
3. Since all values in Python are referred to via pointers, we can argue that the array contains only one type, namely `*PyObject` in C.

In [71]:
mylist = [10, 20, 30]
sys.getsizeof(mylist)

88

In [72]:
mylist = []

for i in range(40):
    print(f'{i=}, {len(mylist)=}, {sys.getsizeof(mylist)=}')
    mylist.append(i)

i=0, len(mylist)=0, sys.getsizeof(mylist)=56
i=1, len(mylist)=1, sys.getsizeof(mylist)=88
i=2, len(mylist)=2, sys.getsizeof(mylist)=88
i=3, len(mylist)=3, sys.getsizeof(mylist)=88
i=4, len(mylist)=4, sys.getsizeof(mylist)=88
i=5, len(mylist)=5, sys.getsizeof(mylist)=120
i=6, len(mylist)=6, sys.getsizeof(mylist)=120
i=7, len(mylist)=7, sys.getsizeof(mylist)=120
i=8, len(mylist)=8, sys.getsizeof(mylist)=120
i=9, len(mylist)=9, sys.getsizeof(mylist)=184
i=10, len(mylist)=10, sys.getsizeof(mylist)=184
i=11, len(mylist)=11, sys.getsizeof(mylist)=184
i=12, len(mylist)=12, sys.getsizeof(mylist)=184
i=13, len(mylist)=13, sys.getsizeof(mylist)=184
i=14, len(mylist)=14, sys.getsizeof(mylist)=184
i=15, len(mylist)=15, sys.getsizeof(mylist)=184
i=16, len(mylist)=16, sys.getsizeof(mylist)=184
i=17, len(mylist)=17, sys.getsizeof(mylist)=248
i=18, len(mylist)=18, sys.getsizeof(mylist)=248
i=19, len(mylist)=19, sys.getsizeof(mylist)=248
i=20, len(mylist)=20, sys.getsizeof(mylist)=248
i=21, len(mylist)

In [73]:
mylist[10]

10

In [74]:
id(mylist)

4520534144

In [75]:
id(mylist) + 10

4520534154

In [76]:
sys.getsizeof(mylist)

376

In [77]:
mylist[0] = 'abcdefghij' * 10_000_000

In [78]:
sys.getsizeof(mylist)   # it's only giving us the size of the pointers, not the values to which the pointers are referring

376

In [79]:
mylist = [10, 20, 30]

# the general rule of thumb in Python is:
# if you invoke a method 
# and the method modifies the object
# then the method returns None, not the object

mylist = mylist.append(40)   
type(mylist)

NoneType

In [80]:
# if you want to append to a list, then just invoke append
# *not* on the right side of assignment

mylist = [10, 20, 30]
mylist.append(40)

mylist

[10, 20, 30, 40]

In [81]:
mylist.pop()

40

In [82]:
mylist

[10, 20, 30]

# Tuples

Tuples are for use as Python's structs/records. The idea is that if you have fields of different types, you'll use a tuple. When you retrieve from a database, you will often get a list of tuples -- a list, because you have many tuples of the same type, and tuples, because the fields/columns are different types.

I don't use tuples very much -- but Python does, behind the scenes, and there are many people who do.

Because tuples are immutable, they don't need that extra space in a list. So they're more compact than lists. They're by far the most optimized/smallest data structure in Python.

When you invoke a function, the arguments are passed as a tuple.

In [83]:
person = ('Reuven', 'Lerner', 46)

In [84]:
person[0]

'Reuven'

In [85]:
person[1]

'Lerner'

In [86]:
person[2]

46

In [87]:
# I don't like this, because I don't want to have think about/remember which numeric
# index goes with which field.

# enter named tuples!

In [88]:
from collections import namedtuple

In [93]:
# I'm going to create a new Person class, using namedtuple. The new class
# will be a subclass of tuple.

# every class needs to have a __name__ (its name, as a string). Here, we 
# provide that name as the first argument to namedtuple.

# the second argument is a string (separated by whitespace) or a list of strings,
# either way, the field names you want for your named tuple

Person = namedtuple('Person', 'first last shoesize')

In [94]:
type(Person)

type

In [95]:
Person.__bases__   # who does Person inherit from?

(tuple,)

In [96]:
# let's create a person!

p = Person('Reuven', 'Lerner', 46)

In [97]:
p[0]

'Reuven'

In [98]:
p[1]

'Lerner'

In [99]:
p[2]

46

In [100]:
p.first

'Reuven'

In [101]:
p.last

'Lerner'

In [102]:
p.shoesize

46

In [103]:
p.first = 'asdfafaf'

AttributeError: can't set attribute

In [104]:
# there is a way to change a value (sort of) on a namedtuple
# we can invoke the _replace method, passing keyword arguments with the new values.

p._replace(first='asdfsafa')

Person(first='asdfsafa', last='Lerner', shoesize=46)

In [105]:
# regular tuples -- a reminder

t = (10, 20, 30)
type(t)

tuple

In [106]:
t = (10, 20)
type(t)

tuple

In [107]:
t = (10)   # HUH?  
type(t)

int

In [108]:
t = ()
type(t)

tuple

In [109]:
# because we use () for so many things in Python, if we want a one-element tuple, we need
# to help Python resolve the ambiguity.

# remember that we can use () for priority in math expressions (for one)

4 + 5 * 6

34

In [110]:
(4 + 5) * 6

54

In [111]:
# for a one-element tuple, you *must* have a comma

t = (10,)
type(t)

tuple

In [112]:
# what about this?

(4 + 5,) * 6

(9, 9, 9, 9, 9, 9)

In [113]:
# do I need parentheses to create a tuple? No!

t = 10, 20, 30

type(t)

tuple

In [114]:
# tuple unpacking

mylist = [10, 20, 30]

x,y,z = mylist   # tuple unpacking, because we have a tuple of variables on the *left* -- parallel assignment

In [115]:
x

10

In [116]:
y

20

In [117]:
z

30

In [118]:
# what if we have the wrong number of variables/values?

x,y = mylist

ValueError: too many values to unpack (expected 2)

In [119]:
w,x,y,z = mylist

ValueError: not enough values to unpack (expected 4, got 3)

In [120]:
# if you want, one (just one!) of the variables in the unpacking can have a * before its name
# in that case, it's a list, containing all of the values that didn't "fit" into the other variables

mylist = [10, 20, 30, 40, 50, 60, 70]

x, *y, z = mylist

In [121]:
x

10

In [122]:
y

[20, 30, 40, 50, 60]

In [123]:
z

70

In [124]:
x,y,*z = mylist

In [125]:
z

[30, 40, 50, 60, 70]

In [126]:
x,*y,*z = mylist

SyntaxError: multiple starred expressions in assignment (605520719.py, line 1)

In [128]:
# can I create a tuple of lists?  YES!

t = ([10, 20, 30],
     [100, 200, 300])
t

([10, 20, 30], [100, 200, 300])

In [129]:
# Can I modify the lists?

t[0].append(40)
t

([10, 20, 30, 40], [100, 200, 300])

In [130]:
# what happens when I do this?

t[0] += [50, 60, 70]      # this is translated into the .__iadd__ method ("inplace add")

TypeError: 'tuple' object does not support item assignment

In [131]:
t

([10, 20, 30, 40, 50, 60, 70], [100, 200, 300])

In [132]:
# instead, you can always use the "extend" method on a list, which doesn't have this issue

t[0].extend([80, 90, 100])
t

([10, 20, 30, 40, 50, 60, 70, 80, 90, 100], [100, 200, 300])

In [133]:
# _ is used for throwaway variables

for _ in range(3):
    print('Hello')

Hello
Hello
Hello


In [134]:
first, *_, last = [10, 20, 30, 40, 50, 60]

In [136]:
mylist[2:5]

[30, 40, 50]

# Next up

1. Practice with named tuples (and friends)
2. Dictionaries and their variants

Resume at :55

# Exercise: Bookstore

1. Use `namedtuple` to create a class of `Book`. Each instance of book will have a title, author, and price.
2. Create 3-4 different instances of `Book`, and put them on a list, the inventory for a store.
3. Allow a customer to ask whether a book is in stock, by entering its title.
    - If the user enters an empty string, then stop asking and print the title cost of all books they've bought
    - If the user enters the name of a book in the inventory, print its full info and the current total
    - If the user enters the name of a book *not* in the inventory, scold the user

Example:

    What book: title1
    title 1, by Author1, is 50, total is now 50
    What book: title2
    title 2, by Author 2, is 60, total is now 110
    

In [142]:
from collections import namedtuple

Book = namedtuple('Book', 'title author price')

b1 = Book('title1', 'author1', 50)
b2 = Book('title2', 'author1', 60)
b3 = Book('title3', 'author2', 70)
b4 = Book('title4', 'author3', 80)

inventory = [b1, b2, b3, b3]
total = 0

while True:
    s = input('Enter title: ').strip()

    if not s:  # all strings are True in a boolean context, except the empty string... this is the Pythonic way to check
        break

    found_it = False
    for one_book in inventory:
        if one_book.title == s:
            total += one_book.price
            print(f'Found {one_book.title} by {one_book.author}; cost is {one_book.price} and {total=}')
            found_it = True
            break

    if not found_it:
        print(f'Did not find {s} in our inventory')

print(f'In the end, {total=}')

Enter title:  title1


Found title1 by author1; cost is 50 and total=50


Enter title:  title9


Did not find title9 in our inventory


Enter title:  


In the end, total=50


In [138]:
b1

Book(title='title1', author='author1', price=50)

In [140]:
x = 10
y = [10, 20, 30]
z = 'hello'

print(f'{x=}, {y=}, {z=}, {len(z)=}')

x=10, y=[10, 20, 30], z='hello', len(z)=5


In [None]:
# let's remove the need for found_it as a variable -- using "for-else"

from collections import namedtuple

Book = namedtuple('Book', 'title author price')

b1 = Book('title1', 'author1', 50)
b2 = Book('title2', 'author1', 60)
b3 = Book('title3', 'author2', 70)
b4 = Book('title4', 'author3', 80)

inventory = [b1, b2, b3, b3]
total = 0

while True:
    s = input('Enter title: ').strip()

    if not s:  # all strings are True in a boolean context, except the empty string... this is the Pythonic way to check
        break

    for one_book in inventory:
        if one_book.title == s:
            total += one_book.price
            print(f'Found {one_book.title} by {one_book.author}; cost is {one_book.price} and {total=}')
            break

    else:   # else on a for loop means: run this code if you didn't encounter a break
        print(f'Did not find {s} in our inventory')

print(f'In the end, {total=}')

In [146]:
# let's cut down the size of our "while" loop

from collections import namedtuple

Book = namedtuple('Book', 'title author price')

b1 = Book('title1', 'author1', 50)
b2 = Book('title2', 'author1', 60)
b3 = Book('title3', 'author2', 70)
b4 = Book('title4', 'author3', 80)

inventory = [b1, b2, b3, b3]
total = 0

# the := operator is the assignment expression operator
# it means: assign, and return the value as expression
# everyone calls it the "walrus operator"
while s := input('Enter title: ').strip():

    for one_book in inventory:
        if one_book.title == s:
            total += one_book.price
            print(f'Found {one_book.title} by {one_book.author}; cost is {one_book.price} and {total=}')
            break

    else:   # else on a for loop means: run this code if you didn't encounter a break
        print(f'Did not find {s} in our inventory')

print(f'In the end, {total=}')

SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='? (3905101370.py, line 18)

In [145]:
x := 5

SyntaxError: invalid syntax (4101523498.py, line 1)

# Dictionaries

Dicts are the most important data structure in Python! Python itself uses dicts everywhere:

- Every object is a dict
- Every namespace (module or set of attributes) is a dict
- Every variable is actually a key-value pair in a dict

Some ground rules for dicts:
- Every key has a value, every value has a key
- Keys must be immutable
- Keys must be unique
- Values can be absolutely anything -- any type, any repetition, etc.
- We get values via keys, not vice versa

In [148]:
d = {'a':100, 'b':200, 'c':300}

len(d)    # how many name-value pairs do we have?

3

In [149]:
d['a'] = 2345    # updating is done via assignment
d

{'a': 2345, 'b': 200, 'c': 300}

In [150]:
d['x'] = 9999   # adding a new key-value pair is done via assignment
d

{'a': 2345, 'b': 200, 'c': 300, 'x': 9999}

In [151]:
d['a']

2345

In [152]:
# what if I want to let the user specify a key?
# I could do this:

while True:
    k = input('Enter a key: ').strip()

    if not k:
        break
        
    # these lines, where if the key exists, we want the value but if it doesn't,
    # then we want some default value/message, happens all of the time
    if k in d:
        print(f'd[{k}] is {d[k]}')
    else:
        print(f'd has no key {k}')

Enter a key:  a


d[a] is 2345


Enter a key:  b


d[b] is 200


Enter a key:  c


d[c] is 300


Enter a key:  d


d has no key d


Enter a key:  e


d has no key e


Enter a key:  a


d[a] is 2345


Enter a key:  


In [153]:
# we can instead use dict.get, which is just like []
# except that if the key doesn't exist, we get None back

d.get('a')

2345

In [154]:
d.get('xyz')

In [155]:
d['xyz']

KeyError: 'xyz'

In [156]:
# there's another method, the opposite of dict.get, dict.setdefault
# it only sets a value if the key is new; if the key already exists, then it does nothing

d.setdefault('y', 123)

123

In [157]:
d

{'a': 2345, 'b': 200, 'c': 300, 'x': 9999, 'y': 123}

In [159]:
d.setdefault('y', 246)   # we get 123 back because 'y' already exists as a key, so we won't set 246 or create a new pair

123

In [160]:
d.pop('x')  # this means: remove key-value pair with 'x' as the key, returning the value

9999

In [161]:
d

{'a': 2345, 'b': 200, 'c': 300, 'y': 123}

In [163]:
# if you want to create a new dict whose keys you know and whose values are
# all the same, use dict.fromkeys

d = dict.fromkeys('abcd', 0)
d

{'a': 0, 'b': 0, 'c': 0, 'd': 0}

In [164]:
# don't pass a value, and it'll be None
d = dict.fromkeys('abcd')
d

{'a': None, 'b': None, 'c': None, 'd': None}

In [165]:
d = dict.fromkeys('abcd', [])
d

{'a': [], 'b': [], 'c': [], 'd': []}

In [166]:
d['a'].append(10)
d['b'].append(20)
d['c'].append(30)

d

{'a': [10, 20, 30], 'b': [10, 20, 30], 'c': [10, 20, 30], 'd': [10, 20, 30]}

In [168]:
# the = sign (assignment) does two different things in Python
# 1. assignment to a variable, where the variable then refers to a new object

x = 100
y = x

x = 200   # here, I assign a new value to x; this has no effect on y
y

100

In [169]:
# 2. mutation of an existing value

x = [10, 20, 30]
y = x   # now, both x and y refer to the same value

x[0] = '!'   # here, I'm changing the object to which x refers, which y also refers to 
y

['!', 20, 30]

In [170]:
d = dict.fromkeys('abcd')
d['a'] = 100
d['b'] = 200
d

{'a': 100, 'b': 200, 'c': None, 'd': None}

In [171]:
d = dict.fromkeys('abcd', [])
d['a'].append(100)  # I'm not assigning a new value to a; I'm mutating the list to which it refers
d['b'].append(200)  # again, I'm mutating the value to which d['b'] refers

d

{'a': [100, 200], 'b': [100, 200], 'c': [100, 200], 'd': [100, 200]}

In [172]:
d = dict.fromkeys('these are my keys for the example'.split(), 9999)

In [173]:
d


{'these': 9999,
 'are': 9999,
 'my': 9999,
 'keys': 9999,
 'for': 9999,
 'the': 9999,
 'example': 9999}

In [174]:
val = 1000
d = dict.fromkeys('abcd', val)
d

{'a': 1000, 'b': 1000, 'c': 1000, 'd': 1000}

In [176]:
val = 20
d

{'a': 1000, 'b': 1000, 'c': 1000, 'd': 1000}

In [177]:
d['b'] = 99
d

{'a': 1000, 'b': 99, 'c': 1000, 'd': 1000}

In [179]:
# how can I search in a dict?
# If I want to know if a key is in a dict, I can use "in"
# "in" only checks the keys!

'a' in d

True

In [181]:
# could I do this:

'a' in d.keys()   # this gives exactly the same answer, but much more slowly

True

In [182]:
%timeit 'a' in d

39.7 ns ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [183]:
%timeit 'a' in d.keys()

92.8 ns ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [184]:
len(globals())

364

In [186]:
%timeit 'x' in globals()

56 ns ± 2.29 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [187]:
%timeit 'x' in globals().keys()

109 ns ± 1.95 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [188]:
d.keys()

dict_keys(['a', 'b', 'c', 'd'])

# Exercise: Rainfall

1. Create an empty dict; it'll eventually contain keys (strings, names of cities) and values (lists of integers).
2. Ask the user, repeatedly, to enter the name of a city.
    - If they enter the empty string, then stop asking.
3. If you got the name of a city, ask for the rainfall in mm.
    - Check that you got something reasonable here
4. If this is the first time you're seeing a city name, then add the key-value pair (city name and mm rain in a list) to the dict.
5. If this is not the first time, then just add the mm rain to the end of the existing list.
6. When the user enters an empty string to exit from entering data, go through each city and print the city name, the total rainfall, and the mean rainfall.

Example:

    City: a
    Rainfall: 5
    City: b
    Rainfall: 4
    City: a
    Rainfall: 3
    City: [ENTER]
    a: total 8, mean 4
    b: total 4, mean 4

In [189]:
rainfall = {}

while True:
    city_name = input('City: ').strip()

    if not city_name:
        break

    mm_rain_s = input('Rain: ').strip()
    mm_rain = int(mm_rain_s)

    # if there is no city_name key, add it with an empty list
    rainfall.setdefault(city_name, [])
    rainfall[city_name].append(mm_rain)

for key, value in rainfall.items():
    print(f'{key}: total {sum(value)}, mean {sum(value)/len(value)}')

City:  a
Rain:  5
City:  b
Rain:  4
City:  a
Rain:  3
City:  b
Rain:  8
City:  c
Rain:  2
City:  


a: total 8, mean 4.0
b: total 12, mean 6.0
c: total 2, mean 2.0


In [190]:
rainfall

{'a': [5, 3], 'b': [4, 8], 'c': [2]}

In [192]:
# how does it work to iterate over dict.items? 

for one_thing in rainfall.items():
    key, value = one_thing    # tuple unpacking
    print(f'{key}: {value}')

a: [5, 3]
b: [4, 8]
c: [2]


In [193]:
for key, value in rainfall.items():
    print(f'{key}: {value}')

a: [5, 3]
b: [4, 8]
c: [2]


In [194]:
# what if we want to number the key-value pairs, too?
# we can use enumerate, which returns a tuple of (index, value)

for one_thing in enumerate(rainfall.items()):
    print(one_thing)

(0, ('a', [5, 3]))
(1, ('b', [4, 8]))
(2, ('c', [2]))


In [196]:
for index, (key, value) in enumerate(rainfall.items(), 1):
    print(f'{index}: {key}, total {sum(value)}, mean {sum(value)/len(value)}')

1: a, total 8, mean 4.0
2: b, total 12, mean 6.0
3: c, total 2, mean 2.0


In [197]:
# how can we be sure that we got integers?

rainfall = {}

while True:
    city_name = input('City: ').strip()

    if not city_name:
        break

    mm_rain_s = input('Rain: ').strip()
    if not mm_rain.isdigit():
        print(f'Not numeric; try again')
        continue
    mm_rain = int(mm_rain_s)

    # if there is no city_name key, add it with an empty list
    rainfall.setdefault(city_name, [])
    rainfall[city_name].append(mm_rain)

for key, value in rainfall.items():
    print(f'{key}: total {sum(value)}, mean {sum(value)/len(value)}')

City:  a
Rain:  none today


ValueError: invalid literal for int() with base 10: 'none today'

In [198]:
# how can we be sure that we got integers?

rainfall = {}

while city_name := input('City: ').strip()

    mm_rain_s = input('Rain: ').strip()
    try:
        mm_rain = int(mm_rain_s)
    except ValueError:
        print(f'{mm_rain_s} is not numeric; try again')
        continue

    # if there is no city_name key, add it with an empty list
    rainfall.setdefault(city_name, [])
    rainfall[city_name].append(mm_rain)

for key, value in rainfall.items():
    print(f'{key}: total {sum(value)}, mean {sum(value)/len(value)}')

City:  a
Rain:  5
City:  a
Rain:  -3
City:  


a: total 2, mean 1.0


In [None]:
# Pydantic 

# Next up

1. How are dicts implemented? (Old and new)
2. Some dict variants
3. Functions
    - Arguments and parameters
    - Scoping
  
Resume at 13:30 Paris Time

# Dicts

For a long time, we knew two things about dicts:

1. They used a ton of memory
2. You didn't know in what order the keys and values were stored



In [199]:
d = {}
d['a'] = 10

In [201]:
# to know where we put this key-value pair ('a' - 10), we run hash on the key
hash('a') % 8

3

In [202]:
'a' in d

True

In [203]:
d['a']

10

In [204]:
d['b'] = 20
hash('b') % 8

3

In [205]:
'b' in d

True

In [206]:
d['c'] = 30
hash('c') % 8

4

In [207]:
hash('d') % 8

2

In [208]:
d = {}
d['a'] = 10

In [209]:
hash('a') % 8

3

In [210]:
d['b'] = 20
hash('b') % 8

3

In [211]:
d['c'] = 30
hash('c') % 8

4

In [212]:
# variants on dictionaries
# first: defaultdict

# if the key isn't in the dict, then you can get a default value back, rather than a KeyError exception

from collections import defaultdict

d = defaultdict(0)

TypeError: first argument must be callable or None

In [213]:
# the argument to defaultdict is a function/class (not calling it, passing it)
# when we request a value based on a key, if the key doesn't yet exist, then the 
# function is invoked, the value is set, and then is returned

d = defaultdict(int)    # if I call int(), I get 0

d['a']    # I asked for d['a'], but 'a' wasn't a key. So defaultdict said d['a'] = int()  

0

In [214]:
d

defaultdict(int, {'a': 0})

In [215]:
d['a']

0

In [216]:
# if I ask for d['b'], and there is no key 'b', then 
# defaultdict will set d['b'] = int(), and then return d['b']

d['b']

0

In [217]:
d

defaultdict(int, {'a': 0, 'b': 0})

In [219]:
# any function/class without arguments can be passed

import time
time.time()   # number of seconds since 1.1.1970 at midnight

1721649493.017952

In [220]:
d = defaultdict(time.time)  # notice, we don't invoke the function! We let defaultdict do that for us

d['a']

1721649509.7191029

In [221]:
d['b']

1721649511.635984

In [222]:
d['c']

1721649513.359512

In [223]:
d

defaultdict(<function time.time>,
            {'a': 1721649509.7191029,
             'b': 1721649511.635984,
             'c': 1721649513.359512})

In [224]:
d['username']

1721649532.289632

In [225]:
d

defaultdict(<function time.time>,
            {'a': 1721649509.7191029,
             'b': 1721649511.635984,
             'c': 1721649513.359512,
             'username': 1721649532.289632})

In [226]:
# now I have an auto-growing two-level tree structure

d = defaultdict(dict)


In [227]:
d['a']['b'] = 100

In [228]:
d['x']['y'] = 200
d['a']['c'] = 300
d

defaultdict(dict, {'a': {'b': 100, 'c': 300}, 'x': {'y': 200}})

In [229]:
# rainfall with defaultdict

from collections import defaultdict

rainfall = defaultdict(list)

while True:
    city_name = input('City: ').strip()

    if not city_name:
        break

    mm_rain_s = input('Rain: ').strip()
    if not mm_rain.isdigit():
        print(f'Not numeric; try again')
        continue
    mm_rain = int(mm_rain_s)

    rainfall[city_name].append(mm_rain)

for key, value in rainfall.items():
    print(f'{key}: total {sum(value)}, mean {sum(value)/len(value)}')

City:  


In [230]:
d = {'a':10, 'b':20, 'c':30}

d['q']

KeyError: 'q'

# Exercise: Travel

The goal is to create a dict in which the keys are countries you've visited, and the values are lists of cities you've visited in each country.

1. Ask the user to enter a place they've visited in the form of `city, country`
2. Break that apart, and make sure that the country is the dict key, and the city is added to the list for that country.
    - If the user enters bad data, scold them and let them try again
    - If the user enters an empty string, stop asking
3. Iterate over the dict, printing each country and then each of the cities for that country.

Example:

    Enter a place: Chicago, USA
    Enter a place: Boston, USA
    Enter a place: Shanghai, China
    Enter a place: Beijing, China
    Enter a place: [ENTER]

    China
        Shanghai
        Beijing
    USA
        Chicago
        Boston

In [231]:
from collections import defaultdict

all_places = defaultdict(lixst)

while s := input('Enter a place: ').strip():

    if s.count(',') != 1:
        print(f'Use format of "CITY, COUNTRY"')
        continue

    city, country = s.split(',')
    all_places[country.strip()].append(city.strip())

for country, all_cities in all_places.items():
    print(country)
    for one_city in all_cities:
        print(f'\t{one_city}')

Enter a place:  Chicago, USA
Enter a place:  Boston, USA
Enter a place:  Beijing, China
Enter a place:  Shanghai, China
Enter a place:  


USA
	Chicago
	Boston
China
	Beijing
	Shanghai


In [232]:
# Counter

from collections import Counter

c = Counter()
c['a'] += 5
c['b'] += 8
c['c'] += 3
c['b'] += 2

c

Counter({'b': 10, 'a': 5, 'c': 3})

In [233]:
# the idea of Counter is to pass it an iterable (string, list, tuple, etc.)
# Counter will count how often each element appears in the iterable
# the elements will become the keys, and the counts will be the values

c = Counter([10, 20, 30, 20, 30, 40, 20, 30, 40, 50, 20, 30, 40, 50, 60])
c

Counter({20: 4, 30: 4, 40: 3, 50: 2, 10: 1, 60: 1})

In [234]:
c = Counter('this is a bunch of text and I wish I knew which letters appeared most often')

In [235]:
c

Counter({' ': 15,
         't': 7,
         'e': 7,
         'h': 5,
         's': 5,
         'i': 4,
         'a': 4,
         'n': 4,
         'o': 3,
         'w': 3,
         'c': 2,
         'f': 2,
         'd': 2,
         'I': 2,
         'r': 2,
         'p': 2,
         'b': 1,
         'u': 1,
         'x': 1,
         'k': 1,
         'l': 1,
         'm': 1})

In [236]:
c.most_common()

[(' ', 15),
 ('t', 7),
 ('e', 7),
 ('h', 5),
 ('s', 5),
 ('i', 4),
 ('a', 4),
 ('n', 4),
 ('o', 3),
 ('w', 3),
 ('c', 2),
 ('f', 2),
 ('d', 2),
 ('I', 2),
 ('r', 2),
 ('p', 2),
 ('b', 1),
 ('u', 1),
 ('x', 1),
 ('k', 1),
 ('l', 1),
 ('m', 1)]

In [237]:
c.most_common(5)

[(' ', 15), ('t', 7), ('e', 7), ('h', 5), ('s', 5)]

In [238]:
c += c

In [239]:
c

Counter({' ': 30,
         't': 14,
         'e': 14,
         'h': 10,
         's': 10,
         'i': 8,
         'a': 8,
         'n': 8,
         'o': 6,
         'w': 6,
         'c': 4,
         'f': 4,
         'd': 4,
         'I': 4,
         'r': 4,
         'p': 4,
         'b': 2,
         'u': 2,
         'x': 2,
         'k': 2,
         'l': 2,
         'm': 2})

In [240]:
c = Counter()

for one_line in open('/Users/reuven/Courses/Current/Data/alice-in-wonderland.txt'):
    c += Counter(one_line)

c.most_common(20)

[(' ', 12387),
 ('e', 6671),
 ('t', 5152),
 ('o', 4242),
 ('a', 4058),
 ('i', 3547),
 ('n', 3530),
 ('h', 3122),
 ('r', 3111),
 ('s', 3057),
 ('d', 2245),
 ('l', 2221),
 ('\n', 1702),
 ('u', 1634),
 ('c', 1435),
 ('g', 1230),
 ('w', 1165),
 ('m', 1064),
 ('f', 1061),
 ('y', 985)]

In [243]:
for key, value in c.items():
    print(f'{key}: {value}')

﻿: 1
T: 241
h: 3122
e: 6671
 : 12387
P: 140
r: 3111
o: 4242
j: 129
c: 1435
t: 5152
G: 122
u: 1634
n: 3530
b: 742
g: 1230
E: 187
B: 62
k: 509
f: 1061
A: 352
l: 2221
i: 3547
W: 103
d: 2245
a: 4058
,: 927
y: 985
L: 90
w: 1165
s: 3057
C: 100

: 1702
m: 1064
v: 389
.: 621
Y: 58
p: 849
-: 287
:: 40
I: 375
R: 133
D: 120
1: 64
2: 10
0: 24
6: 9
[: 27
#: 1
9: 15
3: 21
]: 27
*: 33
S: 159
O: 116
F: 92
H: 92
J: 14
U: 53
N: 113
K: 42
/: 31
_: 150
": 710
': 260
V: 20
M: 79
&: 2
?: 70
(: 33
): 33
!: 176
;: 78
z: 34
x: 70
q: 62
ù: 1
Q: 34
X: 4
8: 11
7: 6
4: 9
5: 12
%: 1
@: 2
$: 2


In [249]:
# I want to know which 5 IP addresses accessed my system most often in our logfile

Counter( 
    [one_line.split()[0]    # list comprehension
    for one_line in open('/Users/reuven/Courses/Current/Data/mini-access-log.txt')]
).most_common(5)

[('66.249.65.38', 100),
 ('66.249.65.12', 32),
 ('89.248.172.58', 22),
 ('67.195.112.35', 16),
 ('66.249.71.65', 3)]

In [250]:
def poor_man_counter(iterable):
    output = {}

    for one_item in iterable:
        output.setdefault(one_item, 0)
        output[one_item] += 1

    return output

In [251]:
# OrderedDict

from collections import OrderedDict

In [252]:
od = OrderedDict(a=10, b=20, c=30)
od

OrderedDict([('a', 10), ('b', 20), ('c', 30)])

In [253]:
# regular dicts are considered equal if all key-value pairs are equal

d1 = {'a':10, 'b':20, 'c':30}
d2 = {'b':20, 'a':10, 'c':30}

In [254]:
d1 == d2

True

In [255]:
# OrderedDicts are only equal if their order, as well as pairs, are equal
od1 = OrderedDict(a=10, b=20, c=30)
od2 = OrderedDict(b=20, a=10, c=30)

od1 == od2

False

# Functions

The first thing to realize about defining functions in Python is that when we use `def`, we're really doing two things:

1. We're creating a function object
2. We're assigning that function object to a variable



In [256]:
def hello():
    return 'Hello'

In [257]:
# which variable has been assigned the function?
# answer: hello

type(hello)

function

In [258]:
hello = 5

In [259]:
hello()

TypeError: 'int' object is not callable

In [260]:
def hello():
    return 'Hello'

In [261]:
# there is a __code__ object under hello
# that's where the real brains of our function are

dir(hello.__code__)

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_co_code_adaptive',
 '_varname_from_oparg',
 'co_argcount',
 'co_cellvars',
 'co_code',
 'co_consts',
 'co_exceptiontable',
 'co_filename',
 'co_firstlineno',
 'co_flags',
 'co_freevars',
 'co_kwonlyargcount',
 'co_lines',
 'co_linetable',
 'co_lnotab',
 'co_name',
 'co_names',
 'co_nlocals',
 'co_positions',
 'co_posonlyargcount',
 'co_qualname',
 'co_stacksize',
 'co_varnames',
 'replace']

In [262]:
# inside of the function
# inside of its __code__ attribute
# we have a bunch of co_* attributes, which are hints/info for running the function

In [263]:
# we can look at the bytecodes that Python has compiled our function into

import dis   # Python disassembler

dis.dis(hello)

  1           0 RESUME                   0

  2           2 RETURN_CONST             1 ('Hello')


In [264]:
# part of the function object is the co_consts attribute
# it's a tuple containing all of the function's constants
# meaning: values that were stored when the function was defined

hello.__code__.co_consts

(None, 'Hello')

In [265]:
hello()

'Hello'

In [266]:
# what if I call the function with an argument?

hello('world')

TypeError: hello() takes 0 positional arguments but 1 was given

In [270]:
# how does Python know that the 'hello' function takes 0 arguments?
# the answer is: the co_argcount attribute

hello.__code__.co_argcount

0

In [271]:
def hello(name):
    return f'Hello, {name}!'

In [272]:
dis.dis(hello)

  1           0 RESUME                   0

  2           2 LOAD_CONST               1 ('Hello, ')
              4 LOAD_FAST                0 (name)
              6 FORMAT_VALUE             0
              8 LOAD_CONST               2 ('!')
             10 BUILD_STRING             3
             12 RETURN_VALUE


In [273]:
hello.__code__.co_consts

(None, 'Hello, ', '!')

In [274]:
hello.__code__.co_varnames

('name',)

In [275]:
hello()   # can I do this?

TypeError: hello() missing 1 required positional argument: 'name'

In [276]:
hello('world')

'Hello, world!'

In [277]:
hello([10, 20, 30])

'Hello, [10, 20, 30]!'

In [278]:
hello({'a':10, 'b':20})

"Hello, {'a': 10, 'b': 20}!"

In [279]:
hello(hello)

'Hello, <function hello at 0x10d962de0>!'

In [280]:
def return_5():
    return 5

In [281]:
dis.dis(return_5)

  1           0 RESUME                   0

  2           2 RETURN_CONST             1 (5)


In [282]:
def return_twice(n):
    return n * 2

In [283]:
dis.dis(return_twice)

  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (n)
              4 LOAD_CONST               1 (2)
              6 BINARY_OP                5 (*)
             10 RETURN_VALUE


In [284]:
def return_two(n):
    return n, n*2

In [285]:
dis.dis(return_two)

  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (n)
              4 LOAD_FAST                0 (n)
              6 LOAD_CONST               1 (2)
              8 BINARY_OP                5 (*)
             12 BUILD_TUPLE              2
             14 RETURN_VALUE


In [286]:
def return_a_list():
    mylist = [10, 20, 30]
    return mylist

In [287]:
dis.dis(return_a_list)

  1           0 RESUME                   0

  2           2 BUILD_LIST               0
              4 LOAD_CONST               1 ((10, 20, 30))
              6 LIST_EXTEND              1
              8 STORE_FAST               0 (mylist)

  3          10 LOAD_FAST                0 (mylist)
             12 RETURN_VALUE


In [293]:
def return_twice(n):
    output = n*2
    return n * 2

In [294]:
dis.dis(return_twice)

  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (n)
              4 LOAD_CONST               1 (2)
              6 BINARY_OP                5 (*)
             10 STORE_FAST               1 (output)

  3          12 LOAD_FAST                0 (n)
             14 LOAD_CONST               1 (2)
             16 BINARY_OP                5 (*)
             20 RETURN_VALUE


In [295]:
return_twice.__code__.co_varnames

('n', 'output')

In [301]:
for index, one_byte in enumerate(return_twice.__code__.co_code):
    if index % 2:
        print(f'\t{one_byte}')
    else:
        print(f'{index}\t{one_byte}')

0	151
	0
2	124
	0
4	100
	1
6	122
	5
8	0
	0
10	125
	1
12	124
	0
14	100
	1
16	122
	5
18	0
	0
20	83
	0


# Next up

1. Practice with functions
2. Arguments and parameters
3. A bit of scoping

Resume at :15

# Arguments and parameters

- Parameters are local variables that we declare in the first line of a program, and which get their values set via arguments, when the function is called.
- Arguments are values that are passed to the function when we invoke it, and which are then assigned to the parameters.

A big part of writing Python functions is understanding how arguments are assigned (mapped) to parameters.

In [302]:
# parameters:  name
# arguments:   'world'

def hello(name):
    return f'Hello, {name}!'

hello('world')   # this is a positional argument, called that because it's assigned to a parameter based on their positions

'Hello, world!'

In [303]:
# parameters:  name
# arguments:  'world'

def hello(name):
    return f'Hello, {name}!'

hello(name='world')   # this is a keyword argument, recognizable because it's name=value

'Hello, world!'

In [304]:
def add(first, second):
    return first + second

add(3, 4)  # both are positional

7

In [305]:
add(first=3, second=4)  # both are keyword

7

In [306]:
add(second=3, first=4)  # both are keyword, the order doesn't matter

7

In [307]:
add(3, second=4)  # can I do this? YES, so long as all positional come before all keyword

7

In [308]:
add(first=3, 4)

SyntaxError: positional argument follows keyword argument (2062583197.py, line 1)

In [314]:
# in order to provide us with flexible function calling, we can define a function with parameters
# that have default argument values. This effectively makes the parameters optional.

# in this function, first can take either positional or keyword arguments, and is mandatory
# but second, which can also take either positional or keyword arguments, is optional. If we don't pass a value, it'll be 5

# parameters: first   second
# arguments:   10      5

def add(first, second=5):
    return first + second

In [310]:
add(10, 3)

13

In [311]:
add(10)

15

In [312]:
add.__code__.co_argcount  # how many arguments does the function need?

2

In [313]:
# if we passed too few arguments for the argcount,
# the function grabs values from __defaults__

add.__defaults__

(5,)

# Our story so far

Our function can be defined with two different types of parameters:

1. Mandatory (positional or keyword arguments)
2. Optional, thanks to default argument values (positional or keyword arguments)

In [315]:
def add_one(x):
    x.append(1)
    return x

mylist = [10, 20, 30]
add_one(mylist)

mylist

[10, 20, 30, 1]

In [319]:
# now, let's give a default value

# parameters: x
# arguments: []

def add_one(x=[]):  # we intended to say: if we don't pass a list, use THIS empty one
    x.append(1)
    return x

add_one()

[1]

In [320]:
add_one.__defaults__

([1],)

In [321]:
# parameters: x
# arguments: [1]

add_one()

[1, 1]

In [322]:
add_one.__defaults__

([1, 1],)

In [323]:
add_one()

[1, 1, 1]

# NEVER USE MUTABLE DEFAULTS!

In [None]:
def add_one(x=None): 
    if x is None:
        x = []   # this is a new, empty list each time we run the function -- it's a runtime creation, not compile-time creation
    x.append(1)
    return x

add_one()

In [324]:
# let's say I want to write a function that adds a list of numbers

def mysum(numbers):
    total = 0

    for one_number in numbers:
        total += one_number

    return total

mysum([10, 20, 30])

60

In [325]:
# what if I don't want to pass a list?
# what if I want to pass just a bunch of numbers, as separate arguments?

def mysum(a, b, c):
    return a + b + c

mysum(10, 20, 30)

60

In [326]:
mysum(10, 20, 30, 40)

TypeError: mysum() takes 3 positional arguments but 4 were given

In [327]:
def mysum(a=0, b=0, c=0, d=0, e=0, f=0, g=0, h=0):
    return a + b + c + d + e + f + g + h

mysum(10, 20, 30)

60

In [328]:
mysum(10, 20, 30, 40, 50)

150

In [329]:
# what we want is a way to say, "I don't care how many arguments someone passes.
# I want to accept them all."

# the way we do that in Python is with *args, a parameter to our function
# that is a tuple whose values are positional arguments that no one parameter took.

def mysum(name, *numbers):   # numbers will be a tuple of numbers
    total = 0

    for one_number in numbers:
        total += one_number

    return f'{name}, the total is {total}'

    

In [330]:
mysum('Reuven', 10, 20, 30, 40)

'Reuven, the total is 100'

# Parameter types

1. Mandatory (positional or keyword)
2. Optional, thanks to argument defaults (positional or keyword)
3. `*args`, a tuple containing all remaining positional arguments that no parameter took

# Exercise: `all_lines`

1. Write a function that takes a filename (string) for output as its first argument, and then any number of additional filenames (strings) as additional arguments (input files).
2. Go through each input file, one line at a time, and write its contents to the output file (i.e., the first argument that was passed).
3. The result of calling the function will be one large output file whose contents are from each of the individual input files -- all of their lines.

Example:

    all_lines('output.txt', 'input1.txt', 'input2.txt', 'input3.txt')

The above will put all of the lines from `input1.txt`, `input2.txt`, and `input3.txt` (which we assume already exist) into the new output file `output.txt`.    

In [333]:
def all_lines(outfile, *args):
    with open(outfile, 'w') as f:
        for one_filename in args:
            print(f'Processing {one_filename}')
            for one_line in open(one_filename):
                f.write(one_line)    # write the current line from the input file to the output file

    # thanks to with, the output file will be flushed + closed 

In [334]:
all_lines('output.txt', '/etc/passwd', '/Users/reuven/.zshrc')

Processing /etc/passwd
Processing /Users/reuven/.zshrc


In [335]:
!cat output.txt

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33

In [336]:
# the glob module lets me get a list of filenames matching a particular pattern 

import glob

In [339]:
# get all files starting with m, n, s, l, or w
glob.glob('[mnslw]*.txt')

['mini-access-log.txt',
 'nums.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 'wcfile.txt']

In [340]:
all_lines('output.txt', 
          glob.glob('[mnslw]*.txt'))   # this won't work -- we need to pass strings, not a list of strings

Processing ['mini-access-log.txt', 'nums.txt', 'shoe-data.txt', 'linux-etc-passwd.txt', 'wcfile.txt']


TypeError: unhashable type: 'list'

In [341]:
# if I have a function that expects a few separate arguments, but I have
# them in a list, how can I remove the list brackets, and turn them into arguments?

# answer: * 
# if I put a * before the list value, then the elements of the list are turned into individual arguments

all_lines('output.txt', 
          *glob.glob('[mnslw]*.txt'))    # I unrolled the list we got back from glob.glob into a number of separate string args.

Processing mini-access-log.txt
Processing nums.txt
Processing shoe-data.txt
Processing linux-etc-passwd.txt
Processing wcfile.txt


# Tomorrow

1. `**kwargs` and other parameter types
2. Scoping (LEGB)
3. Enclosing functions
4. Dispatch tables
5. Comprehensions
6. Sorting and `lambda`

In [342]:
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)

In [344]:
factorial(15)

1307674368000

In [347]:
factorial(15000)

RecursionError: maximum recursion depth exceeded