# Lecture 10 

Today:
* Finishing up sequences:
  * Generators and the yield keyword
  * Generator expressions
* Modules:
  * Some useful modules
  * Hierarchical namespaces
  * Making your own modules
* The main() function
* PEP 8 (v. briefly)
* A revisit to debugging, now that we're writing longer programs:
  * Different error types (syntax, runtime, logical)


# Warm-Up Challenge

In [1]:
# 1. Write a function that takes n and returns a list of the first n squares

# REGULAR FUNCTION
# CALL IT, IT RETURNS A (BIG LIST) VALUE
def squares(n):
    lst = []
    for i in range(1, n+1 ): # 0,1,..., n-1, n
        lst.append( i*i )  # append wants an element
    return lst

# 2. Write a function find that takes a list and an element,
# and checks if the element occurs in the list
def find(lst, e):
    for i in lst:
        if i == e:
            return True
    return False
    
# Test case:
print(squares(15))
find( squares(1000), 25)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225]


True

# Range is Strange

In [12]:
# Recall that range can be used to iterate through a sequence of numbers:

for i in range(10):
  print(i, end = " ")

0 1 2 3 4 5 6 7 8 9 

In [2]:
# We can convert range to a list

list(range(10)) 

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

**But isn't range a list to start with?**

In [3]:
# No!

x = range(10) 

print(x)

range(0, 10)


In [4]:
# So what is the type of range:

x = range(10) # So what is a range? 

print(type(x))

<class 'range'>


Why not just represent a range as a list? In a word: memory.

In [5]:
x = list(range(100)) # This requires allocating memory to store 100 integers

print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [3]:
# This does not make the list, so the memory for the list is never allocated. 
x = range(100) 

print(x)

range(0, 100)


In [7]:
# This requires only the memory for j, i and the Python system

# Compute the sum of integers from 1 (inclusive) to 100 (exclusive)
j = 0
for i in range(100):
  j += i
  
print(j)

4950


In [8]:
# Alternatively, this requires memory for j, i and the list of 100 integers:

j = 0
for i in list(range(100)):
  j += i
  
print(j)

4950


* Range, as an iterator, is the promise to produce a sequence of integers, but this does not require they all exist in memory at the same time. 

* With a list, however, by definition, all the elements are present in memory.

* As a general guide, if we can be "lazy", and avoid ever building a complete sequence in memory, then we should be lazy about evaluation of sequences.

* So how do you code a function like range? This is where the "yield" keyword comes in, which allows you to create generator functions.

# Generators

## Regular function 
* Does all of its computation first, and then return one result at the end

## Generator function
* A function with a **yield** statement
* Produce a whole sequence of results, one at a time, using **yield**
* When there are no more results, the function finally does a **return** statement
* The sequence of results from a generator function can be processed with a for loop. That is, generator functions are iterable.

# Generator Example

In [5]:
## squares written as a generator (ie a function with yield)
# Returns each square, one at a time
# You can iterate over squares(n) in a for loop

def squaresGen(n):
    for i in range(1,n+1):
        yield i*i              # squaresGen is a generator function
    return

def find(collection,e):
    for i in collection:
        if i == e:
            return True 
    return False

# Test case:
print(squaresGen(10))

print(list(squaresGen(10)))  # convert sequence from squares into a list

for i in squaresGen(10): print(i, end=" ") # print sequence from squares
    
find( squaresGen(10), 25)

<generator object squaresGen at 0x14af31690>
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
1 4 9 16 25 36 49 64 81 100 

True

# Yield keyword

With *return* you exit a function completely, returning a value. The internal state of the function is lost.

Yield is like return, in that you return a value from the function and temporarily the function exits, however the state of the function is not lost, and the function can be resumed to return more values.

This allows a function to act like an iterator over a sequence, where the function incrementally yields values, one for each successive resumption of the function. 

It's easiest to understand this by example:

In [6]:
def functionThatYields():  # aka a Generator
    yield 1
    yield "two"
    yield 3.0
    return

for i in functionThatYields():
  print(i)

1
two
3.0


In [10]:
# What is the type?

x = functionThatYields(5) 

print(type(x)) 

<class 'generator'>


In [9]:
def make_numbers_list(m): # returns [0, 1, 2, ..., m-1]
  i = 0
  l = []
  while i < m:
    l.append(i)
    i += 1
  return l

for i in make_numbers_list(10):
  print(i, end=" ")

0 1 2 3 4 5 6 7 8 9 

In [10]:
def make_numbers_generator(m): # a generator for 0, 1, 2, ..., m-1
  i = 0
  while i < m:
    yield i   # yields 0, 1, 2, ..., m-1
    i += 1

for i in make_numbers_list(10):
  print(i, end=" ")

0 1 2 3 4 5 6 7 8 9 

Why use yield to write generator functions?:

* Shorter, cleaner code - here we saved all the messing around with lists
* More efficient in memory - we never have to construct the complete list in memory

# Generator Expressions

Like list comprehensions, but lazy

In [1]:
# Last lecture we covered list comprehensions, for example:

x = [ i**2 for i in range(1,11) ] # list of square numbers

print(x) 

# Same as ...
x = []
for i in range(1,11):  x.append(i**2)
    

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


In [3]:
# If we swap the square brackets for parentheses then we get a "generator expression"

y = (i**2 for i in range(1,11)) # square numbers as a generator expression

print(y)
print(list(y))

# Same as
def gen_numbers():
    for i in range(1,11): 
        yield i**2
y2 = gen_numbers()
print(list(y2))
        
# A generator expression creates a generator function but in less code:

<generator object <genexpr> at 0x1433a68f0>
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


In [5]:
# We don't need to include the parentheses in some cases
x = sum(i**2 for i in range(1,101)) # One liner to sum the first 100 square numbers
print(f"Sum of the first hundred square numbers {x}")

Sum of the first hundred square numbers 338350


The take-home: Generator expressions are just syntatic sugar for generator functions. They are generally less code, particularly when they can be used inline (like the sum example above), and avoid allocating all the memory that a list comprehension involves.

# Challenge 1

In [8]:
# Write a function cubes1 that that produces a *tuple* of the first n cubes
# 1,8,27,..., n**3

def cubes1(n):
    t = () # empty tuple
    for i in range(1,n+1):
        t = t + (i*i*i,)
    return t
#print(cubes1(10))

# Write a function cubes2 that that produces a *list* of the first n cubes
def cubes2(n):
    lst = []
    for i in range(1,n+1):
        lst = lst + [i*i*i]
    return lst 
#print(cubes2(10))
    
# Write a generator function cubes3 that produces the first n cubes
def cubes3(n):
    for i in range(1,n+1):  
        #print(f"About to yield {i*i*i}")
        yield i*i*i
    
#for x in cubes3(10): print(x)

# Write a list comprehension for the first 10 cubes
print( [  i*i*i for i in range(1,10+1) ] )

# Write a generator expression (aka comprehension) for the first 10 cubes
set( (  i*i*i for i in range(1,10+1) ) )

[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]


{1, 8, 27, 64, 125, 216, 343, 512, 729, 1000}

# Challenge 2

In [12]:
# Write a function count that takes in a list and an element e,
# and returns the number of times that e occurs in the list.

def count(list,e):
    ...
    

print(count( [1,2,3,2,1], 2))

# Next, try your count function on a tuple, generator, string, and set.
# Which ones would you expect to work?

print(count( (1,2,3,2,1), 2))  #?
print(count( range(10), 2))  #?
print(count( "a man a plan a canal panama", "a"))
print(count( { 1,2,3,2,1 }, 2))  #?



2
2
1
10
1


# Modules

* A language like Python has vast libraries of useful functions, classes, etc. 
  * See https://pypi.org/:
  * As of Dec 2020 there are over 270K different Python "packages" in PyPi.

* To make it possible to use these and ensure the namespace of our code does not explode in size, Python has a hierarchical system for managing these libraries using "modules" and "packages".

In [15]:
# From a user perspective, modules are variables, functions, objects etc. 
# defined separately to the code we're working on.

# The math module contains lots of math functions and constants
# This line "imports" the math module, so that we can refer to it
import math 

math.log10(100) # Now we're calling a function from the math module 

2.0

In [16]:
dir(math) # use dir to list the contents of an object or module

['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'comb',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'dist',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'isqrt',
 'lcm',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'nextafter',
 'perm',
 'pi',
 'pow',
 'prod',
 'radians',
 'remainder',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc',
 'ulp']

In [19]:
# Use help() to give you info 
# (Note: this is great to use in the interactive interpretor)

# e.g. get info on the math.log10 function
# this is pulling the doc string of the function
help(math.log10) 

Help on built-in function log10 in module math:

log10(...)



* In general, the Python standard library provides loads of useful modules for all sorts of things: https://docs.python.org/3/py-modindex.html 

* Standard library packages are installed as part of a default Python installation - they are part of every Python of that version (e.g. 3.XX)



* There is a much larger universe of open source Python packages you can install from: https://pypi.org/ 

# Challenge 2

In [2]:
# Use the median function from the statistics module to 
# calculate the median of the following list:
l = [ 1, 8, 3, 4, 2, 8, 7, 2, 6 ]

import statistics
statistics.median( l )

4

# Namespaces and dot notation

* **Namespace** is all the identifiers  available to a line of code 

* In Python (like most programming languages), namespaces are organized hierarchically into subpieces using modules and functions and classes. 

* If all identifiers were in one namespace without any hierarchy then we would get lots of collisions between names, and this would result in ambiguity. 

* The upshot is if you want to use a function from another module you need to import it into the "namespace" of your code and use '.' notation:

In [1]:
import math # Imports the math module into the current namespace

math.log10(100) # sqrt is a function that is in the math module


2.0

# Import statements

* As you've seen, to import a module just write "import x", where x is the module name.

**Import from**

* You can also import a specific function, class or object from a module into your program's namespace using the import from syntax:

In [3]:
from math import log10

log10(100) # Now log10 is a just a function in the current program's name space, 
# no dot notation required

2.0

If you want to import all the functions from a module you can use:

In [2]:
from math import * # Import all functions from math


# But, this is generally a BAD IDEA, because you need to be  sure
# this doesn't bring in things that will collide with other things
# used by the program

log10(100)
#etc.

2.0

More useful is the "as" modifier

In [8]:
from math import log10 as log_base_10
# This imports the sqrt function from math
# but names it square_root. Useful if you want to abbreviate a long function
# name, or if you want to import two separate things with the same name

log_base_10(100)

2.0

# Challenge 3

In [2]:
# Write a statement to import the 'sin' function from the 'math' module
# and compute the sin of 0
from math import sin
sin(0)

0.0

# Writing your own modules

You can write your own modules. 

* Create a file whose name is x.py, where x is the name of the module you want to create.

* Edit x.py to contain the stuff you want

* Create a new python file, call it y.py, in the same directory as x.py and
include "import x" at the top of y.py. 


In [1]:
import myadder

print(myadder.add([3,2,42]))

47


# Packages

Packages are collections of modules, organized hierarchically (and accessed using the dot notation).

Beyond the scope here, but you can look more at environment setup to create your own "packages". If you're curious see: https://docs.python.org/3/tutorial/modules.html#packages

# The main() function

* You may write a program and then want to reuse some of the functions by importing them into another program. In this case you are treating the original program as a module. 

* The problem is that when you import a module it is executed. 

* Question: How do you stop the original program from running when you import it as a module?

* Answer: By putting the logic for the program in a "main()", which is only called if the program is being run by user, not imported as a module.




In [1]:
# myadder.py

def add(c):
    r = 0
    for i in c:  r += i
    return r

def main():
    print("myadder loaded")
    assert add([1,2,3]) == 6
    print("All tests pass")
    print(f"__name__ is {__name__}")

if __name__ == '__main__': # This will only be true
    # when the program is executed by a user
    main()

myadder loaded
All tests pass
__name__ is __main__


In [3]:
import myadder
myadder.add([1,2,3])

6

# PEP8: Use Style


It is easy to rush and write poorly structured, hard-to-read code. 

Generally, this proves a false-economy, resulting in longer debug cycles, a larger maintenance burden (like, what was I thinking?) and less code reuse. 

Although many sins have nothing to do with the cosmetics of the code, some can be fixed by adopting a consistent, sane set of coding conventions. Python did this with Python Enhancement Proposal (PEP) 8:

https://www.python.org/dev/peps/pep-0008/

Some things PEP-8 covers:

* use 4 spaces (instead of tabs) for indentation - you can make your text editor do this (insert spaces for tabs)
* limit line length to 78 characters
* when naming identifiers, use CamelCase for classes (we’ll get to those) and lowercase_with_underscores for functions and variables
* place imports at the top of the file
* keep function definitions together
* use docstrings to document functions
* use two blank lines to separate function definitions from each other
* keep top level statements, including function calls, together at the bottom of the program

# Debugging Revisited

We mentioned earlier that a lot of programming is debugging. Now we're going to debug programs and understand the different errors you can get.

There are three principle types of error:
 - syntax errors
 - runtime errors
 - semantic/logical errors

# Syntax Errors

* when what you've written is not valid Python

In [27]:
# Syntax errors - when what you've written is not valid Python

for i in range(10)
  print(i) # What's wrong with this?

SyntaxError: invalid syntax (<ipython-input-27-d7c418e7e523>, line 3)

In [28]:
# Syntax errors - when what you've written is not valid Python

for i in range(10):
print(i) # What's wrong with this?

IndentationError: expected an indented block (<ipython-input-28-6f8b904aea77>, line 4)

In [29]:
# Syntax errors - when what you've written is not valid Python
for i in range(10):
  """ This loop will print stuff ""
  print(i)


SyntaxError: EOF while scanning triple-quoted string literal (<ipython-input-29-d159ec2d490f>, line 4)

In [30]:
# Syntax errors - when what you've written is not valid Python 
# (note, this kind of print statement was legal in Python 2.XX and earlier)

print "Forgetting parentheses"

SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Forgetting parentheses")? (<ipython-input-30-b32860499df0>, line 4)

# Runtime Errors

* when the program crashes during runtime because it 
 tries to do something invalid

In [31]:
# Runtime errors - when the program errors out during runtime because it 
# tries to do something invalid

print("This is an integer: " + 10)

TypeError: can only concatenate str (not "int") to str

In [32]:
# Runtime errors - when the program errors out during runtime because it 
# tries to do something invalid

assert 1 + 1 == 3

AssertionError: 

# Semantic Errors (aka Logical Errors)

* when the program runs and exits without error, but produces an unexpected result

In [1]:
# Semantic errors - when the program runs and exits without error, 
# but produces an unexpected result

j = int(input("Input a number: "))

x = 1
for i in range(1, j):  # 1,2,3,..., j-1
    #print(i)
    x = x * i
  
print(f"{j} factorial is {x}")

Input a number: 4
4 factorial is 6


In my experience 
* syntax errors are easy to fix, 
* runtime errors are generally solvable fast, but 
* semantic errors can take the longest time to fix

**Debug strategies**

To debug a failing program, you can:
  * Use print statements dotted around the code to figure out what code is doing at specific points of time (remember to remove / comment these out when you're done!)
  * Use a debugger - this allows you to step through execution, line-by-line, seeing what the program is up to at each step. (PyCharm has a nice interface to the Python debugger)
  * Write tests for individual parts of the code
  * Use assert to check that expected properties are true during runtime
  * Stare hard at it! Semantic errors will generally require you to question your program's logic.

# Challenge 4

See if you can get this to work:

In [4]:
# Try debugging the following - a number guessing program

print("Think of a number from 1 to 100")
min = 1
max = 100

while min < max:
    i = (min + max) // 2
    answer = input(f"Is your number greater than {i}? Type YES or NO: ")
    assert answer == "YES" or answer == "NO" # Check the value is what we expect
  
    if answer == "YES"
        max = i
    else
        max = i

print(f"Your number is {min}")
  
  

Think of a number from 1 to 100
Is your number greater than 50? Type YES or NO: YES
Is your number greater than 75? Type YES or NO: NO
Is your number greater than 63? Type YES or NO: YES
Is your number greater than 69? Type YES or NO: NO
Is your number greater than 66? Type YES or NO: YES
Is your number greater than 68? Type YES or NO: NO
Is your number greater than 67? Type YES or NO: NO
Your number is 67


# Challenge 5

See if you can get this to work:

In [6]:
# Write a function mostFreq that takes in a list
# and returns the element that occurs most often in the list
# You could use your count function from the start of class

def count(list,e): return sum(1 for x in list if x==e)

def mostFreq(list):
    mostCommonElem = None
    mostCommonCount = 0
    for x in list:             
        countX = count(list,x) 
        if ...
        
    return mostCommonElem

print(mostFreq("a man a plan a canal panama"))
print(mostFreq([ 1,2,3,2]))

# Alternatively, code up mostFreq using a dictionary to
# avoid traversing the list multiple times

def mostFreqFast(list):
    d = {}
    for x in list:
        if x in d:
            d[x] = d[x] +1
        else 
            d[x] = 1
    # have a dictionary            
    # find key that maps to maximum value
    ...
    
pass



a
2


# Homework

* ZyBook Reading 10
* Open book chapter 12: http://openbookproject.net/thinkcs/python/english3e/modules.html

