# Lecture 10 - Generators, Modules and the Main Function (https://bit.ly/intro_python_10)

Today:
* Finishing up sequences:
  * Iterators vs. lists
  * Generators and the yield keyword
  * Generator expressions
* Modules:
  * Some useful modules
  * Hierarchical namespaces
  * Making your own modules
* The main() function
* PEP 8 (v. briefly)
* A revisit to debugging, now that we're writing longer programs:
  * Looking at different error types (syntax, runtime, logical)


# Generators vs. lists

In [3]:
# Recall that range can be used to iterate through a sequence of numbers:

for i in range(10):
  print("i is", i)

i is 0
i is 1
i is 2
i is 3
i is 4
i is 5
i is 6
i is 7
i is 8
i is 9


In [4]:
# We can convert range to a list

x = list(range(10)) # Makes a list [ 0, 1, ... 9 ]

print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


**But isn't range a list to start with?**

In [5]:
# No!

x = range(10) 

print(x)

range(0, 10)


In [6]:
# So what is the type of range:

x = range(10) # So what is a range? 

print(type(x))

<class 'range'>


A range, or (as we'll see in a minute) generator function, is a promise to produce a sequence when asked.

Essentially, you can think of it like a function you can call repeatedly to get 
successive values from an underlying sequence, e.g. 1, 2, ... etc.

Why not just make a list? In a word: memory.

In [7]:
x = list(range(100)) # This requires allocating memory to store 100 integers

print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


In [8]:
x = range(100) # This does not make the list, so the memory for the list is never allocated. 

print(x)

range(0, 100)


In [9]:
# This requires only the memory for j, i and the Python system

# Compute the sum of integers from 1 (inclusive) to 100 (exclusive)
j = 0
for i in range(100):
  j += i
  
print(j)

4950


In [10]:
# Alternatively, this requires memory for j, i and the list of 100 integers:

j = 0
for i in list(range(100)):
  j += i
  
print(j)

4950


* Range, as an iterator, is the promise to produce a sequence of integers, but this does not require they all exist in memory at the same time. 

* With a list, however, by definition, all the elements are present in memory.

* As a general guide, if we can be "lazy", and avoid ever building a complete sequence in memory, then we should be lazy about evaluation of sequences.

* So how do you code a function like range? This is where the "yield" keyword comes in, which allows you to create generator functions.

# Yield keyword

With *return* you exit a function completely, returning a value. The internal state of the function is lost.

Yield is like return, in that you return a value from the function and temporarily the function exits, however the state of the function is not lost, and the function can be resumed to return more values.

This allows a function to act like an iterator over a sequence, where the function incrementally yields values, one for each successive resumption of the function. 

It's easiest to understand this by example:

In [11]:
def make_numbers(m):
  i = 0
  while i < m:
    yield i
    i += 1

for i in make_numbers(10):
  print("i is now", i)

i is now 0
i is now 1
i is now 2
i is now 3
i is now 4
i is now 5
i is now 6
i is now 7
i is now 8
i is now 9


In [12]:
# What is the type?

x = make_numbers(5) 

print(type(x)) 

<class 'generator'>


Why use yield to write generator functions?:

* Shorter, cleaner code - here we saved all the messing around with lists
* More efficient in memory - we never have to construct the complete list in memory, rather we keep track of a limited amount of state in memory that represents where we are in the sequence of permutations.

# Generator Expressions

Like list comprehensions, but lazy

In [12]:
# Last lecture we covered list comprehensions, for example:

x = [i**2 for i in range(10)] # square numbers

print(x) # x is a list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [16]:
# If we swap the square brackets for parentheses then we get a "generator expression"

y = (i**2 for i in range(10)) # square numbers as a generator expression

print(y)

# A generator expression creates a generator function but in less code:
for i in y:
    print(i)

<generator object <genexpr> at 0x1059fcb30>
0
1
4
9
16
25
36
49
64
81


In [2]:
# We don't need to include the parentheses in some cases
x = sum(i**2 for i in range(100)) # One liner to sum the first 100 square numbers
print(f"Sum of the first hundred square numbers {x}")

## This is equivalent to:
def square_numbers(m):
    i = 0
    while i < m:
        yield i**2
        i += 1
        
x = sum(square_numbers(100))
print(f"Sum of the first hundred square numbers {x}")

Sum of the first hundred square numbers 328350
Sum of the first hundred square numbers 328350


The take-home: Generator expressions are just syntatic sugar for generator functions. They are generally less code, particularly when they can be used inline (like the sum example above), and avoid allocating all the memory that a list comprehension involves.

# Challenge 1

In [1]:
# Write a generator function to enumerate numbers in the Collatz 3n + 1 sequence

# Recall:
# The sequence is produced iteratively such that the next term is determined
# by the current value, n. If n is even then the next term in the sequence is n/2 otherwise
# it is n*3 + 1. The sequence terminates when n equals 1 (i.e. 1 is always the last integer returned).

def collatz(n):
    #pass #replace with your code

for i in collatz(11):
    print(i)

11
34
17
52
26
13
40
20
10
5
16
8
4
2
1


# Modules

* A language like Python has vast libraries of useful functions, classes, etc. See https://pypi.org/:
  * As of Dec 2023 there are over 500K different Python "packages" in PyPi.
* To make it possible to use these, and importantly, ensure the namespace of our code does not explode in size, Python has a hierarchical system for managing these libraries using "modules" and "packages".

In [14]:
# From a user perspective, modules are variables, functions, objects etc. defined separately
# to the code we're working on.

import math # This line "imports" the math module into the namespace, so that we can refer to it

math.log10(100) # Now we're calling a function from the math module to compute log_10(100)

# The math module contains lots of math functions and constants

2.0

In [15]:
dir(math) # As I've shown you before, use dir to list the contents of an object or module

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'comb',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'dist',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'isqrt',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'perm',
 'pi',
 'pow',
 'prod',
 'radians',
 'remainder',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc']

In [7]:
# Use help() to give you info (Note: this is great to use in the interactive interpretor)

import math

help(math.sqrt) # e.g. get info on the math.sqrt function - this is pulling the doc string of the function

Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.
Note: The output produced a '/' as an indicator of all previous parameters to be positional-only parameters. Positional-only parameters (https://docs.python.org/3/faq/programming.html#what-does-the-slash-in-the-parameter-list-of-a-function-mean) are the ones without externally usable name. In this ^^ case, we cannot specify the parameter name. For example: math.sqrt(x=9), would result in sytax error.



* In general, the Python standard library provides loads of useful modules for all sorts of things: https://docs.python.org/3/py-modindex.html 

* Standard library packages are installed as part of a default Python installation - they are part of every Python of that version (e.g. 3.XX)



In [17]:
# For example, the random module from the standard library provides loads of functions 
# for making random numbers/choices - this is useful for games, algorithms
# and machine learning where stochastic behaviour is needed

import random

def make_random_ints(num, lower_bound, upper_bound):
  """
  Generate a list containing num random ints between lower_bound
  and upper_bound. upper_bound is an open bound.
  """
  rng = random.Random()  # Create a random number generator
  
  # Makes a sequence of random numbers using rng.randrange()
  return [ rng.randrange(lower_bound, upper_bound) for i in range(num) ]

make_random_ints(10, 0, 6) # Make a sequence of 10 random numbers, each in the 
# interval [0 6)

[3, 5, 0, 3, 3, 5, 0, 0, 4, 0]

* There is a much larger universe of open source Python packages you can install from: https://pypi.org/ 

# Challenge 2

In [2]:
# Use the median function from the statistics module to calculate the median of the following list:
l = [ 1, 8, 3, 4, 2, 8, 7, 2, 6 ]


4

# Namespaces and dot notation

* To recap, the namespace is all the identifiers (variables, functions, classes (to be covered soon), and modules) available to a line of code (See previous notes on scope and name space rules).

* In Python (like most programming languages), namespaces are organized hierarchically into subpieces using modules and functions and classes. 

* If all identifiers were in one namespace without any hierarchy then we would get lots of collisions between names, and this would result in ambiguity. (see Module1.py and Module2.py example in textbook: http://openbookproject.net/thinkcs/python/english3e/modules.html)

* The upshot is if you want to use a function from another module you need to import it into the "namespace" of your code and use '.' notation:

In [19]:
import math # Imports the math module into the current namespace

# The '.' syntax is a way of indicating membership
math.sqrt(2) # sqrt is a function that "belongs" to the math module

# (Later we'll see this notation reused with objects)

1.4142135623730951

# Import statements

* As you've seen, to import a module just write "import x", where x is the module name.

**Import from**

* You can also import a specific function, class or object from a module into your program's namespace using the import from syntax:

In [20]:
from math import sqrt

sqrt(2.0) # Now sqrt is a just a function in the current program's name space, 
# no dot notation required

1.4142135623730951

If you want to import all the functions from a module you can use:

In [21]:
from math import * # Import all functions from math


# But, this is generally a BAD IDEA, because you need to be  sure
# this doesn't bring in things that will collide with other things
# used by the program

log(10)
#etc.

2.302585092994046

More useful is the "as" modifier

In [22]:
from math import sqrt as square_root # This imports the sqrt function from math
# but names it square_root. This is useful if you want to abbreviate a long function
# name, or if you want to import two separate things with the same name

square_root(2.0)

1.4142135623730951

# Challenge 3

In [3]:
# Write a statement to import the 'beep' function from the 'curses' module


Help on built-in function beep in module _curses:

beep()
    Emit a short attention sound.



# Writing your own modules

You can write your own modules. 

* Create a file whose name is 
x.py, where x is the name of the module you want to create.

* Edit x.py to contain the stuff you want

* Create a new python file, call it y.py, in the same directory as x.py and
include "import x" at the top of y.py. 

(NOTE: do demo)



**Packages**

Packages are collections of modules, organized hierarchically (and accessed using the dot notation).

Beyond the scope here, but you can look more at environment setup to create your own "packages". If you're curious see: https://docs.python.org/3/tutorial/modules.html#packages

# The main() function

* You may write a program and then want to reuse some of the functions by importing them into another program. In this case you are treating the original program as a module. 

* The problem is that when you import a module it is executed. 

* Question: How do you stop the original program from running when you import it as a module?

* Answer: By putting the logic for the program in a "main()", which is only called if the program is being run by user, not imported as a module.




In [26]:
def some_useful_function():
  """Defines a function that would be useful to 
  other programs outside of main"""
  pass

def main():
  x = input()
  print("python main function, x is:", x)
    # Put the program logic in this function

if __name__ == '__main__': # This will only be true
    # when the program is executed by a user
    main()

5
python main function, x is: 5


In [29]:
print(__name__) # The name of the current module
type(__name__)

__main__


str

**Live demo!**

# PEP8: Use Style


It is easy to rush and write poorly structured, hard-to-read code. However, generally, this proves a false-economy, resulting in longer debug cycles, a larger maintenance burden (like, what was I thinking?) and less code reuse. 

Although many sins have nothing to do with the cosmetics of the code, some can be fixed by adopting a consistent, sane set of coding conventions. Python did this with Python Enhancement Proposal (PEP) 8:

https://www.python.org/dev/peps/pep-0008/

Some things PEP-8 covers:

* use 4 spaces (instead of tabs) for indentation - you can make your text editor do this (insert spaces for tabs)
* limit line length to 78 characters
* when naming identifiers, use CamelCase for classes (we’ll get to those) and lowercase_with_underscores for functions and variables
* place imports at the top of the file
* keep function definitions together
* use docstrings to document functions
* use two blank lines to separate function definitions from each other
* keep top level statements, including function calls, together at the bottom of the program

# Debugging Revisited

We mentioned earlier that a lot of programming is debugging. Now we're going to debug programs and understand the different errors you can get.

There are three principle types of error:
 - syntax errors
 - runtime errors
 - semantic/logical errors

# Syntax Errors

* when what you've written is not valid Python

In [30]:
# Syntax errors - when what you've written is not valid Python

for i in range(10)
  print(i) # What's wrong with this?

SyntaxError: invalid syntax (3569576470.py, line 3)

In [31]:
# Syntax errors - when what you've written is not valid Python

for i in range(10):
print(i) # What's wrong with this?

IndentationError: expected an indented block (2681185069.py, line 4)

In [32]:
# Syntax errors - when what you've written is not valid Python
for i in range(10):
  """ This loop will print stuff ""
  print(i)


SyntaxError: EOF while scanning triple-quoted string literal (2118504203.py, line 4)

In [33]:
# Syntax errors - when what you've written is not valid Python 
# (note, this kind of print statement was legal in Python 2.XX and earlier)

print "Forgetting parentheses"

SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Forgetting parentheses")? (839321298.py, line 4)

# Runtime Errors

* when the program crashes during runtime because it 
 tries to do something invalid

In [34]:
# Runtime errors - when the program errors out during runtime because it 
# tries to do something invalid

print("This is an integer: " + 10)

TypeError: can only concatenate str (not "int") to str

In [35]:
# Runtime errors - when the program errors out during runtime because it 
# tries to do something invalid

assert 1 + 1 == 3

AssertionError: 

# Semantic Errors (aka Logical Errors)

* when the program runs and exits without error, but produces an unexpected result

In [36]:
# Semantic errors - when the program runs and exits without error, 
# but produces an unexpected result

j = int(input("Input a number: "))

x = 1
for i in range(1, j): # should be range(1, j+1):
  x = x * i
  
print(str(j) + " factorial is " + str(x))

Input a number: 3
3 factorial is 2


In my experience syntax errors are easy to fix, runtime errors are generally solvable fast, but semantic errors can take the longest time to fix

**Debug strategies**

To debug a failing program, you can:
  * Use print statements dotted around the code to figure out what code is doing at specific points of time (remember to remove / comment these out when you're done!)
  * Use a debugger - this allows you to step through execution, line-by-line, seeing what the program is up to at each step. (PyCharm has a nice interface to the Python debugger)
  * Write unit-tests for individual parts of the code
  * Use assert to check that expected properties are true during runtime
  * Stare hard at it! Semantic errors will generally require you to question your program's logic.

# Challenge 4

See if you can get this to work:

In [4]:
import time

# Try debugging the following - a number guessing program
# It has all three types of errors

print("Think of a number from 1 to 100")

time.sleep(3)

min = 1
max = 100

while max == min
  i = (min + max) // 2
  
  answer = input("Is your number greater than " + str(i) + " Type YES or NO: ")
  
  assert answer == "YES" or answer == "YES" # Check the value is what we expect
  
  if answer == "YES":
    min = i+1
  else:
   max = i

print("Your number is: " + str(min))
  
  

Think of a number from 1 to 100
Is your number greater than 50 Type YES or NO: YES
Is your number greater than 75 Type YES or NO: NO
Is your number greater than 63 Type YES or NO: NO
Is your number greater than 57 Type YES or NO: YES
Is your number greater than 60 Type YES or NO: NO
Is your number greater than 59 Type YES or NO: YES
Your number is: 60


# Reading

Open book chapter 12: http://openbookproject.net/thinkcs/python/english3e/modules.html


# Homework

ZyBook Reading 10


# Practice Problems

In [None]:
import math

# Problem 1: Generator Function
def fibonacci_generator(n):
    """
    Create a generator function that yields the first n numbers in the Fibonacci sequence.
    Each number in the sequence is the sum of the two preceding numbers.
    The sequence starts with 0, 1, 1, 2, 3, 5, 8, 13, ...
    
    Args:
        n (int): Number of Fibonacci numbers to generate
        
    Yields:
        int: Next number in Fibonacci sequence
    """
    # Your code here
    pass

# Test Problem 1
assert list(fibonacci_generator(1)) == [0]
assert list(fibonacci_generator(3)) == [0, 1, 1]
assert list(fibonacci_generator(7)) == [0, 1, 1, 2, 3, 5, 8]

In [None]:
# Problem 2: Generator Expression
def sum_squares_up_to(n):
    """
    Use a generator expression to calculate the sum of squares of numbers from 1 to n.
    For example, if n = 3, calculate 1^2 + 2^2 + 3^2 = 14
    
    Args:
        n (int): Upper bound (inclusive)
        
    Returns:
        int: Sum of squares
    """
    # Your code here
    pass

# Test Problem 2
assert sum_squares_up_to(1) == 1
assert sum_squares_up_to(3) == 14
assert sum_squares_up_to(5) == 55

In [None]:
# Problem 3: Module Import and Usage 
def calculate_circle_area(radius):
    """
    Use the math module to calculate the area of a circle with given radius.
    Remember that area = π * r^2
    
    Args:
        radius (float): Radius of the circle
        
    Returns:
        float: Area of the circle rounded to 2 decimal places
    """
    # Your code here
    pass

# Test Problem 3
assert calculate_circle_area(1.0) == 3.14
assert calculate_circle_area(2.0) == 12.57
assert calculate_circle_area(3.0) == 28.27

In [None]:
# Problem 4: Error Handling
def safe_divide(a, b):
    """
    Implement division that handles potential errors.
    If b is 0, return "Cannot divide by zero"
    If either a or b are not integers, return "Invalid input"
    Otherwise return a/b rounded to 2 decimal places
    Hints: 
    (1) the isinstance built in function could be useful, 
    i.e. isinstance(a, int)  will test is a is an integer
    (2) to round a number to a given precision see the builtin round function
    
    Args:
        a: First number
        b: Second number
        
    Returns:
        float or str: Result of division or error message
    """
    # Your code here
    pass

# Test Problem 4
assert safe_divide(10, 2) == 5.0
assert safe_divide(10, 0) == "Cannot divide by zero"
assert safe_divide("10", 2) == "Invalid input"

In [None]:
# Problem 5: Debug the Function
def count_vowels(text):
    """
    Count the number of vowels (a, e, i, o, u) in a string.
    Should be case-insensitive.
    
    Args:
        text (str): Input string
        
    Returns:
        int: Number of vowels in the string
    """
    # This code has bugs! Fix them!
    vowels = ['a','e','i','o']
    count = 0
    for char in text
        if char in vowels:
            count += 1
    return count

# Test Problem 5
assert count_vowels("hello") == 2
assert count_vowels("PYTHON") == 1
assert count_vowels("aEiOu") == 5

print("All tests passed!")