# PD 2: The Cool(er) Parts Of Python

Written by Tim Nadolsky
Welcome to the second PD in the AIM summer intro assignment series. This assignment teaches some of the non-standard basics of Python needed in this lab.

This PD assumes that you have a basic working knowledge of Python (or can quickly pick up Python coming from some other language). In particular, we assume that you know how to write variables (identifying their scope upon request), if statements, for/while loops, functions, return statements, among other things.

By the end of this assignment, you should know how to work with:
- Basic polymorphism and inheritance in Python
- "Concurrency" using Python's *threading* library, plus the limitations and advantages of Python's GIL
- Basic UnitTests in Python
- How to set basic ABSL flags for running Python files from the command line
- *numpy* and *matplotlib.pyplot* (Review)

## Part 1: Inheritance and Object-Oriented Programming

To organize our code better, our lab uses many object-oriented features of Python. Although these features have been "tacked-on" compared to other OOP languages such as Java, they are still useful for organizing and thinking about code. 

### 1.1. Classes

Here is an example of a class in Python. We can make classes to store collections of methods and variables for easy use.

In [1]:
class ClassA:
    variable = 3
    def methodA():
        print("methodA")
    def methodB():
        print("methodB")

We can call a class method using the ``.`` symbol:

In [2]:
ClassA.methodA()

methodA


If we give a class an ``__init__`` method, it turns into an object class (that is, whenever we make a new copy of this class, it is a new "instance" of the object our class represents).

In [3]:
class ObjectA:
    def __init__(self): # The constructor - this method is called with ObjectA() and makes a new copy of ObjectA
        print("Init'ed")
    def methodA(self): # An instance method
        print("My methodA")

We can make a new copy of ``ObjectA`` with the following code:

In [4]:
obj = ObjectA()

Init'ed


And run its ``methodA`` with the following code:

In [5]:
obj.methodA()

My methodA


You may have noticed that each of the method definitions contains an extra argument ``self``, yet there is no argument being passed into the object constructor. This is a reference to the object itself - when you call an instance method of an object (ex. ``obj.methodA()``), Python calls ``ObjectA.methodA`` and lets ``ObjectA`` know that the object calling the method is ``obj`` via the ``self`` parameter. 

Thus, all instance methods must have the ``self`` parameter as the first argument; otherwise, your code ceases to be object-oriented Python code.

We can also set instance variables via the following:

In [6]:
class ObjectB:
    def __init__(self, inVal):
        self.val = inVal
    def out(self):
        print(self.val)

In [7]:
objB = ObjectB(3)
objB.out()

3


Essentially you can think of ``self`` as a "black hole" which you can attach any number of variables to.

### 1.2 Inheritance

Now that we have classes and objects in Python, we can do some interesting things. Here is our first example of inheritance at work:

In [8]:
class Animal:
    def __init__(self, name):
        self.name = name
    def eat(self):
        print("Crunch crunch crunch")
    def make_sound(self):
        print("Thump thump thump")
    def respond_to_name(self, inName):
        if self.name == inName:
            print("!")

class Dog(Animal):
    def __init__(self, name):
        super().__init__(name)
    def make_sound(self):
        print("Woof?")

In [9]:
ani = Animal("Pop")
ani.make_sound()

Thump thump thump


In [10]:
dog = Dog("Peep")
dog.eat()
dog.make_sound()
dog.respond_to_name("Peep")

Crunch crunch crunch
Woof?
!


There's a lot to unpack here. First, we set up a base class called ``Animal`` with some instance methods and variables. 

Then, we create a subclass called ``Dog`` which inherits from ``Animal`` using the syntax ``class Dog(Animal)``. 

What this means is that ``Dog`` gets all of ``Animal``'s instance methods; in fact, it actually is an ``Animal`` as well. 

Note that for this to work correctly, we also have to call the ``__init__`` method of the ``Animal`` superclass to initialize the ``Dog`` as an ``Animal`` (the line ``super().__init__...``).

This creates some interesting behavior.
1. We can call the superclass method ``eat()`` from ``Dog`` without having to cast ``dog`` to be an ``Animal`` - it just works.
2. When we call ``make_sound`` on ``dog``, ``Dog``'s ``make_sound`` method overrides the default ``Animal``'s method.

This behavior allows us to make modular code - that is, we can simply make one overarching "superclass" for many different types of subclasses, all with shared-ish methods that play together nicely. This will become clearer with the next example:

### 1.3 Polymorphism

In [11]:
class Cat(Animal):
    def __init__(self, name):
        super().__init__(name)
    def make_sound(self):
        print("mrawwww")

In [12]:
cat = Cat("Boss")
cat.make_sound()

mrawwww


Notice that we can call ``make_sound()`` on both ``cat`` and ``dog`` (with no crashes or undefined behavior) without knowing whether either is a ``Cat``, ``Dog`` or ``Animal`` as long as we can guarantee both are at least ``Animals``. Yet, each performed the same action in ways that were appropriate to that specific class. In object-oriented programming, we call this **polymorphism**.

Some may wonder: can you access the parent's method once the child has its own method? The answer is yes, and you can do it using the ``super(<Superclass>, <subclass object reference>)`` method:

In [13]:
super(Cat, cat).make_sound()

Thump thump thump


## Part 2: "Concurrency" using Python's *threading* library

In this portion of the tutorial, we will be exploring how to get Python to do multiple tasks "at once" using Python's *threading* library.

In [14]:
import threading
import time

### 2.1 Pizza factories

Roughly speaking, each thread is either running its own piece of code (sequentially according to its instructions) or waiting for some other thread/process to complete so it can start running its code.

As an illustration: consider a home cook making a pizza from scratch versus a factory. The home cook has to proceed mostly in-order to the recipe, making the dough and sauce, grating the cheese, assembling the pizza, and baking it. Since the factory has more workers (threads), they can delegate one person to cook the sauce, one person to make pizza bases, one person to put cheese on pizzas, etc. 

This has several benefits: for one, each of the workers' tasks are very simple. For example, the person putting cheese on pizzas only has to put cheese on pizzas, which only requires a measurement of the amount of cheese before putting it on. Compare this to the home cook, who has to plan and execute the entire recipe from start to finish, which is much harder.

Second (in an ideal world), the latency cooking the first pizza is about the same for both parties (since the pizza's ingredients take the same amount of time to cook in both cases). However, due to the fact that the factory has far more workers, they can start prepping many more pizzas while the first few are baking, increasing their average throughput drastically. The home cook can cook maybe a few pizzas concurrently at most with *very* careful planning, but they are ultimately limited by their physical ability to cook the pizzas and their ingredients (e.g. they can't pay attention to the sauce while they are busy grating the cheese, for example).

### 2.2 Threading (for real)

In computing terms: threading has multiple benefits - first, one large, bulky, hard-to-write for(ever) loop can be broken down into multiple simple-to-write threads, which dynamically share computing resources as they are needed by each thread. This makes writing and reading code much easier.

Also, in an ideal world, threading drastically improves throughput performance in parallelizable workloads - that is, if a task can be broken down into smaller parts which can be run concurrently (in paralllel), threading improves the performance of these tasks. However, you will quickly find that there are significant limitations to how well this works in practice, even without the limitations of Python.

To illustrate how this works in practice, let's use an example of 100 threads trying to increment a global variable called ``counter``.

In [15]:
# Make a bunch of global variables to store variables (not good practice normally)
global counter
global finished
global threshold
global finished

In [16]:
# An object class for a thread which increments counter until it reaches the correct value
class incrementerThread(threading.Thread):            
    def __init__(self):                              
        super(incrementerThread, self).__init__()
        self.stopped = False
        self.counter = 0
    
    # This method is called when incrementerThread.start() is called in the main method
    def run(self):           
        global counter                                
        global finished
        global threshold
        global finished
        while not self.stopped:
            if counter < threshold and not finished:
                counter = counter + 1
                self.counter += 1
            else:
                self.stopped = True
                finished = True
            # time.sleep(0)

In [17]:
# Set variables
counter = 0
finished = False
threshold = 1e7
num_threads = 100

# Make a list of all the threads
threadList = []

# Fill the list with 100 threads
for i in range(num_threads):
    threadList.append(incrementerThread())

# Start all the threads
for thread in threadList:
    thread.start()
     
# while not finished:
    # time.sleep(1)
    
# Print the answer
print("Counter =", counter)

# Print information about the threads
non_running_threads = []
for i in range(num_threads):
    if threadList[i].counter != 0:
        print("Thread %d incremented the counter %d times" % (i, threadList[i].counter))
    else:
        non_running_threads.append(i)

print("Thread(s) ", end="")
for i in non_running_threads:
   print("%d, " % (i), end="")
print("didn't increment the counter at all")

Counter = 10000000
Thread 0 incremented the counter 2220414 times
Thread 1 incremented the counter 2033238 times
Thread 2 incremented the counter 1771470 times
Thread 3 incremented the counter 1374552 times
Thread 4 incremented the counter 1590786 times
Thread 5 incremented the counter 667450 times
Thread 6 incremented the counter 204347 times
Thread 7 incremented the counter 137743 times
Thread(s) 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, didn't increment the counter at all


Normally, if this was a PD on C++, C, Java, or nearly any other language besides Python, you would see a lengthy 6+ paragraph section on how the number above is almost never 100 million exactly. (For those interested, this is due to mutual memory access issues caused by having multiple threads write to the same object concurrently without a mutex or lock.)

However, try as you might, but in Python, it will be very challenging for you to actually get the number to change from the correct answer of 10 million (which would indicate a non-thread-safe operation) without writing unsafe code on purpose.

### 2.3 The GIL and atomic operations

The above behavior is due to Python's so-called **global interpreter lock** (GIL). In simple terms, the GIL only allows one thread at a time to actually do work. So, your so-called "multi-threaded" program only achieves single threaded performance!

However, the GIL is not as bad as it may seem on the surface. Because there is only one thread running at a time, it is near impossible for Python to ever concurrently access the same memory and run into concurrency/mutex issues like in other threaded languages such as Java. Effectively, this makes it very hard for programmers to write incorrectly threaded Python code (in terms of output correctness), which makes development simple.

In practice, you can assume that all operations on ints, doubles, lists, and built-in Python datatypes are **atomic** - that is, once that operation starts, the variable cannot be accessed by a different operation until the first operation finishes. More information is available [here](https://docs.python.org/3/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe) from the Python documentation.

Also, it should be noted that while Python's GIL means it's effectively single threaded, there are several ways to achieve more than single-threaded performance out of Python.
1. Many Python libraries are APIs to other languages in disguise (e.g. PyTorch maps Python to CUDA/C++). With these libraries, all Python has to do is make a few super-fast API calls to the other language, which has no issues using multiple threads efficiently. (Note: your other language may not be thread-safe.)
2. If you REALLY want to process multiple things in Python at once, the *multiprocessing* library allows you to manage multiple Python instances from one main Python process. (This is usually a last resort due to the huge data transfer time penalty between threads.)

### 2.4 Resource hogging

Note that while the above prevents programmers from writing code that produces incorrect results, it does not stop programmers from writing inefficient or even terminally slow code!

If you read the thread counts above to see which threads did the most work, you'll see that the threads initialized first get most (or all) of the work done, while the other threads barely even get a chance to do anything.

As an exercise, uncomment the line ``time.sleep(0)``, as well as the lines ``while not finished: time.sleep(1)``. The run will take far longer than the original, but the threads will have much more equal utilization. (And the counter will still be correct!)

This is a case of what I have termed "resource hogging" in Python: if you don't put a ``time.sleep`` in your thread's ``run`` method (or exclusively use calls that either context switch away from the thread OR are guaranteed to complete extremely quickly), threads with long-running operations will suck all the resources from other threads before they can even start theirs.

### 2.5 Stop conditions in Python

One more thing about threading: it's SUPER important to ALWAYS put a way to terminate your thread in its definition. In the above case, the threads stopped automatically once the counter reached the correct value. In "production" Mus2Vid/Companion code, the threads contain a ``stop_request`` attribute which kills the thread when set to ``True`` by an outside source.

Python cannot kill infinite-looping threads on its own so it is SUPER important that your threads stop themselves, both when they're done working and when they're asked to by an outside program.

## Part 3: UnitTests

We're nearing the home stretch of this PD now! Next up are UnitTests - simple ways to test your code in Python automatically.

UnitTests are super simple: you run them, and if they catch an exception or an error (which you define), they throw an error to the console while still allowing the other tests to run (and also potentially catch errors). Here is a mock unit test - note that you will need to use the included file called ``unittest_example.py``, or copy the code below into a discrete Python file to run it properly.

In [None]:
# Modified from https://docs.python.org/3/library/unittest.html

import unittest

class TestStringMethods(unittest.TestCase):       # Your testing class MUST extend unittest.TestCase to be test-able

    def test_upper(self):                         # All test methods must start with test_XXXXXX
        self.assertEqual('foo'.upper(), 'FOO')    # self.assertEqual(X, Y) fails the test if X doesn't equal Y

    def test_isupper(self):
        self.assertTrue('FOO'.isupper())          # self.assertTrue(X) fails the test if X isn't True
        self.assertFalse('Foo'.isupper())         # self.assertFalse(X) fails the test if X isn't False
        
    def test_bad(self):                           # This test should fail.
        self.assertTrue(1 == 2)

if __name__ == '__main__':                        # You need this last bit of code to run the test from the VSCode "play" button or the command line
    unittest.main()

You should use UnitTests any time you are writing at least a medium-sized project - simply import your libraries, write a few test cases and run them every time you make a decent-sized change to your project's code.

One final thing about UnitTests: Prof. Raymond Yeh (in Purdue's CS department) explained the utility of UnitTests really well with this paraphrased passage:
Most people say they don't have time to write UnitTests. However, if you don't have time to write UnitTests and debug properly, you definitely don't have time to NOT write UnitTests (and debug by searching for a needle-in-a-haystack-type bug).

## Part 4: ABSL flags

One last little thing which will help make your life easy while developing Python code!

Abseil (or ABSL) is a [collection of Python library code for building Python applications. The code is collected from Google's own Python code base, and has been extensively tested and used in production.](https://pypi.org/project/absl-py/)

One of the cool things that ABSL can do is allow you to use command-line flags easily when running your code. Here is some example code - note that like the UnitTests, you will have to use the included file called ``absl_example.py`` or copy the code below into a discrete Python file to run it properly.

In [None]:
from absl import app, flags

# Flags are the command line arguments your program is expecting
# The syntax follows the format (name, default, (enum options if an enum), help_text)
# Several common options are:
# Enum: choose from a few preset programmer-defined choices
# String: self explanatory
# Integer: self explanatory
flags.DEFINE_enum('cool_letter', 'A', ['A', 'B','C'], 'A cool letter (one of the first three in the English alphabet)')
flags.DEFINE_string('name', 'Marty McFly', None)
flags.DEFINE_integer('number', 0, None)

# For programming ease
FLAGS = flags.FLAGS

def main(argv):
    print("Hi my name is %s, my favorite letter is %s and my number is %d!" % (FLAGS.name, FLAGS.cool_letter, FLAGS.number))

# Need this bit to run the code properly
if __name__ == "__main__":
    app.run(main)

The syntax to run the code with the flags properly is:
``python <filename>.py -cool_letter <cool_letter> -name <name> -number <number>``

You can include/omit as many tags as you want because each one has defaults specified in the program.

## Reminder

So far in this PD activity, you should have learned the following:
- Basic polymorphism and inheritance in Python
- "Concurrency" using Python's *threading* library, plus the limitations and advantages of Python's GIL
- Basic UnitTests in Python
- How to set basic ABSL flags for running Python files from the command line

If not, please go back and re-read some of the sections and play with the examples to learn more.

## Exercise

The below assignment assumes you've used *numpy* and *matplotlib* before, or can at least find your way around the documentation well enough to do the assignment. You will also need some (very basic) linear algebra knowledge (possibly) as well as some basic analytic geometry (e.g. the kind you learn in Calculus here).

There are three files included in the GitHub repo for this assignment which you have to write code for: ``image_generator.py``, ``generation_tests.py``, and ``generator_thread.py``, the last of which is also your main file. These files essentially serve as a *very* janky version of real-time image generation - however, instead of using diffusion models, we're simply doing cool-looking things in *numpy* and *matplotlib* by hand instead.

Finish writing the code for these files as best you can, and submit these three files as your final submission.