# 1. Introduction

So far, we've seen how the CPU runs code in sequence, and how control flow statements (like if and else) can change the order in which it executes statements. However, `we can also write programs that execute more than one instruction at a time`. **A multi-core CPU has the ability to run multiple instructions simultaneously.** The desire to take advantage of modern, multi-core CPUs has given rise to a technique called **parallel processing**, which is very useful in data science.

Parallel processing can be powerful, but it also presents many unique challenges. `When multiple processes are sharing data, it's important to manage which process has access to the data and when so that it doesn't become corrupted`. It's also important to think the execution of parallel processes through carefully, `because executing multiple instructions at once can potentially introduce tricky bugs.` Learning to manage these factors will help you write very powerful code that does quick and meaningful data analysis.

# 2. Using Mutable Values for Changing Information

In Python, `some values are immutable`, such as integers. This means that we can't change them.

Most of the data structures we've worked with (like dictionaries and lists) are mutable, so they're useful for representing information that changes.**` Mutable variables are especially useful in parallel processing because we often want to share and edit the same data between different processes.`**

## TODO:
* Create an instance of the Counter class called counter.

* Call counter.get_count() to get the initial value of the counter, and store it in initial_count.

* Call count_up_100000 with counter as its argument.

* Call counter.get_count() to get the final value of the counter, and store it in final_count.

In [1]:
class Counter():
    def __init__(self):
        self.count = 0
    def increment(self):
        self.count += 1
    def get_count(self):
        return self.count
    
def count_up_100000(counter):
    for i in range(100000):
        counter.increment()

counter = Counter()
initial_count = counter.get_count()
count_up_100000(counter)
final_count = counter.get_count()

# 3. Multithreading Multiple Processes

On the last screen, we counted from 0 to 100000 using a Counter instance. Creating this instance, calling the function, and incrementing the counter all happened in one process. `Every instruction in the process executed one after the other. We can also run multiple processes at once, however. We often refer to this technique as multithreading.`

`A thread is one path of execution in a program. We typically have one "main thread" that we think of as our single process program. We can also create new threads, though, and run them concurrently with the main thread.` **`To do this in Python, we use the threading module. Specifically, we can use threading.Thread() to create an instance of the Thread class, which executes a given function as a separate process.`**

In [2]:
# To create a Thread instance that runs the count_up_100000 function with counter as an argument, we write:
import threading

thread = threading.Thread(target=count_up_100000, args=[counter])

# Then we start the thread:

thread.start()

# Next, we "join" the thread so that when it's finished executing, it "joins" with the main thread by terminating:
thread.join()

`The main thread will wait until the other thread has finished executing before moving past the thread.join() call`. **`Waiting for a condition like the termination of a thread is called blocking.`** 

In [3]:
counter = Counter()
count_thread = threading.Thread(target=count_up_100000, args=[counter])
count_thread.start()
count_thread.join()
after_join = counter.get_count()
print(after_join)

100000


# 4. Determinism of Program Results

`In programming, we say that a program is deterministic if` **`we can precisely predict its output for a particular input`**. `Most single-threaded operations are deterministic because we can walk through the code for any input step by step, and predict the output.`

Now imagine that you call your friend a few hours after he started counting to ask what number he's on. He may be at 1000, or 10000, or 25392. It's impossible to know for sure, and this is analogous to measuring the value of our counter before we've joined the counting thread. We can't predict this value because we don't know how many iterations of the counting loop will have been executed at the time of our reading. `When we can't reliably predict the outcome of running a piece of code, we call that code nondeterministic.`

In [4]:
def conduct_trial():
    counter = Counter()
    count_thread = threading.Thread(target=count_up_100000, args=[counter])
    count_thread.start()
    intermediate_value = counter.get_count()
    count_thread.join()
    return intermediate_value

trial1 = conduct_trial()
print(trial1)
trial2 = conduct_trial()
print(trial2)
trial3 = conduct_trial()
print(trial3)

61195
22319
16798


# 5. Using Locks to Enforce Determinism in Multithreading

**`Multithreading is nondeterministic by nature, but there are ways to combat that nondeterminism. The easiest and most common way to make multithreading more predictable is through the use of threading.Lock. A lock is a way to conditionally block the execution of some threads. At any given time, we can think of a lock as being either available or acquired. A thread can acquire an available lock, but if a thread tries to acquire an acquired lock (that another thread is using), it will be blocked until that lock becomes available.`**

## TODO:
* Wrap the inner for loop in count_up_100000 inside lock.acquire() and lock.release() so that nobody can acquire the lock unless the counter value is a multiple of 10.

* In conduct_trial(), wrap the call to counter.get_count() inside lock.acquire() and lock.release() so that the main thread can only read the counter value at multiples of 10.

In [5]:
def count_up_100000(counter, lock):
    for i in range(10000):
        lock.acquire()
        for i in range(10):
            counter.increment()
        lock.release()

def conduct_trial():
    counter = Counter()
    lock = threading.Lock()
    count_thread = threading.Thread(target=count_up_100000, args=[counter, lock])
    count_thread.start()
    lock.acquire()
    intermediate_value = counter.get_count()
    lock.release()
    count_thread.join()
    return intermediate_value

trial1 = conduct_trial()
print(trial1)
trial2 = conduct_trial()
print(trial2)
trial3 = conduct_trial()
print(trial3)

26180
16560
15200


# 6. Counting in Two Steps

Now suppose we want to count to 200000. We can do this in two stages:

* Increment counter 100000 times
* Increment counter 100000 times again

This approach will produce interesting results because the operation will behave differently if we split it up among multiple threads. First, let's implement this behavior using only the main thread. Try to predict the outcome before running your code. Remember that we're implementing a single-threaded solution on this screen, so the outcome should be deterministic.

## TODO:
* Call count_up_100000() twice, using counter as an argument each time.

* Use counter.get_count() to assign the value of our counter after the two function calls to final_count.

* Print final_count.

In [6]:
def count_up_100000(counter):
    for i in range(100000):
        counter.increment()

counter = Counter()
def count_up_100000(counter):
    for i in range(100000):
        counter.increment()

counter = Counter()
count_up_100000(counter)
count_up_100000(counter)
final_count = counter.get_count()
print(final_count)

200000


# 7. Counting Once on Two Different Threads

Now let's implement a multi-threaded implementation to count to 200000. We've defined a conduct_trial() function that counts to 200000 with two threads, each of which increments the counter 100000 times. It's important that both of the threads start at the same time, and are joined at the same time. For this experiment, we want the threads to execute in parallel so we can make observations about how they behave in parallel.

## TODO:
* Call .join() on each of the counting threads in the conduct_trial() function. It's critical that both join calls occur after both threads have already started.

* Conduct three trials by calling conduct_trial() three separate times. Assign the results to trial1, trial2, and trial3, and print those values to observe the results of the experiment.

In [7]:
def count_up_100000(counter):
    for i in range(100000):
        
        counter.increment()

def conduct_trial():
    counter = Counter()
    count_thread1 = threading.Thread(target=count_up_100000, args=[counter])
    count_thread2 = threading.Thread(target=count_up_100000, args=[counter])

    count_thread1.start()
    count_thread2.start()

    # Join the threads here
    count_thread1.join()
    count_thread2.join()
    
    final_count = counter.get_count()
    return final_count

trial1 = conduct_trial()
print(trial1)
trial2 = conduct_trial()
print(trial2)
trial3 = conduct_trial()
print(trial3)

180418
175700
136765


# 8. Imitating Atomicity With Locks

**`An atomic operation is an operation that finishes executing before any other operation can occur, regardless of multithreading.`**

## TODO:
* In the __init__ method of the Counter class, add a lock property.

* Before the first line of the counter.increment() method, acquire the lock.

* After the last line of the counter.increment() method, release the lock.

* Conduct three trials by calling conduct_trial() three separate times. Assign the results to trial1, trial2, and trial3, and print those values to observe the results of the experiment.

In [8]:
class Counter():
    def __init__(self):
        self.count = 0
        self.lock = threading.Lock()
    def increment(self):
        self.lock.acquire()
        old_count = self.count
        self.count = old_count + 1
        self.lock.release()
    def get_count(self):
        return self.count

def count_up_100000(counter):
    for i in range(100000):
        counter.increment()

def conduct_trial():
    counter = Counter()
    count_thread1 = threading.Thread(target=count_up_100000, args=[counter])
    count_thread2 = threading.Thread(target=count_up_100000, args=[counter])

    count_thread1.start()
    count_thread2.start()

    count_thread1.join()
    count_thread2.join()

    final_count = counter.get_count()
    return final_count

trial1 = conduct_trial()
print(trial1)
trial2 = conduct_trial()
print(trial2)
trial3 = conduct_trial()
print(trial3)

200000
200000
200000


`We've seen some of the problems that parallel processing can introduce, such as nonatomicity and nondeterminism. In data science, it's important to maintain the integrity of our data, and a multithreaded environment is no exception. By using tools like locks to enforce atomicity and determinism, we can protect resources shared between threads, and ensure that delegating tasks between threads doesn't introduce unexpected bugs into our code.`