<a href="https://colab.research.google.com/github/COMS-BC3159-SP23/colabs/blob/main/Parallel_Cpp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parallel C++ 101

## Objective:
In this notebook we're going to explore a program that wants to increment a global variable by 100,000 twice! We'll look at the serial output, the parallel (with no safety measures put in) output, and the output using synchronizations, locks, and atomics.

## Acknowledgements:
These notes were built with the support of [ChatGPT](https://chat.openai.com/chat).

In [None]:
# Install some magic to make c++ programs look nice!
!wget -O cpp_plugin.py https://gist.github.com/akshaykhadse/7acc91dd41f52944c6150754e5530c4b/raw/cpp_plugin.py
%load_ext cpp_plugin

## The Serial Code

In [None]:
#@title The output of the cell is formatted text double click the title to show or hide the raw code you can edit!
%%cpp -n serial.cpp -s xcode

#include <iostream>

int shared_counter = 0;

void increment_counter() {
    for (int i = 0; i < 100000; i++) {
        shared_counter++;
    }
}

int main() {
    increment_counter();
    increment_counter();
    std::cout << "Final count: " << shared_counter << std::endl;
    return 0;
}

In [None]:
# Now lets compile our code into a working file!
!g++ serial.cpp -o serial.exe
# And now lets run that code!
!./serial.exe

## The Hogwild Parallel Code

In [None]:
#@title The output of the cell is formatted text double click the title to show or hide the raw code you can edit!
%%cpp -n parallel.cpp -s xcode

#include <iostream>
#include <thread>

int shared_counter = 0;

void increment_counter() {
    for (int i = 0; i < 100000; i++) {
        shared_counter++;
    }
}

int main() {
    // Construct two threads t1 and t2 and pass them the incredment_counter
    // function to run for us in parallel!
    std::thread t1(increment_counter);
    std::thread t2(increment_counter);

    std::cout << "Final count: " << shared_counter << std::endl;
    return 0;
}


In [None]:
# Now lets compile our code into a working file! (Note we have to add -pthread)
!g++ parallel.cpp -o parallel.exe -pthread
# And now lets run that code!
!./parallel.exe

You likely got both the wrong answer and `terminate called without an active exception`. This is because our code returned before the threads were done running! What happened? We didn't wait for them to finish! We can make one small change to our code to fix that!

In [None]:
#@title The output of the cell is formatted text double click the title to show or hide the raw code you can edit!
%%cpp -n parallel_2.cpp -s xcode

#include <iostream>
#include <thread>

int shared_counter = 0;

void increment_counter() {
    for (int i = 0; i < 100000; i++) {
        shared_counter++;
    }
}

int main() {
    // Construct two threads t1 and t2 and pass them the incredment_counter
    // function to run for us in parallel!
    std::thread t1(increment_counter);
    std::thread t2(increment_counter);

    // Wait for t1 and then t2 to finish!
    t1.join();
    t2.join();

    std::cout << "Final count: " << shared_counter << std::endl;
    return 0;
}


In [None]:
# Now lets compile our code into a working file! (Note we have to add -pthread)
!g++ parallel_2.cpp -o parallel_2.exe -pthread
# And now lets run that code!
!./parallel_2.exe

This likely removed your error but the answer is most likely still wrong. What is going on here? Well it turns out that the operation `shared_counter++;` has three steps to it:
1. First the thread loads in the value stored in the counter from RAM/Cache into a register.
2. Then it preforms the math on that value incrementing it.
3. Then it stores that result back into RAM/Cache

What's going wrong here is that there is no guarantee that both threads don't read the same value before either of them can write the updated value down which means that sometimes both try to update it to the same value!

It is also important to note that the join statements are blocking. That is, the code will wait for `t1` to finish and then wait for `t2` to finish. If you have unbalanced runtimes of your threads you'll need to be careful to schedule them appropriately or use more advanced measures (as described later).

## Atomics

In this example since we are just incrementing a variable we can simply use atomic operations to make sure that the three step process to do `++` occurs IN FULL before the other thread is able to read the value. This will hurt performance a bit but will ensure correctness.

In [None]:
#@title The output of the cell is formatted text double click the title to show or hide the raw code you can edit!
%%cpp -n parallel_atomic.cpp -s xcode

#include <iostream>
#include <thread>
#include <atomic>

// All you need to do is declare that variable an atomic variable and the
// compiler will take care of the rest for you!
std::atomic<int> shared_counter(0);

void increment_counter() {
    for (int i = 0; i < 100000; i++) {
        shared_counter++;
    }
}

int main() {
    std::thread t1(increment_counter);
    std::thread t2(increment_counter);

    t1.join();
    t2.join();

    std::cout << "Final count: " << shared_counter << std::endl;
    return 0;
}


In [None]:
# Now lets compile our code into a working file! (Note we have to add -pthread)
!g++ parallel_atomic.cpp -o parallel_atomic.exe -pthread
# And now lets run that code!
!./parallel_atomic.exe

## Mutexes and Advanced Synchronization

What if we need to do far more than one operation before we allow a thread to access a variable. For example, what if you needed to send an email every time the counter reached a multiple of 1,000 and then after the email was sent, depending upon a dice roll, either add 7 or just add 1. I'm not going to implement that chaotic function but you can see how you may need to hold the value for a longer period of time.

One way to do this is through mutexes which we'll use below. Here's a breakdown of the key elements of this example:

`std::mutex mtx;`: This is a mutex object, which is used to synchronize access to shared data. A mutex is essentially a lock that can be acquired and released by different threads.

`std::lock_guard<std::mutex> lock(mtx);`: This line creates a lock_guard object, which automatically acquires the mutex when it is constructed and releases the mutex when it is destroyed. The lock_guard object is scoped to the current block of code, so it releases the mutex automatically when the block is exited, even if an exception is thrown.

`mtx.lock();`: This function locks the mutex, so that no other thread can access the shared data until the mutex is unlocked.

`mtx.unlock();`: This function unlocks the mutex, allowing other threads to access the shared data.

It's important to note that using `std::lock_guard` to lock a mutex is exception-safe, this means that even if an exception is thrown within the scope of the lock_guard object, the lock will be released, this prevents deadlocks and ensures the program's correct behavior. This is not true for more simple implementations of such kinds of locks BUT does come with a small amount of additional overhead.

In [None]:
#@title The output of the cell is formatted text double click the title to show or hide the raw code you can edit!
%%cpp -n parallel_mutex.cpp -s xcode

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;  //mutex object to synchronize access to shared_counter
int shared_counter = 0;

void increment_counter() {
    for (int i = 0; i < 100000; i++) {
        std::lock_guard<std::mutex> lock(mtx);  //lock the mutex
        shared_counter++; //increment shared_counter
    }
}

int main() {
    std::thread t1(increment_counter);
    std::thread t2(increment_counter);

    t1.join(); // wait for t1 to finish
    t2.join(); // wait for t2 to finish

    std::cout << "Final count: " << shared_counter << std::endl;
    return 0;
}


In [None]:
# Now lets compile our code into a working file! (Note we have to add -pthread)
!g++ parallel_mutex.cpp -o parallel_mutex.exe -pthread
# And now lets run that code!
!./parallel_mutex.exe