## I. Race Condition - What happen if we don't protect our data structure ?

In this example, many threads are writing in the same file. If they attempt to access the same file at the same exact time, it can lead to a race condition, and one thread or many threads may not write the information. 

The file must be proctected by a **mutex lock**. With this lock, only one thread can write in the file while the other are sleeping. In that manner, the writing in file operation become thread safe, but at the cost the acquire/release time of the lock.


In [3]:
import sys

module_path = '..'
if module_path not in sys.path:
    sys.path.append(module_path)

from utils import perf_decorator

In [4]:
from time import sleep
from threading import Thread, Lock
from numpy import random

In [5]:
def write_in_file(filename, lock=None):
    """ Write in filename in append mode no lock"""
    sleep(0.001)
    for i in range(100):
        if lock:
            lock.acquire()
            with open(filename, 'a') as f:
                sleep(0.01)
                f.write(f"{random.random()}\n")
            lock.release()
        else:
            with open(filename, 'a') as f:
                sleep(0.01)
                f.write(f"{random.random()}\n")         

In [6]:
def count_lines_in_file(filename):
    """ Count line number i filename """
    counter = 0
    with open(filename, 'r') as f:
        for line in f:
            counter +=1
    return counter

In [7]:
@perf_decorator
def main_not_protected():
    """ Write with 20 threads in the same file  """

    fname = "not_protected.txt"
    
    # erase content of previous file
    with open(fname, "w") as f:
        pass
    
    workers = [Thread(target = write_in_file, args=(fname,)) for _ in range(20)]
    
    for worker in workers:
        worker.start() # start all threads
    
    for worker in workers:
        worker.join() # wait for all threads to finish

    n_lines = count_lines_in_file(fname)
    print(f"There are {n_lines} lines in {fname}, there should be {20*100}.")
    print(f"Missing Line: {20*100-n_lines} lines")

In [8]:
main_not_protected()

There are 1968 lines in not_protected.txt, there should be 2000.
Missing Line: 32 lines
main_not_protected execution time 1.34s


In [9]:
@perf_decorator
def main_protected():
    """ Write with 20 threads in the same file  """

    fname = "protected.txt"
    
    # erase content of previous file
    with open(fname, "w") as f:
        pass

    lock = Lock()
    workers = [Thread(target = write_in_file, args=(fname,lock)) for _ in range(20)]
    
    for worker in workers:
        worker.start() # start all threads
    
    for worker in workers:
        worker.join() # wait for all threads to finish

    n_lines = count_lines_in_file(fname)
    print(f"There are {n_lines} lines in {fname}, there should be {20*100}.")
    print(f"Missing Line: {20*100-n_lines} lines")

In [10]:
main_protected()

There are 2000 lines in protected.txt, there should be 2000.
Missing Line: 0 lines
main_protected execution time 23.77s


In [11]:
@perf_decorator
def main_sequential():
    """ Write with 20 threads in the same file  """

    fname = "sequential.txt"
    
    # erase content of previous file
    with open(fname, "w") as f:
        pass

    for _ in range(20):
        write_in_file(fname)

    n_lines = count_lines_in_file(fname)
    print(f"There are {n_lines} lines in {fname}, there should be {20*100}.")
    print(f"Missing Line: {20*100-n_lines} lines")

In [12]:
main_sequential()

There are 2000 lines in sequential.txt, there should be 2000.
Missing Line: 0 lines
main_sequential execution time 24.25s
