*Your Name*

*Collaborator's Names*

# Locks!

Programs that are "pleasantly parallel" don't have to have their processes share any information about their "state". Unfortunately, not every problem we want to solve can be pleasantly parallel.

Its common for our programs to need to write something to file. As we'll see in this workbook, multiple processes writing to the same file at the same time is a recipe for disaster. We'll learn a couple ways to handle this problem.

## File Writing Conflicts

First we'll explore the worst case scenario: multiple processes writing to the same file at the same time. 

Take the following function as an example:

In [11]:
import os

def work(max_count, filename):

    for n in range(max_count):
        
        # Check for data already present
        if os.path.isfile(filename):
            f = open(filename, "r")
            try:
                nbr = int(f.read())
            except ValueError as err:
                # The flush argument helps print messages appear correctly
                # when multiple processes are trying to print
                print("File read error, starting to count from 0; error: " + str(err),
                      flush=True)
                nbr = 0
        else:
            print("File doesn't exist, starting to count from 0.", flush=True)
            nbr = 0 

        # Update data
        next = nbr + 1

        # Write update (overwrite existing data)
        f = open(filename, "w")
        f.write(str(next) + '\n')
        f.close()


This example is very artificial, but it mimics a case where processes are sharing information back and forth by writing it to file. This function makes a process check that file for the current result (`nbr = int(f.read())`), update it (`next = nbr + 1`), and then write out its update (`f.write(str(next) + '\n')`). In theory, another process can then read in these results and perform its own update.

If everything goes correctly, the number written to file should be the sum of each processes' value for `max_count`.

---
### Exercises

1. Run the `work` function on **one** process, with `max_count = 1000` and `filename = serial.txt`. What number is in this file when the function finishes? Does it meet your expectations?

Warning: If you run `work` multiple times when `serial.txt` already exists, the value inside will keep going up!

*Answer here*

In [7]:
!rm serial.txt

In [8]:
work(1000, "serial.txt")

with open("serial.txt", "r") as f:
    print(f.read())


File doesn't exist, starting to count from 0.
1000



It should be 1000, and is 1000.

2. Lets run this function on **two** processes using `Pool.map`. Each process should run with `filename = parallel.txt`. They should get `max_counts` of 1,000 and 2,000.

    This is our first time using `Pool.map` with a function that takes multiple arguments. We can use `functools.partial` to fix the values of arguments other than the first (i.e., we can set parameters other than `max_counts` to values of our choosing):

    `partial_func = functools.partial(work, filename="parallel.txt")`

    We can now use `partial_func` like any other function. It's one argument is `max_counts`.

Warning: If you run `work` multiple times when `parallel.txt` already exists, the value inside will keep going up!

In [14]:
import multiprocessing
import functools

with multiprocessing.Pool(processes=2) as p:
    p.map(functools.partial(work, filename="parallel.txt"), [1000, 2000])

File doesn't exist, starting to count from 0.File doesn't exist, starting to count from 0.

File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for int() with base 10: ''
File read error, starting to count from 0; error: invalid literal for i

In [13]:
!cat parallel.txt
!rm parallel.txt

828


3. You probably got an awful lot off errors! Check the number stored in `parallel.txt`. What value is written there? Given what was explained at the start of this section, what number do you expect to be there?

*Answer here*

It should be 3000 (1000 + 2000) but its only 392 (this number will be different)

---
### Explanation

When both processes first start running the `work` function, they see that no file named `parallel.txt` currently exists. Therefore, each process starts trying to count from zero. You should have seen the message "File doesn't exist, starting to count from 0" printed twice, reflecting this behavior.

It's arbitrary which process will succeed in writing to file first, an issue known as a "race condition." As one process is in the middle of writing, its possible that the other process will read the partially saved results. This partial data cannot be read properly (it appears as an empty string), causing the exception. We've structured our code so that when a process can't read the file's data, it assumes it should start the count at 0. Therefore, when the processes read the file as the other as writing, they reset the counter! This is why the final count is much lower than we expect.

Race conditions such as these can introduce bugs to our code that are hard to track down, as they are not reliably reproducible. We can use "locks" to prevent multiple processes from accessing the same resource (e.g. a file or variable) at the same time, preventing the race condition from creating bugs. We'll learn some implementations for locks in the following section.


## Applying Different Locks

There are a couple of different ways of locking access to a resource depending on what that resource is. We'll look at locking files and locking shared variables. We'll also be using shared variables more in the coming weeks.

### Locking Files

We can use the `fasteners` module in the standard library to ensure that only one process may write to a file at a time. This library makes use of *decorators*, a feature of the Python language that allows us to easily extend the behavior of functions.

Decorators make adding the fastener behavior easy:
```python
from fasteners import interprocess_locked

@interproess_locked('/tmp/tmp_lock')
def work(max_count, filename):
    ...
```

We have to provide a file name in which to store the lock information. 