# Practical 5: Concurrency Control (Part 2)

**Course**: BMCS3003 Distributed Systems and Parallel Computing

**Difficulty**: ⭐⭐ (Intermediate-Advanced)

**Estimated Time**: 90 minutes

**Prerequisites**:
- Practical 4: Critical sections, atomic operations
- Understanding of race conditions
- OpenMP parallel programming basics

## Learning Objectives

By the end of this practical, you will be able to:

1. Understand and implement barriers for thread synchronization
2. Use the `nowait` clause to improve performance
3. Work with C++ `std::mutex` and `std::lock_guard`
4. Identify and prevent deadlocks
5. Use `std::lock` for multiple mutex acquisition
6. Apply advanced synchronization patterns

## Table of Contents

1. [OpenMP Revision](#section1)
2. [Questions 1-3: Salary Calculation](#section2)
3. [Barriers in OpenMP](#section3)
4. [C++ Mutex and Thread Basics](#section4)
5. [Question 4: Mutex Lock Example](#section5)
6. [Deadlocks](#section6)
7. [Question 5: Deadlock Prevention](#section7)
8. [Summary](#section8)

<a id='section1'></a>
## 1. OpenMP Revision

### Key Concepts Recap

#### Parallel Region
```cpp
#pragma omp parallel
{
    // Code executed by all threads
}
```

#### Loop Parallelization
```cpp
#pragma omp parallel for
for (int i = 0; i < N; i++) {
    // Loop iterations distributed among threads
}
```

#### Reduction
```cpp
double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
    sum += array[i];
}
```

#### Single Directive
```cpp
#pragma omp parallel
{
    // All threads execute this
    
    #pragma omp single
    {
        // Only ONE thread executes this
        printf("Hello from one thread\n");
    }
    // Implicit barrier here - all threads wait
    
    // All threads continue
}
```

<a id='section2'></a>
## Questions 1-3: Company Salary Calculation

### Scenario

Two companies with 10,000,000 employees each. Same owner wants to calculate total salaries.

### Question 1: Basic Parallel Implementation

**Objective**: Calculate salaries using parallel reduction

#### Serial Code (P5Q1.cpp)

```cpp
#include <iostream>
#include <omp.h>

#define NUM_EMPLOYEES 10000000

int main() {
    long long salaries1 = 0, salaries2 = 0;
    
    double start = omp_get_wtime();
    
    // Calculate Company 1 salaries
    for (int i = 0; i < NUM_EMPLOYEES; i++) {
        salaries1 += 1362;  // Each employee earns 1362
    }
    
    // Calculate Company 2 salaries
    for (int i = 0; i < NUM_EMPLOYEES; i++) {
        salaries2 += 1465;  // Each employee earns 1465
    }
    
    double end = omp_get_wtime();
    
    printf("Salaries1: %lld\n", salaries1);
    printf("Salaries2: %lld\n", salaries2);
    printf("Time: %.6f seconds\n", end - start);
    
    return 0;
}
```

#### Parallel Implementation with Reduction

```cpp
#include <iostream>
#include <omp.h>

#define NUM_EMPLOYEES 10000000

int main() {
    long long salaries1 = 0, salaries2 = 0;
    
    double start = omp_get_wtime();
    
    #pragma omp parallel shared(salaries1, salaries2)
    {
        // Calculate Company 1 salaries
        #pragma omp for reduction(+:salaries1)
        for (int i = 0; i < NUM_EMPLOYEES; i++) {
            salaries1 += 1362;
        }
        
        // Calculate Company 2 salaries
        #pragma omp for reduction(+:salaries2)
        for (int i = 0; i < NUM_EMPLOYEES; i++) {
            salaries2 += 1465;
        }
    }
    
    double end = omp_get_wtime();
    
    printf("Salaries1: %lld\n", salaries1);
    printf("Salaries2: %lld\n", salaries2);
    printf("Time: %.6f seconds\n", end - start);
    
    return 0;
}
```

#### Output (Unorganized)

```
Salaries1: 1362004000
Salaries1: 1362004000
Salaries2: 1464690000
Salaries1: 1362004000
Salaries1: 1362004000
Salaries2: 1464690000
Salaries1: 1362004000
Salaries1: 1362004000
Salaries2: 1464690000
Salaries2: 1464690000
...
```

**Problem**: Each thread prints the result! Output is messy.

### Challenge: Print Output Line-by-Line (Correctly)

**Solution**: Use `#pragma omp critical` or `#pragma omp single`

#### Using Critical Section

```cpp
#pragma omp parallel shared(salaries1, salaries2)
{
    #pragma omp for reduction(+:salaries1)
    for (int i = 0; i < NUM_EMPLOYEES; i++) {
        salaries1 += 1362;
    }
    
    #pragma omp critical
    {
        printf("Salaries1: %lld\n", salaries1);
    }
    
    #pragma omp for reduction(+:salaries2)
    for (int i = 0; i < NUM_EMPLOYEES; i++) {
        salaries2 += 1465;
    }
    
    #pragma omp critical
    {
        printf("Salaries2: %lld\n", salaries2);
    }
}
```

**Better Output**:
```
Salaries1: 1362004000
Salaries1: 1362004000
...
Salaries2: 1464690000
Salaries2: 1464690000
```

Still prints multiple times (once per thread), but organized!

### Question 2: Single Reduction Clause

**Objective**: Use only ONE reduction clause to calculate both salaries

#### Implementation

```cpp
#include <iostream>
#include <omp.h>

#define NUM_EMPLOYEES 10000000

int main() {
    long long total_salaries = 0;
    
    double start = omp_get_wtime();
    
    // Combined loop with single reduction
    #pragma omp parallel for reduction(+:total_salaries)
    for (int i = 0; i < NUM_EMPLOYEES * 2; i++) {
        if (i < NUM_EMPLOYEES) {
            total_salaries += 1362;  // Company 1
        } else {
            total_salaries += 1465;  // Company 2
        }
    }
    
    double end = omp_get_wtime();
    
    // Calculate individual salaries
    long long salaries1 = (long long)NUM_EMPLOYEES * 1362;
    long long salaries2 = (long long)NUM_EMPLOYEES * 1465;
    
    printf("Salaries1: %lld\n", salaries1);
    printf("Salaries2: %lld\n", salaries2);
    printf("Total: %lld\n", total_salaries);
    printf("Time: %.6f seconds\n", end - start);
    
    return 0;
}
```

#### Output

```
Salaries1: 1362004000
Salaries2: 1464690000
Total: 2826694000
Time: 0.087391 seconds
```

**Benefit**: Single reduction, clean output!

### Question 3: Using `#pragma omp single`

**Objective**: Print results only once using `single` directive

#### Implementation

```cpp
#include <iostream>
#include <omp.h>

#define NUM_EMPLOYEES 10000000

int main() {
    long long salaries1 = 0, salaries2 = 0;
    
    double start = omp_get_wtime();
    
    #pragma omp parallel shared(salaries1, salaries2)
    {
        // Calculate Company 1 salaries
        #pragma omp for reduction(+:salaries1)
        for (int i = 0; i < NUM_EMPLOYEES; i++) {
            salaries1 += 1362;
        }
        
        // Only ONE thread prints (after barrier)
        #pragma omp single
        {
            printf("Salaries1: %lld\n", salaries1);
        }
        
        // Calculate Company 2 salaries
        #pragma omp for reduction(+:salaries2)
        for (int i = 0; i < NUM_EMPLOYEES; i++) {
            salaries2 += 1465;
        }
        
        // Only ONE thread prints (after barrier)
        #pragma omp single
        {
            printf("Salaries2: %lld\n", salaries2);
        }
    }
    
    double end = omp_get_wtime();
    printf("Time: %.6f seconds\n", end - start);
    
    return 0;
}
```

#### Output (Perfect!)

```
Salaries1: 1362004000
Salaries2: 1464690000
Time: 0.087391 seconds
```

**Key**: `#pragma omp single` ensures only ONE thread executes the block

<a id='section3'></a>
## 3. Barriers in OpenMP

### What is a Barrier?

A **barrier** is a synchronization point where all threads must wait until every thread reaches that point.

```
Thread 0:  ────────────────●─────────→
Thread 1:  ──────●         ↓         
Thread 2:  ───────────●    ↓         
Thread 3:  ──────────────●─┘ Barrier 
                           ↓
           All threads wait here until
           all have arrived
                           ↓
Thread 0:                  ●──────→
Thread 1:                  ●──────→
Thread 2:                  ●──────→
Thread 3:                  ●──────→
```

### Why Barriers?

**Example**: Computing array dependencies

```cpp
#pragma omp parallel
{
    int tid = omp_get_thread_num();
    
    // Step 1: Each thread computes x[tid]
    x[tid] = some_calculation();
    
    // PROBLEM: y[tid] depends on x[tid+1]
    // Need to ensure ALL x[] values are computed first!
    
    #pragma omp barrier  // Wait for all threads
    
    // Step 2: Now safe to use x[tid+1]
    y[tid] = x[tid] + x[tid+1];
}
```

### Explicit vs Implicit Barriers

#### Implicit Barriers (Automatic)

OpenMP automatically inserts barriers at:

1. **End of parallel region**
```cpp
#pragma omp parallel
{
    some_calculation();
} // Implicit barrier here
```

2. **End of work-sharing constructs**
```cpp
#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < N; i++) {
        x[i] = compute();
    }
    // Implicit barrier here
    
    #pragma omp for
    for (int i = 0; i < N; i++) {
        y[i] = x[i] * 2;  // Safe: all x[i] computed
    }
}
```

3. **After `#pragma omp single`**
```cpp
#pragma omp single
{
    printf("One thread\n");
}
// Implicit barrier - all threads wait
```

#### Explicit Barriers

```cpp
#pragma omp parallel
{
    phase1();
    
    #pragma omp barrier  // Explicit synchronization
    
    phase2();  // All threads finished phase1
}
```

### The `nowait` Clause

Remove implicit barriers to improve performance:

```cpp
#pragma omp parallel
{
    // Loop 1
    #pragma omp for nowait  // No barrier after this loop
    for (int i = 0; i < N; i++) {
        a[i] = compute_a(i);
    }
    // Threads can proceed immediately to next loop
    
    // Loop 2
    #pragma omp for
    for (int i = 0; i < N; i++) {
        b[i] = compute_b(i);  // Independent of a[]
    }
}
```

#### When to Use `nowait`

```cpp
// SAFE: Loops are independent
#pragma omp parallel
{
    #pragma omp for nowait
    for (int i = 0; i < N; i++) {
        a[i] = f(i);  // No dependency
    }
    
    #pragma omp for
    for (int i = 0; i < N; i++) {
        b[i] = g(i);  // Independent of a[]
    }
}

// UNSAFE: Second loop depends on first
#pragma omp parallel
{
    #pragma omp for nowait  // DANGEROUS!
    for (int i = 0; i < N; i++) {
        a[i] = f(i);
    }
    
    #pragma omp for
    for (int i = 0; i < N; i++) {
        b[i] = a[i] * 2;  // May read uncomputed a[i]!
    }
}
```

### Pros and Cons of Barriers

#### Pros
- **Correctness**: Avoid data races
- **Data dependencies**: Ensure computations complete
- **Predictable behavior**: Clear synchronization points

#### Cons
- **Performance penalty**: Threads wait (idle time)
- **Load imbalance**: Fast threads wait for slow threads
- **Deadlock risk**: If barrier is in conditional code

```cpp
// DEADLOCK RISK!
#pragma omp parallel
{
    if (omp_get_thread_num() == 0) {
        #pragma omp barrier  // Only thread 0 reaches
    }
    // Other threads wait forever!
}
```

**Rule**: ALL threads in a team MUST reach the barrier!

<a id='section4'></a>
## 4. C++ Mutex and Thread Basics

### Revisiting C++ Threads (from Practical 2)

```cpp
#include <thread>
#include <iostream>

void worker(int id) {
    std::cout << "Thread " << id << " working\n";
}

int main() {
    std::thread t1(worker, 1);
    std::thread t2(worker, 2);
    
    t1.join();  // Wait for t1
    t2.join();  // Wait for t2
    
    return 0;
}
```

### C++ Mutex

A **mutex** (mutual exclusion) is a lock that ensures only one thread accesses a resource at a time.

#### Basic Usage

```cpp
#include <mutex>
#include <thread>
#include <iostream>

std::mutex mtx;  // Mutex object
int counter = 0;

void increment(int n) {
    for (int i = 0; i < n; i++) {
        mtx.lock();    // Acquire lock
        counter++;
        mtx.unlock();  // Release lock
    }
}

int main() {
    std::thread t1(increment, 1000);
    std::thread t2(increment, 1000);
    
    t1.join();
    t2.join();
    
    std::cout << "Counter: " << counter << std::endl;  // 2000
    return 0;
}
```

### Problem with Manual Lock/Unlock

```cpp
void dangerous_function() {
    mtx.lock();
    
    if (error_condition) {
        return;  // BUG: Forgot to unlock!
    }
    
    risky_operation();  // May throw exception
    
    mtx.unlock();  // May never reach here!
}
```

**Problems**:
1. Easy to forget `unlock()`
2. Early returns skip `unlock()`
3. Exceptions skip `unlock()`

### Solution: `std::lock_guard`

**RAII** (Resource Acquisition Is Initialization) pattern:

```cpp
#include <mutex>
#include <thread>
#include <iostream>

std::mutex mtx;
int counter = 0;

void increment(int n) {
    for (int i = 0; i < n; i++) {
        // Lock acquired in constructor
        std::lock_guard<std::mutex> lock(mtx);
        
        counter++;
        
        // Lock automatically released in destructor
        // when lock goes out of scope
    }
}

int main() {
    std::thread t1(increment, 1000);
    std::thread t2(increment, 1000);
    
    t1.join();
    t2.join();
    
    std::cout << "Counter: " << counter << std::endl;
    return 0;
}
```

**Benefits**:
- Automatic unlock (even with exceptions)
- No manual unlock needed
- Exception-safe

<a id='section5'></a>
## Question 4: Counting Even Numbers with Mutex

### Problem Statement

Count how many even numbers are in an array using multiple threads. Use mutex to protect the shared counter.

### Without Mutex (Incorrect)

```cpp
#include <thread>
#include <iostream>
#include <vector>

#define N 1000000

int numEven = 0;  // Shared counter
int data[N];

void countEven(int start, int end) {
    for (int i = start; i < end; i++) {
        if (data[i] % 2 == 0) {
            numEven++;  // RACE CONDITION!
        }
    }
}

int main() {
    // Initialize data
    for (int i = 0; i < N; i++) {
        data[i] = i;
    }
    
    // Create threads
    const int NUM_THREADS = 4;
    std::vector<std::thread> threads;
    int chunk = N / NUM_THREADS;
    
    for (int i = 0; i < NUM_THREADS; i++) {
        int start = i * chunk;
        int end = (i == NUM_THREADS - 1) ? N : (i + 1) * chunk;
        threads.push_back(std::thread(countEven, start, end));
    }
    
    // Wait for all threads
    for (auto& t : threads) {
        t.join();
    }
    
    std::cout << "Even numbers: " << numEven << std::endl;
    std::cout << "Expected: " << N/2 << std::endl;
    
    return 0;
}
```

#### Output (Incorrect)
```
Even numbers: 498742
Expected: 500000
```

Lost updates due to race condition!

### With Mutex (Correct)

```cpp
#include <thread>
#include <mutex>
#include <iostream>
#include <vector>

#define N 1000000

int numEven = 0;
int data[N];
std::mutex increment_mutex;  // Mutex to protect numEven

void countEven(int start, int end) {
    for (int i = start; i < end; i++) {
        if (data[i] % 2 == 0) {
            // Lock mutex before updating
            increment_mutex.lock();
            numEven++;
            std::cout << "thread: " << numEven << std::endl;  // Debug output
            increment_mutex.unlock();
        }
    }
}

int main() {
    // Initialize data
    for (int i = 0; i < N; i++) {
        data[i] = i;
    }
    
    // Create threads
    const int NUM_THREADS = 4;
    std::vector<std::thread> threads;
    int chunk = N / NUM_THREADS;
    
    for (int i = 0; i < NUM_THREADS; i++) {
        int start = i * chunk;
        int end = (i == NUM_THREADS - 1) ? N : (i + 1) * chunk;
        threads.push_back(std::thread(countEven, start, end));
    }
    
    // Wait for all threads
    for (auto& t : threads) {
        t.join();
    }
    
    std::cout << "\nEven numbers: " << numEven << std::endl;
    std::cout << "Expected: " << N/2 << std::endl;
    
    return 0;
}
```

#### Output (Correct)
```
thread: 1
thread: 2
thread: 3
...
thread: 499999
thread: 500000

Even numbers: 500000
Expected: 500000
```

### Improved Version with `lock_guard`

```cpp
#include <thread>
#include <mutex>
#include <iostream>
#include <vector>

#define N 1000000

int numEven = 0;
int data[N];
std::mutex increment_mutex;

void countEven(int start, int end) {
    for (int i = start; i < end; i++) {
        if (data[i] % 2 == 0) {
            // RAII: Automatic lock/unlock
            std::lock_guard<std::mutex> lock(increment_mutex);
            numEven++;
            std::cout << "thread: " << numEven << std::endl;
            // Lock automatically released here
        }
    }
}

int main() {
    for (int i = 0; i < N; i++) {
        data[i] = i;
    }
    
    const int NUM_THREADS = 4;
    std::vector<std::thread> threads;
    int chunk = N / NUM_THREADS;
    
    for (int i = 0; i < NUM_THREADS; i++) {
        int start = i * chunk;
        int end = (i == NUM_THREADS - 1) ? N : (i + 1) * chunk;
        threads.push_back(std::thread(countEven, start, end));
    }
    
    for (auto& t : threads) {
        t.join();
    }
    
    std::cout << "\nEven numbers: " << numEven << std::endl;
    
    return 0;
}
```

### Performance Consideration

**Problem**: Locking for every increment is expensive!

**Better approach**: Use local counter

```cpp
void countEven(int start, int end) {
    int local_count = 0;  // Thread-local
    
    // Count locally (no locking)
    for (int i = start; i < end; i++) {
        if (data[i] % 2 == 0) {
            local_count++;
        }
    }
    
    // Update shared counter ONCE
    std::lock_guard<std::mutex> lock(increment_mutex);
    numEven += local_count;
}
```

Much faster! Lock acquired only once per thread.

<a id='section6'></a>
## 6. Deadlocks

### What is a Deadlock?

**Deadlock** occurs when threads wait for each other indefinitely, and no thread can proceed.

### Classic Example: Dining Philosophers

```
    Fork 0
      |
Phil 0   Phil 1
  \        /
    Fork 1
      |
    Phil 2
```

Each philosopher needs TWO forks to eat.

### Deadlock Scenario

```cpp
std::mutex fork1, fork2;

void philosopher_A() {
    fork1.lock();       // A gets fork1
    // Context switch...
    fork2.lock();       // A wants fork2 (held by B)
    
    // Eat...
    
    fork2.unlock();
    fork1.unlock();
}

void philosopher_B() {
    fork2.lock();       // B gets fork2
    // Context switch...
    fork1.lock();       // B wants fork1 (held by A)
    
    // Eat...
    
    fork1.unlock();
    fork2.unlock();
}
```

```
Time →

Thread A: [Get fork1] Wait for fork2...(BLOCKED FOREVER)
Thread B:             [Get fork2] Wait for fork1...(BLOCKED FOREVER)
                          ↑
                       DEADLOCK!
```

### Deadlock Conditions (Coffman Conditions)

Deadlock occurs when ALL four conditions are true:

1. **Mutual Exclusion**: Resources cannot be shared
2. **Hold and Wait**: Thread holds resource while waiting for another
3. **No Preemption**: Cannot forcibly take resources from thread
4. **Circular Wait**: Cycle of threads waiting for each other

### Prevention Strategies

Break at least one of the four conditions:

#### Strategy 1: Lock Ordering

Always acquire locks in the same order:

```cpp
std::mutex fork1, fork2;

void philosopher_A() {
    fork1.lock();   // Order: 1, 2
    fork2.lock();
    // Eat...
    fork2.unlock();
    fork1.unlock();
}

void philosopher_B() {
    fork1.lock();   // Order: 1, 2 (same!)
    fork2.lock();
    // Eat...
    fork2.unlock();
    fork1.unlock();
}
// No deadlock: Both acquire in same order
```

#### Strategy 2: Try-Lock

```cpp
void philosopher() {
    while (true) {
        fork1.lock();
        
        if (fork2.try_lock()) {
            // Got both forks!
            // Eat...
            fork2.unlock();
            fork1.unlock();
            break;
        } else {
            // Couldn't get fork2, release fork1
            fork1.unlock();
            // Wait and retry
        }
    }
}
```

#### Strategy 3: `std::lock` (Best!)

Lock multiple mutexes atomically:

```cpp
void philosopher() {
    // Atomically acquire both locks (no deadlock!)
    std::lock(fork1, fork2);
    
    // Use RAII to manage unlocking
    std::lock_guard<std::mutex> lock1(fork1, std::adopt_lock);
    std::lock_guard<std::mutex> lock2(fork2, std::adopt_lock);
    
    // Eat...
    
    // Automatic unlock
}
```

<a id='section7'></a>
## Question 5: Preventing Deadlock

### Scenario: Making Apple Juice

Two resources needed:
- **Bag of pears** (m_bag)
- **Print access** (m_print)

### Deadlock Code (P5Q5.cpp)

```cpp
#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>

std::mutex m_print;  // Mutex for console output
std::mutex m_bag;    // Mutex for pear bag

void make_juice(int person_id) {
    // Person grabs print access first
    m_print.lock();
    std::cout << "Person " << person_id << ": Got print access\n";
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    
    // Then tries to grab bag
    std::cout << "Person " << person_id << ": Waiting for bag...\n";
    m_bag.lock();  // DEADLOCK RISK!
    std::cout << "Person " << person_id << ": Got bag\n";
    
    // Make juice...
    std::cout << "Person " << person_id << ": Making juice...\n";
    
    m_bag.unlock();
    m_print.unlock();
}

void throw_pears_from_bag() {
    // Grabs bag first
    m_bag.lock();
    std::cout << "Throwing pears: Got bag\n";
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    
    // Then tries to print
    std::cout << "Throwing pears: Waiting for print...\n";
    m_print.lock();  // DEADLOCK!
    std::cout << "Throwing pears: Got print\n";
    
    std::cout << "Throwing pears: I threw out all pears!\n";
    
    m_print.unlock();
    m_bag.unlock();
}

int main() {
    std::cout << "Making an apple juice...\n";
    
    std::thread person1(make_juice, 1);
    std::thread thrower(throw_pears_from_bag);
    
    person1.join();
    thrower.join();
    
    std::cout << "I made an excellent apple juice for you.\n";
    return 0;
}
```

#### Output (Deadlock)
```
Making an apple juice...
Person 1: Got print access
Throwing pears: Got bag
Person 1: Waiting for bag...
Throwing pears: Waiting for print...
[HANGS FOREVER]
```

**Analysis**:
- Person 1: Has m_print, wants m_bag
- Thrower: Has m_bag, wants m_print
- Circular wait → Deadlock!

### Solution: Use `std::lock`

```cpp
#include <iostream>
#include <thread>
#include <mutex>
#include <chrono>

std::mutex m_print;
std::mutex m_bag;

void make_juice(int person_id) {
    // Atomically lock BOTH mutexes (no deadlock!)
    std::lock(m_print, m_bag);
    
    // Use lock_guard with adopt_lock (already locked)
    std::lock_guard<std::mutex> print_lock(m_print, std::adopt_lock);
    std::lock_guard<std::mutex> bag_lock(m_bag, std::adopt_lock);
    
    std::cout << "Person " << person_id << ": Got both resources\n";
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    
    std::cout << "Person " << person_id << ": Making juice...\n";
    
    // Automatic unlock when lock_guards go out of scope
}

void throw_pears_from_bag() {
    // Same: atomically lock both
    std::lock(m_print, m_bag);
    
    std::lock_guard<std::mutex> print_lock(m_print, std::adopt_lock);
    std::lock_guard<std::mutex> bag_lock(m_bag, std::adopt_lock);
    
    std::cout << "Throwing pears: Got both resources\n";
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    
    std::cout << "Throwing pears: I threw out all pears!\n";
    
    // Automatic unlock
}

int main() {
    std::cout << "Making an apple juice...\n";
    
    std::thread person1(make_juice, 1);
    std::thread thrower(throw_pears_from_bag);
    
    person1.join();
    thrower.join();
    
    std::cout << "I made an excellent apple juice for you.\n";
    std::cout << "Bag: x x x x x x x x\n";
    
    return 0;
}
```

#### Output (Success!)
```
Making an apple juice...
Person 1: Got both resources
Person 1: Making juice...
Throwing pears: Got both resources
Throwing pears: I threw out all pears!
I made an excellent apple juice for you.
Bag: x x x x x x x x
Press any key to continue...
```

### How `std::lock` Works

```cpp
std::lock(mutex1, mutex2, ...);  // Locks all mutexes
```

**Algorithm**:
1. Try to lock all mutexes
2. If any lock fails, unlock all and retry
3. Ensures either ALL are locked or NONE are locked
4. Avoids deadlock by preventing partial acquisition

### `std::adopt_lock` Explained

```cpp
std::lock(m1, m2);  // m1 and m2 are now LOCKED

// Tell lock_guard: "I already locked this, just manage unlocking"
std::lock_guard<std::mutex> g1(m1, std::adopt_lock);
std::lock_guard<std::mutex> g2(m2, std::adopt_lock);

// When g1 and g2 go out of scope, they unlock m1 and m2
```

**Without `std::adopt_lock`**:
```cpp
std::lock(m1, m2);  // Already locked
std::lock_guard<std::mutex> g1(m1);  // Tries to lock again → DEADLOCK!
```

<a id='section8'></a>
## Summary and Key Concepts

### OpenMP Synchronization

| Mechanism | Purpose | Example |
|-----------|---------|--------|
| **Barrier** | Wait for all threads | `#pragma omp barrier` |
| **Implicit barrier** | Automatic sync points | End of `for`, `parallel`, `single` |
| **nowait** | Skip implicit barrier | `#pragma omp for nowait` |
| **single** | Execute once | `#pragma omp single` |

### C++ Thread Synchronization

| Mechanism | Purpose | Example |
|-----------|---------|--------|
| **std::mutex** | Basic lock | `mtx.lock(); mtx.unlock();` |
| **std::lock_guard** | RAII lock | `std::lock_guard<std::mutex> lock(mtx);` |
| **std::lock** | Multiple locks | `std::lock(m1, m2);` |
| **std::adopt_lock** | Pre-locked flag | `lock_guard(m, std::adopt_lock)` |

### Deadlock Prevention Strategies

1. **Lock ordering**: Always acquire locks in same order
2. **std::lock**: Atomically acquire multiple locks
3. **try_lock**: Non-blocking lock attempt
4. **Timeout**: Give up after waiting too long
5. **Avoid nesting**: Use single lock when possible

### Best Practices

#### DO:
- ✅ Use `lock_guard` for automatic unlock
- ✅ Use `std::lock` for multiple mutexes
- ✅ Keep critical sections small
- ✅ Use `reduction` for accumulation
- ✅ Profile before optimizing

#### DON'T:
- ❌ Hold locks longer than necessary
- ❌ Acquire locks in different orders
- ❌ Forget to unlock (use RAII!)
- ❌ Put barriers in conditional code
- ❌ Use `nowait` with dependencies

### Performance Comparison Summary

**From all practicals**:

| Technique | Overhead | Use Case |
|-----------|----------|----------|
| **No synchronization** | None | Independent work |
| **Atomic** | Low | Simple updates |
| **Reduction** | Very Low | Accumulation |
| **Critical section** | Medium | Complex updates |
| **Mutex** | Medium | C++ threads |
| **Barrier** | Medium-High | Phase synchronization |

### Common Pitfalls Recap

1. **Race conditions**: Unprotected shared data
2. **Deadlocks**: Circular waiting for locks
3. **False sharing**: Variables on same cache line
4. **Lock contention**: Too many threads waiting
5. **Missing unlock**: Forgetting to release lock

### Real-World Applications

- **Web servers**: Handle multiple client requests
- **Databases**: Concurrent transaction processing
- **Game engines**: Physics, rendering, AI in parallel
- **Scientific computing**: Parallel simulations
- **Machine learning**: Parallel training

### Further Learning

1. **C++20 features**: `std::jthread`, `std::counting_semaphore`
2. **Lock-free programming**: Atomic operations, memory ordering
3. **Task parallelism**: OpenMP tasks, TBB
4. **GPU programming**: CUDA, OpenCL, SYCL
5. **Distributed systems**: MPI, MapReduce

---

**End of Practical 5**

**Congratulations!** You have completed the Parallel Processing Practicals series. You now understand:
- Threading and processes
- OpenMP directives
- Memory management
- Synchronization mechanisms
- Deadlock prevention
- Performance optimization

**Keep practicing** and applying these concepts to real-world problems!