# Practical 2: Inter-process Communication

**Course**: BMCS3003 Distributed Systems and Parallel Computing

**Difficulty**: ⭐⭐ (Intermediate)

**Estimated Time**: 90 minutes

**Prerequisites**: 
- Basic understanding of C++ programming
- Familiarity with processes and threads concepts
- Understanding of network communication basics

## Learning Objectives

By the end of this practical, you will be able to:

1. Implement multi-threaded programs using C++ Standard Library (`std::thread`)
2. Create and manage POSIX threads (pthread) for parallel execution
3. Build UDP client-server communication systems
4. Combine threading with network communication for concurrent data exchange
5. Measure and compare performance between sequential and parallel implementations

## Table of Contents

1. [Introduction to Multi-threading](#section1)
2. [Standard C++ Threading (`std::thread`)](#section2)
3. [POSIX Threads (pthread)](#section3)
4. [UDP Socket Communication](#section4)
5. [Multi-threaded UDP Communication](#section5)
6. [Performance Analysis](#section6)
7. [Summary and Key Takeaways](#section7)

<a id='section1'></a>
## 1. Introduction to Multi-threading

### What is Multi-threading?

Multi-threading allows a program to run multiple tasks simultaneously within a single process. This is crucial for:
- **Responsiveness**: Keep UI responsive while performing background computations
- **Performance**: Utilize multiple CPU cores for parallel processing
- **Resource Sharing**: Threads share the same memory space, making communication efficient

### Key Concepts

```
Process
├── Memory Space (shared)
├── Thread 1
│   ├── Stack (private)
│   └── Registers (private)
├── Thread 2
│   ├── Stack (private)
│   └── Registers (private)
└── Thread N
    ├── Stack (private)
    └── Registers (private)
```

**Important**: Each thread has its own stack and execution context, but shares the process's heap memory.

<a id='section2'></a>
## 2. Standard C++ Threading (`std::thread`)

### Overview

C++11 introduced the `<thread>` library, providing a platform-independent way to create and manage threads.

### Basic Thread Operations

#### Creating a Thread

```cpp
#include <thread>
#include <iostream>

// Function to be executed by the thread
void worker_function() {
    std::cout << "Hello from worker thread!" << std::endl;
}

int main() {
    // Create a thread that executes worker_function
    std::thread worker(worker_function);
    
    // Wait for the thread to complete
    worker.join();
    
    return 0;
}
```

### Thread Lifecycle

```
CREATE → RUNNING → COMPLETE
           ↓
        .join()
           ↓
      Main waits here
```

### Question 1: Multi-threaded Performance Comparison

**Objective**: Modify the provided code to support 8 threads running concurrently with a higher count, and compare with sequential processing.

#### Original Code Structure (P2Q1.cpp)

```cpp
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>

// Function to perform computation
void compute(int id, int count) {
    auto start = std::chrono::high_resolution_clock::now();
    
    long long sum = 0;
    for (int i = 0; i < count; i++) {
        sum += i * i;  // Some computation
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    std::cout << "Thread " << id << " execution time: " 
              << duration.count() << " ms" << std::endl;
}

int main() {
    const int NUM_THREADS = 2;  // Modify to 8
    const int COUNT = 10000000; // Increase for better comparison
    
    // Parallel execution
    auto parallel_start = std::chrono::high_resolution_clock::now();
    
    std::vector<std::thread> threads;
    for (int i = 0; i < NUM_THREADS; i++) {
        threads.push_back(std::thread(compute, i, COUNT));
    }
    
    // Wait for all threads to complete
    for (auto& t : threads) {
        t.join();
    }
    
    auto parallel_end = std::chrono::high_resolution_clock::now();
    auto parallel_duration = std::chrono::duration_cast<std::chrono::milliseconds>(
        parallel_end - parallel_start
    );
    
    std::cout << "\nTotal parallel execution time: " 
              << parallel_duration.count() << " ms" << std::endl;
    
    return 0;
}
```

#### Step-by-Step Solution

**Step 1**: Increase the number of threads to 8
```cpp
const int NUM_THREADS = 8;  // Changed from 2 to 8
```

**Step 2**: Increase the workload for better measurement
```cpp
const int COUNT = 100000000;  // Increased from 10000000
```

**Step 3**: Add sequential execution for comparison
```cpp
// Sequential execution (for comparison)
auto sequential_start = std::chrono::high_resolution_clock::now();

for (int i = 0; i < NUM_THREADS; i++) {
    compute(i, COUNT);  // Run sequentially (no threads)
}

auto sequential_end = std::chrono::high_resolution_clock::now();
auto sequential_duration = std::chrono::duration_cast<std::chrono::milliseconds>(
    sequential_end - sequential_start
);

std::cout << "\nTotal sequential execution time: " 
          << sequential_duration.count() << " ms" << std::endl;
```

**Step 4**: Calculate speedup
```cpp
double speedup = (double)sequential_duration.count() / parallel_duration.count();
std::cout << "\nSpeedup: " << speedup << "x" << std::endl;
```

### Expected Output Analysis

```
Thread 0 execution time: 245 ms
Thread 1 execution time: 248 ms
Thread 2 execution time: 247 ms
Thread 3 execution time: 246 ms
Thread 4 execution time: 249 ms
Thread 5 execution time: 245 ms
Thread 6 execution time: 248 ms
Thread 7 execution time: 246 ms

Total parallel execution time: 250 ms
Total sequential execution time: 1960 ms

Speedup: 7.84x
```

#### Key Observations:

1. **Near-linear speedup**: With 8 threads on an 8-core CPU, we get approximately 8x speedup
2. **Parallel execution time**: Limited by the slowest thread
3. **Why not perfect 8x?**: Thread creation overhead, context switching, and CPU scheduling

### Common Pitfalls and Troubleshooting

#### Problem 1: Forgetting to join threads
```cpp
// BAD: Program may terminate before threads complete
std::thread worker(some_function);
// Missing worker.join();
return 0;  // Undefined behavior!
```

**Solution**: Always call `.join()` or `.detach()`

#### Problem 2: Data races
```cpp
int counter = 0;  // Shared variable

void increment() {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // NOT thread-safe!
    }
}
```

**Solution**: Use synchronization primitives (covered in later practicals)

<a id='section3'></a>
## 3. POSIX Threads (pthread)

### What is POSIX Threads?

POSIX Threads (Pthreads) is a standardized C API for thread programming. It's available on Unix-like systems (Linux, macOS) and Windows (via pthreads-w32 library).

### Key Differences: std::thread vs pthread

| Feature | std::thread | pthread |
|---------|------------|----------|
| Language | C++ (C++11+) | C |
| Portability | Cross-platform | Unix/POSIX systems |
| Ease of use | High (RAII) | Low (manual management) |
| Control | Limited | Fine-grained |

### Basic Pthread Operations

#### Creating a Thread

```cpp
#include <pthread.h>
#include <iostream>

// Thread function signature: void* function(void* arg)
void* worker_function(void* arg) {
    int* id = (int*)arg;
    std::cout << "Hello from thread " << *id << std::endl;
    return NULL;
}

int main() {
    pthread_t thread_id;  // Thread identifier
    int thread_arg = 1;
    
    // Create thread: pthread_create(thread_id, attributes, function, argument)
    pthread_create(&thread_id, NULL, worker_function, &thread_arg);
    
    // Wait for thread to complete
    pthread_join(thread_id, NULL);
    
    return 0;
}
```

### pthread_create Parameters

```cpp
int pthread_create(
    pthread_t *thread,              // Output: thread identifier
    const pthread_attr_t *attr,     // Thread attributes (NULL = default)
    void *(*start_routine)(void *), // Function to execute
    void *arg                       // Argument passed to function
);
```

#### Parameter Details:

1. **thread**: Pointer where thread ID will be stored
2. **attr**: Thread attributes (priority, stack size, etc.) - NULL for defaults
3. **start_routine**: Function pointer with signature `void* func(void*)`
4. **arg**: Single argument passed to the thread function

### Setting Up pthread on Windows

#### Step 1: Install vcpkg (Package Manager)

1. Download vcpkg from: https://github.com/Microsoft/vcpkg
2. Extract and navigate to the vcpkg directory
3. Run the bootstrap script:
   ```bash
   bootstrap-vcpkg.bat
   ```

#### Step 2: Install pthread Library

```bash
vcpkg install pthread
vcpkg integrate install
vcpkg install pthreads:x64-windows
```

#### Step 3: Configure Visual Studio

The `vcpkg integrate install` command automatically configures Visual Studio to find the pthread headers and libraries.

#### Verification

If you see this error:
```
Error E1696: cannot open source file "pthread.h"
```

You need to install pthread using the steps above.

### Question 2: Running pthread Example

**Objective**: Compile and run the pthread example code (P2Q2.cpp)

#### Example Code (P2Q2.cpp)

```cpp
#include <pthread.h>
#include <iostream>
#include <unistd.h>  // For sleep()

// Thread function to print numbers
void* print_numbers(void* arg) {
    int* thread_id = (int*)arg;
    
    for (int i = 1; i <= 5; i++) {
        std::cout << "Thread " << *thread_id << ": " << i << std::endl;
        sleep(1);  // Sleep for 1 second
    }
    
    return NULL;
}

int main() {
    const int NUM_THREADS = 3;
    pthread_t threads[NUM_THREADS];
    int thread_ids[NUM_THREADS];
    
    // Create threads
    for (int i = 0; i < NUM_THREADS; i++) {
        thread_ids[i] = i + 1;
        
        int result = pthread_create(
            &threads[i],         // Thread ID
            NULL,                // Default attributes
            print_numbers,       // Function to execute
            &thread_ids[i]       // Argument to function
        );
        
        if (result != 0) {
            std::cerr << "Error creating thread " << i << std::endl;
            return 1;
        }
    }
    
    // Wait for all threads to complete
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
    }
    
    std::cout << "All threads completed!" << std::endl;
    
    return 0;
}
```

#### Expected Output

```
Thread 1: 1
Thread 2: 1
Thread 3: 1
Thread 1: 2
Thread 2: 2
Thread 3: 2
Thread 1: 3
Thread 2: 3
Thread 3: 3
...
All threads completed!
```

**Note**: The output order may vary because threads execute concurrently!

### Compilation Instructions

#### Linux/macOS:
```bash
g++ -o P2Q2 P2Q2.cpp -lpthread
./P2Q2
```

#### Windows (Visual Studio):
1. Ensure vcpkg is configured
2. Build the project normally (Ctrl+B)
3. Run from Debug menu (F5)

#### Windows (Command Line with MSVC):
```bash
cl /EHsc P2Q2.cpp /link pthreadVC2.lib
P2Q2.exe
```

<a id='section4'></a>
## 4. UDP Socket Communication

### What is UDP?

**UDP (User Datagram Protocol)** is a connectionless protocol that sends data packets without establishing a connection.

#### UDP vs TCP Comparison

| Feature | UDP | TCP |
|---------|-----|-----|
| Connection | Connectionless | Connection-oriented |
| Reliability | No guarantee | Reliable delivery |
| Speed | Fast | Slower |
| Use case | Streaming, gaming | File transfer, web |
| Overhead | Low | High |

### UDP Communication Flow

```
CLIENT                    SERVER
  |                          |
  |   1. Create socket       |   1. Create socket
  |                          |   2. Bind to port
  |                          |   3. Wait for data
  |                          |      (recvfrom)
  |   2. Send message        |         |
  |      (sendto)            |         |
  |------------------------>|         |
  |                          |   4. Receive data
  |                          |   5. Process
  |   3. Receive reply       |   6. Send reply
  |<------------------------|      (sendto)
  |   4. Process reply       |
  |                          |
```

### Key Socket Functions

```cpp
// Create a socket
int sockfd = socket(AF_INET, SOCK_DGRAM, 0);
//                   IPv4     UDP       protocol

// Send data
sendto(sockfd, buffer, length, 0, &dest_addr, sizeof(dest_addr));

// Receive data
recvfrom(sockfd, buffer, length, 0, &src_addr, &addr_len);

// Close socket
close(sockfd);  // Linux/macOS
closesocket(sockfd);  // Windows
```

### Question 3: Basic UDP Client-Server

**Objective**: Implement UDP communication between client and server on localhost (127.0.0.1)

#### UDP Server Code

```cpp
#include <iostream>
#include <string>
#include <cstring>

// Platform-specific includes
#ifdef _WIN32
    #include <winsock2.h>
    #pragma comment(lib, "ws2_32.lib")
    typedef int socklen_t;
#else
    #include <sys/socket.h>
    #include <netinet/in.h>
    #include <arpa/inet.h>
    #include <unistd.h>
    #define closesocket close
#endif

int main() {
    const int PORT = 8080;
    const int BUFFER_SIZE = 1024;
    
#ifdef _WIN32
    // Initialize Winsock on Windows
    WSADATA wsa;
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0) {
        std::cerr << "WSAStartup failed" << std::endl;
        return 1;
    }
#endif
    
    // Step 1: Create socket
    int server_socket = socket(AF_INET, SOCK_DGRAM, 0);
    if (server_socket < 0) {
        std::cerr << "Failed to create socket" << std::endl;
        return 1;
    }
    std::cout << "Socket created successfully" << std::endl;
    
    // Step 2: Configure server address
    struct sockaddr_in server_addr, client_addr;
    memset(&server_addr, 0, sizeof(server_addr));
    
    server_addr.sin_family = AF_INET;           // IPv4
    server_addr.sin_addr.s_addr = INADDR_ANY;   // Listen on all interfaces
    server_addr.sin_port = htons(PORT);         // Port number
    
    // Step 3: Bind socket to port
    if (bind(server_socket, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) {
        std::cerr << "Bind failed" << std::endl;
        closesocket(server_socket);
        return 1;
    }
    std::cout << "Server listening on port " << PORT << std::endl;
    
    // Step 4: Receive and respond to messages
    char buffer[BUFFER_SIZE];
    socklen_t client_len = sizeof(client_addr);
    
    while (true) {
        // Clear buffer
        memset(buffer, 0, BUFFER_SIZE);
        
        // Receive message from client
        int recv_len = recvfrom(server_socket, buffer, BUFFER_SIZE, 0,
                                (struct sockaddr*)&client_addr, &client_len);
        
        if (recv_len > 0) {
            std::cout << "Received: " << buffer << std::endl;
            
            // Send reply back to client
            std::string reply = "Server received: " + std::string(buffer);
            sendto(server_socket, reply.c_str(), reply.length(), 0,
                   (struct sockaddr*)&client_addr, client_len);
            
            // Exit if "exit" received
            if (std::string(buffer) == "exit") {
                break;
            }
        }
    }
    
    // Cleanup
    closesocket(server_socket);
#ifdef _WIN32
    WSACleanup();
#endif
    
    return 0;
}
```

#### UDP Client Code

```cpp
#include <iostream>
#include <string>
#include <cstring>

#ifdef _WIN32
    #include <winsock2.h>
    #pragma comment(lib, "ws2_32.lib")
    typedef int socklen_t;
#else
    #include <sys/socket.h>
    #include <netinet/in.h>
    #include <arpa/inet.h>
    #include <unistd.h>
    #define closesocket close
#endif

int main() {
    const char* SERVER_IP = "127.0.0.1";  // Localhost
    const int PORT = 8080;
    const int BUFFER_SIZE = 1024;
    
#ifdef _WIN32
    // Initialize Winsock
    WSADATA wsa;
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0) {
        std::cerr << "WSAStartup failed" << std::endl;
        return 1;
    }
#endif
    
    // Step 1: Create socket
    int client_socket = socket(AF_INET, SOCK_DGRAM, 0);
    if (client_socket < 0) {
        std::cerr << "Failed to create socket" << std::endl;
        return 1;
    }
    
    // Step 2: Configure server address
    struct sockaddr_in server_addr;
    memset(&server_addr, 0, sizeof(server_addr));
    
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(PORT);
    inet_pton(AF_INET, SERVER_IP, &server_addr.sin_addr);
    
    // Step 3: Send and receive messages
    char buffer[BUFFER_SIZE];
    std::string message;
    socklen_t server_len = sizeof(server_addr);
    
    std::cout << "UDP Client started. Type 'exit' to quit." << std::endl;
    
    while (true) {
        std::cout << "Enter message: ";
        std::getline(std::cin, message);
        
        // Send message to server
        sendto(client_socket, message.c_str(), message.length(), 0,
               (struct sockaddr*)&server_addr, server_len);
        
        if (message == "exit") {
            break;
        }
        
        // Receive response
        memset(buffer, 0, BUFFER_SIZE);
        int recv_len = recvfrom(client_socket, buffer, BUFFER_SIZE, 0,
                                (struct sockaddr*)&server_addr, &server_len);
        
        if (recv_len > 0) {
            std::cout << "Server reply: " << buffer << std::endl;
        }
    }
    
    // Cleanup
    closesocket(client_socket);
#ifdef _WIN32
    WSACleanup();
#endif
    
    return 0;
}
```

#### Running the UDP Example

**Step 1**: Compile both programs
```bash
# Linux/macOS
g++ -o server udp_server.cpp
g++ -o client udp_client.cpp

# Windows (Visual Studio)
cl udp_server.cpp ws2_32.lib
cl udp_client.cpp ws2_32.lib
```

**Step 2**: Run server in one terminal
```bash
./server
```

**Step 3**: Run client in another terminal
```bash
./client
```

#### Expected Output

**Server Terminal:**
```
Socket created successfully
Server listening on port 8080
Received: Hello Server!
Received: How are you?
Received: exit
```

**Client Terminal:**
```
UDP Client started. Type 'exit' to quit.
Enter message: Hello Server!
Server reply: Server received: Hello Server!
Enter message: How are you?
Server reply: Server received: How are you?
Enter message: exit
```

### Understanding Socket Addresses

```cpp
struct sockaddr_in {
    short sin_family;       // Address family (AF_INET for IPv4)
    unsigned short sin_port; // Port number (in network byte order)
    struct in_addr sin_addr; // IP address
    char sin_zero[8];       // Padding
};
```

#### Important Functions:

- **htons()**: Host to Network Short - converts port number to network byte order
- **inet_pton()**: Converts IP address string to binary form
- **INADDR_ANY**: Special address (0.0.0.0) meaning "listen on all network interfaces"

<a id='section5'></a>
## 5. Multi-threaded UDP Communication

### Question 4: Multi-threaded UDP Server

**Objective**: Create a UDP server that can handle multiple clients simultaneously using threads.

### Why Multi-threading for UDP?

Although UDP is connectionless, a server may need to:
1. Process requests from multiple clients concurrently
2. Perform time-consuming operations without blocking other clients
3. Maintain separate state for different clients

### Architecture

```
CLIENT 1 ---→ |
CLIENT 2 ---→ | UDP SERVER → Thread Pool
CLIENT 3 ---→ |                ├─ Worker Thread 1
                               ├─ Worker Thread 2
                               └─ Worker Thread 3
```

#### Multi-threaded UDP Server Code

```cpp
#include <iostream>
#include <thread>
#include <string>
#include <cstring>
#include <vector>
#include <mutex>

#ifdef _WIN32
    #include <winsock2.h>
    #pragma comment(lib, "ws2_32.lib")
#else
    #include <sys/socket.h>
    #include <netinet/in.h>
    #include <arpa/inet.h>
    #include <unistd.h>
    #define closesocket close
#endif

std::mutex cout_mutex;  // Mutex to protect console output

// Structure to hold client request information
struct ClientRequest {
    int socket;
    std::string message;
    sockaddr_in client_addr;
    int client_id;
};

// Worker thread function to handle client request
void handle_client(ClientRequest request) {
    // Simulate some processing time
    std::this_thread::sleep_for(std::chrono::milliseconds(500));
    
    // Thread-safe console output
    {
        std::lock_guard<std::mutex> lock(cout_mutex);
        std::cout << "[Thread " << std::this_thread::get_id() << "] "
                  << "Processing client " << request.client_id << ": "
                  << request.message << std::endl;
    }
    
    // Prepare response
    std::string response = "Processed by thread: " + request.message;
    
    // Send response back to client
    sendto(request.socket, response.c_str(), response.length(), 0,
           (struct sockaddr*)&request.client_addr, sizeof(request.client_addr));
}

int main() {
    const int PORT = 8080;
    const int BUFFER_SIZE = 1024;
    
#ifdef _WIN32
    WSADATA wsa;
    WSAStartup(MAKEWORD(2, 2), &wsa);
#endif
    
    // Create socket
    int server_socket = socket(AF_INET, SOCK_DGRAM, 0);
    if (server_socket < 0) {
        std::cerr << "Socket creation failed" << std::endl;
        return 1;
    }
    
    // Bind socket
    struct sockaddr_in server_addr;
    memset(&server_addr, 0, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = INADDR_ANY;
    server_addr.sin_port = htons(PORT);
    
    if (bind(server_socket, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) {
        std::cerr << "Bind failed" << std::endl;
        return 1;
    }
    
    std::cout << "Multi-threaded UDP Server listening on port " << PORT << std::endl;
    
    std::vector<std::thread> threads;
    int client_counter = 0;
    
    // Main server loop
    while (true) {
        char buffer[BUFFER_SIZE];
        memset(buffer, 0, BUFFER_SIZE);
        
        struct sockaddr_in client_addr;
        socklen_t client_len = sizeof(client_addr);
        
        // Receive message
        int recv_len = recvfrom(server_socket, buffer, BUFFER_SIZE, 0,
                                (struct sockaddr*)&client_addr, &client_len);
        
        if (recv_len > 0) {
            client_counter++;
            
            // Create request structure
            ClientRequest request;
            request.socket = server_socket;
            request.message = std::string(buffer);
            request.client_addr = client_addr;
            request.client_id = client_counter;
            
            // Create new thread to handle request
            threads.push_back(std::thread(handle_client, request));
            
            // Check for exit command
            if (std::string(buffer) == "exit") {
                break;
            }
        }
    }
    
    // Wait for all threads to complete
    for (auto& t : threads) {
        if (t.joinable()) {
            t.join();
        }
    }
    
    closesocket(server_socket);
#ifdef _WIN32
    WSACleanup();
#endif
    
    return 0;
}
```

### Testing Multi-threaded Server

#### Test with Multiple Clients

**Terminal 1 (Server):**
```bash
./multi_server
```

**Terminal 2-4 (Clients):**
```bash
./client  # Run in parallel
```

#### Expected Output (Server)

```
Multi-threaded UDP Server listening on port 8080
[Thread 0x123] Processing client 1: Hello from client 1
[Thread 0x124] Processing client 2: Hello from client 2
[Thread 0x125] Processing client 3: Hello from client 3
```

#### Key Observations:

1. **Concurrent Processing**: Multiple clients handled simultaneously
2. **Thread IDs**: Each request processed by different thread
3. **Thread-safe Output**: Mutex prevents garbled console output

<a id='section6'></a>
## 6. Performance Analysis

### Measuring Thread Performance

```cpp
#include <chrono>

auto start = std::chrono::high_resolution_clock::now();

// ... code to measure ...

auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

std::cout << "Execution time: " << duration.count() << " ms" << std::endl;
```

### Performance Metrics

| Metric | Formula | Interpretation |
|--------|---------|----------------|
| Speedup | Sequential Time / Parallel Time | How much faster? |
| Efficiency | Speedup / Number of Threads | How well threads used? |
| Scalability | Performance vs Thread Count | Does adding threads help? |

### Example Performance Comparison

```
Workload: Computing sum of squares for 100,000,000 numbers

Threads | Time (ms) | Speedup | Efficiency
--------|-----------|---------|------------
   1    |   2000    |   1.0x  |   100%
   2    |   1050    |   1.9x  |    95%
   4    |    550    |   3.6x  |    90%
   8    |    280    |   7.1x  |    89%
  16    |    200    |  10.0x  |    63%
```

#### Analysis:
- Good scaling up to 8 threads (near-linear)
- Diminishing returns beyond 8 threads
- Overhead becomes significant with too many threads

### Factors Affecting Performance

#### 1. Thread Creation Overhead
```cpp
// Creating threads has cost
std::thread t(function);  // ~microseconds overhead
```

**Solution**: Use thread pools for repeated tasks

#### 2. Context Switching
- OS switches between threads
- CPU cache invalidation
- **Rule of thumb**: Number of threads ≈ Number of CPU cores

#### 3. Lock Contention
```cpp
std::mutex m;
m.lock();   // Only one thread can enter
// Critical section
m.unlock();
```

**Impact**: If threads spend too much time waiting for locks, parallelism decreases

#### 4. Memory Bandwidth
- Multiple threads accessing memory simultaneously
- Can saturate memory bus
- **Memory-bound** vs **CPU-bound** tasks

### Best Practices for Thread Performance

#### 1. Choose Appropriate Grain Size
```cpp
// BAD: Too fine-grained
for (int i = 0; i < 1000000; i++) {
    std::thread t(process, i);  // Creating 1 million threads!
    t.join();
}

// GOOD: Coarse-grained
const int NUM_THREADS = 8;
const int ITEMS_PER_THREAD = 1000000 / NUM_THREADS;

for (int i = 0; i < NUM_THREADS; i++) {
    int start = i * ITEMS_PER_THREAD;
    int end = (i + 1) * ITEMS_PER_THREAD;
    threads.push_back(std::thread(process_range, start, end));
}
```

#### 2. Minimize Lock Contention
```cpp
// BAD: Lock held for entire computation
std::lock_guard<std::mutex> lock(m);
result = expensive_computation();
shared_data = result;

// GOOD: Lock held only for shared data access
result = expensive_computation();
{
    std::lock_guard<std::mutex> lock(m);
    shared_data = result;
}
```

#### 3. Use Thread-Local Storage
```cpp
// Each thread has its own copy
thread_local int counter = 0;
```

<a id='section7'></a>
## 7. Summary and Key Takeaways

### Concepts Covered

#### 1. C++ Standard Threading
- Creating threads with `std::thread`
- Waiting for completion with `.join()`
- Thread lifecycle management

#### 2. POSIX Threads
- Creating threads with `pthread_create`
- Understanding thread function signatures
- Platform-specific considerations

#### 3. UDP Communication
- Socket creation and binding
- Sending and receiving datagrams
- Client-server architecture

#### 4. Multi-threaded Networking
- Handling multiple clients concurrently
- Thread-safe console output
- Request processing patterns

#### 5. Performance Analysis
- Measuring execution time
- Calculating speedup and efficiency
- Understanding overhead and scalability

### Key Comparison Table

| Aspect | std::thread | pthread | UDP Sockets |
|--------|-------------|---------|-------------|
| Language | C++ | C | C/C++ |
| Portability | High | Medium | High |
| Learning Curve | Easy | Moderate | Moderate |
| Use Case | Modern C++ apps | Legacy/Unix | Network comm |

### Common Pitfalls to Avoid

1. **Forgetting to join threads**: Results in program termination or undefined behavior
2. **Data races**: Multiple threads accessing shared data without synchronization
3. **Too many threads**: Diminishing returns and increased overhead
4. **Ignoring return values**: Socket functions return error codes that should be checked
5. **Not closing sockets**: Resource leaks in long-running applications

### Practice Exercises

#### Exercise 1: Thread Pool
Modify Question 1 to reuse threads instead of creating new ones for each task.

#### Exercise 2: Multi-client Chat
Extend Question 4 to broadcast messages from one client to all connected clients.

#### Exercise 3: Performance Benchmarking
Create a program that:
1. Tests different thread counts (1, 2, 4, 8, 16)
2. Measures execution time for each
3. Generates a performance report
4. Plots speedup vs thread count

### Additional Resources

1. **C++ Threading**:
   - https://en.cppreference.com/w/cpp/thread
   - https://iamsorush.com/posts/cpp-std-thread/

2. **POSIX Threads**:
   - https://www.cs.cmu.edu/afs/cs/academic/class/15492-f07/www/pthreads.html
   - POSIX Threads Programming (LLNL Tutorial)

3. **Socket Programming**:
   - Beej's Guide to Network Programming
   - Unix Network Programming (Stevens)

4. **Performance Analysis**:
   - Intel VTune Profiler
   - Linux perf tools
   - Visual Studio Profiler

### Next Steps

In the next practical, you will learn:
- OpenMP for easier parallel programming
- Memory management in parallel systems
- CUDA for GPU computing
- Advanced synchronization techniques

### Self-Assessment Questions

1. What is the difference between a process and a thread?
2. When should you use `.join()` vs `.detach()` for threads?
3. Why is UDP faster than TCP?
4. What happens if you don't bind a UDP server socket?
5. How many threads should you create for optimal performance?
6. What is a data race and how can it be prevented?
7. Why might speedup be less than the number of threads?
8. What are the advantages of pthread over std::thread?
9. How does UDP differ from TCP in terms of reliability?
10. What is the purpose of mutex in multi-threaded programs?

### Answers

1. Process has own memory; threads share memory within a process
2. Use `.join()` when you need to wait; `.detach()` for fire-and-forget threads
3. UDP has no connection setup, acknowledgments, or retransmission overhead
4. Client won't know where to send data; receive will fail
5. Approximately equal to number of CPU cores for CPU-bound tasks
6. Data race: concurrent access to shared data; prevent with locks/atomics
7. Overhead from thread creation, context switching, and synchronization
8. More control, available on Unix systems, works with C code
9. UDP: no guarantee of delivery, order, or data integrity
10. Mutex ensures only one thread accesses critical section at a time

---

**End of Practical 2**

Remember: The key to mastering parallel programming is practice and experimentation. Try modifying the code examples, test with different workloads, and observe the behavior!