<a href="https://colab.research.google.com/github/2303a51546/HPC/blob/main/Copy_of_Assignment_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1.Identifying Serial Bottlenecks (Amdahl’s Law)

In [None]:
import time
from multiprocessing import Pool, cpu_count

def serial_part():
    s = 0
    for i in range(10_000_000):
        s += i
    return s

def parallel_part(n):
    s = 0
    for i in range(n):
        s += i
    return s

def run(threads):
    start = time.time()

    # Serial region
    serial_part()

    # Parallel region
    with Pool(threads) as p:
        p.map(parallel_part, [2_500_000]*threads)

    return time.time() - start

for t in [1, 2, 4, 8]:
    print(f"Threads {t}: Time =", run(t))


Threads 1: Time = 1.0142476558685303
Threads 2: Time = 1.3787193298339844
Threads 4: Time = 2.3396224975585938
Threads 8: Time = 2.4009130001068115


**Observations**

Execution time does not reduce proportionally with threads.

Serial part dominates total runtime.

Increasing threads gives diminishing returns.

Speedup is limited due to serial region.

Demonstrates Amdahl’s Law in practice.

2.Hotspot Detection Using Timing Analysis

In [None]:
import time

def initialization():
    time.sleep(0.5)

def computation():
    s = 0
    for i in range(20_000_000):
        s += i

def io_task():
    time.sleep(0.2)

start = time.time()
initialization()
print("Initialization time:", time.time() - start)

start = time.time()
computation()
print("Computation time:", time.time() - start)

start = time.time()
io_task()
print("I/O time:", time.time() - start)

Initialization time: 0.5005333423614502
Computation time: 2.1269285678863525
I/O time: 0.20044851303100586


**Observations**

Computation loop consumes maximum execution time.

Initialization and I/O take significantly less time.

Computation section is identified as the hotspot.

Only hotspot should be optimized or parallelized.

Avoids unnecessary parallelization of minor sections.

3.Load Imbalance Detection

In [None]:
import time
from multiprocessing import Pool

def work(n):
    start = time.time()
    s = 0
    for i in range(n):
        s += i
    return time.time() - start

tasks = [5_000_000, 20_000_000, 5_000_000, 20_000_000]

with Pool(4) as p:
    times = p.map(work, tasks)

for i, t in enumerate(times):
    print(f"Thread {i} time:", t)

Thread 0 time: 1.0866992473602295
Thread 1 time: 2.9837353229522705
Thread 2 time: 1.2912120819091797
Thread 3 time: 2.9934637546539307


**Observations**

Different threads take different execution times.

Workload is unevenly distributed.

Some threads finish early and remain idle.

Load imbalance reduces overall performance.

Dynamic scheduling can reduce imbalance.

4.False Sharing & Memory Bottlenecks (Conceptual)

In [None]:
import time
from multiprocessing import Process, Array

def update(arr, idx):
    for _ in range(5_000_000):
        arr[idx] += 1

arr = Array('i', 4)

start = time.time()
processes = []
for i in range(4):
    p = Process(target=update, args=(arr, i))
    processes.append(p)
    p.start()

for p in processes:
    p.join()

print("Execution time:", time.time() - start)

Execution time: 51.00672197341919


**Observations**

Multiple processes update adjacent memory locations.

Performance is slower than expected.

Cache contention occurs due to shared memory.

Similar to false sharing in OpenMP.

Padding or private data reduces memory bottlenecks.

5.Synchronization Overhead Analysis

In [None]:
import time
from multiprocessing import Process, Lock

lock = Lock()
counter = 0

def critical_section():
    global counter
    for _ in range(1_000_000):
        with lock:
            counter += 1

start = time.time()
processes = []

for _ in range(4):
    p = Process(target=critical_section)
    processes.append(p)
    p.start()

for p in processes:
    p.join()

print("Execution time:", time.time() - start)

Execution time: 3.261547327041626


**Observations**

Excessive locking increases execution time.

Threads spend time waiting at synchronization points.

Critical sections reduce parallel efficiency.

Synchronization overhead dominates computation.

Reduction-based approaches improve performance.