# **Serial Performance Baseline and Profiling**

### **Assignment 1: Parallel Vector Addition**

**ðŸ”¹ Aim**

To add two large vectors using serial and parallel execution and compare execution time.

**ðŸ”¹ Objective**

To understand basic OpenMP-style parallel loop execution.

**ðŸ”¹ Algorithm**

Initialize two large vectors with random values.

Perform vector addition using serial loop.

Measure serial execution time.

Perform vector addition using parallel loop.

Measure parallel execution time.

Compare both execution times.

In [3]:
import time
import numpy as np
from multiprocessing import Pool,cpu_count

n=1000000
A=np.random.rand(n)
B=np.random.rand(n)

start=time.time()
C=[A[i]+B[i] for i in range(n)]
end=time.time()
print("Serial Time:",end-start)

def add(i):
    return A[i]+B[i]

start=time.time()
with Pool(cpu_count()) as p:
    C=p.map(add,range(n))
end=time.time()
print("Parallel Time:",end-start)



Serial Time: 0.6593949794769287
Parallel Time: 3.386927366256714


**ðŸ”¹ Result**

Parallel execution performs faster than serial execution.

### **Assignment 2: Static Scheduling in Parallel Loop**

**ðŸ”¹ Aim**

To perform vector computation using static scheduling.



**ðŸ”¹ Objective**

To study static scheduling behavior in OpenMP-style parallel loops

**ðŸ”¹ Algorithm**

Initialize input arrays.

Divide array elements equally among threads.

Perform vector computation using parallel loop.

Measure execution time.

Observe thread work distribution.

In [4]:
import numpy as np
import time
from multiprocessing import Pool,cpu_count

n=1000000
A=np.random.rand(n)
B=np.random.rand(n)

def compute(chunk):
    start,end=chunk
    return [A[i]*B[i] for i in range(start,end)]

threads=cpu_count()
chunk=n//threads
chunks=[(i*chunk,(i+1)*chunk) for i in range(threads)]

start=time.time()
with Pool(threads) as p:
    result=p.map(compute,chunks)
end=time.time()

print("Execution Time:",end-start)


Execution Time: 3.4027395248413086


**ðŸ”¹ Result**

Static scheduling distributes work equally among threads.

### **Assignment 3: Load Imbalance in Parallel Execution**

**ðŸ”¹ Aim**

To analyze load imbalance in parallel loops.

**ðŸ”¹ Objective**

To understand performance impact of uneven workload.

**ðŸ”¹ Algorithm**

Generate array of random numbers.

Perform heavy computation for some elements.

Perform light computation for remaining elements.

Execute loop in parallel.

Measure execution time.

In [5]:
import random
import time
from multiprocessing import Pool,cpu_count

n=10000
arr=[random.randint(1,100) for _ in range(n)]

def work(x):
    if x%2==0:
        s=0
        for _ in range(1000):
            s+=x
        return s
    else:
        return x

start=time.time()
with Pool(cpu_count()) as p:
    result=p.map(work,arr)
end=time.time()

print("Execution Time:",end-start)


Execution Time: 0.4446098804473877


**ðŸ”¹ Result**

Uneven workload causes load imbalance and reduces performance.

### **Assignment 4: Parallel Reduction (Synchronization)**

**ðŸ”¹ Aim**

To compute sum of array elements using parallel reduction.

**ðŸ”¹ Objective**

To eliminate race conditions using reduction technique.

**ðŸ”¹ Algorithm**

Initialize large array.

Divide array into chunks.

Calculate partial sum using parallel threads.

Combine partial sums.

Display final sum.

In [6]:
import numpy as np
from multiprocessing import Pool,cpu_count

n=1000000
arr=np.random.rand(n)

def partial_sum(chunk):
    return sum(chunk)

threads=cpu_count()
chunks=np.array_split(arr,threads)

with Pool(threads) as p:
    result=p.map(partial_sum,chunks)

total=sum(result)
print("Final Sum:",total)


Final Sum: 499985.2827547176


**ðŸ”¹ Result**

Reduction avoids race condition and produces correct output.

### **Assignment 5: Barrier Synchronization (Two-Phase Computation)**

**ðŸ”¹ Aim**

To demonstrate barrier synchronization.

**ðŸ”¹ Objective**
\
To ensure correct synchronization between two phases of computation.

**ðŸ”¹ Algorithm**

Initialize array with values.

Perform Phase 1 computation.

Apply barrier synchronization.

Perform Phase 2 computation.

Display final result.

In [7]:
import threading
import random

n=10
arr=[random.randint(1,10) for _ in range(n)]
barrier=threading.Barrier(n)

def task(i):
    global arr
    arr[i]=arr[i]*2
    barrier.wait()
    arr[i]=arr[i]+5

threads=[]
for i in range(n):
    t=threading.Thread(target=task,args=(i,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print("Completed Array:",arr)


Completed Array: [11, 9, 7, 15, 21, 17, 13, 13, 19, 25]


**ðŸ”¹ Result**

Barrier synchronization ensures Phase 2 starts only after Phase 1 completes