<a href="https://colab.research.google.com/github/2303a51546/HPC/blob/main/Copy_of_Assignment_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1.Vector Addition (Scalar vs SIMD-like)

In [None]:
import numpy as np
import time

N = 10_000_000

A = np.arange(N, dtype=np.float64)
B = np.arange(N, dtype=np.float64)

# Scalar loop
C = np.zeros(N)
start = time.time()
for i in range(N):
    C[i] = A[i] + B[i]
end = time.time()
print("Normal loop time:", end - start)

# Vectorized (SIMD-like)
start = time.time()
C = A + B
end = time.time()
print("Vectorized time:", end - start)


Normal loop time: 7.748841762542725
Vectorized time: 0.04339742660522461


** â€“ Observations**

Scalar loop execution time is very high for large data sizes.

Vectorized NumPy operation runs significantly faster.

SIMD allows multiple data elements to be processed simultaneously.

Python loop overhead causes poor performance in scalar method.

Vectorization improves CPU utilization and efficiency.

2.Reduction (Sum)

In [None]:
import numpy as np
import time

N = 10_000_000
A = np.ones(N, dtype=np.float64)

# Normal loop
start = time.time()
s = 0.0
for i in range(N):
    s += A[i]
end = time.time()
print("Normal sum:", s, "Time:", end - start)

# Vectorized reduction
start = time.time()
s = np.sum(A)
end = time.time()
print("Vectorized sum:", s, "Time:", end - start)


Normal sum: 10000000.0 Time: 2.739647388458252
Vectorized sum: 10000000.0 Time: 0.008611440658569336


**Observations**

Normal loop summation takes more time due to iteration overhead.

Vectorized np.sum() completes the task extremely fast.

SIMD instructions are used internally during reduction.

Both methods give the same correct result.

Vectorized reduction is preferred in high-performance computing.

3.Memory Alignment Effect

In [None]:
import numpy as np
import time

N = 10_000_000

unaligned = np.arange(N + 1, dtype=np.float64)[1:]
aligned = np.arange(N, dtype=np.float64)

start = time.time()
np.sum(unaligned)
print("Unaligned time:", time.time() - start)

start = time.time()
np.sum(aligned)
print("Aligned time:", time.time() - start)


Unaligned time: 0.007345676422119141
Aligned time: 0.007325649261474609


**Observations**

Execution time for aligned and unaligned arrays is nearly the same.

Modern CPUs handle unaligned memory efficiently.

NumPy internally optimizes memory access.

No significant performance penalty is observed.

Memory alignment impact is minimal at high-level NumPy operations.

4.Parallel + SIMD (Implicit)

In [None]:
import numpy as np
import time

N = 10_000_000
A = np.arange(N, dtype=np.float64)

start = time.time()
B = A * 2.0
print("Vectorized (SIMD + multithreaded) time:", time.time() - start)


Vectorized (SIMD + multithreaded) time: 0.026205778121948242


**Observation**
Operation executes very fast without explicit parallel code.

NumPy automatically uses SIMD instructions.

Multithreading is implicitly applied where possible.

Simple code achieves high performance.

Demonstrates ease of parallel programming using NumPy.

5.Branch Divergence

In [None]:
import numpy as np
import time

N = 10_000_000
A = np.random.rand(N) * 100
B = np.zeros(N)

start = time.time()
for i in range(N):
    if A[i] > 50:
        B[i] = A[i] * 2
    else:
        B[i] = A[i] / 2
print("Branch loop time:", time.time() - start)


Branch loop time: 6.7121851444244385


**Observations**

Execution time is very high due to conditional branching.

Branch divergence slows down CPU pipelines.

Each iteration checks if-else, increasing overhead.

Poor SIMD utilization is observed.

Not suitable for performance-critical applications.

In [None]:
start = time.time()
B = np.where(A > 50, A * 2, A / 2)
print("Vectorized conditional time:", time.time() - start)


Vectorized conditional time: 0.1503133773803711


**Observations**

Vectorized conditional execution is much faster.

Eliminates explicit branching inside loops.

Better SIMD utilization compared to normal loop.

Reduces CPU pipeline stalls.

Preferred method for conditional operations on large arrays.