# Problem Selection and Analysis

**Objective**: Choose a computationally expensive problem suitable for parallelization

---

## Criteria for Good Problem Selection

A good parallel computing problem should have:

1. **Computational Intensity**: Takes significant time to compute serially
2. **Data Parallelism**: Can be broken into independent tasks
3. **Scalability**: Performance improves with more processors/cores
4. **Practical Application**: Solves a real-world problem
5. **Measurable Performance**: Clear metrics for speedup

---

## Problem Domain Analysis

### 1. Image Processing

**Examples**:
- Image filtering (Gaussian blur, edge detection)
- Image transformations (rotation, scaling)
- Feature extraction
- Hough Transform for line/circle detection

**Parallelization Potential**: ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê
- Each pixel can be processed independently
- GPU-friendly (massive data parallelism)
- Easy to visualize results

**Difficulty**: Beginner to Intermediate

---

### 2. Matrix Operations

**Examples**:
- Matrix multiplication
- Matrix inversion
- Solving linear equations (Gaussian elimination)
- Eigenvalue computation

**Parallelization Potential**: ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê
- Row/column operations are independent
- Block-based parallelism
- Well-studied algorithms

**Difficulty**: Intermediate

---

### 3. Monte Carlo Simulations

**Examples**:
- Pi estimation
- Option pricing in finance
- Random walk simulations
- Statistical sampling

**Parallelization Potential**: ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê
- Embarrassingly parallel (perfect for parallelization)
- Each simulation is independent
- Easy to scale

**Difficulty**: Beginner

---

### 4. Machine Learning

**Examples**:
- K-means clustering
- K-nearest neighbors classification
- Neural network training
- Decision tree ensemble (Random Forest)

**Parallelization Potential**: ‚≠ê‚≠ê‚≠ê‚≠ê
- Training iterations can be parallelized
- Data can be split across workers
- Model parallelism possible

**Difficulty**: Intermediate to Advanced

---

### 5. Numerical Simulations

**Examples**:
- N-body simulations (planetary motion)
- Molecular dynamics
- Heat diffusion (PDE solvers)
- Fluid dynamics

**Parallelization Potential**: ‚≠ê‚≠ê‚≠ê‚≠ê
- Spatial decomposition
- Time-stepping parallelism
- Scientifically interesting

**Difficulty**: Advanced

---

### 6. Graph Processing

**Examples**:
- Shortest path algorithms
- PageRank
- Community detection
- Graph coloring

**Parallelization Potential**: ‚≠ê‚≠ê‚≠ê
- Some algorithms are harder to parallelize
- Load balancing can be challenging
- Good for learning distributed computing

**Difficulty**: Intermediate to Advanced

---

## Recommended Problems for This Assignment

Based on grading criteria and effort required:

### Top Choices:

1. **Image Processing with Multiple Filters** ‚úÖ
   - Apply Gaussian blur, edge detection, sharpening
   - Compare serial vs OpenMP vs CUDA
   - Visual results are impressive
   - Easy to measure speedup

2. **Matrix Operations Suite** ‚úÖ
   - Matrix multiplication + solving linear systems
   - Multiple parallel strategies (row, column, block)
   - Clear performance metrics
   - Educational value

3. **Monte Carlo Simulations** ‚úÖ
   - Pi estimation + option pricing
   - Perfect for learning parallelism
   - Easy to explain and demonstrate
   - Good for testing multiple platforms

4. **K-Means Clustering** ‚úÖ
   - Practical ML application
   - Iterative algorithm (interesting to parallelize)
   - Can visualize clusters
   - Real datasets available

---

## Problem Selection Worksheet

In [None]:
# Define your problem selection
# Fill this out with your team

problem_selection = {
    'title': "[Your Creative Project Title]",
    'domain': "[e.g., Image Processing, Machine Learning, etc.]",
    'specific_problem': "[Specific problem you'll solve]",
    'why_parallel': "[Why this problem benefits from parallelization]",
    'expected_speedup': "[Estimate: 2x, 5x, 10x, etc.]",
    'platforms': "[e.g., OpenMP, CUDA, both]",
}

# Example:
example = {
    'title': "High-Performance Image Processing Pipeline with Multi-Platform Parallelization",
    'domain': "Image Processing",
    'specific_problem': "Apply multiple filters (blur, edge detection, sharpening) to high-resolution images",
    'why_parallel': "Each pixel operation is independent, perfect for data parallelism",
    'expected_speedup': "10-50x with GPU, 4-8x with OpenMP on 8-core CPU",
    'platforms': "OpenMP (CPU) and CUDA/Numba (GPU)",
}

print("Example Problem Selection:")
for key, value in example.items():
    print(f"{key}: {value}")

---

## Computational Complexity Analysis

Understanding the computational complexity helps justify parallelization:

In [None]:
import numpy as np
import time
import matplotlib.pyplot as plt

# Example: Demonstrate computational growth
# Let's use matrix multiplication as example

def matrix_multiply_serial(A, B):
    """Serial matrix multiplication"""
    n = len(A)
    C = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            for k in range(n):
                C[i][j] += A[i][k] * B[k][j]
    return C

# Test with increasing matrix sizes
sizes = [50, 100, 200, 400]
times = []

print("Matrix Size\tTime (seconds)\tComplexity (n¬≥)")
print("-" * 50)

for size in sizes:
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)
    
    start = time.time()
    C = matrix_multiply_serial(A, B)
    elapsed = time.time() - start
    
    times.append(elapsed)
    print(f"{size}x{size}\t\t{elapsed:.4f}s\t\t{size**3:,}")

# Plot computational growth
plt.figure(figsize=(10, 6))
plt.plot(sizes, times, 'o-', linewidth=2, markersize=8)
plt.xlabel('Matrix Size (n)', fontsize=12)
plt.ylabel('Time (seconds)', fontsize=12)
plt.title('Computational Growth: Matrix Multiplication O(n¬≥)', fontsize=14)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\n‚ö†Ô∏è This demonstrates why parallelization is needed!")
print(f"Time increases by ~{times[-1]/times[0]:.1f}x when size doubles twice.")

---

## Parallelization Strategy

For your chosen problem, identify:

### Data Decomposition
How will you split the data?

In [None]:
# Example: Image processing data decomposition
image_width = 1920
image_height = 1080
num_cores = 8

# Strategy 1: Row-based decomposition
rows_per_core = image_height // num_cores
print(f"Row-based: Each core processes {rows_per_core} rows")

# Strategy 2: Block-based decomposition
blocks_x = 4
blocks_y = 2
block_width = image_width // blocks_x
block_height = image_height // blocks_y
print(f"Block-based: {blocks_x}x{blocks_y} = {blocks_x*blocks_y} blocks")
print(f"Each block: {block_width}x{block_height} pixels")

# Visualize decomposition
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Row-based
ax1.set_xlim(0, image_width)
ax1.set_ylim(0, image_height)
for i in range(num_cores):
    y_start = i * rows_per_core
    ax1.axhline(y=y_start, color='red', linewidth=2)
    ax1.text(image_width/2, y_start + rows_per_core/2, f'Core {i}', 
             ha='center', va='center', fontsize=10)
ax1.set_title('Row-based Decomposition', fontsize=12)
ax1.set_xlabel('Width (pixels)')
ax1.set_ylabel('Height (pixels)')
ax1.grid(True, alpha=0.3)

# Block-based
ax2.set_xlim(0, image_width)
ax2.set_ylim(0, image_height)
for i in range(blocks_x + 1):
    ax2.axvline(x=i * block_width, color='blue', linewidth=2)
for j in range(blocks_y + 1):
    ax2.axhline(y=j * block_height, color='blue', linewidth=2)
ax2.set_title('Block-based Decomposition', fontsize=12)
ax2.set_xlabel('Width (pixels)')
ax2.set_ylabel('Height (pixels)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Dependencies and Communication

Identify if your problem has:

1. **No dependencies** (Embarrassingly parallel)
   - Each task completely independent
   - Example: Monte Carlo simulations

2. **Local dependencies** (Stencil operations)
   - Tasks need data from neighbors
   - Example: Image convolution

3. **Global dependencies** (Reductions)
   - Tasks need to combine results
   - Example: Computing average, sum, max

4. **Iterative dependencies**
   - Results from one iteration affect next
   - Example: K-means clustering, gradient descent

In [None]:
# Analyze your problem's dependencies
problem_analysis = {
    'independence_level': "[High/Medium/Low]",
    'communication_pattern': "[None/Neighbor/Global/Iterative]",
    'synchronization_needs': "[None/Barriers/Locks/Reductions]",
    'load_balance': "[Static/Dynamic]",
}

# Example for image filtering:
image_filter_analysis = {
    'independence_level': "High (each pixel independent)",
    'communication_pattern': "Local (only neighbors for convolution)",
    'synchronization_needs': "None (read-only input, write-only output)",
    'load_balance': "Static (equal work per pixel)",
}

print("Example: Image Filtering Analysis")
for key, value in image_filter_analysis.items():
    print(f"{key}: {value}")

print("\n‚úÖ This analysis shows image filtering is highly parallelizable!")

---

## Expected Performance Gains

Calculate theoretical speedup using **Amdahl's Law**:

$$Speedup = \frac{1}{(1-P) + \frac{P}{N}}$$

Where:
- P = Proportion of program that can be parallelized (0 to 1)
- N = Number of processors

In [None]:
def amdahl_speedup(P, N):
    """Calculate theoretical speedup using Amdahl's Law"""
    return 1 / ((1 - P) + (P / N))

# Test different parallelization percentages
processors = np.arange(1, 17)
parallel_portions = [0.5, 0.75, 0.90, 0.95, 0.99]

plt.figure(figsize=(12, 7))
for P in parallel_portions:
    speedups = [amdahl_speedup(P, N) for N in processors]
    plt.plot(processors, speedups, marker='o', label=f'P = {P:.0%}')

plt.plot(processors, processors, 'k--', label='Ideal (linear)', linewidth=2)
plt.xlabel('Number of Processors', fontsize=12)
plt.ylabel('Speedup', fontsize=12)
plt.title("Amdahl's Law: Theoretical Speedup vs Parallelizable Portion", fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate for your expected scenario
your_P = 0.95  # Assume 95% of code can be parallelized
your_N = 8     # Your CPU cores

expected_speedup = amdahl_speedup(your_P, your_N)
print(f"\nExpected speedup with {your_N} cores and {your_P:.0%} parallelizable code:")
print(f"Theoretical maximum: {expected_speedup:.2f}x")
print(f"\nRealistic expectation (70% of theoretical): {expected_speedup * 0.7:.2f}x")

---

## Uniqueness Check

‚ö†Ô∏è **Remember**: Each group must have a UNIQUE title and solution

To ensure uniqueness:
1. Choose a specific application domain
2. Combine multiple techniques
3. Add creative twists

### Examples of Unique Titles:

Instead of: "Matrix Multiplication using OpenMP"  
Make it: "Adaptive Block-Based Matrix Multiplication with Dynamic Load Balancing"

Instead of: "Image Filtering"  
Make it: "Real-Time Video Processing Pipeline with Multi-Stage Parallel Filters"

Instead of: "Monte Carlo Pi Estimation"  
Make it: "Hybrid CPU-GPU Monte Carlo Framework for Financial Option Pricing"

---

## Decision Matrix

In [None]:
import pandas as pd

# Create a decision matrix for your problem selection
problems = [
    {'name': 'Image Processing', 'complexity': 3, 'parallel_potential': 5, 'uniqueness': 4, 'interest': 5},
    {'name': 'Matrix Operations', 'complexity': 4, 'parallel_potential': 5, 'uniqueness': 3, 'interest': 3},
    {'name': 'Monte Carlo', 'complexity': 2, 'parallel_potential': 5, 'uniqueness': 3, 'interest': 4},
    {'name': 'K-Means Clustering', 'complexity': 3, 'parallel_potential': 4, 'uniqueness': 4, 'interest': 4},
    {'name': 'N-Body Simulation', 'complexity': 5, 'parallel_potential': 4, 'uniqueness': 5, 'interest': 5},
]

df = pd.DataFrame(problems)
df['total_score'] = df['complexity'] + df['parallel_potential'] + df['uniqueness'] + df['interest']
df = df.sort_values('total_score', ascending=False)

print("Problem Selection Decision Matrix")
print("=" * 80)
print(df.to_string(index=False))
print("\nScoring (1-5): Higher is better")
print("Complexity: Technical challenge (good for learning)")
print("Parallel Potential: How well it parallelizes")
print("Uniqueness: Stands out from other groups")
print("Interest: Your team's enthusiasm")

---

## Final Problem Statement Template

In [None]:
# Fill this template for your proposal
final_problem = """
TITLE: [Your Creative, Unique Title]

PROBLEM DOMAIN: [e.g., Image Processing, Machine Learning, etc.]

SPECIFIC PROBLEM:
[Describe the computational problem you will solve. Be specific about:
- Input: What data will you process?
- Processing: What operations will you perform?
- Output: What results will you produce?]

WHY PARALLELIZATION IS NEEDED:
[Explain the computational bottleneck. Include:
- Time complexity (O(n¬≤), O(n¬≥), etc.)
- Typical data sizes
- Serial execution time estimates]

PARALLELIZATION APPROACH:
[Describe how you will parallelize:
- Data decomposition strategy
- Parallel programming model (SPMD, loop parallelism, etc.)
- Technologies to use (OpenMP, CUDA, etc.)]

EXPECTED OUTCOMES:
[What you expect to achieve:
- Performance improvements (speedup targets)
- Platforms to test on
- Insights to gain]

UNIQUENESS:
[What makes your project different:
- Novel combination of techniques
- Specific application domain
- Creative optimizations]
"""

print(final_problem)
print("\n" + "="*80)
print("üí° Save this! You'll use it in the proposal writing notebook.")

---

## Next Steps

Once you've selected your problem:

1. ‚úÖ Complete the problem analysis above
2. ‚úÖ Verify uniqueness with your tutor
3. ‚úÖ Move to `02_literature_review.ipynb`
4. ‚úÖ Start researching existing solutions

---

## Team Discussion Questions

Before moving forward, discuss with your team:

1. Do we all understand the problem?
2. Do we have the skills to implement this?
3. Can we complete it in the given timeframe?
4. Is it different enough from other groups?
5. Are we excited about this problem?

If you answered YES to all questions, proceed to literature review! üöÄ