# Module 01: Introduction to Distributed Systems

**Difficulty**: ‚≠ê‚≠ê‚≠ê  
**Estimated Time**: 90 minutes  
**Prerequisites**: Module 00 - Course Setup and Introduction

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Define** what constitutes a distributed system and identify its key characteristics
2. **Compare** traditional (centralized) vs distributed system architectures
3. **Analyze** the benefits of distributed systems (scalability, reliability, performance, geographical distribution)
4. **Identify** the challenges when building distributed systems
5. **Distinguish** between tightly-coupled and loosely-coupled hardware architectures
6. **Implement** basic distributed system concepts using Python

---

## 1. What is a Distributed System?

### Definition

> **A distributed system is a computing environment in which various components are spread across multiple computers (or nodes) connected by a network, working together to appear as a single coherent system to end users.**

### Key Characteristics

1. **Multiple autonomous computers** working together
2. **Connected via a network** (LAN, WAN, or Internet)
3. **Appears as a single system** to users (transparency)
4. **No shared physical memory** between nodes
5. **Coordinated through message passing**

### What Can Be Distributed?

- üíæ **Database / Data**: Distributed databases, file systems
- ‚öôÔ∏è **Operating System**: Processes running on different machines
- üìÅ **File System**: Files stored across multiple servers
- üîê **Authentication**: Centralized vs distributed auth systems
- üíº **Business Logic**: Microservices architecture
- üñ•Ô∏è **Workload**: Load balancing across servers

## 2. Traditional vs Distributed Systems

### 2.1 Visualization Setup

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

%matplotlib inline

# Set random seed for reproducibility
np.random.seed(42)

### 2.2 Traditional (Centralized) Database Architecture

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Traditional Database Architecture
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 10)
ax1.axis('off')
ax1.set_title('Traditional Database (1990s-2000s)', fontsize=14, fontweight='bold')

# Main database
db_rect = patches.FancyBboxPatch((3.5, 5), 3, 2, boxstyle="round,pad=0.1",
                                  edgecolor='#2E86AB', facecolor='#A23B72', linewidth=2)
ax1.add_patch(db_rect)
ax1.text(5, 6, 'Main\nDatabase', ha='center', va='center',
         fontsize=11, fontweight='bold', color='white')

# Backup servers
backup1 = patches.FancyBboxPatch((7.5, 5), 1.5, 1.5, boxstyle="round,pad=0.05",
                                  edgecolor='#F18F01', facecolor='#FFF3B0', linewidth=2)
ax1.add_patch(backup1)
ax1.text(8.25, 5.75, 'Backup\nServer', ha='center', va='center', fontsize=9)

backup2 = patches.FancyBboxPatch((8, 2.5), 1.5, 1.5, boxstyle="round,pad=0.05",
                                  edgecolor='#F18F01', facecolor='#FFF3B0', linewidth=2)
ax1.add_patch(backup2)
ax1.text(8.75, 3.25, 'Tape/Disk\nBackup', ha='center', va='center', fontsize=9)

# Arrows
ax1.arrow(6.5, 6, 0.8, 0, head_width=0.2, head_length=0.2, fc='#2E86AB', ec='#2E86AB')
ax1.arrow(6.5, 5.5, 1.2, -1.8, head_width=0.2, head_length=0.2, fc='#2E86AB', ec='#2E86AB')

# Characteristics
ax1.text(5, 1.5, '‚úì Single Point of Failure\n‚úì Limited Scalability\n‚úì Simple Architecture\n‚úì Easier Maintenance',
         ha='center', fontsize=10, bbox=dict(boxstyle='round', facecolor='#E0E0E0', alpha=0.8))

# Distributed Database Architecture
ax2.set_xlim(0, 10)
ax2.set_ylim(0, 10)
ax2.axis('off')
ax2.set_title('Distributed Database (2010s and Beyond)', fontsize=14, fontweight='bold')

# Multiple distributed nodes
node_positions = [(2, 7), (5, 8), (8, 7), (2, 4), (5, 5), (8, 4), (5, 2)]
for i, (x, y) in enumerate(node_positions):
    node = patches.Circle((x, y), 0.6, edgecolor='#06A77D', facecolor='#A8DADC', linewidth=2)
    ax2.add_patch(node)
    ax2.text(x, y, f'Node\n{i+1}', ha='center', va='center', fontsize=8, fontweight='bold')

# Network connections
connections = [(0, 1), (1, 2), (0, 3), (1, 4), (2, 5), (3, 4), (4, 5), (4, 6)]
for i, j in connections:
    x1, y1 = node_positions[i]
    x2, y2 = node_positions[j]
    ax2.plot([x1, x2], [y1, y2], 'b--', alpha=0.3, linewidth=1.5)

# Benefits
benefits = [
    '1. Scale-Out vs Scale-Up',
    '2. Local vs Shared Storage',
    '3. Elastic vs Static Infrastructure'
]
ax2.text(5, 0.5, '\n'.join(benefits), ha='center', fontsize=9,
         bbox=dict(boxstyle='round', facecolor='#90EE90', alpha=0.7))

plt.tight_layout()
plt.show()

## 3. Benefits of Distributed Systems

### 3.1 Scalability

**Definition**: The ability to continuously evolve and support growing amounts of work.

#### Types of Scalability:

1. **Horizontal Scaling (Scale-Out)**: Add more machines
   - More cost-effective over time
   - Better for distributed systems
   
2. **Vertical Scaling (Scale-Up)**: Add more resources to existing machines
   - Limited by hardware constraints
   - Costs rise sharply after a certain point

In [None]:
# Simulate scalability scenarios
def simulate_scalability(workload_size, scaling_type='horizontal', num_nodes=1):
    """
    Simulate processing time for different scaling approaches.
    
    Parameters:
    -----------
    workload_size : int
        Size of the workload to process
    scaling_type : str
        'horizontal' or 'vertical'
    num_nodes : int
        Number of nodes (for horizontal scaling)
    
    Returns:
    --------
    float : Processing time in seconds
    """
    base_processing_time = workload_size * 0.01  # Base time per unit
    
    if scaling_type == 'horizontal':
        # Horizontal scaling: distribute workload across nodes
        # Communication overhead increases with nodes
        communication_overhead = (num_nodes - 1) * 0.1
        processing_time = (base_processing_time / num_nodes) + communication_overhead
    else:
        # Vertical scaling: diminishing returns
        processing_time = base_processing_time / (1 + np.log(num_nodes))
    
    return max(processing_time, 0.1)  # Minimum time

# Compare scaling approaches
workloads = [100, 500, 1000, 2000, 5000]
node_counts = [1, 2, 4, 8, 16]

horizontal_times = []
vertical_times = []

for workload in workloads:
    h_times = [simulate_scalability(workload, 'horizontal', n) for n in node_counts]
    v_times = [simulate_scalability(workload, 'vertical', n) for n in node_counts]
    horizontal_times.append(h_times)
    vertical_times.append(v_times)

# Visualize scaling comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Plot for workload = 1000
workload_idx = 2
ax1.plot(node_counts, horizontal_times[workload_idx], 'o-', label='Horizontal Scaling',
         linewidth=2, markersize=8, color='#06A77D')
ax1.plot(node_counts, vertical_times[workload_idx], 's-', label='Vertical Scaling',
         linewidth=2, markersize=8, color='#E63946')
ax1.set_xlabel('Number of Nodes', fontsize=12)
ax1.set_ylabel('Processing Time (seconds)', fontsize=12)
ax1.set_title(f'Scalability Comparison (Workload = {workloads[workload_idx]})',
              fontsize=13, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)
ax1.set_xscale('log', base=2)

# Cost analysis
# Assume: Horizontal node cost = $100/node, Vertical upgrade cost increases exponentially
horizontal_costs = [100 * n for n in node_counts]
vertical_costs = [100 * (2 ** (n-1)) for n in range(1, len(node_counts)+1)]

ax2.plot(node_counts, horizontal_costs, 'o-', label='Horizontal Scaling Cost',
         linewidth=2, markersize=8, color='#06A77D')
ax2.plot(node_counts, vertical_costs, 's-', label='Vertical Scaling Cost',
         linewidth=2, markersize=8, color='#E63946')
ax2.set_xlabel('Scaling Level', fontsize=12)
ax2.set_ylabel('Cost ($)', fontsize=12)
ax2.set_title('Cost Comparison: Horizontal vs Vertical Scaling',
              fontsize=13, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)
ax2.set_yscale('log')

plt.tight_layout()
plt.show()

print("üìä Key Insight:")
print("   Horizontal scaling becomes more cost-effective as demand grows.")
print(f"   At 16x scaling: Horizontal = ${horizontal_costs[-1]}, Vertical = ${vertical_costs[-1]}")

### 3.2 Reliability and Fault Tolerance

Distributed systems can maintain service even when individual components fail.

**Example**: Facebook's Maelstrom system replicates data across multiple data centers.

In [None]:
import random

class DistributedSystem:
    """
    Simulate a distributed system with fault tolerance.
    """
    def __init__(self, num_nodes, replication_factor=3):
        """
        Initialize distributed system.
        
        Parameters:
        -----------
        num_nodes : int
            Total number of nodes in the system
        replication_factor : int
            How many copies of data to maintain
        """
        self.num_nodes = num_nodes
        self.replication_factor = replication_factor
        self.nodes = {f"Node_{i}": True for i in range(num_nodes)}  # True = healthy
        
    def simulate_failure(self, failure_rate=0.1):
        """
        Randomly fail some nodes based on failure rate.
        """
        for node in self.nodes:
            if random.random() < failure_rate:
                self.nodes[node] = False
    
    def check_data_availability(self):
        """
        Check if data is still available despite failures.
        Data is available if at least one replica survives.
        """
        healthy_nodes = sum(1 for status in self.nodes.values() if status)
        return healthy_nodes >= 1  # At least one node must be healthy
    
    def get_system_status(self):
        """
        Return current system status.
        """
        healthy = sum(1 for status in self.nodes.values() if status)
        failed = self.num_nodes - healthy
        availability = (healthy / self.num_nodes) * 100
        
        return {
            'healthy_nodes': healthy,
            'failed_nodes': failed,
            'availability': availability,
            'data_available': self.check_data_availability()
        }

# Simulate fault tolerance
print("=" * 60)
print("FAULT TOLERANCE SIMULATION")
print("=" * 60)

# Create systems with different replication factors
systems = {
    'No Replication': DistributedSystem(10, replication_factor=1),
    '3x Replication': DistributedSystem(10, replication_factor=3),
    '5x Replication': DistributedSystem(10, replication_factor=5)
}

# Run simulation 1000 times
num_simulations = 1000
failure_rate = 0.2  # 20% node failure rate

results = {name: [] for name in systems.keys()}

for _ in range(num_simulations):
    for name, system in systems.items():
        # Reset system
        system.nodes = {f"Node_{i}": True for i in range(system.num_nodes)}
        
        # Simulate failures
        system.simulate_failure(failure_rate)
        
        # Check if data is still available
        status = system.get_system_status()
        results[name].append(status['data_available'])

# Calculate reliability
print(f"\nSimulation Results ({num_simulations} runs, {failure_rate*100}% failure rate):\n")
for name, availability_list in results.items():
    reliability = (sum(availability_list) / num_simulations) * 100
    print(f"{name:20s}: {reliability:6.2f}% data availability")

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
reliabilities = [(sum(results[name]) / num_simulations) * 100 for name in systems.keys()]
colors = ['#E63946', '#F1FAEE', '#06A77D']
bars = ax.bar(systems.keys(), reliabilities, color=colors, edgecolor='black', linewidth=1.5)

ax.set_ylabel('Data Availability (%)', fontsize=12)
ax.set_title('Impact of Replication on System Reliability', fontsize=14, fontweight='bold')
ax.set_ylim(0, 105)
ax.axhline(y=99.9, color='r', linestyle='--', label='99.9% SLA Target')
ax.legend(fontsize=10)
ax.grid(axis='y', alpha=0.3)

# Add value labels
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.1f}%',
            ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

### 3.3 Performance Through Parallelism

In [None]:
import time
from multiprocessing import Pool
import math

def compute_intensive_task(n):
    """
    Simulate a compute-intensive task.
    Calculate sum of square roots from 1 to n.
    """
    result = sum(math.sqrt(i) for i in range(1, n + 1))
    return result

# Test with different workload sizes
workload_sizes = [1000000, 2000000, 3000000, 4000000]
process_counts = [1, 2, 4, 8]

print("Performance Comparison: Sequential vs Distributed Processing\n")
print(f"{'Workload':>12s} | {'Processes':>10s} | {'Time (s)':>10s} | {'Speedup':>8s}")
print("-" * 50)

for workload in workload_sizes:
    # Divide workload into chunks
    base_time = None
    
    for num_processes in process_counts:
        chunk_size = workload // num_processes
        chunks = [chunk_size] * num_processes
        
        start_time = time.time()
        
        if num_processes == 1:
            # Sequential
            results = [compute_intensive_task(c) for c in chunks]
        else:
            # Parallel
            with Pool(processes=num_processes) as pool:
                results = pool.map(compute_intensive_task, chunks)
        
        elapsed_time = time.time() - start_time
        
        if base_time is None:
            base_time = elapsed_time
            speedup_str = "-"
        else:
            speedup = base_time / elapsed_time
            speedup_str = f"{speedup:.2f}x"
        
        print(f"{workload:12,d} | {num_processes:10d} | {elapsed_time:10.4f} | {speedup_str:>8s}")
    
    print("-" * 50)

print("\nüí° Insight: Performance improves with more processes, but speedup is not linear")
print("   due to communication overhead and Amdahl's Law.")

### 3.4 Geographical Distribution

Distributed systems can place resources closer to users, reducing latency.

In [None]:
# Simulate latency for different geographical distributions
def simulate_latency(user_location, server_locations, distributed=False):
    """
    Simulate network latency based on geographical distance.
    
    Parameters:
    -----------
    user_location : tuple
        (latitude, longitude) of user
    server_locations : dict
        Dictionary of server locations with their coordinates
    distributed : bool
        If True, use nearest server; if False, use specific server
    
    Returns:
    --------
    float : Latency in milliseconds
    """
    def calculate_distance(loc1, loc2):
        # Simplified distance calculation (Euclidean)
        return np.sqrt((loc1[0] - loc2[0])**2 + (loc1[1] - loc2[1])**2)
    
    if distributed:
        # Find nearest server
        distances = {name: calculate_distance(user_location, loc) 
                    for name, loc in server_locations.items()}
        nearest_distance = min(distances.values())
    else:
        # Use specific server (e.g., US East)
        nearest_distance = calculate_distance(user_location, server_locations['US East'])
    
    # Latency increases with distance (roughly 1ms per 100km)
    latency = nearest_distance * 10  # Simplified: 10ms per degree
    return latency

# Define server locations (simplified coordinates)
servers = {
    'US East': (40, -75),      # New York area
    'US West': (37, -122),     # California
    'Europe': (51, 0),         # London
    'Asia Pacific': (1, 103),  # Singapore
    'Australia': (-33, 151)    # Sydney
}

# Simulate users from different locations
user_locations = {
    'New York User': (40, -74),
    'London User': (51, -0.1),
    'Singapore User': (1.3, 103.8),
    'Sydney User': (-33.9, 151.2),
    'Mumbai User': (19, 72.8)
}

# Calculate latencies
print("\n" + "=" * 70)
print("GEOGRAPHICAL DISTRIBUTION IMPACT ON LATENCY")
print("=" * 70)
print(f"\n{'User Location':20s} | {'Centralized (ms)':>18s} | {'Distributed (ms)':>18s} | {'Improvement':>12s}")
print("-" * 70)

centralized_latencies = []
distributed_latencies = []
user_names = []

for user_name, user_loc in user_locations.items():
    centralized = simulate_latency(user_loc, servers, distributed=False)
    distributed = simulate_latency(user_loc, servers, distributed=True)
    improvement = ((centralized - distributed) / centralized) * 100
    
    centralized_latencies.append(centralized)
    distributed_latencies.append(distributed)
    user_names.append(user_name)
    
    print(f"{user_name:20s} | {centralized:18.1f} | {distributed:18.1f} | {improvement:11.1f}%")

print("-" * 70)

# Visualize latency comparison
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(user_names))
width = 0.35

bars1 = ax.bar(x - width/2, centralized_latencies, width, label='Centralized (US East only)',
               color='#E63946', edgecolor='black', linewidth=1.5)
bars2 = ax.bar(x + width/2, distributed_latencies, width, label='Distributed (Nearest Server)',
               color='#06A77D', edgecolor='black', linewidth=1.5)

ax.set_ylabel('Latency (milliseconds)', fontsize=12)
ax.set_title('Impact of Geographical Distribution on Network Latency', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels([name.replace(' User', '') for name in user_names], rotation=45, ha='right')
ax.legend(fontsize=11)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

avg_improvement = np.mean([(c - d) / c * 100 for c, d in zip(centralized_latencies, distributed_latencies)])
print(f"\nüìà Average latency improvement with distributed servers: {avg_improvement:.1f}%")

## 4. Challenges in Distributed Systems

### Key Challenges:

1. **Avoid Single Point of Failure (SPOF)**
2. **Replication**: Maintaining consistent copies of data
3. **Availability and Performance**: Trade-offs (CAP theorem)
4. **Resource Naming and Addressing**: Finding and accessing resources
5. **Binding**: Mapping between system components

### 4.1 Single Point of Failure (SPOF) Demo

In [None]:
class ServiceWithSPOF:
    """
    Simulates a service with a single point of failure.
    """
    def __init__(self):
        self.critical_component_healthy = True
        self.requests_served = 0
        self.requests_failed = 0
    
    def handle_request(self):
        if self.critical_component_healthy:
            self.requests_served += 1
            return True
        else:
            self.requests_failed += 1
            return False
    
    def simulate_failure(self):
        self.critical_component_healthy = False

class ResilientService:
    """
    Simulates a resilient service with redundancy.
    """
    def __init__(self, num_replicas=3):
        self.replicas = [True] * num_replicas
        self.requests_served = 0
        self.requests_failed = 0
    
    def handle_request(self):
        # Service is available if at least one replica is healthy
        if any(self.replicas):
            self.requests_served += 1
            return True
        else:
            self.requests_failed += 1
            return False
    
    def simulate_random_failures(self, failure_probability=0.1):
        for i in range(len(self.replicas)):
            if random.random() < failure_probability:
                self.replicas[i] = False

# Run simulation
num_requests = 10000
failure_happens_at = num_requests // 2

service_spof = ServiceWithSPOF()
service_resilient = ResilientService(num_replicas=3)

print("\nSimulating 10,000 requests with failure at request 5,000...\n")

for i in range(num_requests):
    # Simulate failure halfway through
    if i == failure_happens_at:
        service_spof.simulate_failure()
        service_resilient.simulate_random_failures(failure_probability=0.3)
    
    service_spof.handle_request()
    service_resilient.handle_request()

# Results
print(f"{'Service Type':30s} | {'Served':>10s} | {'Failed':>10s} | {'Success Rate':>15s}")
print("-" * 70)

spof_success = (service_spof.requests_served / num_requests) * 100
resilient_success = (service_resilient.requests_served / num_requests) * 100

print(f"{'Service with SPOF':30s} | {service_spof.requests_served:10,d} | {service_spof.requests_failed:10,d} | {spof_success:14.2f}%")
print(f"{'Resilient Service (3 replicas)':30s} | {service_resilient.requests_served:10,d} | {service_resilient.requests_failed:10,d} | {resilient_success:14.2f}%")

print(f"\n‚úÖ Resilient service maintained {resilient_success:.1f}% availability despite failures!")

## 5. Tightly-Coupled vs Loosely-Coupled Systems

### 5.1 Tightly-Coupled Systems (Parallel Processing)

**Characteristics**:
- Processors physically part of same computer
- Connected by high-speed backplane bus or on same motherboard/chip
- Shared clock (synchronization possible)
- Shared memory (fast inter-processor communication)
- Examples: Multi-core CPUs (2-64 cores)

**Advantages**:
- Fast communication
- Easier synchronization
- Simple programming model

**Disadvantages**:
- Fixed architecture
- Expensive
- Limited scalability

### 5.2 Loosely-Coupled Systems (Distributed Computing)

**Characteristics**:
- Processors in separate computers
- Connected by network technology
- Each computer has its own clock (loose synchronization needed)
- Separate memory (message passing for communication)
- Heterogeneous (different OS, hardware)
- Autonomous nodes

**Advantages**:
- Scalable
- Cost-effective (commodity hardware)
- Flexible growth
- Geographical distribution

**Disadvantages**:
- Complex synchronization
- Network overhead
- Harder to program

In [None]:
# Comparison visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Tightly-coupled architecture
ax1 = axes[0, 0]
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 10)
ax1.axis('off')
ax1.set_title('Tightly-Coupled System\n(Shared Memory Multiprocessor)', 
              fontsize=12, fontweight='bold')

# Shared memory
memory = patches.Rectangle((2, 7), 6, 1.5, edgecolor='#2E86AB', 
                           facecolor='#A8DADC', linewidth=2)
ax1.add_patch(memory)
ax1.text(5, 7.75, 'Shared Memory', ha='center', va='center', 
         fontsize=10, fontweight='bold')

# CPUs
cpu_positions = [2.5, 4, 5.5, 7]
for i, x in enumerate(cpu_positions):
    cpu = patches.Rectangle((x, 4), 1, 1.5, edgecolor='#E63946',
                           facecolor='#FFB4A2', linewidth=2)
    ax1.add_patch(cpu)
    ax1.text(x + 0.5, 4.75, f'CPU\n{i}', ha='center', va='center', fontsize=9)
    # Connection to memory
    ax1.arrow(x + 0.5, 5.6, 0, 1.2, head_width=0.15, head_length=0.1, 
             fc='black', ec='black', linewidth=1.5)

# Bus
ax1.plot([1.5, 8.5], [3.5, 3.5], 'k-', linewidth=3)
ax1.text(5, 3, 'High-Speed Bus', ha='center', fontsize=9, style='italic')

# Clock
clock = patches.Circle((5, 1.5), 0.5, edgecolor='black', facecolor='#F1FAEE', linewidth=2)
ax1.add_patch(clock)
ax1.text(5, 1.5, 'CLK', ha='center', va='center', fontsize=8, fontweight='bold')
ax1.text(5, 0.5, 'Shared Clock', ha='center', fontsize=9)

# Loosely-coupled architecture
ax2 = axes[0, 1]
ax2.set_xlim(0, 10)
ax2.set_ylim(0, 10)
ax2.axis('off')
ax2.set_title('Loosely-Coupled System\n(Distributed Network)', 
              fontsize=12, fontweight='bold')

# Network cloud
network = patches.Ellipse((5, 5), 4, 2.5, edgecolor='#06A77D', 
                         facecolor='#D4F1F4', linewidth=2)
ax2.add_patch(network)
ax2.text(5, 5, 'Network', ha='center', va='center', fontsize=11, fontweight='bold')

# Nodes with separate memory and clock
node_positions = [(2, 8.5), (8, 8.5), (2, 1.5), (8, 1.5)]
for i, (x, y) in enumerate(node_positions):
    # Node box
    node = patches.FancyBboxPatch((x-0.7, y-0.6), 1.4, 1.2, 
                                  boxstyle="round,pad=0.05",
                                  edgecolor='#E63946', facecolor='#FFE5E5', linewidth=2)
    ax2.add_patch(node)
    ax2.text(x, y + 0.2, f'Node {i}', ha='center', fontsize=9, fontweight='bold')
    ax2.text(x, y - 0.2, 'Mem+CLK', ha='center', fontsize=7)
    
    # Connection to network
    if y > 5:
        ax2.plot([x, x], [y - 0.6, 6.5], 'k--', linewidth=1.5, alpha=0.5)
    else:
        ax2.plot([x, x], [y + 0.6, 3.5], 'k--', linewidth=1.5, alpha=0.5)

# Performance comparison
ax3 = axes[1, 0]
categories = ['Communication\nSpeed', 'Scalability', 'Cost\nEfficiency', 'Fault\nTolerance']
tight_scores = [9, 3, 2, 4]
loose_scores = [4, 9, 9, 8]

x_pos = np.arange(len(categories))
width = 0.35

bars1 = ax3.bar(x_pos - width/2, tight_scores, width, label='Tightly-Coupled',
               color='#FFB4A2', edgecolor='black')
bars2 = ax3.bar(x_pos + width/2, loose_scores, width, label='Loosely-Coupled',
               color='#06A77D', edgecolor='black')

ax3.set_ylabel('Score (0-10)', fontsize=10)
ax3.set_title('Performance Comparison', fontsize=12, fontweight='bold')
ax3.set_xticks(x_pos)
ax3.set_xticklabels(categories, fontsize=9)
ax3.set_ylim(0, 10)
ax3.legend(fontsize=9)
ax3.grid(axis='y', alpha=0.3)

# Use cases
ax4 = axes[1, 1]
ax4.axis('off')
ax4.set_title('Typical Use Cases', fontsize=12, fontweight='bold')

tight_uses = [
    "‚Ä¢ Scientific simulations",
    "‚Ä¢ Real-time data processing",
    "‚Ä¢ High-performance computing",
    "‚Ä¢ Shared-memory algorithms"
]

loose_uses = [
    "‚Ä¢ Web services",
    "‚Ä¢ Cloud computing",
    "‚Ä¢ Big data processing",
    "‚Ä¢ Microservices architecture"
]

ax4.text(0.1, 0.8, 'Tightly-Coupled:', fontsize=11, fontweight='bold', color='#E63946')
ax4.text(0.1, 0.6, '\n'.join(tight_uses), fontsize=9, verticalalignment='top')

ax4.text(0.1, 0.4, 'Loosely-Coupled:', fontsize=11, fontweight='bold', color='#06A77D')
ax4.text(0.1, 0.2, '\n'.join(loose_uses), fontsize=9, verticalalignment='top')

plt.tight_layout()
plt.show()

## 6. Exercises

### Exercise 1: Analyze a Distributed System

**Task**: Research a real-world distributed system (e.g., Netflix, Amazon AWS, Google Search) and analyze:

1. What components are distributed?
2. What benefits does distribution provide?
3. What challenges does the system face?
4. Is it tightly-coupled or loosely-coupled?

Write your analysis below:

**Your Analysis Here**:

System chosen: ________________

1. Distributed components:
   -
   -

2. Benefits:
   -
   -

3. Challenges:
   -
   -

4. Architecture type: ________________


### Exercise 2: Implement CAP Theorem Simulation

**Task**: The CAP theorem states that a distributed system can only guarantee 2 out of 3 properties:
- **C**onsistency
- **A**vailability  
- **P**artition tolerance

Implement a simulation that demonstrates this trade-off.

**Hints**:
- Create a distributed key-value store with 3 replicas
- Simulate network partitions
- Show the trade-off between consistency and availability during partitions

In [None]:
class DistributedKeyValueStore:
    """
    Simulates a distributed key-value store.
    """
    def __init__(self, num_replicas=3):
        # TODO: Initialize replicas
        pass
    
    def write(self, key, value, consistency_level='strong'):
        """
        Write operation with configurable consistency.
        
        consistency_level:
        - 'strong': Wait for all replicas (CP)
        - 'eventual': Write to one replica, propagate later (AP)
        """
        # TODO: Implement write with consistency guarantees
        pass
    
    def read(self, key, consistency_level='strong'):
        """
        Read operation with configurable consistency.
        """
        # TODO: Implement read with consistency guarantees
        pass
    
    def simulate_partition(self, partition_nodes):
        """
        Simulate network partition.
        """
        # TODO: Implement partition simulation
        pass

# TODO: Test CAP theorem scenarios


### Exercise 3: Latency vs Throughput Trade-off

**Task**: Implement a simulation that demonstrates the trade-off between latency and throughput in distributed systems.

**Requirements**:
1. Create a message queue system
2. Implement batch processing vs real-time processing
3. Measure latency and throughput for different batch sizes
4. Visualize the trade-off

In [None]:
class MessageQueueSystem:
    def __init__(self, batch_size=1):
        """
        Initialize message queue.
        
        batch_size: Number of messages to process together
                   1 = real-time, >1 = batched
        """
        self.batch_size = batch_size
        self.queue = []
        self.latencies = []
        self.throughput = 0
    
    def send_message(self, message):
        # TODO: Implement message sending
        pass
    
    def process_batch(self):
        # TODO: Implement batch processing
        pass

# TODO: Test with different batch sizes and visualize results


## 7. Summary

In this module, you have learned:

‚úÖ **Definition** of distributed systems and their key characteristics  
‚úÖ **Differences** between traditional and distributed architectures  
‚úÖ **Benefits** of distributed systems:
   - Scalability (horizontal vs vertical)
   - Reliability and fault tolerance
   - Performance through parallelism
   - Geographical distribution

‚úÖ **Challenges** in building distributed systems:
   - Single points of failure
   - Replication and consistency
   - Resource naming and addressing

‚úÖ **Architecture types**:
   - Tightly-coupled (shared memory, multiprocessor)
   - Loosely-coupled (message passing, distributed)

### Key Takeaways

1. **Distribution is about trade-offs**: No single solution is best for all scenarios
2. **Scalability types matter**: Horizontal scaling is more cost-effective long-term
3. **Replication improves reliability**: But adds complexity
4. **Location matters**: Geographical distribution reduces latency
5. **CAP theorem**: You can't have all three (Consistency, Availability, Partition tolerance)

### What's Next?

In **Module 02**, we will explore:
- Distribution system architectures (Client-Server, Peer-to-Peer, Workstation-Server, Processor Pool)
- Transparency in distributed systems (8 types)
- Design principles for distributed systems

---

### Further Reading

- Tanenbaum & Van Steen: "Distributed Systems: Principles and Paradigms"
- [CAP Theorem Explained](https://www.ibm.com/cloud/learn/cap-theorem)
- [Facebook's Maelstrom System](https://engineering.fb.com/2018/06/20/core-data/maelstrom/)
- [Scale-Out vs Scale-Up](https://www.mongodb.com/basics/scaling)

**Course**: BMCS3003 - Distributed Systems and Parallel Computing  