# Module 02: Transparency and System Architectures

**Difficulty**: ‚≠ê‚≠ê‚≠ê  
**Estimated Time**: 90 minutes  
**Prerequisites**: Module 01 - Introduction to Distributed Systems

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Define** and **explain** the 8 types of transparency in distributed systems
2. **Compare** different distributed system architectures (Workstation-Server, Processor Pool)
3. **Implement** transparency mechanisms in Python
4. **Analyze** transparency trade-offs in real-world systems
5. **Design** distributed applications with appropriate transparency levels

---

## 1. What is Transparency?

### Definition

> **Transparency means hiding the details of distribution from users and applications, making the distributed system appear as a single coherent system.**

### Goal

**Reduce the burden on developers** so they can focus on business logic rather than dealing with the complexity of distribution.

### The 8 Types of Transparency

| Type | Description | Example |
|------|-------------|----------|
| **Access** | Local and remote resources accessed with same operations | `file.read()` works for local and network files |
| **Location** | Resources accessed without knowledge of their location | URL instead of physical server IP |
| **Concurrency** | Multiple processes can use shared objects without interference | Database transactions |
| **Replication** | Multiple copies of resources without visible effects | CDN content copies |
| **Failure** | Faults concealed, system continues despite failures | Auto-failover to backup server |
| **Migration** | Resources can move without affecting operations | Live VM migration |
| **Performance** | System reconfigures to improve performance as load varies | Auto-scaling |
| **Scaling** | System expands without changing structure/algorithms | Add nodes without code changes |

In [None]:
# Setup
import numpy as np
import matplotlib.pyplot as plt
import time
import random
from datetime import datetime
import threading

%matplotlib inline
np.random.seed(42)

## 2. Access Transparency

**Definition**: Local and remote objects accessed with the same operations.

**Benefit**: Developers don't need different code for local vs remote resources.

In [None]:
class TransparentFileSystem:
    """
    Demonstrates access transparency - same interface for local and remote files.
    """
    
    def read_file(self, path):
        """
        Read file from local or remote location transparently.
        User doesn't need to know if file is local or remote.
        """
        if path.startswith('http://') or path.startswith('https://'):
            # Remote file - simulate network fetch
            print(f"üì° Fetching remote file: {path}")
            time.sleep(0.5)  # Simulate network latency
            return f"[Remote content from {path}]"
        else:
            # Local file
            print(f"üìÅ Reading local file: {path}")
            return f"[Local content from {path}]"
    
    def write_file(self, path, content):
        """
        Write file to local or remote location transparently.
        """
        if path.startswith('http://') or path.startswith('https://'):
            print(f"üì§ Writing to remote file: {path}")
            time.sleep(0.5)
            return f"Written to remote: {path}"
        else:
            print(f"üíæ Writing to local file: {path}")
            return f"Written to local: {path}"

# Demonstrate access transparency
fs = TransparentFileSystem()

print("=" * 60)
print("ACCESS TRANSPARENCY DEMONSTRATION")
print("=" * 60)
print("\nüë§ User code (same for local and remote files):\n")

# User doesn't need to know if files are local or remote
files = [
    '/home/user/document.txt',
    'https://example.com/data/remote_file.txt'
]

for file_path in files:
    content = fs.read_file(file_path)
    print(f"   Content: {content}\n")

print("‚úÖ Same interface used for both local and remote files!")
print("   This is ACCESS TRANSPARENCY in action.")

## 3. Location Transparency

**Definition**: Resources can be accessed without knowledge of their physical location.

**Example**: Domain names instead of IP addresses, database connection strings.

In [None]:
class LocationTransparentService:
    """
    Demonstrates location transparency using service discovery.
    """
    
    def __init__(self):
        # Service registry maps logical names to physical locations
        self.service_registry = {
            'user-service': ['192.168.1.10:8080', '192.168.1.11:8080', '192.168.1.12:8080'],
            'payment-service': ['10.0.0.5:9000', '10.0.0.6:9000'],
            'email-service': ['172.16.0.20:7000']
        }
    
    def call_service(self, service_name, method, data):
        """
        Call a service by logical name, not physical location.
        System automatically finds the actual server.
        """
        if service_name not in self.service_registry:
            return f"‚ùå Service '{service_name}' not found"
        
        # Load balancing: pick a random server from available instances
        available_servers = self.service_registry[service_name]
        selected_server = random.choice(available_servers)
        
        print(f"üîç Service: {service_name}")
        print(f"üìç Routed to: {selected_server}")
        print(f"‚öôÔ∏è  Method: {method}({data})")
        
        # Simulate service call
        time.sleep(0.1)
        return f"‚úÖ Response from {service_name}"

# Demonstrate location transparency
print("=" * 60)
print("LOCATION TRANSPARENCY DEMONSTRATION")
print("=" * 60)
print("\nüë§ Application developer doesn't know/care about server IPs:\n")

service = LocationTransparentService()

# Application code uses logical service names
requests = [
    ('user-service', 'getUser', {'id': 123}),
    ('payment-service', 'processPayment', {'amount': 99.99}),
    ('user-service', 'updateProfile', {'id': 123, 'name': 'Alice'})
]

for service_name, method, data in requests:
    result = service.call_service(service_name, method, data)
    print(f"   {result}\n")

print("‚úÖ Services accessed by logical name, not IP address!")
print("   Servers can move to different IPs without affecting clients.")

## 4. Failure Transparency

**Definition**: Faults are concealed so applications can continue without knowledge that a fault occurred.

**Mechanism**: Automatic failover, retry logic, circuit breakers.

In [None]:
class FailureTransparentDatabase:
    """
    Database with automatic failover - failures hidden from application.
    """
    
    def __init__(self, num_replicas=3):
        self.replicas = {
            f'replica_{i}': {'healthy': True, 'data': {}}
            for i in range(num_replicas)
        }
        self.primary = 'replica_0'
    
    def _simulate_random_failure(self):
        """Randomly fail some replicas to simulate real-world failures."""
        for replica_name in self.replicas:
            if random.random() < 0.15:  # 15% failure chance
                self.replicas[replica_name]['healthy'] = False
                print(f"   ‚ö†Ô∏è  {replica_name} has failed!")
    
    def _get_healthy_replica(self):
        """Find a healthy replica to serve requests."""
        # Try primary first
        if self.replicas[self.primary]['healthy']:
            return self.primary
        
        # Failover to backup
        for replica_name, replica in self.replicas.items():
            if replica['healthy']:
                print(f"   üîÑ Failover: Using {replica_name} instead of {self.primary}")
                return replica_name
        
        return None
    
    def query(self, key):
        """
        Execute query with automatic failover.
        Application doesn't see failures - they're handled transparently.
        """
        max_retries = 3
        
        for attempt in range(max_retries):
            # Simulate potential failures
            if attempt > 0:
                self._simulate_random_failure()
            
            # Find healthy replica
            replica_name = self._get_healthy_replica()
            
            if replica_name:
                # Query successful
                print(f"   ‚úÖ Query successful on {replica_name}")
                return f"Data for '{key}' from {replica_name}"
            
            # All replicas down, retry
            print(f"   üîÅ Retry {attempt + 1}/{max_retries}...")
            time.sleep(0.1)
        
        return "‚ùå All replicas unavailable"

# Demonstrate failure transparency
print("=" * 60)
print("FAILURE TRANSPARENCY DEMONSTRATION")
print("=" * 60)
print("\nSimulating multiple queries with random failures:\n")

db = FailureTransparentDatabase(num_replicas=3)

for i in range(5):
    print(f"\nQuery {i+1}:")
    result = db.query(f'key_{i}')
    print(f"   üì¶ Result: {result}")

print("\n‚úÖ Application continues working despite individual replica failures!")
print("   This is FAILURE TRANSPARENCY - faults are hidden from the application.")

## 5. Replication Transparency

**Definition**: Multiple copies of resources can be created without applications seeing the effects of replication.

**Example**: CDN distributes copies of content worldwide, but users see one consistent version.

In [None]:
class ReplicationTransparentCache:
    """
    Demonstrates replication transparency with automatic consistency.
    """
    
    def __init__(self, num_replicas=4):
        # Multiple replicas of the cache
        self.replicas = [{}  for _ in range(num_replicas)]
        self.replica_locations = ['US-East', 'US-West', 'Europe', 'Asia']
    
    def write(self, key, value):
        """
        Write to all replicas to maintain consistency.
        Application sees single write operation.
        """
        print(f"\n‚úçÔ∏è  Writing '{key}' = '{value}'")
        
        for i, replica in enumerate(self.replicas):
            replica[key] = value
            print(f"   ‚úì Replicated to {self.replica_locations[i]}")
        
        return f"Written: {key} = {value}"
    
    def read(self, key, user_location='US-East'):
        """
        Read from nearest replica for better performance.
        Application doesn't know multiple copies exist.
        """
        # Find nearest replica based on user location
        try:
            replica_index = self.replica_locations.index(user_location)
        except ValueError:
            replica_index = 0  # Default to first replica
        
        value = self.replicas[replica_index].get(key, 'Not found')
        print(f"\nüìñ Reading '{key}' from {self.replica_locations[replica_index]}")
        print(f"   Value: {value}")
        
        return value
    
    def show_all_replicas(self):
        """Show internal state of all replicas (for educational purposes)."""
        print("\nüìä Internal State (all replicas):")
        for i, replica in enumerate(self.replicas):
            print(f"   {self.replica_locations[i]}: {replica}")

# Demonstrate replication transparency
print("=" * 60)
print("REPLICATION TRANSPARENCY DEMONSTRATION")
print("=" * 60)

cache = ReplicationTransparentCache()

# Application writes data (unaware of replication)
cache.write('user:123', 'Alice')
cache.write('user:456', 'Bob')

# Different users read from different locations
# They all get consistent data despite reading from different replicas
cache.read('user:123', user_location='US-East')
cache.read('user:123', user_location='Europe')
cache.read('user:456', user_location='Asia')

# Show that data is actually replicated
cache.show_all_replicas()

print("\n‚úÖ Application sees single logical cache, not multiple replicas!")
print("   Data is consistent across all locations.")

## 6. Performance and Scaling Transparency

**Performance Transparency**: System reconfigures to improve performance as load varies.

**Scaling Transparency**: System expands without changing structure or algorithms.

In [None]:
class AutoScalingLoadBalancer:
    """
    Demonstrates performance and scaling transparency with auto-scaling.
    """
    
    def __init__(self, initial_servers=2, max_servers=10):
        self.servers = [{'id': i, 'load': 0} for i in range(initial_servers)]
        self.max_servers = max_servers
        self.total_requests = 0
    
    def handle_request(self):
        """
        Handle incoming request with automatic load balancing and scaling.
        """
        self.total_requests += 1
        
        # Find server with lowest load
        server = min(self.servers, key=lambda s: s['load'])
        server['load'] += 1
        
        # Auto-scale if needed
        avg_load = sum(s['load'] for s in self.servers) / len(self.servers)
        
        if avg_load > 80 and len(self.servers) < self.max_servers:
            # Scale up
            new_server_id = len(self.servers)
            self.servers.append({'id': new_server_id, 'load': 0})
            print(f"   üìà Scaled UP: Added server {new_server_id} (avg load: {avg_load:.1f})")
        
        elif avg_load < 30 and len(self.servers) > 2:
            # Scale down
            removed = self.servers.pop()
            print(f"   üìâ Scaled DOWN: Removed server {removed['id']} (avg load: {avg_load:.1f})")
        
        return server['id']
    
    def process_load(self, num_requests):
        """
        Simulate processing multiple requests.
        """
        print(f"\nüöÄ Processing {num_requests} requests...")
        for _ in range(num_requests):
            self.handle_request()
        
        # Decrease load after processing
        for server in self.servers:
            server['load'] = max(0, server['load'] - num_requests // len(self.servers))
    
    def show_status(self):
        print(f"\nüìä Current Status:")
        print(f"   Total servers: {len(self.servers)}")
        print(f"   Total requests handled: {self.total_requests}")
        avg_load = sum(s['load'] for s in self.servers) / len(self.servers)
        print(f"   Average load: {avg_load:.1f}")

# Demonstrate auto-scaling
print("=" * 60)
print("PERFORMANCE & SCALING TRANSPARENCY DEMONSTRATION")
print("=" * 60)

lb = AutoScalingLoadBalancer(initial_servers=2)

# Simulate varying load
load_patterns = [
    (50, "Low traffic"),
    (200, "Peak traffic"),
    (300, "High peak"),
    (100, "Normal traffic"),
    (50, "Low traffic again")
]

for num_requests, description in load_patterns:
    print(f"\n{'='*50}")
    print(f"üìà Load Pattern: {description}")
    lb.process_load(num_requests)
    lb.show_status()

print("\n" + "="*60)
print("‚úÖ System automatically scaled up/down based on load!")
print("   Application code didn't change - SCALING TRANSPARENCY!")

## 7. System Architectures

### 7.1 Workstation-Server Model

**Characteristics**:
- Powerful servers host services (file, database, web)
- Workstations are clients
- **"A server is a process, not a computer!"**

**Example**: Traditional web applications, file servers

### 7.2 Processor Pool Model

**Characteristics**:
- Pool of processors available for allocation
- Processors assigned dynamically based on demand
- **Grid Computing** is based on this model
- Can span multiple organizations

**Example**: Cloud computing, grid computing, serverless

In [None]:
# Visualize different architectures
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Workstation-Server Model
ax1.set_xlim(0, 10)
ax1.set_ylim(0, 10)
ax1.axis('off')
ax1.set_title('Workstation-Server Model', fontsize=14, fontweight='bold')

# Servers
server_positions = [(5, 8), (3, 6.5), (7, 6.5)]
server_labels = ['Web Server', 'DB Server', 'File Server']
for (x, y), label in zip(server_positions, server_labels):
    circle = plt.Circle((x, y), 0.8, color='#E63946', alpha=0.7)
    ax1.add_patch(circle)
    ax1.text(x, y, label, ha='center', va='center', fontsize=9, fontweight='bold', color='white')

# Clients/Workstations
client_positions = [(2, 3), (4, 3), (6, 3), (8, 3), (3, 1), (7, 1)]
for i, (x, y) in enumerate(client_positions):
    rect = plt.Rectangle((x-0.4, y-0.3), 0.8, 0.6, color='#06A77D', alpha=0.7)
    ax1.add_patch(rect)
    ax1.text(x, y, f'WS{i+1}', ha='center', va='center', fontsize=8, color='white')
    # Connection lines
    if i < 3:
        ax1.plot([x, server_positions[0][0]], [y+0.3, server_positions[0][1]-0.8], 
                'k--', alpha=0.3, linewidth=1)

ax1.text(5, 0.3, 'Clients connect to dedicated servers', ha='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Processor Pool Model
ax2.set_xlim(0, 10)
ax2.set_ylim(0, 10)
ax2.axis('off')
ax2.set_title('Processor Pool Model (Grid Computing)', fontsize=14, fontweight='bold')

# Pool of processors
pool_positions = [(2, 7), (3.5, 7), (5, 7), (6.5, 7), (8, 7),
                  (2, 5.5), (3.5, 5.5), (5, 5.5), (6.5, 5.5), (8, 5.5)]
for i, (x, y) in enumerate(pool_positions):
    square = plt.Rectangle((x-0.4, y-0.4), 0.8, 0.8, color='#457B9D', alpha=0.6)
    ax2.add_patch(square)
    ax2.text(x, y, f'P{i+1}', ha='center', va='center', fontsize=7, color='white')

# Pool label
ax2.text(5, 8.5, 'Processor Pool', ha='center', fontsize=12, fontweight='bold',
         bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))

# Terminals
terminal_positions = [(2, 2), (5, 2), (8, 2)]
for i, (x, y) in enumerate(terminal_positions):
    circle = plt.Circle((x, y), 0.5, color='#06A77D', alpha=0.7)
    ax2.add_patch(circle)
    ax2.text(x, y, f'T{i+1}', ha='center', va='center', fontsize=9, color='white', fontweight='bold')
    # Dynamic allocation lines
    for px, py in pool_positions[i*3:(i+1)*3]:
        ax2.plot([x, px], [y+0.5, py-0.4], 'r--', alpha=0.2, linewidth=1.5)

ax2.text(5, 0.5, 'Processors dynamically allocated to terminals', ha='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.show()

print("üìä Architecture Comparison:\n")
print("Workstation-Server:")
print("  ‚úì Fixed server roles")
print("  ‚úì Clients know which server to contact")
print("  ‚úì Example: Traditional web apps\n")

print("Processor Pool:")
print("  ‚úì Dynamic resource allocation")
print("  ‚úì Better resource utilization")
print("  ‚úì Example: Cloud computing, serverless")

## 8. Transparency Trade-offs

### Not All Transparency is Good!

**Too much transparency can**:
1. **Hide important information** (e.g., network latency)
2. **Reduce performance** (e.g., consistency overhead)
3. **Increase complexity** (e.g., failure handling)
4. **Limit control** (e.g., can't optimize for specific cases)

### Finding the Right Balance

In [None]:
# Compare different transparency levels
transparency_levels = ['None', 'Partial', 'Full']
metrics = {
    'Development\nComplexity': [8, 5, 3],
    'Performance\nVisibility': [10, 6, 2],
    'System\nControl': [10, 7, 3],
    'Ease of\nUse': [3, 7, 10],
    'Maintenance\nCost': [7, 5, 8]
}

fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.ravel()

colors = ['#E63946', '#F1FAEE', '#06A77D']

for idx, (metric_name, values) in enumerate(metrics.items()):
    ax = axes[idx]
    bars = ax.bar(transparency_levels, values, color=colors)
    ax.set_ylabel('Score (0-10)', fontsize=10)
    ax.set_title(metric_name, fontsize=11, fontweight='bold')
    ax.set_ylim(0, 12)
    ax.grid(axis='y', alpha=0.3)
    
    # Add value labels
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{int(height)}',
                ha='center', va='bottom', fontsize=10, fontweight='bold')

# Summary in last subplot
axes[-1].axis('off')
summary_text = """
üéØ KEY INSIGHTS:

No Transparency:
‚Ä¢ Full control and visibility
‚Ä¢ Complex development
‚Ä¢ Better performance tuning

Partial Transparency:
‚Ä¢ Balanced approach ‚≠ê
‚Ä¢ Hide complexity, expose essentials
‚Ä¢ Most practical for real systems

Full Transparency:
‚Ä¢ Easiest for developers
‚Ä¢ Hidden performance costs
‚Ä¢ Less control over optimization
"""
axes[-1].text(0.1, 0.9, summary_text, transform=axes[-1].transAxes,
              fontsize=10, verticalalignment='top',
              bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))

plt.tight_layout()
plt.show()

print("\nüí° Best Practice: Use PARTIAL transparency")
print("   Hide unnecessary complexity, but expose important details like:")
print("   ‚Ä¢ Network latency")
print("   ‚Ä¢ Consistency guarantees")
print("   ‚Ä¢ Resource costs")

## 9. Exercises

### Exercise 1: Implement Migration Transparency

**Task**: Create a system that allows live migration of services between servers without clients noticing.

**Requirements**:
1. Implement a service that can migrate between servers
2. Maintain active connections during migration
3. Update service registry automatically
4. Clients continue working without interruption

In [None]:
class MigrationTransparentService:
    def __init__(self):
        # TODO: Implement service migration
        pass
    
    def migrate_service(self, service_name, from_server, to_server):
        # TODO: Implement live migration
        pass

# TODO: Test your implementation


### Exercise 2: Concurrency Transparency

**Task**: Implement a distributed counter that multiple processes can increment concurrently without interference.

**Requirements**:
1. Multiple threads increment counter simultaneously
2. No race conditions (use locking)
3. Final count should be correct
4. Demonstrate transparency - threads don't coordinate manually

In [None]:
import threading

class ConcurrencyTransparentCounter:
    def __init__(self):
        # TODO: Implement thread-safe counter
        self.value = 0
        # Add locking mechanism
    
    def increment(self):
        # TODO: Thread-safe increment
        pass

# TODO: Test with multiple threads


### Exercise 3: Design Transparency Levels

**Task**: For each scenario below, decide which types of transparency are needed and which should be avoided. Explain your reasoning.

**Scenarios**:

1. **Online Banking System**:
   - Which transparencies? Why?
   - Which to avoid? Why?

2. **Video Streaming Service (like Netflix)**:
   - Which transparencies? Why?
   - Which to avoid? Why?

3. **Real-time Stock Trading Platform**:
   - Which transparencies? Why?
   - Which to avoid? Why?

**Your Analysis**:

### 1. Online Banking System

**Needed Transparencies**:
- 
- 

**Avoid/Limit**:
- 
- 

**Reasoning**:


### 2. Video Streaming Service

**Needed Transparencies**:
- 
- 

**Avoid/Limit**:
- 
- 

**Reasoning**:


### 3. Stock Trading Platform

**Needed Transparencies**:
- 
- 

**Avoid/Limit**:
- 
- 

**Reasoning**:



## 10. Summary

In this module, you learned:

‚úÖ **The 8 Types of Transparency**:
   1. Access - Same interface for local/remote
   2. Location - Access without knowing location
   3. Concurrency - Shared access without interference
   4. Replication - Multiple copies hidden
   5. Failure - Faults concealed
   6. Migration - Resources can move
   7. Performance - Auto-optimization
   8. Scaling - Expand without code changes

‚úÖ **System Architectures**:
   - Workstation-Server Model
   - Processor Pool Model (Grid Computing)

‚úÖ **Transparency Trade-offs**:
   - Too much transparency hides important info
   - Partial transparency is often best
   - Balance ease-of-use with control

### Key Takeaways

1. **Transparency simplifies development** but has costs
2. **Different systems need different transparencies**
3. **Not all transparency is good** - sometimes you need visibility
4. **Architecture choice affects** which transparencies are easy to achieve

### What's Next?

In **Module 03**, we'll explore:
- Real-time Systems characteristics
- Controlling vs Controlled systems
- Time-critical requirements
- Distributed real-time systems

---

**Course**: BMCS3003 - Distributed Systems and Parallel Computing