# Setup - TinyTorch System Configuration

Welcome to TinyTorch! This module configures your development environment and establishes professional ML engineering practices.

## Learning Goals
- Configure personal developer identification for your TinyTorch installation
- Query system information for hardware-aware ML development
- Master the NBGrader workflow: implement → test → export
- Build functions that integrate into your tinytorch package

## Why Configuration Matters in ML Systems
Every production ML system needs proper configuration:
- **Developer attribution**: Professional identification and contact info
- **System awareness**: Understanding hardware limitations and capabilities
- **Reproducibility**: Documenting exact environment for experiment tracking
- **Debugging support**: System specs help troubleshoot performance issues

You'll learn to build ML systems that understand their environment and identify their creators.

In [None]:
#| default_exp core.setup

#| export
import sys
import platform
import psutil
import os
from typing import Dict, Any

In [None]:
print("🔥 TinyTorch Setup Module")
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
print(f"Platform: {platform.system()}")
print("Ready to configure your TinyTorch installation!\n")

# Display configuration workflow
print("Configuration Workflow:")
print("Personal Information → System Information → Complete")
print("")

## Personal Information Configuration

### The 5 C's Framework
Before we implement, let's understand what we're building through our 5 C's approach:

#### Concept

What is Personal Information Configuration?
Personal information identifies you as the creator of ML systems. Every professional system needs proper attribution - just like Git commits have author info, your TinyTorch installation needs your identity.

#### Code Structure

What We're Building:
```python
def personal_info() -> Dict[str, str]:     # Returns developer identity
    return {                               # Dictionary with required fields
        'developer': 'Your Name',         # Your actual name
        'email': 'your@domain.com',       # Contact information
        'institution': 'Your Place',      # Affiliation
        'system_name': 'YourName-Dev',    # Unique system identifier
        'version': '1.0.0'                # Configuration version
    }
```

#### Connections

Real-World Equivalents:
- **Git commits**: Author name and email in every commit
- **Docker images**: Maintainer information in container metadata
- **Python packages**: Author info in setup.py and pyproject.toml
- **ML model cards**: Creator information for model attribution

#### Constraints

Key Implementation Requirements:
- Use your actual information (not placeholder text)
- Email must contain @ and domain
- System name should be unique and descriptive
- All values must be strings, keep version as '1.0.0'

#### Context

**You're establishing your professional identity in the ML systems world.**

In [None]:
#| export
def personal_info() -> Dict[str, str]:
    """
    Return personal information for this TinyTorch installation.
    
    This function configures your personal TinyTorch installation with your identity.
    It's the foundation of proper ML engineering practices - every system needs
    to know who built it and how to contact them.
    
    TODO: Implement personal information configuration.
    
    STEP-BY-STEP IMPLEMENTATION:
    1. Create a dictionary with your personal details
    2. Include all required keys: developer, email, institution, system_name, version
    3. Use your actual information (not placeholder text)
    4. Make system_name unique and descriptive
    5. Keep version as '1.0.0' for now
    
    Returns:
        Dict[str, str]: Personal configuration with developer identity
    """
    ### BEGIN SOLUTION
    return {
        'developer': 'Student Name',
        'email': 'student@university.edu',
        'institution': 'University Name',
        'system_name': 'StudentName-TinyTorch-Dev',
        'version': '1.0.0'
    }
    ### END SOLUTION

# Test and validate the personal_info function
def test_personal_info_comprehensive():
    """Comprehensive test for personal_info function."""
    print("🔬 Testing Personal Information Configuration...")
    
    # Test personal_info function
    personal = personal_info()
    
    # Test return type
    assert isinstance(personal, dict), "personal_info should return a dictionary"
    
    # Test required keys
    required_keys = ['developer', 'email', 'institution', 'system_name', 'version']
    for key in required_keys:
        assert key in personal, f"Dictionary should have '{key}' key"
    
    # Test non-empty values
    for key, value in personal.items():
        assert isinstance(value, str), f"Value for '{key}' should be a string"
        assert len(value) > 0, f"Value for '{key}' cannot be empty"
    
    # Test email format
    assert '@' in personal['email'], "Email should contain @ symbol"
    assert '.' in personal['email'], "Email should contain domain"
    
    # Test version format
    assert personal['version'] == '1.0.0', "Version should be '1.0.0'"
    
    # Test system name (should be unique/personalized)
    assert len(personal['system_name']) > 5, "System name should be descriptive"
    
    print("✅ All personal info tests passed!")
    print(f"✅ TinyTorch configured for: {personal['developer']}")
    print(f"✅ Contact: {personal['email']}")
    print(f"✅ System: {personal['system_name']}")
    return personal

# Run comprehensive test and display results
personal_config = test_personal_info_comprehensive()
print("\n" + "="*50)
print("✅ Personal Information Configuration COMPLETE")
print("="*50)

## System Information Collection

### The 5 C's Framework
Before we implement, let's understand what we're building through our 5 C's approach:

#### Concept

What is System Information Collection?
System information detection provides hardware and software specs that ML systems need for performance optimization. Think computer specifications for gaming - ML needs to know what resources are available.

#### Code Structure

What We're Building:
```python
def system_info() -> Dict[str, Any]:       # Queries system specs
    return {                               # Hardware/software details
        'python_version': '3.9.7',        # Python compatibility
        'platform': 'Darwin',             # Operating system
        'architecture': 'arm64',          # CPU architecture
        'cpu_count': 8,                   # Parallel processing cores
        'memory_gb': 16.0                 # Available RAM in GB
    }
```

#### Connections

Real-World Equivalents:
- **PyTorch**: `torch.get_num_threads()` uses CPU count for optimization
- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware
- **Scikit-learn**: `n_jobs=-1` uses all available CPU cores
- **MLflow**: Documents system environment for experiment reproducibility

#### Constraints

Key Implementation Requirements:
- Use actual system queries (not hardcoded values)
- Convert memory from bytes to GB for readability
- Round memory to 1 decimal place for clean output
- Return proper data types (strings, int, float)

#### Context

**You're building ML systems that adapt intelligently to their hardware environment.**

In [None]:
#| export
def system_info() -> Dict[str, Any]:
    """
    Query and return system information for this TinyTorch installation.
    
    This function gathers crucial hardware and software information that affects
    ML performance, compatibility, and debugging. It's the foundation of 
    hardware-aware ML systems.
    
    TODO: Implement system information queries.
    
    STEP-BY-STEP IMPLEMENTATION:
    1. Get Python version using sys.version_info
    2. Get platform using platform.system()
    3. Get architecture using platform.machine()
    4. Get CPU count using psutil.cpu_count()
    5. Get memory using psutil.virtual_memory().total
    6. Convert memory from bytes to GB (divide by 1024^3)
    7. Return all information in a dictionary
    
    EXAMPLE OUTPUT:
    {
        'python_version': '3.9.7',
        'platform': 'Darwin', 
        'architecture': 'arm64',
        'cpu_count': 8,
        'memory_gb': 16.0
    }
    
    IMPLEMENTATION HINTS:
    - Use f-string formatting for Python version: f"{major}.{minor}.{micro}"
    - Memory conversion: bytes / (1024^3) = GB
    - Round memory to 1 decimal place for readability
    - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)
    
    LEARNING CONNECTIONS:
    - This is like `torch.cuda.is_available()` in PyTorch
    - Similar to system info in MLflow experiment tracking
    - Parallels hardware detection in TensorFlow
    - Foundation for performance optimization in ML systems
    
    PERFORMANCE IMPLICATIONS:
    - cpu_count affects parallel processing capabilities
    - memory_gb determines maximum model and batch sizes
    - platform affects file system and process management
    - architecture influences numerical precision and optimization
    """
    ### BEGIN SOLUTION
    # Get Python version
    version_info = sys.version_info
    python_version = f"{version_info.major}.{version_info.minor}.{version_info.micro}"
    
    # Get platform information
    platform_name = platform.system()
    architecture = platform.machine()
    
    # Get CPU information
    cpu_count = psutil.cpu_count()
    
    # Get memory information (convert bytes to GB)
    memory_bytes = psutil.virtual_memory().total
    memory_gb = round(memory_bytes / (1024**3), 1)
    
    return {
        'python_version': python_version,
        'platform': platform_name,
        'architecture': architecture,
        'cpu_count': cpu_count,
        'memory_gb': memory_gb
    }
    ### END SOLUTION

### 🧪 Unit Test: System Information Query

This test validates your `system_info()` function implementation, ensuring it accurately detects and reports hardware and software specifications for performance optimization and debugging.

In [None]:
def test_unit_system_info_basic():
    """Test system_info function implementation."""
    print("🔬 Unit Test: System Information...")
    
    # Test system_info function
    sys_info = system_info()
    
    # Test return type
    assert isinstance(sys_info, dict), "system_info should return a dictionary"
    
    # Test required keys
    required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']
    for key in required_keys:
        assert key in sys_info, f"Dictionary should have '{key}' key"
    
    # Test data types
    assert isinstance(sys_info['python_version'], str), "python_version should be string"
    assert isinstance(sys_info['platform'], str), "platform should be string"
    assert isinstance(sys_info['architecture'], str), "architecture should be string"
    assert isinstance(sys_info['cpu_count'], int), "cpu_count should be integer"
    assert isinstance(sys_info['memory_gb'], (int, float)), "memory_gb should be number"
    
    # Test reasonable values
    assert sys_info['cpu_count'] > 0, "CPU count should be positive"
    assert sys_info['memory_gb'] > 0, "Memory should be positive"
    assert len(sys_info['python_version']) > 0, "Python version should not be empty"
    
    # Test that values are actually queried (not hardcoded)
    actual_version = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
    assert sys_info['python_version'] == actual_version, "Python version should match actual system"
    
    print("✅ System info function tests passed!")
    print(f"✅ Python: {sys_info['python_version']} on {sys_info['platform']}")

# Run the test
test_unit_system_info_basic()

## 🖥️ ML Systems Foundation: Hardware Awareness & Resource Planning

Now that you've implemented basic system detection, let's build **ML systems engineering intuition**. This section introduces you to thinking like an ML systems engineer - understanding how hardware affects ML capabilities and learning to plan resources for real-world deployments.

### **Learning Outcome**: *"I understand how my hardware affects ML capabilities"*

---

## Systems Analysis Tools (Review & Understand)

As an ML systems engineer, you need tools to analyze hardware capabilities and estimate what models you can realistically run. Below are professional-grade analysis tools - **your job is to run them, understand the output, and develop intuition** about hardware-ML relationships.

In [None]:
#| export
import time
import numpy as np

class MLSystemAnalyzer:
    """
    Professional ML systems analysis toolkit.
    
    This class provides tools to analyze hardware capabilities and estimate
    what ML workloads your system can handle. Used by ML engineers to plan
    deployments and understand system limitations.
    """
    
    def __init__(self):
        self.sys_info = system_info()
        self.analysis_cache = {}
        
    def analyze_ml_capabilities(self):
        """
        Analyze this system's ML capabilities and provide professional estimates.
        
        Returns comprehensive analysis of what this hardware can handle for ML workloads.
        Based on industry rules of thumb and production experience.
        """
        memory_gb = self.sys_info['memory_gb']
        cpu_cores = self.sys_info['cpu_count']
        
        # Industry rule of thumb: 1M parameters ≈ 4MB memory (float32)
        # Conservative estimate accounts for gradients, optimizer states
        max_model_params = int(memory_gb * 1_000_000 * 0.25)  # 25% of memory for model
        
        # Batch size recommendations based on memory
        small_batch = max(1, int(memory_gb * 2))      # 2 samples per GB
        medium_batch = max(1, int(memory_gb * 8))     # 8 samples per GB  
        large_batch = max(1, int(memory_gb * 32))     # 32 samples per GB
        
        # Training time estimates (very rough)
        epochs_per_hour_estimate = max(1, cpu_cores * 2)
        
        analysis = {
            'system_class': self._classify_system(memory_gb, cpu_cores),
            'max_model_parameters': max_model_params,
            'recommended_batch_sizes': {
                'conservative': small_batch,
                'balanced': medium_batch, 
                'aggressive': large_batch
            },
            'estimated_training_speed': f"~{epochs_per_hour_estimate} epochs/hour",
            'memory_allocation': {
                'model_weights': f"{memory_gb * 0.25:.1f} GB",
                'gradients': f"{memory_gb * 0.25:.1f} GB", 
                'optimizer_state': f"{memory_gb * 0.25:.1f} GB",
                'system_overhead': f"{memory_gb * 0.25:.1f} GB"
            },
            'production_readiness': self._assess_production_readiness(memory_gb, cpu_cores)
        }
        
        return analysis
    
    def compare_with_famous_models(self):
        """
        Compare your system with requirements for famous ML models.
        
        Helps students understand what they can realistically run and what
        requires cloud resources or specialized hardware.
        """
        max_params = self.analyze_ml_capabilities()['max_model_parameters']
        
        famous_models = {
            'Tiny Model (Educational)': {
                'parameters': 100_000,
                'memory_needed_gb': 0.4,
                'example': 'Simple MNIST classifier'
            },
            'Small Model (Prototype)': {
                'parameters': 1_000_000,
                'memory_needed_gb': 4,
                'example': 'Small ResNet, basic NLP model'
            },
            'Medium Model (Research)': {
                'parameters': 10_000_000,
                'memory_needed_gb': 40,
                'example': 'ResNet-50, small transformer'
            },
            'Large Model (Production)': {
                'parameters': 100_000_000,
                'memory_needed_gb': 400,
                'example': 'Large transformer, computer vision production'
            },
            'GPT-3 (175B)': {
                'parameters': 175_000_000_000,
                'memory_needed_gb': 700_000,  # 700 TB!
                'example': 'OpenAI GPT-3, requires massive clusters'
            },
            'GPT-4 (Estimated 1.8T)': {
                'parameters': 1_800_000_000_000,
                'memory_needed_gb': 7_200_000,  # 7.2 PB!
                'example': 'OpenAI GPT-4, cutting-edge research'
            }
        }
        
        analysis = {}
        for model_name, specs in famous_models.items():
            can_run = specs['parameters'] <= max_params
            analysis[model_name] = {
                'can_run': can_run,
                'parameters': f"{specs['parameters']:,}",
                'memory_needed': f"{specs['memory_needed_gb']:,.1f} GB",
                'example': specs['example'],
                'verdict': '✅ Can run' if can_run else '❌ Need cloud/cluster'
            }
            
        return analysis
    
    def estimate_cloud_costs(self, model_parameters, training_hours=24):
        """
        Estimate cloud costs for training models that don't fit on local hardware.
        
        Helps students understand the economics of ML systems and why optimization matters.
        """
        memory_needed_gb = model_parameters * 4 / (1024**3)  # Convert to GB
        
        # Rough AWS pricing (changes frequently, this is educational)
        if memory_needed_gb <= 32:
            instance_type = "m5.2xlarge"
            cost_per_hour = 0.384
        elif memory_needed_gb <= 64:
            instance_type = "m5.4xlarge" 
            cost_per_hour = 0.768
        elif memory_needed_gb <= 128:
            instance_type = "m5.8xlarge"
            cost_per_hour = 1.536
        else:
            instance_type = "p3.8xlarge (GPU cluster)"
            cost_per_hour = 12.24
            
        total_cost = cost_per_hour * training_hours
        
        return {
            'recommended_instance': instance_type,
            'cost_per_hour': f"${cost_per_hour:.2f}",
            'total_cost_24h': f"${total_cost:.2f}",
            'memory_needed': f"{memory_needed_gb:.1f} GB",
            'cost_comparison': f"{total_cost/100:.1f}x more than local development"
        }
    
    def _classify_system(self, memory_gb, cpu_cores):
        """Classify the system type for ML workloads."""
        if memory_gb >= 64 and cpu_cores >= 16:
            return "High-end workstation"
        elif memory_gb >= 16 and cpu_cores >= 8:
            return "Development machine"
        elif memory_gb >= 8 and cpu_cores >= 4:
            return "Basic laptop"
        else:
            return "Limited system"
            
    def _assess_production_readiness(self, memory_gb, cpu_cores):
        """Assess if system is suitable for different types of ML work."""
        if memory_gb >= 32:
            return "Production prototyping capable"
        elif memory_gb >= 16:
            return "Research and development ready"
        elif memory_gb >= 8:
            return "Educational and small experiments"
        else:
            return "Very limited ML capabilities"

### 🎯 Learning Activity 1: Hardware Discovery (Review & Understand)

**Goal**: Understand your system's ML capabilities and develop hardware intuition.

Run the systems analysis tools below and **interpret the results**. Your job is to understand what the numbers mean for ML development.

In [None]:
# Initialize the ML systems analyzer
analyzer = MLSystemAnalyzer()

# Analyze your system's ML capabilities  
print("🖥️  ML SYSTEMS ANALYSIS: Your Hardware Capabilities")
print("=" * 60)

capabilities = analyzer.analyze_ml_capabilities()

print(f"🏷️  System Classification: {capabilities['system_class']}")
print(f"🧠  Max Model Parameters: {capabilities['max_model_parameters']:,}")
print(f"⚡  Production Readiness: {capabilities['production_readiness']}")
print(f"🏃  Estimated Training Speed: {capabilities['estimated_training_speed']}")

print(f"\n📊 Recommended Batch Sizes:")
for level, size in capabilities['recommended_batch_sizes'].items():
    print(f"   {level.capitalize()}: {size}")

print(f"\n💾 Memory Allocation Breakdown:")
for component, allocation in capabilities['memory_allocation'].items():
    print(f"   {component.replace('_', ' ').title()}: {allocation}")

print("\n" + "=" * 60)
print("💡 SYSTEMS INSIGHT: These numbers determine what you can realistically build!")
print("   - Model parameters directly affect memory usage")  
print("   - Batch size affects training speed and memory")
print("   - Your hardware constrains your ML possibilities")

### 🎯 Learning Activity 2: Compare with Famous Models (Review & Understand)

**Goal**: Understand how your system compares to real-world ML models and why cloud computing matters.

In [None]:
# Compare your system with famous ML models
print("🌟 FAMOUS MODEL COMPARISON: What Can You Run?")
print("=" * 60)

model_comparison = analyzer.compare_with_famous_models()

for model_name, specs in model_comparison.items():
    print(f"\n📱 {model_name}")
    print(f"   Parameters: {specs['parameters']}")
    print(f"   Memory Needed: {specs['memory_needed']}")
    print(f"   Example: {specs['example']}")
    print(f"   {specs['verdict']}")

print("\n" + "=" * 60)
print("💡 SYSTEMS INSIGHT: Notice the massive jump from research to production models!")
print("   - Your laptop: Good for learning and small experiments")
print("   - Production models: Require massive cloud infrastructure")  
print("   - GPT-3/GPT-4: Need entire data centers!")

# Show cloud cost estimates for a model that doesn't fit
print(f"\n💰 CLOUD COST EXAMPLE: Training a 100M parameter model")
cost_analysis = analyzer.estimate_cloud_costs(100_000_000, 24)
print(f"   Recommended Instance: {cost_analysis['recommended_instance']}")
print(f"   Cost per Hour: {cost_analysis['cost_per_hour']}")
print(f"   24-Hour Training Cost: {cost_analysis['total_cost_24h']}")
print(f"   Memory Required: {cost_analysis['memory_needed']}")

print(f"\n🎯 KEY TAKEAWAY: This is why ML engineers optimize for efficiency!")
print(f"   Every parameter costs money in production 💸")

## Module Summary: TinyTorch Setup Complete

Congratulations! You've successfully configured your TinyTorch development environment and established professional ML engineering practices.

### What You've Accomplished
✅ **Personal Configuration**: Established developer identity and system attribution  
✅ **System Information**: Built hardware-aware ML system foundation  
✅ **Testing Integration**: Implemented comprehensive validation for both functions  
✅ **Professional Workflow**: Mastered NBGrader solution blocks and testing  

Your TinyTorch installation is now properly configured with:
- **Developer attribution** for professional collaboration
- **Hardware detection** for performance optimization
- **Tested functions** ready for package integration

### Key ML Systems Concepts Learned
- **Configuration management**: Professional setup and attribution standards
- **Hardware awareness**: System specs affect ML performance and capabilities
- **Testing practices**: Comprehensive validation ensures reliability
- **Package development**: Functions become part of production codebase

### Next Steps
1. **Export your work**: Use `tito module export 01_setup` to integrate with TinyTorch
2. **Verify integration**: Test that your functions work in the tinytorch package
3. **Ready for tensors**: Move on to building the fundamental ML data structure

**You've built the foundation - now let's construct the ML system on top of it!**

## 🤔 ML Systems Thinking: Reflection Questions

Now that you've built system configuration tools, reflect on how this foundation connects to production ML systems:

### System Design - How does this fit into larger systems?
1. **Identity and Attribution**: Your `personal_info()` function establishes developer identity. How does proper attribution become crucial when ML teams collaborate on models that affect millions of users? What happens when models misbehave and you need to trace accountability?

2. **Environment Reproducibility**: Your `system_info()` captures hardware specs automatically. When researchers publish papers claiming breakthrough results, why is documenting the exact environment (CPU, memory, Python version) essential for reproducibility? How does this connect to the "replication crisis" in AI research?

3. **Hardware-Aware Development**: Your function detects CPU count and memory. How do modern ML frameworks like PyTorch use this information to automatically parallelize computations? Why might the same model code behave differently on a laptop vs. a cloud instance?

### Production ML - How is this used in real ML workflows?
4. **Configuration Management**: Your personal configuration mirrors how production systems identify model creators. How do companies like Netflix or Spotify track which data scientist trained which recommendation model when debugging performance issues?

5. **Resource Planning**: Your memory detection helps understand system limits. When deploying large language models in production, how do resource constraints influence architectural decisions? Why might a 16GB system require different serving strategies than a 128GB system?

6. **Version Control Integration**: Your system fingerprinting resembles Git's commit metadata. How does proper environment documentation help when a model trained 6 months ago suddenly needs retraining with updated data?

### Framework Design - Why do frameworks make certain choices?
7. **Abstraction Layers**: Your simple functions hide OS complexity. How do frameworks like PyTorch abstract hardware differences so the same neural network code runs on CPUs, GPUs, and TPUs without modification?

8. **Metadata Standards**: Your configuration dictionary structure mirrors industry practices. Why do frameworks invest heavily in standardized metadata formats for model cards, experiment tracking, and deployment manifests?

9. **Development Ergonomics**: Your functions provide clean APIs for system queries. How does good developer experience in configuration tools ripple through to faster experimentation and more reliable model development?

### Performance & Scale - What happens when systems get large?
10. **Distributed Configuration**: Your single-machine setup works locally. How does configuration management change when training large models across hundreds of machines in data centers? What new challenges emerge?

11. **Resource Optimization**: Your memory detection helps with local planning. How do cloud providers like AWS optimize resource allocation when thousands of researchers are training models simultaneously? What role does configuration metadata play?

12. **System Monitoring**: Your hardware queries provide snapshots. How do production ML systems use continuous monitoring of CPU, memory, and GPU utilization to automatically scale training jobs and serving infrastructure?

**💡 Systems Insight**: The simple configuration functions you built are the DNA of ML systems—every production deployment, research experiment, and model serving instance needs to know "who built this, where is it running, and what resources are available." This metadata becomes critical when things go wrong or when scaling to millions of users.