# Module 02: Python-Windows Integration

**Difficulty**: ⭐⭐ (Intermediate)

**Estimated Time**: 60 minutes

**Prerequisites**: 
- Completed Modules 00-01
- Understanding of Python basics
- Familiarity with subprocess module

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Master** the subprocess module for Windows automation
2. **Manage** Python virtual environments (venv and conda)
3. **Automate** environment activation in scripts
4. **Use** psutil for cross-platform system monitoring
5. **Create** Python scripts that orchestrate Windows tasks
6. **Handle** errors and edge cases in automation

## 1. Advanced Subprocess Techniques

The `subprocess` module is your bridge between Python and Windows. Let's master it!

In [None]:
# Setup: Import required libraries
import subprocess
import sys
import os
from pathlib import Path
import json
import time

print("Setup complete!")

### 1.1 Subprocess Best Practices

In [None]:
# Best practice: Always use list for arguments (prevents injection)
# Always check return codes
# Always use timeouts for external commands

def run_command_safe(command_list, timeout=30):
    """
    Safely run a command with proper error handling.
    
    Args:
        command_list: List of command and arguments
        timeout: Maximum seconds to wait (default: 30)
    
    Returns:
        tuple: (success: bool, output: str, error: str)
    """
    try:
        result = subprocess.run(
            command_list,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=False  # Don't raise exception on non-zero exit
        )
        
        success = result.returncode == 0
        return success, result.stdout, result.stderr
        
    except subprocess.TimeoutExpired:
        return False, "", f"Command timed out after {timeout} seconds"
    except Exception as e:
        return False, "", str(e)

# Test it
success, output, error = run_command_safe(['python', '--version'])
if success:
    print(f"✓ Success: {output.strip()}")
else:
    print(f"✗ Error: {error}")

### 1.2 Running PowerShell Scripts

In [None]:
# Running PowerShell scripts with proper execution policy

def run_powershell_script(script_path, args=None):
    """
    Execute a PowerShell script file.
    
    Args:
        script_path: Path to .ps1 file
        args: Optional list of arguments
    
    Returns:
        subprocess.CompletedProcess
    """
    cmd = [
        'powershell',
        '-ExecutionPolicy', 'Bypass',  # Allow script execution
        '-File', str(script_path)
    ]
    
    if args:
        cmd.extend(args)
    
    return subprocess.run(cmd, capture_output=True, text=True)

# Example: If we have a PowerShell script
print("PowerShell script runner ready!")

## 2. Virtual Environment Management

Managing environments is crucial for data science projects. Different projects often need different package versions.

### 2.1 Working with venv (Python Built-in)

In [None]:
# Check if running in a virtual environment

def is_venv():
    """
    Check if currently running in a virtual environment.
    
    Returns:
        bool: True if in venv, False otherwise
    """
    # Check for virtual environment indicators
    return (
        hasattr(sys, 'real_prefix') or  # Old virtualenv
        (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix)  # venv
    )

if is_venv():
    print("✓ Running in virtual environment")
    print(f"  Environment: {sys.prefix}")
else:
    print("⚠ Not running in virtual environment")
    print(f"  Using system Python: {sys.prefix}")

# Display Python executable
print(f"\nPython executable: {sys.executable}")

### 2.2 Automating venv Creation

In [None]:
# Script to create and setup a virtual environment

def create_venv(env_name, packages=None):
    """
    Create a virtual environment and install packages.
    
    Args:
        env_name: Name of the environment
        packages: List of packages to install (optional)
    
    Returns:
        bool: True if successful
    """
    env_path = Path.cwd().parent / env_name
    
    # Create venv
    print(f"Creating virtual environment: {env_path}")
    result = subprocess.run(
        [sys.executable, '-m', 'venv', str(env_path)],
        capture_output=True,
        text=True
    )
    
    if result.returncode != 0:
        print(f"✗ Failed to create venv: {result.stderr}")
        return False
    
    print(f"✓ Virtual environment created")
    
    # Install packages if specified
    if packages:
        pip_path = env_path / 'Scripts' / 'pip.exe'
        print(f"\nInstalling packages: {', '.join(packages)}")
        
        for package in packages:
            result = subprocess.run(
                [str(pip_path), 'install', package],
                capture_output=True,
                text=True
            )
            
            if result.returncode == 0:
                print(f"  ✓ Installed {package}")
            else:
                print(f"  ✗ Failed to install {package}")
    
    return True

# Example usage (commented out to prevent execution)
# create_venv('test_env', ['numpy', 'pandas'])
print("Virtual environment helper ready!")

### 2.3 Conda Environment Management

In [None]:
# Check if conda is available

def check_conda():
    """
    Check if conda is installed and available.
    
    Returns:
        bool: True if conda is available
    """
    try:
        result = subprocess.run(
            ['conda', '--version'],
            capture_output=True,
            text=True,
            timeout=5
        )
        return result.returncode == 0
    except:
        return False

conda_available = check_conda()
if conda_available:
    print("✓ Conda is available")
    
    # Get conda info
    result = subprocess.run(
        ['conda', 'info', '--json'],
        capture_output=True,
        text=True
    )
    
    if result.returncode == 0:
        info = json.loads(result.stdout)
        print(f"  Conda version: {info.get('conda_version', 'Unknown')}")
        print(f"  Python version: {info.get('python_version', 'Unknown')}")
        print(f"  Active environment: {info.get('active_prefix', 'base')}")
else:
    print("⚠ Conda not available (this is OK if you don't use it)")

## 3. System Monitoring with psutil

`psutil` is a cross-platform library for system and process monitoring. Essential for monitoring ML training jobs!

In [None]:
# Install psutil if not available
try:
    import psutil
    print("✓ psutil is installed")
except ImportError:
    print("Installing psutil...")
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'psutil', '-q'])
    import psutil
    print("✓ psutil installed successfully")

### 3.1 System Resource Monitoring

In [None]:
# Get system resource usage
# Useful for monitoring during data processing or model training

def get_system_stats():
    """
    Get current system resource usage.
    
    Returns:
        dict: System statistics
    """
    # CPU usage
    cpu_percent = psutil.cpu_percent(interval=1)
    cpu_count = psutil.cpu_count()
    
    # Memory usage
    mem = psutil.virtual_memory()
    
    # Disk usage
    disk = psutil.disk_usage('C:\\')
    
    return {
        'cpu_percent': cpu_percent,
        'cpu_count': cpu_count,
        'memory_total_gb': mem.total / (1024**3),
        'memory_available_gb': mem.available / (1024**3),
        'memory_percent': mem.percent,
        'disk_total_gb': disk.total / (1024**3),
        'disk_free_gb': disk.free / (1024**3),
        'disk_percent': disk.percent
    }

# Display system statistics
stats = get_system_stats()

print("System Resources:")
print("=" * 50)
print(f"CPU:")
print(f"  Cores: {stats['cpu_count']}")
print(f"  Usage: {stats['cpu_percent']}%")
print(f"\nMemory:")
print(f"  Total: {stats['memory_total_gb']:.1f} GB")
print(f"  Available: {stats['memory_available_gb']:.1f} GB")
print(f"  Usage: {stats['memory_percent']}%")
print(f"\nDisk (C:):")
print(f"  Total: {stats['disk_total_gb']:.1f} GB")
print(f"  Free: {stats['disk_free_gb']:.1f} GB")
print(f"  Usage: {stats['disk_percent']}%")

### 3.2 Process Management

In [None]:
# Find and monitor Python processes
# Useful for tracking your data science jobs

def find_python_processes():
    """
    Find all running Python processes.
    
    Returns:
        list: List of dicts with process info
    """
    python_procs = []
    
    for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_info']):
        try:
            if 'python' in proc.info['name'].lower():
                python_procs.append({
                    'pid': proc.info['pid'],
                    'name': proc.info['name'],
                    'cpu_percent': proc.info['cpu_percent'],
                    'memory_mb': proc.info['memory_info'].rss / (1024**2)
                })
        except (psutil.NoSuchProcess, psutil.AccessDenied):
            pass
    
    return python_procs

# Display Python processes
python_procs = find_python_processes()

print(f"Found {len(python_procs)} Python process(es):")
print("=" * 50)
for proc in python_procs[:5]:  # Show first 5
    print(f"PID {proc['pid']}: {proc['name']}")
    print(f"  Memory: {proc['memory_mb']:.1f} MB")
    print()

## 4. Practical Automation Examples

### 4.1 Environment Setup Script

In [None]:
# Create a complete environment setup script
# This is what you'd use at the start of a new project

setup_script = '''
#!/usr/bin/env python
"""Setup script for data science project."""

import subprocess
import sys
from pathlib import Path

def setup_project():
    """Setup complete data science project environment."""
    print("Setting up data science environment...")
    
    # 1. Create directory structure
    dirs = ['data/raw', 'data/processed', 'notebooks', 'src', 'tests']
    for dir_path in dirs:
        Path(dir_path).mkdir(parents=True, exist_ok=True)
    print("✓ Directory structure created")
    
    # 2. Create virtual environment
    venv_path = Path('venv')
    if not venv_path.exists():
        subprocess.run([sys.executable, '-m', 'venv', 'venv'])
        print("✓ Virtual environment created")
    
    # 3. Install core packages
    pip_exe = 'venv\\Scripts\\pip.exe'
    packages = ['numpy', 'pandas', 'matplotlib', 'jupyter']
    
    for pkg in packages:
        subprocess.run([pip_exe, 'install', pkg, '-q'])
    print("✓ Core packages installed")
    
    # 4. Create requirements.txt
    subprocess.run([pip_exe, 'freeze'], stdout=open('requirements.txt', 'w'))
    print("✓ requirements.txt created")
    
    print("\nSetup complete! Activate with: venv\\Scripts\\activate")

if __name__ == '__main__':
    setup_project()
'''

# Save the script
script_path = Path.cwd().parent / 'data' / 'sample' / 'setup_project.py'
script_path.write_text(setup_script)
print(f"Created setup script: {script_path}")
print("\nTo use it: python setup_project.py")

### 4.2 Resource Monitor for ML Training

In [None]:
# Create a resource monitoring script for ML training
# This runs alongside your training and logs resource usage

monitor_script = '''
#!/usr/bin/env python
"""Monitor system resources during ML training."""

import psutil
import time
import csv
from datetime import datetime

def monitor_resources(duration_minutes=60, interval_seconds=5, output_file='resource_log.csv'):
    """Monitor and log system resources."""
    print(f"Monitoring resources for {duration_minutes} minutes...")
    print(f"Logging to: {output_file}")
    
    # Open log file
    with open(output_file, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['timestamp', 'cpu_percent', 'memory_percent', 'memory_gb'])
        
        end_time = time.time() + (duration_minutes * 60)
        
        while time.time() < end_time:
            # Get current stats
            cpu = psutil.cpu_percent(interval=1)
            mem = psutil.virtual_memory()
            
            # Log to file
            writer.writerow([
                datetime.now().isoformat(),
                cpu,
                mem.percent,
                mem.used / (1024**3)  # GB
            ])
            f.flush()  # Ensure data is written
            
            # Display progress
            print(f"\rCPU: {cpu:5.1f}% | Memory: {mem.percent:5.1f}%", end='')
            
            time.sleep(interval_seconds)
    
    print(f"\n\nMonitoring complete! Log saved to {output_file}")

if __name__ == '__main__':
    monitor_resources(duration_minutes=5, interval_seconds=2)
'''

# Save the monitor script
monitor_path = Path.cwd().parent / 'data' / 'sample' / 'monitor_resources.py'
monitor_path.write_text(monitor_script)
print(f"Created resource monitor: {monitor_path}")
print("\nRun alongside training: python monitor_resources.py")

## 5. Practice Exercises

### Exercise 1: Safe Command Runner

Improve the `run_command_safe()` function to:
1. Log all commands to a file
2. Retry failed commands up to 3 times
3. Return execution time

**Hint**: Use `time.time()` for timing and a loop for retries

In [None]:
# Exercise 1: Your solution here

# TODO: Enhance the run_command_safe function



### Exercise 2: Environment Inspector

Create a function that:
1. Checks if running in venv or conda
2. Lists all installed packages
3. Identifies packages not in requirements.txt
4. Suggests packages to add/remove

**Hint**: Use `pip list` and compare with requirements.txt

In [None]:
# Exercise 2: Your solution here

# TODO: Create environment inspector



### Exercise 3: Resource Alert System

Create a monitoring script that:
1. Continuously monitors CPU and memory
2. Sends alert if CPU > 90% for 30 seconds
3. Sends alert if memory > 95%
4. Logs all alerts to a file

**Hint**: Use psutil in a loop with sleep intervals

In [None]:
# Exercise 3: Your solution here

# TODO: Create resource alert system



## 6. Summary

### Key Concepts

1. **Subprocess Mastery**
   - Use lists for command arguments (security)
   - Always set timeouts
   - Check return codes
   - Handle exceptions properly

2. **Environment Management**
   - Check if in venv: `sys.prefix != sys.base_prefix`
   - Create venv: `python -m venv env_name`
   - Conda provides better package management for data science

3. **System Monitoring**
   - psutil is cross-platform
   - Monitor CPU, memory, disk, processes
   - Essential for ML training monitoring

4. **Automation Patterns**
   - Setup scripts for new projects
   - Resource monitors for long-running jobs
   - Error handling and logging

### What's Next?

In **Module 03: File System Operations**, you'll learn:
- Advanced pathlib usage
- Batch file operations
- File watching and monitoring
- Archive and compression

### Self-Assessment

Before moving on, make sure you can:
- [ ] Run commands safely with subprocess
- [ ] Create and manage virtual environments
- [ ] Monitor system resources with psutil
- [ ] Handle errors in automation scripts
- [ ] Create reusable automation utilities

---

**Continue to Module 03** when ready!