# Module 00: Windows for Data Science - Setup & Introduction

**Difficulty**: ⭐ (Beginner)

**Estimated Time**: 45 minutes

**Prerequisites**: 
- Windows 10 or Windows 11 operating system
- Basic computer literacy
- Administrator access to your computer

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** why Windows command line skills are essential for data science
2. **Verify** your Python installation and environment
3. **Execute** basic system commands from Python
4. **Navigate** your file system using both CMD and Python
5. **Configure** your development environment for the course

## 1. Why Windows Skills Matter for Data Scientists

As a data science student, you might wonder: "Why do I need to learn Windows command line tools?"

### Real-World Scenarios

**Scenario 1: Automated Data Processing**
- You receive 100 CSV files daily that need processing
- Manual processing: 5 minutes per file = 8+ hours
- With automation: 30 seconds total

**Scenario 2: Environment Management**
- Different projects need different Python versions and libraries
- Command line skills let you switch environments in seconds
- Prevents "works on my machine" problems

**Scenario 3: Model Training**
- Training deep learning models takes hours/days
- Use task scheduling to train overnight
- Monitor GPU/CPU usage from command line

### Skills You'll Gain

1. **Efficiency**: Automate repetitive tasks
2. **Reproducibility**: Script your workflows
3. **Debugging**: Understand what's happening "under the hood"
4. **Collaboration**: Share reproducible environments
5. **Professional Skills**: Most companies use Windows, command line is universal

## 2. Setup: Verify Your Environment

Let's start by checking that everything is properly installed.

In [None]:
# Setup cell: Import required libraries
import sys
import os
import platform
import subprocess
from pathlib import Path

print("Setup complete! All imports successful.")

### 2.1 Check Python Installation

In [None]:
# Display Python version and installation details
# This helps verify your Python environment is set up correctly

python_version = sys.version
python_executable = sys.executable

print("Python Information:")
print("=" * 50)
print(f"Version: {python_version}")
print(f"Executable location: {python_executable}")
print(f"Platform: {platform.platform()}")
print(f"Architecture: {platform.architecture()[0]}")

# Verify we're on Windows
assert platform.system() == "Windows", "This notebook is designed for Windows!"
print("\n✓ Running on Windows - Good to go!")

### 2.2 Check System Information

Let's gather information about your Windows system. This is useful for understanding your environment and troubleshooting issues.

In [None]:
# Get detailed system information
# Understanding your system helps optimize your data science workflows

system_info = {
    "Operating System": platform.system(),
    "OS Version": platform.version(),
    "OS Release": platform.release(),
    "Machine Type": platform.machine(),
    "Processor": platform.processor(),
    "Computer Name": platform.node()
}

print("System Information:")
print("=" * 50)
for key, value in system_info.items():
    print(f"{key}: {value}")

## 3. Running Windows Commands from Python

One of the most powerful features is the ability to execute Windows commands directly from Python using the `subprocess` module.

### Why This Matters

- Automate system tasks within your Python scripts
- Capture command output for processing
- Integrate Windows tools into your data pipelines

### 3.1 Basic Command Execution

Let's start by running a simple Windows command: `ver` (displays Windows version).

In [None]:
# Execute Windows 'ver' command to show Windows version
# subprocess.run() is the modern way to execute system commands

result = subprocess.run(
    ['cmd', '/c', 'ver'],  # /c means "execute command and exit"
    capture_output=True,    # Capture stdout and stderr
    text=True               # Return output as string (not bytes)
)

print("Command: ver")
print("Output:")
print(result.stdout)

# Check if command succeeded (return code 0 means success)
if result.returncode == 0:
    print("✓ Command executed successfully!")
else:
    print(f"✗ Command failed with return code: {result.returncode}")

### 3.2 Capturing and Processing Command Output

Let's get the current directory using the `cd` command (with no arguments, it shows current directory).

In [None]:
# Get current directory using Windows 'cd' command
# This demonstrates capturing output for further processing

result = subprocess.run(
    ['cmd', '/c', 'cd'],
    capture_output=True,
    text=True
)

current_dir_from_cmd = result.stdout.strip()
print(f"Current directory (from CMD): {current_dir_from_cmd}")

# Compare with Python's built-in method
current_dir_from_python = os.getcwd()
print(f"Current directory (from Python): {current_dir_from_python}")

# Verify they match (they should!)
assert current_dir_from_cmd == current_dir_from_python, "Directories don't match!"
print("\n✓ Both methods return the same directory!")

### 3.3 Listing Directory Contents

Let's use the `dir` command to list files in the current directory.

In [None]:
# List files in current directory using 'dir' command
# /B flag gives bare format (just filenames, no details)

result = subprocess.run(
    ['cmd', '/c', 'dir', '/B'],
    capture_output=True,
    text=True
)

files_from_cmd = result.stdout.strip().split('\n')
print("Files in current directory (from CMD):")
for i, file in enumerate(files_from_cmd, 1):
    print(f"  {i}. {file}")

# Compare with Python's method
print("\nFiles in current directory (from Python):")
files_from_python = os.listdir('.')
for i, file in enumerate(files_from_python, 1):
    print(f"  {i}. {file}")

## 4. Working with File Paths

Understanding file paths is crucial for data science work. Python's `pathlib` module provides a modern, cross-platform way to work with paths.

### 4.1 Understanding Windows Paths

Windows paths have some unique characteristics:
- Use backslashes `\` (vs. forward slashes `/` on Linux/Mac)
- Drive letters like `C:\`
- Case-insensitive (usually)

In [None]:
# Working with paths using pathlib (recommended modern approach)
# pathlib handles OS differences automatically

# Get current working directory as a Path object
current_path = Path.cwd()
print(f"Current directory: {current_path}")
print(f"Type: {type(current_path)}")

# Path components
print(f"\nPath components:")
print(f"  Drive: {current_path.drive}")
print(f"  Parent: {current_path.parent}")
print(f"  Name: {current_path.name}")
print(f"  Parts: {current_path.parts}")

### 4.2 Creating and Joining Paths

Always use pathlib to join paths - it handles OS differences automatically.

In [None]:
# Creating paths using pathlib (cross-platform compatible)

# Method 1: Using the / operator (recommended)
data_path = Path.cwd() / "data" / "raw" / "sample.csv"
print(f"Data path (using /): {data_path}")

# Method 2: Using joinpath
results_path = Path.cwd().joinpath("results", "output.txt")
print(f"Results path (using joinpath): {results_path}")

# Method 3: From string (old way - avoid this)
# BAD: manual_path = "C:\\Users\\Name\\data\\file.csv"  # Windows only!
# GOOD: Use Path objects instead

# Check if path exists
print(f"\nDoes data path exist? {data_path.exists()}")
print(f"Does results path exist? {results_path.exists()}")

### 4.3 Working with the Data Directory Structure

Let's check the directory structure for this project.

In [None]:
# Navigate to project root (parent of notebooks directory)
# This ensures we can access data/ directory from notebooks/

notebooks_dir = Path.cwd()
project_root = notebooks_dir.parent

print(f"Current notebook directory: {notebooks_dir}")
print(f"Project root: {project_root}")

# Check for expected directories
expected_dirs = ['data', 'notebooks', 'docs', 'tests']

print("\nProject structure:")
for dir_name in expected_dirs:
    dir_path = project_root / dir_name
    exists = "✓" if dir_path.exists() else "✗"
    print(f"  {exists} {dir_name}/")

# Check data subdirectories
data_dir = project_root / "data"
if data_dir.exists():
    print("\nData subdirectories:")
    for subdir in ['raw', 'processed', 'sample']:
        subdir_path = data_dir / subdir
        exists = "✓" if subdir_path.exists() else "✗"
        print(f"  {exists} data/{subdir}/")

## 5. Environment Variables

Environment variables are key-value pairs that affect how programs run. They're crucial for:
- Finding installed programs (PATH)
- Storing configuration (API keys, database URLs)
- Controlling behavior (PYTHONPATH)

### 5.1 Reading Environment Variables

In [None]:
# Access environment variables using os.environ
# These variables control system and application behavior

# Common Windows environment variables
important_vars = [
    'USERNAME',      # Current user
    'COMPUTERNAME',  # Computer name
    'USERPROFILE',   # User home directory
    'TEMP',          # Temporary files directory
    'OS',            # Operating system name
]

print("Important Environment Variables:")
print("=" * 50)
for var in important_vars:
    value = os.environ.get(var, "Not set")
    print(f"{var}: {value}")

### 5.2 The PATH Variable

The PATH variable tells Windows where to look for executable programs. Understanding PATH is essential for troubleshooting "command not found" errors.

In [None]:
# Display the PATH variable
# PATH determines where Windows looks for executable programs

path_variable = os.environ.get('PATH', '')

# Split by semicolon (Windows path separator)
path_directories = path_variable.split(';')

print(f"PATH contains {len(path_directories)} directories:")
print("\nFirst 10 directories in PATH:")
for i, directory in enumerate(path_directories[:10], 1):
    print(f"  {i}. {directory}")

# Check if Python is in PATH
python_in_path = any('python' in dir.lower() for dir in path_directories)
if python_in_path:
    print("\n✓ Python appears to be in PATH")
else:
    print("\n⚠ Python may not be in PATH (this is sometimes OK)")

## 6. Checking Installed Tools

Let's verify that essential tools for the course are installed.

In [None]:
# Check for common data science tools
# This helps identify what you need to install

def check_command(command, display_name=None):
    """
    Check if a command is available in the system.
    
    Args:
        command: Command to check (e.g., 'python', 'git')
        display_name: Friendly name to display (defaults to command)
    
    Returns:
        bool: True if command is available, False otherwise
    """
    if display_name is None:
        display_name = command
    
    try:
        # Try to run the command with --version
        result = subprocess.run(
            [command, '--version'],
            capture_output=True,
            text=True,
            timeout=5  # Prevent hanging
        )
        
        if result.returncode == 0:
            # Get first line of output (usually version info)
            version = result.stdout.split('\n')[0].strip()
            print(f"✓ {display_name}: {version}")
            return True
        else:
            print(f"✗ {display_name}: Not found or error")
            return False
    except (subprocess.TimeoutExpired, FileNotFoundError):
        print(f"✗ {display_name}: Not installed")
        return False

# Check essential tools
print("Checking installed tools:")
print("=" * 50)

tools = [
    ('python', 'Python'),
    ('pip', 'pip (Python package manager)'),
    ('git', 'Git'),
    ('conda', 'Conda (optional)'),
    ('powershell', 'PowerShell'),
]

for command, name in tools:
    check_command(command, name)

## 7. Practice Exercises

Now it's your turn! Complete these exercises to reinforce your learning.

### Exercise 1: System Information

Write code to display the following information:
1. Your username
2. Your computer name  
3. Your user profile directory
4. The current date and time

**Hint**: Use `os.environ.get()` for environment variables and `datetime` module for date/time.

In [None]:
# Exercise 1: Your solution here
from datetime import datetime

# TODO: Get and display the required information




### Exercise 2: Directory Navigation

Using `pathlib.Path`:
1. Get the current working directory
2. Navigate to the parent directory (project root)
3. List all items in the project root
4. Count how many are files vs directories

**Hint**: Use `.is_file()` and `.is_dir()` methods.

In [None]:
# Exercise 2: Your solution here

# TODO: Complete the exercise




### Exercise 3: Running Commands

Use `subprocess.run()` to:
1. Check your Python version (command: `python --version`)
2. List all Python packages installed (command: `pip list`)
3. Count how many packages are installed

**Hint**: Parse the output string and count lines (excluding header).

In [None]:
# Exercise 3: Your solution here

# TODO: Complete the exercise




### Exercise 4: Path Creation

Create paths for the following (don't actually create the files/folders):
1. A CSV file named `sales_2024.csv` in the `data/raw/` directory
2. A text file named `results.txt` in the `data/processed/` directory
3. Check if these paths' parent directories exist

**Hint**: Use `Path.cwd().parent` to get project root, then build paths from there.

In [None]:
# Exercise 4: Your solution here

# TODO: Complete the exercise




## 8. Summary

Congratulations! You've completed Module 00. Let's recap what you learned:

### Key Concepts

1. **Why Windows Skills Matter**
   - Automation saves time
   - Essential for professional data science work
   - Most enterprises use Windows

2. **Running System Commands**
   - Use `subprocess.run()` to execute Windows commands
   - Capture and process output
   - Check return codes for success/failure

3. **Working with Paths**
   - Always use `pathlib.Path` (cross-platform)
   - Use `/` operator to join paths
   - Check existence with `.exists()`

4. **Environment Variables**
   - Access with `os.environ.get()`
   - PATH determines where programs are found
   - Useful for configuration

5. **System Information**
   - Use `platform` module for system details
   - Check installed tools programmatically
   - Verify your development environment

### What's Next?

In **Module 01: PowerShell Fundamentals**, you'll learn:
- PowerShell cmdlets and syntax
- Object-oriented command pipeline
- Working with CSV files
- Creating simple automation scripts

### Self-Assessment

Before moving on, make sure you can:
- [ ] Execute Windows commands from Python
- [ ] Work with file paths using pathlib
- [ ] Access environment variables
- [ ] Check for installed programs
- [ ] Navigate your project directory structure

### Additional Resources

- [Python subprocess documentation](https://docs.python.org/3/library/subprocess.html)
- [Python pathlib documentation](https://docs.python.org/3/library/pathlib.html)
- [Windows Command Reference](https://docs.microsoft.com/windows-server/administration/windows-commands/windows-commands)

---

**Ready for the next module?** Open `01_powershell_fundamentals.ipynb` to continue your learning journey!