# Module 01: PowerShell Fundamentals for Data Science

**Difficulty**: ⭐⭐ (Intermediate)

**Estimated Time**: 60 minutes

**Prerequisites**: 
- Completed Module 00: Setup & Introduction
- Basic understanding of command line concepts
- Python subprocess module familiarity

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** the difference between CMD and PowerShell
2. **Execute** PowerShell commands from Python
3. **Use** PowerShell cmdlets for file operations
4. **Work with** PowerShell objects and pipelines
5. **Process** CSV files using PowerShell
6. **Create** simple PowerShell scripts for automation

## 1. PowerShell vs CMD: Why PowerShell Matters

### CMD (Command Prompt)
- Legacy command line interface
- Text-based input/output
- Limited scripting capabilities
- Commands like: `dir`, `copy`, `del`

### PowerShell
- Modern, object-oriented shell
- Commands pass **objects**, not just text
- Full scripting language (.NET integration)
- Cross-platform (PowerShell Core)
- Cmdlets like: `Get-ChildItem`, `Copy-Item`, `Remove-Item`

### Real-World Example

**Task**: Get files larger than 10MB

**CMD approach**: Complex text parsing with findstr
```batch
dir /s | findstr /R "[0-9][0-9],[0-9][0-9][0-9],[0-9][0-9][0-9]"
```

**PowerShell approach**: Simple object filtering
```powershell
Get-ChildItem -Recurse | Where-Object {$_.Length -gt 10MB}
```

PowerShell is **easier, more powerful, and more readable**!

## 2. Setup: Running PowerShell from Python

In [None]:
# Setup cell: Import required libraries
import subprocess
import json
from pathlib import Path
import pandas as pd

print("Setup complete!")

In [None]:
# Helper function to run PowerShell commands
# We'll use this throughout the notebook

def run_powershell(command, capture_output=True):
    """
    Execute a PowerShell command and return the result.
    
    Args:
        command: PowerShell command to execute (string)
        capture_output: Whether to capture output (default: True)
    
    Returns:
        subprocess.CompletedProcess object with stdout, stderr, returncode
    """
    result = subprocess.run(
        ['powershell', '-Command', command],
        capture_output=capture_output,
        text=True,
        encoding='utf-8'
    )
    return result

# Test the function
result = run_powershell('Write-Output "Hello from PowerShell!"')
print(result.stdout.strip())
print("\n✓ PowerShell helper function ready!")

## 3. PowerShell Basics: Cmdlets

### What are Cmdlets?

Cmdlets (command-lets) are PowerShell commands that follow a **Verb-Noun** naming convention:
- `Get-Process` - Get information about processes
- `Set-Location` - Change directory
- `Copy-Item` - Copy files/folders
- `Remove-Item` - Delete files/folders

This makes PowerShell **self-documenting** - you can guess what a cmdlet does from its name!

### 3.1 Getting PowerShell Version

In [None]:
# Check PowerShell version
# This helps ensure compatibility with modern features

result = run_powershell('$PSVersionTable.PSVersion')
print("PowerShell Version:")
print(result.stdout)

# Parse version number
result_json = run_powershell('$PSVersionTable.PSVersion | ConvertTo-Json')
version_info = json.loads(result_json.stdout)

major_version = version_info.get('Major', 0)
print(f"\nMajor version: {major_version}")

if major_version >= 7:
    print("✓ You have PowerShell Core (modern, cross-platform)")
elif major_version >= 5:
    print("✓ You have Windows PowerShell 5.x (good for this course)")
else:
    print("⚠ Consider upgrading to PowerShell 5.1 or PowerShell Core 7+")

### 3.2 Basic Navigation Cmdlets

In [None]:
# Get current location (like 'pwd' in bash or 'cd' in CMD)
result = run_powershell('Get-Location')
print("Current directory (PowerShell):")
print(result.stdout.strip())

# Compare with Python
print(f"\nCurrent directory (Python):")
print(Path.cwd())

# PowerShell has aliases that work like Unix/CMD commands
result = run_powershell('pwd')  # Alias for Get-Location
print(f"\nUsing 'pwd' alias:")
print(result.stdout.strip())

### 3.3 Listing Files and Directories

In [None]:
# Get-ChildItem is the PowerShell equivalent of 'ls' or 'dir'
# It returns FILE OBJECTS, not just text!

# List files in current directory
result = run_powershell('Get-ChildItem | Select-Object Name, Length, LastWriteTime')
print("Files in current directory:")
print(result.stdout)

# You can also use the 'ls' or 'dir' aliases
result = run_powershell('ls')
print("\nUsing 'ls' alias (first 500 chars):")
print(result.stdout[:500])

## 4. PowerShell Objects and Pipelines

### The Power of Object Pipelines

Unlike CMD where everything is text, PowerShell passes **objects** through the pipeline. This means you can:
- Filter by properties
- Sort by any field
- Select specific properties
- Perform calculations

**Pipeline operator**: `|` (pipe)

Example chain:
```powershell
Get-ChildItem | Where-Object {$_.Length -gt 1MB} | Sort-Object Length -Descending
```

This reads naturally: "Get files | Filter large ones | Sort by size"

### 4.1 Filtering with Where-Object

In [None]:
# Find .ipynb files (Jupyter notebooks) in current directory
# Where-Object filters based on object properties

command = '''
Get-ChildItem | 
Where-Object {$_.Extension -eq '.ipynb'} | 
Select-Object Name, Length, LastWriteTime
'''

result = run_powershell(command)
print("Jupyter notebook files:")
print(result.stdout)

# The $_ variable represents the current object in the pipeline
# $_.Extension gets the Extension property of each file object

### 4.2 Sorting with Sort-Object

In [None]:
# Get files sorted by size (largest first)
# Sort-Object sorts by any property

command = '''
Get-ChildItem -File | 
Sort-Object Length -Descending | 
Select-Object Name, @{Name="SizeKB";Expression={[math]::Round($_.Length/1KB, 2)}} -First 5
'''

result = run_powershell(command)
print("Top 5 largest files:")
print(result.stdout)

# Note: We used a calculated property to show size in KB
# @{Name="SizeKB"; Expression={...}} creates a new property

### 4.3 Measuring with Measure-Object

In [None]:
# Calculate statistics about files
# Measure-Object performs calculations on object properties

command = '''
Get-ChildItem -File | 
Measure-Object -Property Length -Sum -Average -Maximum -Minimum | 
ConvertTo-Json
'''

result = run_powershell(command)
stats = json.loads(result.stdout)

print("File statistics in current directory:")
print(f"  Count: {stats['Count']}")
print(f"  Total size: {stats['Sum']:,.0f} bytes ({stats['Sum']/1024/1024:.2f} MB)")
print(f"  Average size: {stats['Average']:,.0f} bytes")
print(f"  Largest file: {stats['Maximum']:,.0f} bytes")
print(f"  Smallest file: {stats['Minimum']:,.0f} bytes")

## 5. Working with CSV Files in PowerShell

PowerShell has excellent built-in CSV support, which is perfect for data science workflows!

### 5.1 Creating Sample CSV Data

In [None]:
# First, let's create a sample CSV file using pandas
# This simulates real data you might work with

sample_data = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Department': ['Data Science', 'Engineering', 'Data Science', 'Marketing', 'Engineering'],
    'Salary': [95000, 85000, 92000, 78000, 88000],
    'YearsExperience': [5, 3, 4, 2, 3]
})

# Save to data directory
csv_path = Path.cwd().parent / 'data' / 'sample' / 'employees.csv'
csv_path.parent.mkdir(parents=True, exist_ok=True)
sample_data.to_csv(csv_path, index=False)

print(f"Created sample CSV at: {csv_path}")
print(f"\nPreview of data:")
print(sample_data)

### 5.2 Reading CSV with PowerShell

In [None]:
# Import-Csv reads CSV files and creates objects
# Each row becomes an object with properties!

csv_path_str = str(csv_path).replace('\\', '\\\\')
command = f'''
Import-Csv "{csv_path_str}" | Format-Table
'''

result = run_powershell(command)
print("CSV data read by PowerShell:")
print(result.stdout)

### 5.3 Filtering CSV Data

In [None]:
# Filter for Data Science department employees
# This is like pandas: df[df['Department'] == 'Data Science']

command = f'''
Import-Csv "{csv_path_str}" | 
Where-Object {{$_.Department -eq "Data Science"}} | 
Format-Table
'''

result = run_powershell(command)
print("Data Science employees:")
print(result.stdout)

### 5.4 Calculating Statistics on CSV Data

In [None]:
# Calculate average salary by department
# This demonstrates PowerShell's data analysis capabilities

command = f'''
$data = Import-Csv "{csv_path_str}"
$data | Group-Object Department | ForEach-Object {{
    [PSCustomObject]@{{
        Department = $_.Name
        AvgSalary = ($_.Group | Measure-Object Salary -Average).Average
        Count = $_.Count
    }}
}} | Format-Table
'''

result = run_powershell(command)
print("Average salary by department:")
print(result.stdout)

### 5.5 Exporting Filtered Data

In [None]:
# Export filtered data to new CSV
# Export-Csv writes objects to CSV format

output_path = Path.cwd().parent / 'data' / 'sample' / 'high_earners.csv'
output_path_str = str(output_path).replace('\\', '\\\\')

command = f'''
Import-Csv "{csv_path_str}" | 
Where-Object {{[int]$_.Salary -gt 90000}} | 
Export-Csv "{output_path_str}" -NoTypeInformation
'''

result = run_powershell(command)
print(f"Exported high earners to: {output_path}")

# Verify with pandas
if output_path.exists():
    high_earners = pd.read_csv(output_path)
    print(f"\nVerification (read with pandas):")
    print(high_earners)

## 6. PowerShell Variables and Expressions

### 6.1 Variables in PowerShell

In [None]:
# PowerShell variables start with $
# They can store any .NET object

command = '''
$name = "Data Science"
$count = 42
$pi = 3.14159

Write-Output "Name: $name"
Write-Output "Count: $count"
Write-Output "Pi: $pi"
Write-Output "Doubled: $($count * 2)"
'''

result = run_powershell(command)
print("PowerShell variables:")
print(result.stdout)

### 6.2 Arrays and Hash Tables

In [None]:
# PowerShell has arrays and hash tables (dictionaries)

command = '''
# Array
$tools = @("Python", "PowerShell", "Git", "Jupyter")
Write-Output "Tools: $($tools -join ", ")"
Write-Output "First tool: $($tools[0])"

# Hash table (like Python dict)
$person = @{
    Name = "Alice"
    Role = "Data Scientist"
    Salary = 95000
}

Write-Output "`nPerson info:"
Write-Output "Name: $($person.Name)"
Write-Output "Role: $($person.Role)"
Write-Output "Salary: $($person.Salary)"
'''

result = run_powershell(command)
print(result.stdout)

## 7. Creating PowerShell Scripts

PowerShell scripts (.ps1 files) let you save and reuse your automation code.

### 7.1 Simple Automation Script

In [None]:
# Create a PowerShell script that analyzes CSV files
# This script could be run independently or scheduled

script_content = '''
# analyze_data.ps1
# Analyzes employee data and generates a report

param(
    [string]$InputFile = "data/sample/employees.csv",
    [string]$OutputFile = "data/sample/report.txt"
)

Write-Output "Analyzing data from: $InputFile"
Write-Output "" | Out-File $OutputFile

# Load data
$data = Import-Csv $InputFile

# Generate report
"Employee Data Analysis Report" | Out-File $OutputFile
"Generated: $(Get-Date)" | Out-File $OutputFile -Append
"=" * 50 | Out-File $OutputFile -Append
"" | Out-File $OutputFile -Append

# Total employees
"Total Employees: $($data.Count)" | Out-File $OutputFile -Append

# Departments
$depts = $data | Group-Object Department
"" | Out-File $OutputFile -Append
"Employees by Department:" | Out-File $OutputFile -Append
foreach ($dept in $depts) {
    "  $($dept.Name): $($dept.Count)" | Out-File $OutputFile -Append
}

# Salary stats
$salaryStats = $data | Measure-Object Salary -Average -Maximum -Minimum
"" | Out-File $OutputFile -Append
"Salary Statistics:" | Out-File $OutputFile -Append
"  Average: ${0:N2}" -f $salaryStats.Average | Out-File $OutputFile -Append
"  Highest: ${0:N2}" -f $salaryStats.Maximum | Out-File $OutputFile -Append
"  Lowest: ${0:N2}" -f $salaryStats.Minimum | Out-File $OutputFile -Append

Write-Output "Report saved to: $OutputFile"
'''

# Save script
script_path = Path.cwd().parent / 'data' / 'sample' / 'analyze_data.ps1'
script_path.write_text(script_content)

print(f"Created PowerShell script: {script_path}")

### 7.2 Running the Script

In [None]:
# Execute the PowerShell script
# Note: You may need to set execution policy first

script_path_str = str(script_path).replace('\\', '\\\\')
csv_path_str = str(csv_path).replace('\\', '\\\\')
report_path = Path.cwd().parent / 'data' / 'sample' / 'report.txt'
report_path_str = str(report_path).replace('\\', '\\\\')

command = f'& "{script_path_str}" -InputFile "{csv_path_str}" -OutputFile "{report_path_str}"'

result = run_powershell(command)
print("Script output:")
print(result.stdout)

# Read and display the report
if report_path.exists():
    print("\nGenerated Report:")
    print("=" * 50)
    print(report_path.read_text())

## 8. Practice Exercises

### Exercise 1: File Analysis

Use PowerShell to:
1. Find all `.ipynb` files in the notebooks directory
2. Calculate total size of all notebooks
3. List them sorted by modification date (newest first)

**Hint**: Use `Get-ChildItem`, `Where-Object`, `Measure-Object`, and `Sort-Object`

In [None]:
# Exercise 1: Your solution here

command = '''
# TODO: Write PowerShell command to analyze notebook files

'''

# result = run_powershell(command)
# print(result.stdout)


### Exercise 2: CSV Filtering

Using the employees.csv file:
1. Filter for employees with 3+ years of experience
2. Export to a new CSV file
3. Calculate the average salary for this group

**Hint**: Use `Import-Csv`, `Where-Object`, `Export-Csv`, and `Measure-Object`

In [None]:
# Exercise 2: Your solution here

# TODO: Write PowerShell command to filter and analyze CSV



### Exercise 3: Department Summary

Create a PowerShell command that:
1. Groups employees by department
2. For each department, shows:
   - Department name
   - Number of employees
   - Average salary
   - Total salary budget

**Hint**: Use `Group-Object` and calculated properties

In [None]:
# Exercise 3: Your solution here

# TODO: Write PowerShell command for department summary



### Exercise 4: Custom Script

Create a PowerShell script (.ps1) that:
1. Takes a directory path as parameter
2. Finds all files modified in the last 7 days
3. Groups them by file extension
4. Outputs a summary report

**Hint**: Use `param()`, `Get-Date`, and `Group-Object`

In [None]:
# Exercise 4: Your solution here

script_content = '''
# TODO: Write your PowerShell script

'''

# Save and test your script


## 9. Summary

Congratulations! You've completed Module 01. Let's recap:

### Key Concepts

1. **PowerShell vs CMD**
   - PowerShell is object-oriented (vs text-based)
   - Cmdlets follow Verb-Noun naming
   - More powerful and easier to use

2. **Cmdlets You Learned**
   - `Get-ChildItem` - List files/folders
   - `Where-Object` - Filter objects
   - `Sort-Object` - Sort by properties
   - `Measure-Object` - Calculate statistics
   - `Select-Object` - Choose properties to display
   - `Group-Object` - Group by property

3. **CSV Operations**
   - `Import-Csv` - Read CSV files
   - `Export-Csv` - Write CSV files
   - Filter, sort, and analyze like pandas

4. **PowerShell Scripting**
   - Variables with `$`
   - Arrays and hash tables
   - Script files (.ps1)
   - Parameters with `param()`

5. **Integration with Python**
   - Execute PowerShell from Python
   - Capture and process output
   - Combine strengths of both

### Real-World Applications

- Automate data preprocessing
- Batch process CSV files
- Generate reports
- Manage file operations
- Schedule data collection

### What's Next?

In **Module 02: Python-Windows Integration**, you'll learn:
- Advanced subprocess techniques
- Virtual environment management
- pywin32 library for Windows APIs
- Combining Python and PowerShell workflows

### Self-Assessment

Before moving on, make sure you can:
- [ ] Execute PowerShell commands from Python
- [ ] Use pipelines to filter and sort data
- [ ] Import and analyze CSV files
- [ ] Create PowerShell variables and arrays
- [ ] Write basic PowerShell scripts

### Additional Resources

- [PowerShell Documentation](https://docs.microsoft.com/powershell/)
- [PowerShell Gallery](https://www.powershellgallery.com/)
- [Learn PowerShell in Y Minutes](https://learnxinyminutes.com/docs/powershell/)

---

**Ready for more?** Continue to `02_python_windows_integration.ipynb`!