# Homework 07 Storage Management

This notebook helps you manage storage used by previous lessons and homework assignments.

## What This Notebook Does:

1. **Storage Report** - Shows detailed storage usage for:
   - `~/home_workspace/data/`, `~/home_workspace/downloads/`, `~/home_workspace/models/`
   - `~/cs_workspace/downloads/` (compute server only - doesn't exist on base CoCalc)
   - All `Lesson_XX_Models` folders (Lessons 1-7)
   - All `Homework_XX_Models` folders (Homework 1-6)

2. **Clear Storage** - Removes contents of those specific folders to free up space

## Important Notes:

✅ **Safe to Delete**: All files analyzed here can be:
- **Data & Downloads**: Re-downloaded automatically when needed
- **Model Checkpoints**: Retrained from scratch or found in CoCalc backups

💾 **Before Deleting**: If you want to keep your trained models:
- Use CoCalc's file browser to zip model folders
- Download the zip files to your local machine
- CoCalc's backup system also maintains copies

⚠️ **Protected Files**: This notebook will NOT touch:
- `home_workspace/api_keys.env` (your API keys are safe!)
- Any other files in the root of home_workspace or cs_workspace
- Current homework models (Homework_07_Models)
- Future homework/lesson models (Lessons 8-12, Homework 8-12)
- Your homework assignment notebooks

**Only these specific subfolders are cleared:**
- `home_workspace/data/` ← Only this subfolder
- `home_workspace/downloads/` ← Only this subfolder  
- `home_workspace/models/` ← Only this subfolder
- `cs_workspace/downloads/` ← Only this subfolder
- Lesson/Homework model folders (complete folders)

---

In [None]:
# Setup and imports
from pathlib import Path
import os
import shutil

def format_size(bytes_size):
    """Format bytes to human-readable string."""
    for unit in ['B', 'KB', 'MB', 'GB']:
        if bytes_size < 1024.0:
            return f"{bytes_size:.2f} {unit}"
        bytes_size /= 1024.0
    return f"{bytes_size:.2f} TB"

def get_folder_size(path):
    """Calculate total size of a folder in bytes."""
    total = 0
    if path.exists():
        for dirpath, dirnames, filenames in os.walk(path):
            for filename in filenames:
                filepath = Path(dirpath) / filename
                if filepath.exists():
                    try:
                        total += filepath.stat().st_size
                    except:
                        pass
    return total

# Get course root and workspace paths
course_root = Path(os.environ.get('DS776_ROOT_DIR', Path.home()))
home_workspace = course_root / "home_workspace"
cs_workspace = Path.home() / "cs_workspace"

# Determine current homework number
current_hw = int(Path.cwd().name.split('_')[1]) if 'Homework_' in Path.cwd().name else 7

print(f"📚 Storage Management for Homework {current_hw:02d}")
print(f"📁 Course root: {course_root}")
print(f"✅ Ready to analyze storage!")

## 1. Comprehensive Storage Report

This cell analyzes storage usage across all previous lessons and homework, plus workspace folders.

In [None]:
# Storage analysis
print("=" * 70)
print("📊 DS776 STORAGE REPORT - Previous Lessons & Homework")
print("=" * 70)

storage_items = []
total_storage = 0

# 1. Analyze home_workspace
print("\n📦 HOME WORKSPACE (~/home_workspace)")
print("-" * 70)

if home_workspace.exists():
    hw_data = home_workspace / "data"
    hw_downloads = home_workspace / "downloads"
    hw_models = home_workspace / "models"
    
    for folder in [hw_data, hw_downloads, hw_models]:
        if folder.exists():
            size = get_folder_size(folder)
            storage_items.append((folder, size))
            total_storage += size
            print(f"  • {folder.name:20s} {format_size(size):>12s}    [{folder}]")
        else:
            print(f"  • {folder.name:20s} {'(not found)':>12s}")
else:
    print("  ⚠️ home_workspace not found")

# 2. Analyze cs_workspace (compute server only)
print("\n💾 COMPUTE SERVER WORKSPACE (~/cs_workspace)")
print("-" * 70)

if cs_workspace.exists():
    cs_downloads = cs_workspace / "downloads"
    if cs_downloads.exists():
        size = get_folder_size(cs_downloads)
        storage_items.append((cs_downloads, size))
        total_storage += size
        print(f"  • downloads          {format_size(size):>12s}    [{cs_downloads}]")
    else:
        print("  • downloads          (not found)")
else:
    print("  ℹ️ cs_workspace not found (only exists on compute servers)")

# 3. Analyze Lesson Models (up to current lesson)
print(f"\n📚 LESSON MODELS (Lessons 1-{current_hw})")
print("-" * 70)

lessons_dir = course_root / "Lessons"
lesson_count = 0

if lessons_dir.exists():
    # Look for Lesson_01 through current lesson
    for i in range(1, current_hw + 1):
        # Find lesson directory (may have description after number)
        lesson_pattern = f"Lesson_{i:02d}*"
        lesson_dirs = sorted(lessons_dir.glob(lesson_pattern))
        
        if lesson_dirs:
            lesson_dir = lesson_dirs[0]
            # Look for models folder
            models_pattern = f"Lesson_{i:02d}*_Models"
            models_dirs = list(lesson_dir.glob(models_pattern))
            
            if models_dirs:
                models_dir = models_dirs[0]
                size = get_folder_size(models_dir)
                if size > 0:
                    storage_items.append((models_dir, size))
                    total_storage += size
                    lesson_count += 1
                    print(f"  • Lesson {i:02d} Models    {format_size(size):>12s}    [{models_dir.name}]")
    
    if lesson_count == 0:
        print("  ℹ️ No lesson models found")
else:
    print("  ⚠️ Lessons directory not found")

# 4. Analyze Homework Models (up to previous homework)
print(f"\n📝 HOMEWORK MODELS (Homework 1-{current_hw - 1})")
print("-" * 70)

homework_dir = course_root / "Homework"
homework_count = 0

if homework_dir.exists():
    # Look for Homework_01 through previous homework (current_hw - 1)
    for i in range(1, current_hw):
        hw_folder = homework_dir / f"Homework_{i:02d}"
        
        if hw_folder.exists():
            # Look for models folder
            models_pattern = f"Homework_{i:02d}*_Models"
            models_dirs = list(hw_folder.glob(models_pattern))
            
            if models_dirs:
                models_dir = models_dirs[0]
                size = get_folder_size(models_dir)
                if size > 0:
                    storage_items.append((models_dir, size))
                    total_storage += size
                    homework_count += 1
                    print(f"  • Homework {i:02d} Models {format_size(size):>12s}    [{models_dir.name}]")
    
    if homework_count == 0:
        print("  ℹ️ No homework models found")
else:
    print("  ⚠️ Homework directory not found")

# Summary
print("\n" + "=" * 70)
print(f"📊 TOTAL STORAGE ANALYZED: {format_size(total_storage)}")
print(f"📁 Total folders: {len(storage_items)}")
print("=" * 70)

if total_storage > 0:
    print("\n💡 This storage can be safely cleared:")
    print("   • Data/downloads will re-download automatically when needed")
    print("   • Model checkpoints can be retrained or recovered from CoCalc backups")
    print("   • You can zip and download folders first if you want to keep them")
else:
    print("\n✨ No storage to clear - everything is already clean!")

## 2. Clear Storage

This cell removes the contents of all analyzed folders to free up space.

### What Gets Deleted:
- Contents of `home_workspace/data/` ← Only files inside this folder
- Contents of `home_workspace/downloads/` ← Only files inside this folder
- Contents of `home_workspace/models/` ← Only files inside this folder
- Contents of `cs_workspace/downloads/` ← Only files inside this folder (if on compute server)
- All Lesson_XX_Models folders (Lessons 1-7) ← Entire folders deleted
- All Homework_XX_Models folders (Homework 1-6) ← Entire folders deleted

### What's Protected:
- ✅ `home_workspace/api_keys.env` - NOT touched
- ✅ Other files in root of home_workspace - NOT touched
- ✅ Current homework (Homework_07) models - NOT touched
- ✅ Your homework notebooks and code - NOT affected
- ✅ All deleted files can be recreated or re-downloaded
- ✅ CoCalc backups maintain copies of your trained models

**Run this cell to clear storage:**

In [None]:
# Clear storage
print("=" * 70)
print("🗑️ CLEARING STORAGE")
print("=" * 70)

if len(storage_items) == 0:
    print("\n✨ Nothing to clear - all folders are already empty or don't exist!")
else:
    print(f"\n📋 Preparing to clear {len(storage_items)} folders...\n")
    
    cleared_size = 0
    cleared_count = 0
    failed_count = 0
    
    for folder_path, size in storage_items:
        if size == 0:
            continue
            
        try:
            print(f"🗑️ Clearing {folder_path.name}... ", end="")
            
            # Remove all contents but keep the folder
            if folder_path.exists():
                for item in folder_path.iterdir():
                    if item.is_file():
                        item.unlink()
                    elif item.is_dir():
                        shutil.rmtree(item)
                
                cleared_size += size
                cleared_count += 1
                print(f"✅ Cleared {format_size(size)}")
            else:
                print("⚠️ Folder no longer exists")
                
        except Exception as e:
            failed_count += 1
            print(f"❌ Error: {e}")
    
    # Summary
    print("\n" + "=" * 70)
    print("✅ CLEANUP COMPLETE")
    print("=" * 70)
    print(f"✅ Cleared folders:  {cleared_count}/{len(storage_items)}")
    print(f"💾 Storage freed:    {format_size(cleared_size)}")
    
    if failed_count > 0:
        print(f"⚠️ Failed:           {failed_count}")
    
    print("\n💡 Remember:")
    print("   • Data/downloads will re-download automatically when needed")
    print("   • Model checkpoints can be retrained from your notebook code")
    print("   • CoCalc backups maintain copies of deleted files")

## Verify Storage Cleared

Run this cell to verify that storage has been successfully cleared:

In [None]:
# Re-run storage analysis to verify
print("=" * 70)
print("📊 VERIFICATION - Storage After Cleanup")
print("=" * 70)

remaining_storage = 0
remaining_items = []

for folder_path, _ in storage_items:
    if folder_path.exists():
        size = get_folder_size(folder_path)
        if size > 0:
            remaining_storage += size
            remaining_items.append((folder_path, size))

if remaining_storage == 0:
    print("\n✨ SUCCESS! All analyzed folders are now empty.")
    print(f"💾 Total storage freed: {format_size(total_storage)}")
else:
    print(f"\n⚠️ Some storage remains: {format_size(remaining_storage)}")
    print("\nRemaining items:")
    for folder_path, size in remaining_items:
        print(f"  • {folder_path.name}: {format_size(size)}")

print("\n" + "=" * 70)

---

## Summary

This notebook helps you manage storage by:

1. **Analyzing** storage usage across:
   - Workspace subfolders: `data/`, `downloads/`, `models/`
   - Previous lesson models (Lessons 1-7)
   - Previous homework models (Homework 1-6)

2. **Clearing** contents of those specific folders in one step

3. **Verifying** that storage was successfully cleared

### Safety Notes:

✅ **Everything can be recovered:**
- Data and downloads re-download automatically
- Model checkpoints can be retrained
- CoCalc backups maintain copies

✅ **Protected files:**
- `home_workspace/api_keys.env` is NOT touched
- Other files in root of workspaces are NOT touched
- Only specific subfolders are cleared

💾 **To keep your models:**
- Zip model folders in CoCalc file browser
- Download zip files to local machine
- Access CoCalc backups if needed

⚠️ **Not affected:**
- Current homework (Homework_07) models
- Future homework/lesson models (8-12)
- Your homework notebooks and code