# Storage Cleanup for Homework

This notebook helps you manage storage used by previous lessons and homework assignments.

## What This Notebook Does:

1. **Storage Report** - Shows detailed storage usage for:
   - `~/home_workspace/data/`, `~/home_workspace/downloads/`, `~/home_workspace/models/`
   - `~/cs_workspace/data/`, `~/cs_workspace/downloads/`, `~/cs_workspace/models/` (compute server only)
   - `~/.cache/huggingface/` (HuggingFace Hub cached models and datasets)
   - All `Lesson_XX_Models` folders (Lessons 1 through current)
   - All `Homework_XX_Models` folders (Homework 1 through previous)

2. **Clear Storage** - Removes contents of those specific folders to free up space

## Important Notes:

✅ **Safe to Delete**: All files analyzed here can be:
- **Data & Downloads**: Re-downloaded automatically when needed
- **Model Checkpoints**: Retrained from scratch or found in CoCalc backups
- **HuggingFace Cache**: Re-downloaded automatically when models are used

💾 **Before Deleting**: If you want to keep your trained models:
- Use CoCalc's file browser to zip model folders
- Download the zip files to your local machine
- CoCalc's backup system also maintains copies

⚠️ **Protected Files**: This notebook will NOT touch:
- `home_workspace/api_keys.env` (your API keys are safe!)
- Any other files in the root of home_workspace or cs_workspace
- Current homework models (detected automatically)
- Future homework/lesson models (detected automatically)
- Your homework assignment notebooks

**Only these specific subfolders are cleared:**
- `home_workspace/data/`, `home_workspace/downloads/`, `home_workspace/models/` ← Only subfolders
- `cs_workspace/data/`, `cs_workspace/downloads/`, `cs_workspace/models/` ← Only subfolders
- `~/.cache/huggingface/` ← Only subfolders/files (HuggingFace cache)
- Lesson/Homework model folders (complete folders)

---

In [None]:
# Import storage utilities from introdl
from introdl.storage import (
    get_homework_storage_report,
    clear_homework_storage,
    format_size
)

## 1. Comprehensive Storage Report

This cell analyzes storage usage across all previous lessons and homework, plus workspace folders.

⏱️ **Note**: Running this cell may take 1-2 minutes as it searches through all folders in the course directory to calculate sizes.

In [None]:
# Get storage report (auto-detects current homework number)
storage_items, total_storage, current_hw = get_homework_storage_report()

print("=" * 70)
print(f"📊 DS776 STORAGE REPORT - Homework {current_hw:02d}")
print("=" * 70)

if len(storage_items) == 0:
    print("\n✨ No storage to analyze - everything is already clean!")
else:
    print(f"\n📁 Found {len(storage_items)} folders using storage:\n")

    # Group by type for better organization
    workspace_items = []
    lesson_items = []
    homework_items = []
    hf_cache_items = []

    for folder_path, size in storage_items:
        if 'huggingface' in str(folder_path) and '.cache' in str(folder_path):
            hf_cache_items.append((folder_path, size))
        elif 'workspace' in str(folder_path):
            workspace_items.append((folder_path, size))
        elif 'Lesson' in folder_path.name:
            lesson_items.append((folder_path, size))
        elif 'Homework' in folder_path.name:
            homework_items.append((folder_path, size))

    # Display workspace folders
    if workspace_items:
        print("📦 WORKSPACE FOLDERS")
        print("-" * 70)
        for folder_path, size in workspace_items:
            # Show relative path from workspace
            if 'home_workspace' in str(folder_path):
                display_path = f"home_workspace/{folder_path.name}"
            else:
                display_path = f"cs_workspace/{folder_path.name}"
            print(f"  • {display_path:30s} {format_size(size):>12s}")

    # Display HuggingFace cache
    if hf_cache_items:
        print("\n🤗 HUGGINGFACE CACHE")
        print("-" * 70)
        for folder_path, size in hf_cache_items:
            print(f"  • ~/.cache/huggingface        {format_size(size):>12s}")
            print("    (Cached models and datasets from HuggingFace Hub)")

    # Display lesson models
    if lesson_items:
        print(f"\n📚 LESSON MODELS (Lessons 1-{current_hw})")
        print("-" * 70)
        for folder_path, size in lesson_items:
            print(f"  • {folder_path.name:30s} {format_size(size):>12s}")

    # Display homework models
    if homework_items:
        print(f"\n📝 HOMEWORK MODELS (Homework 1-{current_hw-1})")
        print("-" * 70)
        for folder_path, size in homework_items:
            print(f"  • {folder_path.name:30s} {format_size(size):>12s}")

    # Summary
    print("\n" + "=" * 70)
    print(f"📊 TOTAL STORAGE ANALYZED: {format_size(total_storage)}")
    print(f"📁 Total folders: {len(storage_items)}")
    print("=" * 70)

    print("\n💡 This storage can be safely cleared:")
    print("   • Data/downloads will re-download automatically when needed")
    print("   • Model checkpoints can be retrained or recovered from CoCalc backups")
    print("   • HuggingFace models will re-download automatically when needed")
    print("   • You can zip and download folders first if you want to keep them")

## 2. Clear Storage

This cell removes the contents of all analyzed folders to free up space.

### What Gets Deleted:
- Contents of `home_workspace/data/`, `home_workspace/downloads/`, `home_workspace/models/` ← Only files inside
- Contents of `cs_workspace/data/`, `cs_workspace/downloads/`, `cs_workspace/models/` ← Only files inside (compute server)
- Contents of `~/.cache/huggingface/` ← HuggingFace Hub cache (re-downloads automatically)
- All Lesson_XX_Models folders ← Entire folders deleted
- All Homework_XX_Models folders (previous homework only) ← Entire folders deleted
- YOLO `*.pt` files in Homework_06 (if running from HW 7+)

### What's Protected:
- ✅ `home_workspace/api_keys.env` - NOT touched
- ✅ Other files in root of home_workspace - NOT touched
- ✅ Current homework models - NOT touched
- ✅ Your homework notebooks and code - NOT affected
- ✅ All deleted files can be recreated or re-downloaded
- ✅ CoCalc backups maintain copies of your trained models

**Run this cell to clear storage:**

In [None]:
# Clear storage (use dry_run=True to preview first)
results = clear_homework_storage(storage_items, dry_run=False)

print(f"\n✅ Cleanup complete!")
print(f"💾 Freed {format_size(results['cleared_size'])}")
print(f"📁 Cleared {results['cleared_count']} folders")

if results['failed_count'] > 0:
    print(f"⚠️ Failed to clear {results['failed_count']} folders")

## 3. Verify Storage Cleared

Run this cell to verify that storage has been successfully cleared:

In [None]:
# Re-run storage analysis to verify
storage_items_after, total_storage_after, _ = get_homework_storage_report()

print("=" * 70)
print("📊 VERIFICATION - Storage After Cleanup")
print("=" * 70)

if total_storage_after == 0:
    print("\n✨ SUCCESS! All analyzed folders are now empty.")
    print(f"💾 Total storage freed: {format_size(total_storage)}")
else:
    print(f"\n⚠️ Some storage remains: {format_size(total_storage_after)}")
    print("\nRemaining items:")
    for folder_path, size in storage_items_after:
        if size > 0:
            print(f"  • {folder_path.name}: {format_size(size)}")

print("\n" + "=" * 70)

---

## Summary

This notebook helps you manage storage by:

1. **Analyzing** storage usage across:
   - Workspace subfolders: `data/`, `downloads/`, `models/`
   - HuggingFace cache: `~/.cache/huggingface/`
   - Previous lesson models (auto-detected)
   - Previous homework models (auto-detected)

2. **Clearing** contents of those specific folders in one step

3. **Verifying** that storage was successfully cleared

### Safety Notes:

✅ **Everything can be recovered:**
- Data and downloads re-download automatically
- HuggingFace models re-download automatically when needed
- Model checkpoints can be retrained
- CoCalc backups maintain copies

✅ **Protected files:**
- `home_workspace/api_keys.env` is NOT touched
- Other files in root of workspaces are NOT touched
- Only specific subfolders are cleared

💾 **To keep your models:**
- Zip model folders in CoCalc file browser
- Download zip files to local machine
- Access CoCalc backups if needed

⚠️ **Not affected:**
- Current homework models (auto-detected)
- Future homework/lesson models (auto-detected)
- Your homework notebooks and code