## Course Tools Notebook

This notebook consolidates some tools you can use for freeing up disk space and installing or updating the class `introdl` package.

In [None]:
# Environment Detection and Configuration
import os
from pathlib import Path

print("🔍 Environment Configuration:")
print("-" * 40)

# Check for DS776_ROOT_DIR
if 'DS776_ROOT_DIR' in os.environ:
    root_dir = Path(os.environ['DS776_ROOT_DIR'])
    print(f"✅ DS776_ROOT_DIR is set: {root_dir}")
    print("   Running in local development mode")
else:
    root_dir = Path.home()
    print(f"📚 Using default paths from: {root_dir}")
    print("   Running in student mode (CoCalc/Colab)")

# Display expected paths
print("\n📁 Expected Course Structure:")
print(f"   Lessons:    {root_dir / 'Lessons'}")
print(f"   Homework:   {root_dir / 'Homework'}")
print(f"   Solutions:  {root_dir / 'Solutions'}")

# Check which paths exist
print("\n✓ Path Status:")
for dir_name in ['Lessons', 'Homework', 'Solutions']:
    path = root_dir / dir_name
    status = "✅ Exists" if path.exists() else "❌ Not found"
    print(f"   {dir_name}: {status}")

### Update or Install Course Package

Just run the following cell.  It doesn't hurt to run it even if the package is up to date.  Note, you may need to restart the kernel in this or any other notebook (if it's already running) to load the newest version of the package.

In [None]:
# Install or update the introdl package in editable mode
import sys
import os
from pathlib import Path

# Flexible path resolution
if 'DS776_ROOT_DIR' in os.environ:
    course_tools_path = Path(os.environ['DS776_ROOT_DIR']) / 'Lessons' / 'Course_Tools'
else:
    course_tools_path = Path('~/Lessons/Course_Tools').expanduser()

# Check if the introdl package exists locally
introdl_path = course_tools_path / 'introdl'

if introdl_path.exists():
    print(f"📂 Found Course_Tools at: {course_tools_path}")
    print(f"📦 Installing introdl in editable mode from: {introdl_path}")
    print("   (Editable mode means package updates will be reflected immediately)\n")
    !pip install -e {introdl_path}
else:
    print(f"⚠️ Could not find introdl package at: {introdl_path}")
    print("   This might indicate an issue with your course setup.")
    print("   Expected structure:")
    print("   - ~/Lessons/Course_Tools/introdl/ (for CoCalc)")
    print("   - or $DS776_ROOT_DIR/Lessons/Course_Tools/introdl/ (for local dev)")
    print("\n📦 Attempting to install from PyPI as fallback...")
    !pip install introdl

print("\n✅ Installation complete! You may need to restart your kernel to use the updated package.")

### Freeing Up Diskspace

There are three places that have files you don't really need to save for longer than the duration of the current assignment you're completing.  These are:

* **Student Models** (`~/models` or `$DS776_ROOT_DIR/models`) - Your trained model checkpoints. Be selective about what you delete here as these are your actual work.
* **Datasets** (`~/data` or `$DS776_ROOT_DIR/data`) - Downloaded datasets like MNIST, CIFAR10, etc. These can be re-downloaded when needed.
* **Model Cache** (`~/downloads` or `$DS776_ROOT_DIR/downloads`) - Pre-trained models from PyTorch Hub and HuggingFace. These consume the most space and can safely be cleared after assignments.
  - PyTorch models: `downloads/hub/`
  - HuggingFace models: `downloads/huggingface/`
  - Datasets cache: `downloads/datasets/`

**NOTE:** With the updated course package (v1.3+), all caches are centralized in the workspace directories, making cleanup easier and safer.

**NOTE 2:** Removing files from these directories does not affect your Homework or Lessons folders. You can always rerun notebooks to reproduce results.</cell_parameter>
</invoke>

In [None]:
# Check cache usage across all storage locations
from introdl.utils import check_cache_usage

print("📊 Checking storage usage...")
print("=" * 50)
check_cache_usage()
print("\n💡 Tip: Use the cleanup cells below to free up space")

### Check Cache Usage

Run this cell to see how much disk space is being used by models, datasets, and caches:

#### Clear All Cache and Data (Full Reset)

**⚠️ WARNING:** This will remove ALL cached models, datasets, and potentially your trained models. Only use this if you need to free up maximum space.

If you'd prefer to be more selective, use the "Selective Cache Cleanup" cell above or use the file explorer to manually remove files.

Run the cell below to remove all checkpoint files, datasets, and pre-trained model weights:

In [None]:
# Selective cache cleanup with preview mode
from introdl.utils import clear_model_cache

# Preview what would be deleted (dry_run=True)
print("🔍 PREVIEW MODE - Nothing will be deleted:\n")
print("=" * 50)

# Check what PyTorch models would be deleted
print("\n📦 PyTorch Hub Models:")
clear_model_cache("pytorch", dry_run=True)

# Check what HuggingFace models would be deleted
print("\n🤗 HuggingFace Models:")
clear_model_cache("huggingface", dry_run=True)

# Check what datasets would be deleted
print("\n📊 Cached Datasets:")
clear_model_cache("datasets", dry_run=True)

print("\n" + "=" * 50)
print("\n⚠️  To actually clean, uncomment ONE of these lines:\n")
print("# clear_model_cache('pytorch', dry_run=False)      # Clean PyTorch models only")
print("# clear_model_cache('huggingface', dry_run=False)  # Clean HF models only")
print("# clear_model_cache('datasets', dry_run=False)     # Clean datasets only")
print("# clear_model_cache('all', dry_run=False)          # Clean everything")
print("\n💡 After cleaning, you can re-run notebooks to re-download needed models")

### Selective Cache Cleanup

Use this cell to selectively clean different types of cached data. The dry_run mode lets you preview what will be deleted without actually removing files:

In [2]:
import os
from pathlib import Path

# Resolve the full path to ~/home_workspace
workspace_path = Path("~/home_workspace").expanduser().resolve()

# Ensure the directory exists before proceeding
if workspace_path.exists() and workspace_path.is_dir():
    # Remove all contents inside ~/home_workspace using os commands
    for item in workspace_path.iterdir():
        item_path = str(item)  # Convert Path to string for os commands
        if item.is_file():
            os.remove(item_path)  # Remove file
        elif item.is_dir():
            os.system(f"rm -rf {item_path}")  # Remove directory and its contents

    print(f"Cleared all contents inside: {workspace_path}")

    # Create new subdirectories using os
    for subdir in ["data", "downloads", "models"]:
        new_dir = workspace_path / subdir
        os.makedirs(new_dir, exist_ok=True)
        print(f"Created: {new_dir}")

else:
    print(f"Directory does not exist: {workspace_path}")


Cleared all contents inside: /home/user/home_workspace
Created: /home/user/home_workspace/data
Created: /home/user/home_workspace/downloads
Created: /home/user/home_workspace/models


#### Clear cs_workspace (must be on compute server)

This works the same as clearing home_workspace, but you must run this code on each compute server because this folder is not synced between servers.  Again you can be more selective by using Explorer running on the compute server.  The only files use should consider keeping are checkpoint files save in home_workspace/models.  Everything else is easily downloaded when needed.

Run the cell below to remove all the checkpoint files, datasets, pre-trained model weights, and also restore the original directory structure in cs_workspace.  You can run this code from either the home server or a compute server. 

In [3]:
import os
from pathlib import Path

# Resolve the full path to ~/home_workspace
workspace_path = Path("~/cs_workspace").expanduser().resolve()

# Ensure the directory exists before proceeding
if workspace_path.exists() and workspace_path.is_dir():
    # Remove all contents inside ~/home_workspace using os commands
    for item in workspace_path.iterdir():
        item_path = str(item)  # Convert Path to string for os commands
        if item.is_file():
            os.remove(item_path)  # Remove file
        elif item.is_dir():
            os.system(f"rm -rf {item_path}")  # Remove directory and its contents

    print(f"Cleared all contents inside: {workspace_path}")

    # Create new subdirectories using os
    for subdir in ["data", "downloads", "models"]:
        new_dir = workspace_path / subdir
        os.makedirs(new_dir, exist_ok=True)
        print(f"Created: {new_dir}")

else:
    print(f"Directory does not exist: {workspace_path}")


Cleared all contents inside: /home/user/cs_workspace
Created: /home/user/cs_workspace/data
Created: /home/user/cs_workspace/downloads
Created: /home/user/cs_workspace/models


#### Clear the Hugging Face Cache

Run the cell below.  Be careful about making changes to the path because you don't want to accidentally delete the wrong files. (It's permanent.)  Even if you update the course package so the cache is in one of your workspace directories, this will remove any older cached models.

In [1]:
import os
from pathlib import Path

# Resolve the full path
folder_path = Path("~/.cache/huggingface/hub").expanduser().resolve()

# Ensure the folder exists before attempting deletion
if folder_path.exists():
    os.system(f"rm -rf {folder_path}")
    print(f"Removed: {folder_path}")
else:
    print(f"Folder does not exist: {folder_path}")


Removed: /home/user/.cache/huggingface/hub


#### Remove Datasets and Models from Other Folders

Use the Explorer on either a home or compute server and look for datasets or model files.  You can delete these.  Running your notebook again will download the necessary files.  For example, many of you created copies of the Flowers102 dataset in your Homework_05 folder (use DATA_PATH as the root directory for all torchvision datasets).  You can delete those copies.

### Reset Your Compute Server

* Click the servers button on the left side of CoCalc.
* Click the Compute Servers tab
* Click settings on the computer server you wish to reset.
* Click Deprovision at the bottom of the popup window and agree to terms.
* Restart the server.  Wait several minutes.
* Make sure this notebook is running on the compute server (use the button at the top labeled Server).
* Run the cell at the top of the notebook to reinstall the course package.