# Lab 1.1.3: NGC Container Setup - SOLUTIONS

This notebook contains solutions to the exercises in the NGC Container Setup notebook.

---

## Try It Yourself #1 Solution

**Task:** Pull the PyTorch container.

In [None]:
# Solution: Pull the PyTorch NGC container
# Run this in terminal for progress visibility:
#   docker pull nvcr.io/nvidia/pytorch:25.11-py3

# Or run directly (may take 10-30 minutes first time):
!docker pull nvcr.io/nvidia/pytorch:25.11-py3

**Expected Output:**
```
25.11-py3: Pulling from nvidia/pytorch
...
Status: Downloaded newer image for nvcr.io/nvidia/pytorch:25.11-py3
nvcr.io/nvidia/pytorch:25.11-py3
```

**Explanation:**
- The image is ~20GB+ so initial download takes time
- Subsequent runs will use cached layers
- Always use specific version tags (e.g., `25.11-py3`) not `latest`

---

## Challenge Solution

**Task:** Create a custom Dockerfile that extends the NGC PyTorch image with additional packages.

In [None]:
# Solution: Custom Dockerfile content
dockerfile_content = '''# Custom DGX Spark Development Environment
# Extends NGC PyTorch with common AI packages

FROM nvcr.io/nvidia/pytorch:25.11-py3

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# Install additional Python packages
RUN pip install --no-cache-dir \\
    transformers>=4.40.0 \\
    datasets>=2.18.0 \\
    accelerate>=0.28.0 \\
    peft>=0.10.0 \\
    bitsandbytes>=0.43.0 \\
    wandb>=0.16.0 \\
    tensorboard>=2.16.0 \\
    gradio>=4.0.0 \\
    langchain>=0.1.0 \\
    chromadb>=0.4.0

# Configure Jupyter
RUN mkdir -p /root/.jupyter
RUN echo "c.ServerApp.token = ''" >> /root/.jupyter/jupyter_lab_config.py
RUN echo "c.ServerApp.password = ''" >> /root/.jupyter/jupyter_lab_config.py
RUN echo "c.ServerApp.allow_root = True" >> /root/.jupyter/jupyter_lab_config.py

# Create startup script
COPY startup.sh /startup.sh
RUN chmod +x /startup.sh

# Set working directory
WORKDIR /workspace

# Default command
CMD ["/startup.sh"]
'''

# Save Dockerfile
with open('Dockerfile.custom', 'w') as f:
    f.write(dockerfile_content)

print("Dockerfile.custom created!")
print("\nContent preview:")
print(dockerfile_content[:500])

In [None]:
# Solution: Startup script that shows GPU status
startup_script = '''#!/bin/bash
# DGX Spark Custom Container Startup Script

echo "========================================"
echo "  DGX Spark Development Environment"
echo "========================================"
echo ""

# Show GPU info
echo "GPU Status:"
nvidia-smi --query-gpu=name,memory.total,memory.free,temperature.gpu --format=csv
echo ""

# Show PyTorch CUDA status
echo "PyTorch CUDA Check:"
python -c "import torch; print(f'  CUDA Available: {torch.cuda.is_available()}'); print(f'  Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"
echo ""

# Show memory
echo "System Memory:"
free -h | head -2
echo ""

echo "========================================"
echo "  Starting Jupyter Lab..."
echo "========================================"

# Start Jupyter Lab
exec jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browser
'''

# Save startup script
with open('startup.sh', 'w') as f:
    f.write(startup_script)

import os
os.chmod('startup.sh', 0o755)

print("startup.sh created!")

In [None]:
# Solution: Build and run commands
print("To build and run your custom container:")
print("="*50)
print()
print("# Build the custom image")
print("docker build -f Dockerfile.custom -t dgx-spark-custom:latest .")
print()
print("# Run the custom container")
print("docker run --gpus all -it --rm \\")
print("    -v $HOME/workspace:/workspace \\")
print("    -v $HOME/.cache/huggingface:/root/.cache/huggingface \\")
print("    -p 8888:8888 \\")
print("    --ipc=host \\")
print("    dgx-spark-custom:latest")
print()
print("# Alternative: Just run bash")
print("docker run --gpus all -it --rm \\")
print("    -v $HOME/workspace:/workspace \\")
print("    --ipc=host \\")
print("    dgx-spark-custom:latest bash")

**Key Points:**

1. **Always extend NGC base images** - They have the correct CUDA/cuDNN setup for ARM64

2. **Use `--no-cache-dir`** for pip to reduce image size

3. **Include `--ipc=host`** when running for PyTorch DataLoader compatibility

4. **Configure Jupyter security** appropriately for your environment

5. **Use version pinning** for reproducible builds (e.g., `transformers>=4.40.0`)

---

## Key Takeaways

1. **NGC containers are required** - pip install torch won't work on ARM64
2. **Always use `--gpus all`** - GPU access must be explicitly enabled
3. **Always use `--ipc=host`** - Required for PyTorch multiprocessing
4. **Mount cache directories** - Avoid re-downloading models
5. **Custom containers extend NGC** - Never try to install CUDA/PyTorch yourself

---

## Cleanup

In [None]:
# Cleanup resources and generated files
import gc
import os

# Remove generated files if they exist
for f in ["Dockerfile.custom", "startup.sh"]:
    if os.path.exists(f):
        os.remove(f)
        print(f"Removed {f}")

gc.collect()
print("Cleanup complete!")