# GPU usage etiquette on Tufts HPC

- Request only what you’ll actively use. Avoid rapid-fire short `srun` calls that thrash the scheduler.
- Prefer an interactive allocation sized for your exploration window (e.g., 1–4 GPUs for 1–3 hours), then iterate inside that shell.
- Release GPUs when idle. Don’t “park” large-GPU nodes while not running.
- For exploratory or spiky workloads, consider `preempt` partition (jobs may be preempted but start faster).
- Keep HOME clean; place all code, caches, envs, and models under `/cluster/tufts/datalab/zwu09` or class storage.
- Monitor your jobs: `squeue -u $USER`. Inspect partitions: `sinfo -o "%P %a %l %D %c %m %G"`.

Example interactive sessions:
```bash
# 2 GPUs (A100) for 2 hours
srun -p gpu --gres=gpu:a100:2 -t 02:00:00 -n 1 -c 16 --pty bash

# 4 GPUs (A100) for 2 hours
srun -p gpu --gres=gpu:a100:4 -t 02:00:00 -n 1 -c 32 --pty bash

# 2 GPUs (H100) on preempt (may queue/preempt)
srun -p preempt --gres=gpu:h100:2 -t 02:00:00 -n 1 -c 32 --pty bash
```


In [None]:
#Kill junk
scancel 15654372 15654410


In [None]:
# System checks (run inside an interactive shell)
!nvidia-smi -L || true
!python -c "import torch, sys; print('torch:', torch.__version__, 'cuda:', torch.version.cuda, 'is_cuda_available:', torch.cuda.is_available()); print(sys.version)"


In [None]:
# Storage and cluster discovery (run on login or interactive node)
!df -h /cluster/tufts/datalab
!df -h /cluster/tufts/em212class
!sinfo -o "%P %a %l %D %c %m %G" | sed -n '1,40p'
!squeue -u $USER | sed -n '1,30p'


Filesystem                       Size  Used Avail Use% Mounted on
10.246.194.88:/projects/datalab  2.3T  2.3T  6.7G 100% /cluster/tufts/datalab
Filesystem                          Size  Used Avail Use% Mounted on
10.246.194.84:/projects/em212class 1000G  8.9G  991G   1% /cluster/tufts/em212class
PARTITION AVAIL TIMELIMIT NODES CPUS MEMORY GRES
interactive up 4:00:00 2 36 248000 (null)
batch* up 7-00:00:00 71 36+ 120000+ (null)
mpi up 7-00:00:00 70 36+ 120000+ (null)
gpu up 7-00:00:00 1 64 190000 gpu:a100:2
gpu up 7-00:00:00 10 64+ 756121+ gpu:a100:8
gpu up 7-00:00:00 1 72 248000 gpu:p100:4
largemem up 7-00:00:00 2 36+ 1000000 (null)
preempt up 7-00:00:00 3 64+ 190000+ gpu:a100:2
preempt up 7-00:00:00 9 64+ 756121+ gpu:a100:8
preempt up 7-00:00:00 97 36+ 120000+ (null)
preempt up 7-00:00:00 9 128 248000+ gpu:l40:4
preempt up 7-00:00:00 1 64 368000 gpu:v100:3
preempt up 7-00:00:00 2 64+ 256000+ gpu:v100:4
preempt up 7-00:00:00 1 72 248000 gpu:p100:4
preempt up 7-00:00:00 1 72 256000 gpu:

In [1]:
# Runtime configuration for caches and model choice
import os
os.environ['HF_HOME'] = '/cluster/tufts/datalab/zwu09/caches/huggingface'
os.environ['TRANSFORMERS_CACHE'] = '/cluster/tufts/datalab/zwu09/caches/huggingface'
os.environ['PIP_CACHE_DIR'] = '/cluster/tufts/datalab/zwu09/caches/pip'
os.environ['TORCH_HOME'] = '/cluster/tufts/datalab/zwu09/caches/torch'
os.environ['TMPDIR'] = '/cluster/tufts/datalab/zwu09/tmp'

# Use a lightweight, fast, and memory-efficient SDXL variant for testing
model_id = os.environ.get('SD_MODEL_ID', 'stabilityai/sdxl-turbo')


In [2]:
import sys
# Installer (inside kernel) — A100-friendly stack
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!{sys.executable} -m pip install "diffusers[torch]" transformers accelerate safetensors bitsandbytes einops --quiet



Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.2-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
[0mSuccessfully installed pip-25.2
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://download.pytorch.org/whl/cu121
Collecting torch
  Downloading https://download.pytorch.org/whl/cu121/torch-2.5.1%2Bcu121-cp312-cp312-linux_x86_64.whl (780.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m780.4/780.4 MB[0m [31m72.9 MB/s[0m  [33m0:00:07[0m:00:01[0m00:01[0m
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
[2K     [90m━━━━━━━━━━━

In [3]:
# Diffusers demo (Stable Diffusion XL) — adjust model if needed
from diffusers import StableDiffusionXLPipeline
import torch

model_id = "stabilityai/stable-diffusion-xl-base-1.0"

pipe = StableDiffusionXLPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe = pipe.to("cuda")

prompt = "A photorealistic portrait of a person wearing futuristic sunglasses, studio lighting"
image = pipe(prompt, guidance_scale=7.0, num_inference_steps=20).images[0]
image.save("sdxl_demo.png")
image


ImportError: cannot import name 'cached_download' from 'huggingface_hub' (/cluster/home/zwu09/.local/lib/python3.12/site-packages/huggingface_hub/__init__.py)

In [None]:
# Honor external model selection via env var
import os
SD_MODEL_ID = os.environ.get('SD_MODEL_ID', 'stabilityai/sdxl-turbo')
print('Using model:', SD_MODEL_ID)


In [None]:
# Small torch smoke test (device)
import torch
print('torch:', torch.__version__, 'cuda available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))


In [None]:
# TinyLlama demo (small LLM that fits on 1 GPU)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "Write a 3-line explanation of why GPUs accelerate deep learning."
inputs = tokenizer(prompt, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
