# Instructions Mental Alpaca and Flan XXL

Instructions for 
- [Mental Flan T5 XXL](https://huggingface.co/NEU-HAI/mental-flan-t5-xxl)
- [Mental Alpaca](https://huggingface.co/NEU-HAI/mental-alpaca)

Note, conversational chats are not possible with mental-alpaca. Task instructions need to follow specific formats, which you need to explore.

More detailed instructions and how to loop and log over queries can be found in the `Instruction Medgemma` notebook.


## 1. Create LLM-Specific Virtual Environment

### Virtual Environment
For reusability create and later simply load (activate) a virtual environment which contains all packages needed to run a specific model. We will need packages provided in a pre-configured conda environment specific for GPU compute nodes plus additional modules. Complete list of pre-configured kernels can be found [here](https://git.bihealth.org/charite-sc-public/sc-wiki/-/wikis/Resources/User%20Documentation/User%20Guide:%20HPC%20@Charite#shared-conda-environments) and [yaml](https://git.bihealth.org/charite-sc-public/conda-envs/-/tree/main/)). 

As we will require newer versions of at least one package as provided in the pre-configured conda kernel conda_envs-gpulab (write-protected), we will clone the gpulab environment, install additional packages into the cloned one and create a kernel that can be selected for this notebook:

#### Create Conda gpulab Clone
Create clone, called `llm_env` or any other name and activate:
```shell
conda create --name llm_env --clone gpulab
conda activate llm_env
```

#### Register Jupyter Kernel
Register a Jupyter kernel with currently active environment 
```
conda install ipykernel
python -m ipykernel install --user --name llm_env --display-name "Python (llm_env)"
```

#### Install Medgemma-Specific Modules
Install missing packages and force update for Jinja2

```
conda env update -f environment.yaml --prune
```

#### Start Jupyter Notebook with Registered Kernel
Start notebook by selecting it from the filebrowser showing your home directory on the cluster. Then select the new environment: Listed top right in notebook and selectable in drop-down menue likely listed as `conda env:.conda-llm_env`.

### Update Torch
Run in Terminal after activating `llm_env` environment:

```shell
pip install --upgrade --force-reinstall --no-cache-dir torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 --extra-index-url https://pypi.org/simple
```
Followed by a downgrade to avoid incompatibilities:
```shell
pip install "numpy<2.0"
```
Note, this update requires Python processes like this notebook book to be shut down. Otherwise a lock is hold on module files that are to be replaced by above process.

## 2. Set Directories

In [None]:
import os
from pathlib import Path
import sys

# otherwise, eg, use home directory "~"
scratch_dir = "<your_granted_scratch_dir_on_hpc_see_mail"
assert os.path.isdir(scratch_dir)
repo_id = "NEU-HAI/mental-flan-t5-xxl"
# repo_id = "NEU-HAI/mental-alpaca"
model_dir = Path(scratch_dir) / "data" / "models" / repo_id.split('/')[-1]
model_dir.mkdir(parents=True, exist_ok=True)

model_dir

## Set HF Token

In [None]:
# Option a) unsafe: copy-paste HuggingFace token here
token="<your_secret_HF_token>"

# Option b) set with 'export HF_TOKEN="secret_token"' in ~/.bashrc followed by 
# source ~/.bashrc to take immediate effect then load environment variable with
token=os.environ["HF_TOKEN"]

assert len(token) > 3, "ERROR\tToken not set!"

## Download Model

Due to subprocess issues I ran when trying to download with multiple workers from a notebook, I recommend to launch the download separately in a terminal. First set arguments for the script in terminal, then call the provided download_model script:
```shell
conda activate llm_env
repo_id="NEU-HAI/mental-flan-t5-xxl" 
# or repo_id="NEU-HAI/mental-alpaca"

local_dir="<your_scratch_dir>"
token=$HF_TOKEN # or direct setting
```
Now, trigger the download:
```shell
python scripts/download_model.py --repo_id $repo_id --local_dir $local_dir --token $token
```

In [None]:
import os
model_dir_str=str(model_dir.resolve())

print("Files in model directory:")
for file in os.listdir(model_dir_str):
    print(f"{file}")

## you should see files like `config.json`, `tokenizer_config.json`, 
## `spiece.model` or `tokenizer.json`, `pytorch_model.bin`

In [None]:
import numpy
import torch

print("numpy version\t", numpy.__version__)
print("torch version\t", torch.__version__)

assert numpy.__version__.startswith("1.")
assert torch.__version__.startswith("2.6.")

In [None]:
from datetime import timedelta
import time

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM
model_dir_str=str(model_dir.resolve())

tokenizer = AutoTokenizer.from_pretrained(model_dir_str)

# Load model directly
match(repo_id):
    case "NEU-HAI/mental-alpaca":
        model = AutoModelForCausalLM.from_pretrained(model_dir_str, device_map="auto", dtype="auto")
    case "NEU-HAI/mental-flan-t5-xxl":
        model = AutoModelForSeq2SeqLM.from_pretrained(model_dir_str, device_map="auto", dtype="auto")

# Example input prompt (Mental FLAN expects normal text string, not chat template)
prompt = (
    "Classify the following text by the author's mental health risk:\n\n"
    "Text: I feel constantly anxious and can't focus on my work lately.\n\n"
    "Prediction:"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
t1 = time.time()
outputs = model.generate(**inputs, max_new_tokens=40)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
t2 = time.time()
print(response)
time_elapsed = str(timedelta(seconds=t2-t1))
print(f"response time: {time_elapsed}")

## Trouble-Shooting

### Memory Offloading 

1. Warnings like `Some parameters are on the meta device because they were offloaded to the cpu.` indicate that GPU memory (max. 40 GB) was insufficient to load complete model into GPU RAM. If warning turns into error, consider lowering precision by setting explicitly `torch_dtype="float16"`.