## Gradio

[Gradio](https://www.gradio.app) can enabel simple web interfaces to your software. In this example, we are using Gradio to get a simple chat interface to a large language model.

**Note:** To get this notebook, execute the following command in a terminal in your JupyterHub:
```
cp /home/fs70824/trainee49/LLMs-on-supercomputers/D2_07_Gradio_example.ipynb ./
```

In [190]:
%%writefile gradio_example.py
# Import necessary libraries
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
import gradio as gr
model_name = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="cuda", 
    torch_dtype=torch.bfloat16,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    ),
    trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
def get_answer(question, history=""):
    messages = [
        {"role": "user", "content": question},
    ]
    result = pipe(messages, max_new_tokens=500, return_full_text=False)
    return result[0]['generated_text'].strip()
gr.ChatInterface(get_answer).launch(share=False)

Overwriting gradio_example.py


In [191]:
%%writefile gradio_example.slurm
#!/bin/bash
#SBATCH --partition=zen3_0512_a100x2
#SBATCH --qos=zen3_0512_a100x2
#SBATCH --reservation=llm_supercomputer_gpu_day2
#SBATCH --account=p70824

#SBATCH --gres=gpu:1             # Number of GPUs (1 or 2 on VSC5)
#SBATCH --time=0-00:15:00        # Time limit. Format: Days-hours:minutes:seconds

# module purge                     # Start in a clean environment
# module load miniconda3           # Load conda
eval "$(conda shell.bash hook)"
source /opt/sw/jupyterhub/envs/conda/vsc5/jupyterhub-huggingface-v2/modules  # Activate the conda environment

pip install gradio
python gradio_example.py

Overwriting gradio_example.slurm


In [192]:
!sbatch gradio_example.slurm

sbatch: Allocating 50.0 % of cpu resources: 64 / 128.
sbatch: Number of tasks adjusted to 64.
Submitted batch job 3992403


In [193]:
!squeue --me

             JOBID            PARTITION     NAME     USER ST       TIME  NODES     NODELIST(REASON)
           3991665            zen3_0512 vsc5_jh_ trainee4  R    4:17:12      1            n3501-007
           3992403     zen3_0512_a100x2 gradio_e trainee4  R       0:29      1            n3071-013


In [196]:
!tail -c +0 slurm-3992403.out

activating: /opt/sw/jupyterhub/envs/conda/vsc5/jupyterhub-huggingface-v2 (jupyterhub-huggingface-v2)
the huggingface home directory now points to '/gpfs/data/fs70824/trainee49/hf-cache' (HF_HOME)
Defaulting to user installation because normal site-packages is not writeable
2024-09-26 12:49:10.128956: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-26 12:49:10.130157: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-26 12:49:10.141785: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-26 12:49:10.176775: I tensorflow/core/platform/cpu_feature_guard.cc:182] This Tenso

In [198]:
!squeue --me

             JOBID            PARTITION     NAME     USER ST       TIME  NODES     NODELIST(REASON)
           3991665            zen3_0512 vsc5_jh_ trainee4  R    4:24:03      1            n3501-007
           3992403     zen3_0512_a100x2 gradio_e trainee4  R       7:20      1            n3071-013


Now, create a tunnel to the compute node so that you can have a look at the Gradio website by executing the following command on your computer. (Change the trainee-user-number to your number and the name of the node to the node that your Gradio job is running on.)
```
ssh -L 7860:127.0.0.1:7860 -t -J trainee49@vmos.vsc.ac.at,trainee49@vsc5.vsc.ac.at trainee49@n3071-013
```

In [199]:
!scancel 3992403

In [200]:
!squeue --me

             JOBID            PARTITION     NAME     USER ST       TIME  NODES     NODELIST(REASON)
           3991665            zen3_0512 vsc5_jh_ trainee4  R    4:24:21      1            n3501-007
