# Fine-tuning Phi3-Vision from HuggingFace

## Phi-3CookBook

### Notes

- Need a Hugging Face dataset. `./nga-data-converter.ipynb` converts the **.jsonl* files to an Arrow format supported by Hugging Face for saving/loading from disk.

### Run it

1. Run all: [./nga-data-retriever.ipynb](./nga-data-retriever.ipynb)
2. Run all: [./nga-data-converter.ipynb](./nga-data-converter.ipynb)
3. Run all on this notebook.

In [4]:
import sys

!{sys.executable} -m pip install openai --quiet
!{sys.executable} -m pip install transformers==4.38.1 --quiet
!{sys.executable} -m pip install datasets --quiet
!{sys.executable} -m pip install accelerate==0.30.1 --quiet
!{sys.executable} -m pip install peft --quiet
!{sys.executable} -m pip install Levenshtein --quiet
!{sys.executable} -m pip install deepspeed==0.13.1 --quiet
!{sys.executable} -m pip install torchvision --quiet

In [5]:
# Finetune!
! torchrun --nproc_per_node=4 phi-3-vision-finetune.py

W0808 13:36:09.227000 140529580269568 torch/distributed/run.py:779] 
W0808 13:36:09.227000 140529580269568 torch/distributed/run.py:779] *****************************************
W0808 13:36:09.227000 140529580269568 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0808 13:36:09.227000 140529580269568 torch/distributed/run.py:779] *****************************************
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-vision-128k-instruct:
- image_processing_phi3_v.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
tokenizer_config.json: 100%|███████████████| 9.40k/9.40k [00:00<00:00, 44.0MB/s]
tokenizer.json: 100%|██████████████████████| 1.85M/1.85

## Test it

In [None]:
import torch
from transformers import (AutoModelForCausalLM)

model_output_dir = "./output"
model_use_flash_attention = False
model_name = "SPECIALSAUCE"

model = AutoModelForCausalLM.from_pretrained(
    model_output_dir, # Defaults to './output/'
    # Phi-3-V is originally trained in bf16 + flash attn
    # For fp16 mixed precision training, load in f32 to avoid hf accelerate error
    torch_dtype=torch.bfloat16 if model_use_flash_attention else torch.float32,
    trust_remote_code=True,
    _attn_implementation='flash_attention_2' if model_use_flash_attention else 'eager',
)

# Somehow run this in oolama?
! oolama run model_name

In [None]:
import openai

client = openai.OpenAI(
    base_url = "http://localhost:11434/v1",
    api_key = "nokeyneeded",
)

test_image_url = ""

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "system", "content": "You describe art for low vision folks. You will be given an image. Describe clearly and concisely only what is visible in the image. Avoid stylistic comparisons and suggestions."},
        {"role": "user", "content": [{ "type": "image_url", "image_url": { "url": test_image_url }}]}
    ]
)

print("Response:")
print(response.choices[0].message.content)