!["Persistent Logo"](https://raw.githubusercontent.com/dattarajrao/ml_workshop/main/logo.png)
# Fine-tuning Apple MLX framework 
### Instruction tune a 4B parameter LLM, on Mental Health dataset from Kaggle

Author: Dattaraj Rao - https://www.linkedin.com/in/dattarajrao/ <br/>
Model published at: https://huggingface.co/dattaraj/phi3-finetuned-mentalhealth
Dataset used: https://www.kaggle.com/datasets/narendrageek/mental-health-faq-for-chatbot

Objective: Leverage a compilation of mental health faq discussions compiled from real conversations. Use this dataset to fine tune a 4B parameter language model - Phi3 - usng Apple MLX framework. Evaluate if using the dataset resonses are more empathetic as compared to raw LLM responses.

### Check GPU availability on Apple Mac

In [43]:
import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

RuntimeError: MPS backend out of memory (MPS allocated: 8.00 MB, other allocations: 21.73 GB, max allowed: 20.40 GB). Tried to allocate 256 bytes on shared pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

### Compile data from CSV

Download CSV from: 

In [15]:
import pandas as pd
df = pd.read_csv("../../Datasets/Mental_Health_FAQ.csv")
df.head()

Unnamed: 0,Question_ID,Questions,Answers
0,1590140,What does it mean to have a mental illness?,Mental illnesses are health conditions that di...
1,2110618,Who does mental illness affect?,It is estimated that mental illness affects 1 ...
2,6361820,What causes mental illness?,It is estimated that mental illness affects 1 ...
3,9434130,What are some of the warning signs of mental i...,Symptoms of mental health disorders vary depen...
4,7657263,Can people with mental illness recover?,"When healing from mental illness, early identi..."


### Write traning and validation jsonl files

In [54]:
import json

with open('synth-mentalhealth/train.jsonl', 'w') as tr_file, open('synth-mentalhealth/valid.jsonl', 'w') as vl_file:
    tot_rows = len(df)
    counter = 0
    for index, row in df.iterrows():
        row['Questions'] = row['Questions'].encode('ascii', 'ignore')
        row['Answers'] = row['Answers'].encode('ascii', 'ignore')
        counter = counter + 1
        if counter < tot_rows - 25:
            tr_file.write(json.dumps({"text": f"<|user|>\n{row['Questions']} <|end|>\n<|assistant|>\n{row['Answers']} <|end|>"}) + "\n") 
        else:
            vl_file.write(json.dumps({"text": f"<|user|>\n{row['Questions']} <|end|>\n<|assistant|> \n{row['Answers']} <|end|>"}) + "\n") 

### Run MX framework inference

In [38]:
from mlx_lm import load, generate

model, tokenizer = load("microsoft/Phi-3-mini-4k-instruct")

response = generate(model, tokenizer, prompt="What is the evidence on vaping?", verbose=True, max_tokens=512)

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Prompt: What is the evidence on vaping?


Vaping is a relatively new phenomenon, and there is still much to learn about its effects on health. However, some studies have suggested that vaping may have some benefits for smokers who want to quit or reduce their tobacco consumption. Here are some of the main findings from the research on vaping:

- Vaping may help smokers quit or cut down on smoking. A systematic review of 38 studies found that e-cigarette use was associated with higher rates of abstinence from smoking than non-use or use of other nicotine replacement therapies (NRTs). The review also found that e-cigarette use was more effective than NRTs in helping smokers quit or reduce their smoking. However, the review also noted that the quality of the evidence was low to moderate, and that more research is needed to confirm these results and to understand the long-term effects of vaping.
- Vaping may have fewer harmful effects than smoking. A review of 13 studies found that e-cigar

## Fine tune with MLX

In [18]:
%time
!python -m mlx_lm.lora --model microsoft/Phi-3-mini-4k-instruct --train --data ./synth-mentalhealth/ --iters 1000

CPU times: user 7 μs, sys: 16 μs, total: 23 μs
Wall time: 281 μs


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 13 files: 100%|████████████████████| 13/13 [00:00<00:00, 144248.55it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading datasets
Training
Trainable parameters: 0.041% (1.573M/3821.080M)
Starting training..., iters: 1000
Iter 1: Val loss 1.913, Val took 20.490s
Iter 10: Train loss 2.184, Learning Rate 1.000e-05, It/sec 0.061, Tokens/sec 105.203, Trained Tokens 17139, Peak mem 40.406 GB
Iter 20: Train loss 2.053, Learning Rate 1.000e-05, It/sec 0.157, Tokens/sec 215.326, Trained Tokens 30891, Peak mem 40.406 GB
Iter 30: Train loss 1.971, Learning Rate 1.000e-05, It/sec 0.058, Tokens/sec 108.112, Trained Tokens 49555, Peak mem 40.406 GB
Iter 40: Train loss 1.807, Learning Rate 1.000e-05, It/sec 0.121, Tokens/sec 156.161, Trained Tokens 62419, Peak mem 40.406 GB
Iter 50: Train loss 1.760, Learning Rate 1.000e-05, It/sec 0.074, Tokens/sec 114.696, Trained Tokens 78018, Peak mem 40

### Inference with adapter

Save ./adapters folder as ./lora-mental-health

In [24]:
!python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --adapter-path ./lora-mental-health --prompt "How do I deal with a mental health patient who is violent?" --eos-token " <|end|>"

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fetching 13 files: 100%|████████████████████| 13/13 [00:00<00:00, 128902.96it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Prompt: <|user|>
How do I deal with a mental health patient who is violent?<|end|>
<|assistant|>

Mental illness can impair an individuals ability to think clearly, recognize reality, and make reason decisions. In some cases, mental illness may lead to behavior that can be harmful to themselves or others. When a person is in crisis, they may require immediate intervention or hospitalization to stabilize their symptoms. This is known as an involuntary commitment. 

Involuntary commitment is a legal procedure in which someone who is considered a danger to
Prompt: 28.654 tokens-per-sec
Generation: 17.097 tokens-per-sec


### Without adapter

In [25]:
!python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --prompt "How do I deal with a mental health patient who is violent?" --eos-token "<|end|>"    

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Fetching 13 files: 100%|████████████████████| 13/13 [00:00<00:00, 153162.79it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Prompt: <|user|>
How do I deal with a mental health patient who is violent?<|end|>
<|assistant|>

As an AI, I must emphasize the importance of prioritizing safety and seeking professional guidance. Here are some general steps to approach a situation involving a violent mental health patient:

1. Assess the situation: Remain calm and assess the level of threat. Ensure you and others around you are in a safe location.

2. Call for help: If the situation is severe, call emergency services (911 in the United States) to report the incident
Prompt: 30.723 tokens-per-sec
Generation: 17.319 tokens-per-sec


### Fuse model

In [26]:
!python -m mlx_lm.fuse --model microsoft/Phi-3-mini-4k-instruct

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 13 files: 100%|████████████████████| 13/13 [00:00<00:00, 119052.30it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Create GGUF

In [36]:
!python llama.cpp/convert_hf_to_gguf.py './lora_fused_model' --outfile phi-3-mini-ft.gguf --outtype f16 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


INFO:hf-to-gguf:Loading model: lora_fused_model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:

## The GGUF file can now be published on HuggingFace Hub or used in libraries like Llama.CPP. 

# Happy fine-tuning!