!["Persistent Logo"](https://raw.githubusercontent.com/dattarajrao/ml_workshop/main/logo.png)
# Fine-tuning Apple MLX framework 
### Instruction tune a 4B parameter Phi3 LLM, on SQL query builder dataset on HugingFace

Author: Dattaraj Rao - https://www.linkedin.com/in/dattarajrao/ <br/>
Model published at: https://huggingface.co/dattaraj/phi3-finetuned-mentalhealth <br/>
Dataset used: https://huggingface.co/datasets/b-mc2/sql-create-context <br/>
Reference: https://heidloff.net/article/apple-mlx-fine-tuning/

Objective: Leverage a compilation of sql query creations compiled from real cases. Use this dataset to fine tune a 4B parameter language model - Phi3 - usng Apple MLX framework. Evaluate if using the dataset responses are more better quality SQL generations.

### Check GPU availability on Apple Mac

In [22]:
import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

tensor([1.], device='mps:0')


### Compile data from HF Dataset

Download CSV from: https://huggingface.co/datasets/b-mc2/sql-create-context

In [2]:
from datasets import load_dataset

folder="my-data-chat/" 

system_message = """You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
{schema}"""
def create_conversation(sample):
  return {
    "messages": [
      {"role": "system", "content": system_message.format(schema=sample["context"])},
      {"role": "user", "content": sample["question"]},
      {"role": "assistant", "content": sample["answer"]}
    ]
  }

dataset = load_dataset("b-mc2/sql-create-context", split="train")
dataset = dataset.shuffle().select(range(150))
dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)
dataset = dataset.train_test_split(test_size=50/150)
dataset_test_valid = dataset['test'].train_test_split(0.5)
print(dataset["train"][45]["messages"])
dataset["train"].to_json(folder + "train.jsonl", orient="records")

dataset_test_valid["train"].to_json(folder + "test.jsonl", orient="records")
dataset_test_valid["test"].to_json(folder + "valid.jsonl", orient="records")

Map:   0%|          | 0/150 [00:00<?, ? examples/s]

[{'content': 'You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_16 (lane VARCHAR, heat INTEGER)', 'role': 'system'}, {'content': 'What is the total lane for a heat larger than 7?', 'role': 'user'}, {'content': 'SELECT COUNT(lane) FROM table_name_16 WHERE heat > 7', 'role': 'assistant'}]


Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

11884

In [5]:
import json
folder_input='./my-data-chat/'
folder_output='./my-data-text/'
names=['test', 'train', 'valid']
for name in names:
    with open(folder_input + name + '.jsonl', 'r') as json_file:
        json_list = list(json_file)
    with open(folder_output + name + '.jsonl', 'w') as outfile:
        for json_str in json_list:
            result = json.loads(json_str)
            message = result['messages']
            entry = {
                "text": message[0]['content'] + '\nUser:' + message[1]['content'] + '\nAssistant:' + message[2]['content']
            }
            json.dump(entry, outfile)
            outfile.write('\n')

### Run MX framework inference

In [23]:
from mlx_lm import load, generate

prompt = "You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_71 (tie_no VARCHAR, away_team VARCHAR)\nUser:What is Tie No, when Away Team is \"Millwall\"?\nAssistant:SELECT tie_no FROM table_name_71 WHERE away_team = \"millwall\""
model, tokenizer = load("microsoft/Phi-3-mini-4k-instruct")

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Visualize the model

In [33]:
model

Model(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072)
    (layers.0): TransformerBlock(
      (self_attn): Attention(
        (qkv_proj): Linear(input_dims=3072, output_dims=9216, bias=False)
        (o_proj): Linear(input_dims=3072, output_dims=3072, bias=False)
        (rope): RoPE(96, traditional=False)
      )
      (mlp): MLP(
        (gate_up_proj): Linear(input_dims=3072, output_dims=16384, bias=False)
        (down_proj): Linear(input_dims=8192, output_dims=3072, bias=False)
      )
      (input_layernorm): RMSNorm(3072, eps=1e-05)
      (post_attention_layernorm): RMSNorm(3072, eps=1e-05)
    )
    (layers.1): TransformerBlock(
      (self_attn): Attention(
        (qkv_proj): Linear(input_dims=3072, output_dims=9216, bias=False)
        (o_proj): Linear(input_dims=3072, output_dims=3072, bias=False)
        (rope): RoPE(96, traditional=False)
      )
      (mlp): MLP(
        (gate_up_proj): Linear(input_dims=3072, output_dims=16384, bias=False)
        (dow

In [34]:
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)

Prompt: You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
CREATE TABLE table_name_71 (tie_no VARCHAR, away_team VARCHAR)
User:What is Tie No, when Away Team is "Millwall"?
Assistant:SELECT tie_no FROM table_name_71 WHERE away_team = "millwall"




You are an advanced SQL query translator. Users will ask you complex questions in English, and you will generate a SQL query based on the provided SCHEMA. The questions may involve multiple tables, conditions, and specific data retrieval.

SCHEMA:
CREATE TABLE team_info (team_id VARCHAR, team_name VARCHAR, country VARCHAR, founded_year INT)
CREATE TABLE match_results (match_id VARCHAR, home_team VARCHAR,
Prompt: 4.456 tokens-per-sec
Generation: 14.924 tokens-per-sec




You are an advanced SQL query translator. Users will ask you complex questions in English, and you will generate a SQL query based on the provided SCHEMA. The questions may i

## Fine tune with MLX

In [7]:
%time
!python -m mlx_lm.lora --model microsoft/Phi-3-mini-4k-instruct --train --data ./my-data-text/ --iters 1000

CPU times: user 3 μs, sys: 1 μs, total: 4 μs
Wall time: 15 μs


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 13 files: 100%|█████████████████████| 13/13 [00:00<00:00, 47290.50it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading datasets
Training
Trainable parameters: 0.041% (1.573M/3821.080M)
Starting training..., iters: 1000
Iter 1: Val loss 2.599, Val took 9.977s
Iter 10: Train loss 2.295, Learning Rate 1.000e-05, It/sec 0.601, Tokens/sec 285.199, Trained Tokens 4745, Peak mem 9.629 GB
Iter 20: Train loss 1.533, Learning Rate 1.000e-05, It/sec 0.703, Tokens/sec 299.395, Trained Tokens 9005, Peak mem 9.629 GB
Iter 30: Train loss 1.052, Learning Rate 1.000e-05, It/sec 0.678, Tokens/sec 300.776, Trained Tokens 13440, Peak mem 9.629 GB
Iter 40: Train loss 0.916, Learning Rate 1.000e-05, It/sec 0.683, Tokens/sec 300.041, Trained Tokens 17834, Peak mem 9.629 GB
Iter 50: Train loss 0.830, Learning Rate 1.000e-05, It/sec 0.688, Tokens/sec 300.161, Trained Tokens 22194, Peak mem 9.629 GB


### Inference with base mode and with adapter

In [13]:
model, tokenizer = load("microsoft/Phi-3-mini-4k-instruct")

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Prompt: You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
CREATE TABLE table_name_71 (tie_no VARCHAR, away_team VARCHAR)
User:What is Tie No, when Away Team is "Millwall"?
Assistant:SELECT tie_no FROM table_name_71 WHERE away_team = "millwall"




You are an advanced SQL query translator. Users will ask you complex questions in English, and you will generate a SQL query based on the provided SCHEMA. The questions may involve multiple tables, conditions, and specific data retrieval.

SCHEMA:
CREATE TABLE team_info (team_id VARCHAR, team_name VARCHAR, country VARCHAR, founded_year INT)
CREATE TABLE match_results (match_id VARCHAR, home_team VARCHAR,
Prompt: 118.658 tokens-per-sec
Generation: 17.072 tokens-per-sec


In [14]:
model, tokenizer = load("microsoft/Phi-3-mini-4k-instruct", adapter_path="./adapters/")

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Prompt: You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
CREATE TABLE table_name_71 (tie_no VARCHAR, away_team VARCHAR)
User:What is Tie No, when Away Team is "Millwall"?
Assistant:SELECT tie_no FROM table_name_71 WHERE away_team = "millwall"

Prompt: 125.284 tokens-per-sec
Generation: 0.000 tokens-per-sec


### Fuse model

In [19]:
!python -m mlx_lm.fuse --model microsoft/Phi-3-mini-4k-instruct --adapter-path adapters --save-path phi3-it-sql

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 13 files: 100%|█████████████████████| 13/13 [00:00<00:00, 40122.11it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Create GGUF

In [20]:
!python llama.cpp/convert_hf_to_gguf.py './phi3-it-sql' --outfile phi3-it-sql.gguf --outtype f16 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


INFO:hf-to-gguf:Loading model: phi3-it-sql
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 32000
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 32000
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'user' %}{{'<|user|>
' + message['content'] + '<|end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
' + message['content'] + '<|end|>
'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf:

## The GGUF file can now be published on HuggingFace Hub or used in libraries like Llama.CPP. 

# Happy fine-tuning!