## Fine tune / Train LLM

### Downloading and inspecting the model

```
▶ huggingface-cli download mlx-community/Qwen3-1.7B-4bit

▶ ls ~/.cache/huggingface/hub/models--mlx-community--Qwen3-1.7B-4bit  
blobs     refs      snapshots

# Note: The path where hugging face cache and mlx models are stored is the same
▶ huggingface-cli scan-cache
REPO ID                                     REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED  LAST_MODIFIED  REFS LOCAL PATH
------------------------------------------- --------- ------------ -------- -------------- -------------- ---- --------------------------------------------------------------------------------------------
gretelai/synthetic_text_to_sql              dataset          34.3M        8 48 minutes ago 48 minutes ago main /Users/deyarchit/.cache/huggingface/hub/datasets--gretelai--synthetic_text_to_sql
mistralai/Mistral-7B-Instruct-v0.3          model            29.0G       15 12 minutes ago 3 minutes ago  main /Users/deyarchit/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3
mlx-community/Mistral-7B-Instruct-v0.3-4bit model             4.1G        7 4 weeks ago    4 weeks ago    main /Users/deyarchit/.cache/huggingface/hub/models--mlx-community--Mistral-7B-Instruct-v0.3-4bit
mlx-community/gemma-3-4b-it-4bit            model             3.4G       12 4 weeks ago    4 weeks ago    main /Users/deyarchit/.cache/huggingface/hub/models--mlx-community--gemma-3-4b-it-4bit
```

### Training

Note: 

```
python -m mlx_lm lora \
  --model mlx-community/Qwen3-1.7B-4bit \
  --fine-tune-type lora \
  --data data \
  --train \
  --batch-size 4\
  --num-layers 16\
  --iters 1000

Loading pretrained model
Fetching 9 files: 100%|████████████████████████████████| 9/9 [00:00<00:00, 21732.15it/s]
Loading datasets
Training
Trainable parameters: 0.053% (0.918M/1720.575M)
Starting training..., iters: 1000
Iter 1: Val loss 2.567, Val took 31.900s
Iter 10: Train loss 2.293, Learning Rate 1.000e-05, It/sec 0.358, Tokens/sec 226.100, Trained Tokens 6308, Peak mem 4.906 GB
.
.
.
Iter 1000: Val loss 0.689, Val took 35.469s
Iter 1000: Train loss 0.670, Learning Rate 1.000e-05, It/sec 0.290, Tokens/sec 190.825, Trained Tokens 637302, Peak mem 8.651 GB
Iter 1000: Saved adapter weights to adapters/adapters.safetensors and adapters/0001000_adapters.safetensors.
Saved final weights to adapters/adapters.safetensors.
```

## Evaluating LLM

### Testing the fine-tuned LLM using test data:
```
▶ python -m mlx_lm lora \
  --model mlx-community/Qwen3-1.7B-4bit \
  --adapter-path adapters \
  --data data \
  --test
Loading pretrained model
Fetching 9 files: 100%|███████████████████████████████| 9/9 [00:00<00:00, 140329.87it/s]
Loading datasets
Testing
Test loss 0.657, Test ppl 1.929.

```



### Testing with this data point from the test set:

```
{"prompt":"What is the average explainability score of creative AI applications in 'Europe' and 'North America' in the 'creative_ai' table? with given SQL schema CREATE TABLE creative_ai (application_id INT, name TEXT, region TEXT, explainability_score FLOAT); INSERT INTO creative_ai (application_id, name, region, explainability_score) VALUES (1, 'ApplicationX', 'Europe', 0.87), (2, 'ApplicationY', 'North America', 0.91), (3, 'ApplicationZ', 'Europe', 0.84), (4, 'ApplicationAA', 'North America', 0.93), (5, 'ApplicationAB', 'Europe', 0.89);","completion":"SELECT AVG(explainability_score) FROM creative_ai WHERE region IN ('Europe', 'North America');"}
```

#### Prompting the original LLM:
```
▶ mlx_lm.generate \
     --model mlx-community/Qwen3-1.7B-4bit \
     --max-tokens 2000 \
     --prompt "Write a SQL query to find the average explainability score of creative AI applications in 'Europe' and 'North America' in the 'creative_ai' table? with given SQL schema CREATE TABLE creative_ai (application_id INT, name TEXT, region TEXT, explainability_score FLOAT); INSERT INTO creative_ai (application_id, name, region, explainability_score) VALUES (1, 'ApplicationX', 'Europe', 0.87), (2, 'ApplicationY', 'North America', 0.91), (3, 'ApplicationZ', 'Europe', 0.84), (4, 'ApplicationAA', 'North America', 0.93), (5, 'ApplicationAB', 'Europe', 0.89);"
Fetching 9 files: 100%|█████████████████████████████████| 9/9 [00:00<00:00, 80487.71it/s]
==========
To calculate the average explainability score for **creative AI applications in Europe** and **North America**, we can use a SQL query that groups the data by region and computes the average for each group.

SQL Query:

SELECT region, AVG(explainability_score) AS average_score
FROM creative_ai
GROUP BY region;


---

Explanation:

- **`SELECT region, AVG(explainability_score) AS average_score`**  
  This selects the region and the average explainability score for each region.

- **`FROM creative_ai`**  
  Specifies the table from which to retrieve data.

- **`GROUP BY region`**  
  Groups the rows by region, ensuring that the average is calculated for each region separately.

---

Output:

The query will return two rows:

| region | average_score |
|--------|---------------|
| Europe | 0.8667       |
| North America | 0.92     |

---

Notes:

- The `AVG(explainability_score)` function calculates the mean score for each region.
- The query assumes that the `region` column contains only values of `'Europe'` and `'North America'`.

Prompt: 163 tokens, 587.505 tokens-per-sec
Generation: 1751 tokens, 74.075 tokens-per-sec
Peak memory: 1.351 GB
```

### Prompting the original LLM with trained adapters:

```
▶ mlx_lm.generate \
     --model mlx-community/Qwen3-1.7B-4bit \
     --max-tokens 2000 \
     --adapter-path adapters \
     --prompt "Write a SQL query to find the average explainability score of creative AI applications in 'Europe' and 'North America' in the 'creative_ai' table? with given SQL schema CREATE TABLE creative_ai (application_id INT, name TEXT, region TEXT, explainability_score FLOAT); INSERT INTO creative_ai (application_id, name, region, explainability_score) VALUES (1, 'ApplicationX', 'Europe', 0.87), (2, 'ApplicationY', 'North America', 0.91), (3, 'ApplicationZ', 'Europe', 0.84), (4, 'ApplicationAA', 'North America', 0.93), (5, 'ApplicationAB', 'Europe', 0.89);"
Fetching 9 files: 100%|████████████████████████████████| 9/9 [00:00<00:00, 155986.51it/s]


SELECT AVG(explainability_score) FROM creative_ai WHERE region IN ('Europe', 'North America');

Prompt: 163 tokens, 578.914 tokens-per-sec
Generation: 25 tokens, 80.978 tokens-per-sec
Peak memory: 1.295 GB

```