# Finetuning your first model

Now that you've created your Docker environment and tested them, it is time for you to Finetune your first model.

In this exercise, you will finetune a small model (Phi 1) on fictional data (garage-bAInd/Open-Platypus)

## Goal

Use Axolotl to finetune a small model (tiny-llama) on fictional data. To make the finetuning easier, you will not use Low Rank Adapters yet.

The compute instances that you will use for this exercise contains only 1 GPU, and you will use only 1 node. To make the exercise as simple as possible, we will not introduce Distributed Training yet (although you have seen from a previous notebook how to submit the same command to multiple noeds in a cluster).

## Introducing Axolotl

Axolotl is a framework that easily allows you to pre-train and finetine a multitude of models by unifying the configuration in a standardized YAML file.

The workflow for Axolotl is as follows:
- Training: You perform training by running `accelerate launch -m axolotol.cli.train file.yaml`, where `file.yaml` is a yaml file that contains the Axolotl configuration. For more information about the structure and contents of this file, please refer to the Axolotl documentation.
- Once your model is done training (depending on the configuration, model size, hardware size, data size), your model checkpoints will be stored locally on disk. To perform interactive inference on the newly pre-trained / finetuned model, run `accelerate launch -m axolotol.cli.inference file.yaml --model_dir=model_dir` where `model_dir` is the directory where Axolotl has saved the checkpoint. This directory is typically named `lora_out` when LoRA is used, or `model-out` when LoRA is not used. You can find the output directory in the YAML file as `output_dir`

## Finetuning Phi (1) on the Open Platypus dataset, without using PEFT, and without Distributed Training

The Phi family of Small Language Models have been pre-trained on a mixed corpus containing filtered, curated web data that has been augmented with Synthetic data. The data used for training is predominantly focused on Mathematics (data similar to GSMK8+) and one programming (de-duped data from TheStack v3). More information about the training workflow for Phi can be found in the technical report "Textbooks is all you need".

For our exercise, we will perform the following:
1. We will create a single instance Standard_NC24ads_A100_v4 Virtual Machine. Please experiment with other VM sizes for your job, or alternatively proceed to the next notebook to experiment with Finetuning using LoRA / QLoRA.
2. We will store an Axolotl YAML file in this repo, which we will use to configure the finetuning process
3. We will finetune the model

> Note: Although you will finetune the model, and that it will be saved on local disk - the model WILL BE DELETED UPON COMPLETION. In this notebook, you will not interacted with the fine-tuned model !!!

In [1]:
import azureml.core
workspace = azureml.core.Workspace.from_config()

config = {}
config["compute_size"] = "Standard_NC24ads_A100_v4"
config["compute_target"] = "a100cluster"
config["compute_node_count"] = 1
config["pytorch_configuration"] = {
    "node_count": 1, # num of computers in cluster
    "process_count": 1} # gpus-per-computer * node_count
config["training_command"] = "accelerate launch -m axolotl.cli.train phi-ft.yml"
config["experiment"] = "Finetune_phi1"
config["source_directory"] = "src"
config["environment"] = "axolotl_acpt"

In [2]:
try:
    cluster = azureml.core.compute.ComputeTarget(
        workspace=workspace, 
        name=config['compute_target']
    )
    print('Found existing compute cluster')
except azureml.core.compute_target.ComputeTargetException:
    compute_config = azureml.core.compute.AmlCompute.provisioning_configuration(
        vm_size=config['compute_size'],
        max_nodes=config['compute_node_count']
    )
    cluster = azureml.core.compute.ComputeTarget.create(
        workspace=workspace,
        name=config['compute_target'], 
        provisioning_configuration=compute_config
    )
    
cluster.wait_for_completion(show_output=True)

Found existing compute cluster
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### Submitting the finetuning request

As with previous notebooks, you will create a new experiment, and submit a job. This time, we will perform it using only 1 of the two Docker images we've created.
We will load the YAML file to extract some properties, that we will add as tags to our job

In [3]:
import yaml
with open('src/phi-ft.yml', 'r') as f:
    phi_config = yaml.safe_load(f)

In [4]:
phi_config["datasets"][0]["path"]

'garage-bAInd/Open-Platypus'

In [5]:
experiment = azureml.core.Experiment(workspace, config['experiment'])

distributed_job_config = azureml.core.runconfig.PyTorchConfiguration(**config['pytorch_configuration'])
aml_config = azureml.core.ScriptRunConfig(
            source_directory=config['source_directory'],
            command=config['training_command'],
            environment=azureml.core.Environment.get(workspace, name=config["environment"]),
            compute_target=config['compute_target'],
            distributed_job_config=distributed_job_config,
    )
run = experiment.submit(aml_config)
run.set_tags({
    "environment":config["environment"],
    "epochs": phi_config["num_epochs"],
    "micro_batch_size": phi_config["micro_batch_size"],
    "sequence_len": phi_config["sequence_len"],
    "dataset": phi_config["datasets"][0]["path"]
})

print(f"View run details:\n{run.get_portal_url()}")

Converting non-string tag to string: (epochs: 1)
Converting non-string tag to string: (micro_batch_size: 2)
Converting non-string tag to string: (sequence_len: 2048)


View run details:
https://ml.azure.com/runs/Finetune_phi1_1712567210_561031dc?wsid=/subscriptions/68092087-0161-4fb5-b51d-32f18ac56bf9/resourcegroups/aml-au/workspaces/aml-au&tid=16b3c013-d300-468d-ac64-7eda0820b6d3


The job succeeds after 35 minutes
![Finetuning Phi1](img/ft_phi1.jpg)