# Finetuning your first model

Now that you've created your Azure curated environment and tested it, it is time for you to Finetune your first model.

In this exercise, you will finetune a small model (Phi 1) on fictional dataset (garage-bAInd/Open-Platypus). You will also integrate Axolotl with AzureML MLFlow, enabling you to retrieve logged metrics from Azure. You will save the output of your finetune exercise as part of the Azure experiment.

## Goal

Use Axolotl to finetune a small model on fictional data. To make the finetuning easier, you will not use Low Rank Adapters yet.

The compute instances that you will use for this exercise contains only 1 GPU, and you will use only 1 node. To make the exercise as simple as possible, we will not introduce Distributed Training yet (although you have seen from a previous notebook how to submit the same command to multiple nodes in a cluster). Future notebooks will demonstrate how to run a GPU distributed job.

## Introducing Axolotl

Axolotl is a framework that easily allows you to pre-train and finetine a multitude of models by unifying the configuration in a standardized YAML file.

The workflow for Axolotl is as follows:
- Training: You perform training by running `accelerate launch -m axolotol.cli.train file.yaml`, where `file.yaml` is a yaml file that contains the Axolotl configuration. For more information about the structure and contents of this file, please refer to the Axolotl documentation.
- Once your model is done training (depending on the configuration, model size, hardware size, data size), your model checkpoints will be stored locally on disk. To perform interactive inference on the newly pre-trained / finetuned model, run `accelerate launch -m axolotol.cli.inference file.yaml --model_dir=model_dir` where `model_dir` is the directory where Axolotl has saved the checkpoint. This directory is typically named `lora_out` when LoRA is used, or `model-out` when LoRA is not used. You can find the output directory in the YAML file as `output_dir`. Since you are submitting a job to AML, you will not interact with the finetuned model immediately once the job finishes.

## Finetuning Phi (1) on the Open Platypus dataset, without using PEFT, and without Distributed Training

The Phi family of Small Language Models have been pre-trained on a mixed corpus containing filtered, curated web data that has been augmented with Synthetic data. The data used for training is predominantly focused on Mathematics (data similar to GSMK8+) and one programming (de-duped data from TheStack v3). More information about the training workflow for Phi can be found in the technical report "Textbooks is all you need".

For our exercise, we will perform the following:
1. We will create a single instance Standard_NC24ads_A100_v4 Virtual Machine. Please experiment with other VM sizes for your job, or alternatively proceed to the next notebook to experiment with Finetuning using LoRA / QLoRA.
2. We will retrieve information about our workspace, more specifically our AzureML MLFlow URI, which we will insert in the configuration YML file. This way, we will be able to retrieve the learning metrics from the AzureML interface
3. We will save the model's outputs to the folder `./outputs`, so that the finetuned model weights are saved permanently with the job submitted to AML.
4. We will store an Axolotl YAML file in this repo, which we will use to configure the finetuning process. We will modify this file using the information from #2 and #3.
5. We will finetune the model

### 1. Creating a target compute cluster to run our experiments

In [1]:
import azureml.core
workspace = azureml.core.Workspace.from_config()

config = {}
config["compute_size"] = "Standard_NC24ads_A100_v4"
config["compute_target"] = "a100cluster"
config["compute_node_count"] = 1
config["pytorch_configuration"] = {
    "node_count": 1, # num of computers in cluster
    "process_count": 1} # gpus-per-computer * node_count
config["training_command"] = "accelerate launch -m axolotl.cli.train phi-ft.yml"
config["experiment"] = "Finetune_phi1"
config["source_directory"] = "src"
config["environment"] = "axolotl_acpt"

In [2]:
try:
    cluster = azureml.core.compute.ComputeTarget(
        workspace=workspace, 
        name=config['compute_target']
    )
    print('Found existing compute cluster')
except azureml.core.compute_target.ComputeTargetException:
    compute_config = azureml.core.compute.AmlCompute.provisioning_configuration(
        vm_size=config['compute_size'],
        max_nodes=config['compute_node_count']
    )
    cluster = azureml.core.compute.ComputeTarget.create(
        workspace=workspace,
        name=config['compute_target'], 
        provisioning_configuration=compute_config
    )
    
cluster.wait_for_completion(show_output=True)

InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### 2: Retrieve your MLFlow URI

To properly track your experiments, you will instead need to leverage MLFlow. MLFlow enables you to do model tracking and registration, enabling you to use tracked models in downstream processes.

In [1]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
mlflow_tracking_uri = ml_client.workspaces.get(ml_client.workspace_name).mlflow_tracking_uri
print(mlflow_tracking_uri)

Found the config file in: /config.json


azureml://australiaeast.api.azureml.ms/mlflow/v1.0/subscriptions/68092087-0161-4fb5-b51d-32f18ac56bf9/resourceGroups/aml-au/providers/Microsoft.MachineLearningServices/workspaces/aml-au


In [None]:

with open("/config.json", 'r') as f:


### 3 and 4. Load the Phi-1 Axolotl YML file

In the code snipped below, you will:

1. Load an existing YML file that ships with Axolotl. For other examples, please review their repository.
2. You will use the MLFLow URI from above to insert new settings in the loaded dictionary.
3. You will change the `output_dir` value to `./outputs`. This folder on AzureML enables the persistence of output data.
4. You will save the configuration to a new file called `phi-ft-modified.yml`


In [4]:
import yaml
with open('src/phi-ft.yml', 'r') as f:
    phi_ft_config = yaml.safe_load(f)

phi_ft_config["mlflow_tracking_uri"] = mlflow_tracking_uri
phi_ft_config["hf_mlflow_log_artifacts"] = False
phi_ft_config["mlflow_experiment_name"] = config["experiment"]
phi_ft_config["output_dir"] = "./outputs"

with open('src/phi-ft-modified.yml', 'w') as f:
    yaml.dump(phi_ft_config, f)

### 5. Submitting the finetuning request

As with previous notebooks, you will create a new experiment, and submit a job. This time, we will perform it using only 1 of the two Docker images we've created.
We will load the YAML file to extract some properties, that we will add as tags to our job

In [5]:
experiment = azureml.core.Experiment(workspace, config['experiment'])

distributed_job_config = azureml.core.runconfig.PyTorchConfiguration(**config['pytorch_configuration'])
aml_config = azureml.core.ScriptRunConfig(
            source_directory=config['source_directory'],
            command=config['training_command'],
            environment=azureml.core.Environment.get(workspace, name=config["environment"]),
            compute_target=config['compute_target'],
            distributed_job_config=distributed_job_config,
    )
run = experiment.submit(aml_config)
run.set_tags({
    "environment":config["environment"],
    "epochs": phi_ft_config["num_epochs"],
    "micro_batch_size": phi_ft_config["micro_batch_size"],
    "sequence_len": phi_ft_config["sequence_len"],
    "dataset": phi_ft_config["datasets"][0]["path"]
})

print(f"View run details:\n{run.get_portal_url()}")

Converting non-string tag to string: (epochs: 1)
Converting non-string tag to string: (micro_batch_size: 2)
Converting non-string tag to string: (sequence_len: 2048)


View run details:
https://ml.azure.com/runs/Finetune_phi1_1712713399_fc32d6dd?wsid=/subscriptions/68092087-0161-4fb5-b51d-32f18ac56bf9/resourcegroups/aml-au/workspaces/aml-au&tid=16b3c013-d300-468d-ac64-7eda0820b6d3


The job succeeds after around 35 minutes. Notice the std_out display on the right pane, and the contents of the `./outputs` directory, showing the persisted Model checkpoints.
![Finetuning Phi1](img/phi_ft_1.png)

the integration with MLFlow can also be shown by observing the `metrics` tab:
![MLFlow integration](img/phi_ft_2.png)