<a href="https://colab.research.google.com/github/BohdanPetryshyn/code-llama-fim-fine-tuning/blob/main/code_llama_fim_fine_tuning_inference_and_merging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Code Llama FIM fine-tuning inference and merging

If you found a problem with this notebook, please report it in the original GitHub [repo](https://github.com/BohdanPetryshyn/code-llama-fim-fine-tuning?tab=readme-ov-file) as an issue.

## Install dependenices


In [None]:
!git clone https://github.com/BohdanPetryshyn/code-llama-fim-fine-tuning.git repo

In [None]:
!pip install ninja
!ninja --version

In [None]:
%cd repo
!pip install -r requirements.txt

## Load base Code Llama model

In [None]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)

base_model_id = "codellama/CodeLlama-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

In [None]:
import torch

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=None,
    device_map=None,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)

model.cuda()

## Load fine-tuned adapter

In [None]:
from peft import PeftModel

# Replace with your adapter ID
adapter_id = "BohdanPetryshyn/codellama-7b-openapi-completion-ctx-lvl-fim-05-spm"
# Latest revision
revision = None
model = PeftModel.from_pretrained(model, adapter_id, revision=revision, adapter_name="my-adapter")
model.set_adapter("my-adapter")

## Inference test

In [7]:
def get_completion(prefix, suffix, prompt = None):
    if prompt == None:
      prompt = f"""<PRE> {prefix} <SUF>{suffix} <MID>"""
    if not isinstance(prompt, list):
      prompt = tokenizer(prompt).input_ids

    model.eval()
    outputs = model.generate(
        input_ids=torch.tensor([prompt]).cuda(),
        max_new_tokens=256,
    )
    return (outputs, tokenizer.batch_decode(outputs, skip_special_tokens=False)[0])

In [None]:
prefix = """
  /special-events:
    post:
      summary: Create special events
      description: Creates a new special event for the museum.
      operationId: createSpecialEvent
      tags:
        - Events
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSpecialEventRequest'
            examples:
              default_example:
                $ref: '#/components/examples/CreateSpecialEventRequestExample'
      responses:
        '200':
          description: Success
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SpecialEventResponse'
              examples:
                default_example:
                  $ref: '#/components/examples/CreateSpecialEventResponseExample'
        '400':
          description: Bad request
        '404':
          description: Not found
    get:
      summary: List special events
      description: Return a list of upcoming special events at the museum.
      operationId: listSpecialEvents
      tags:
        - Events
      parameters:"""

suffix = """  /special-events/{eventId}:
    get:
      summary: Get special event
      description: Get details about a special event.
      operationId: getSpecialEvent
      tags:
        - Events
      parameters:
        - $ref: '#/components/parameters/EventId'
      responses:
        '200':
          description: Success
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/SpecialEventResponse'
              examples:
                default_example:
                  $ref: '#/components/examples/GetSpecialEventResponseExample'
        '400':
          description: Bad request
        '404':
          description: Not found"""

tokens, result = get_completion(prefix, suffix)

print(result)
print([tokenizer.decode(token) for token in tokens[0]])
print(tokens[0].tolist())

## Merge and upload the model

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
# Replace with your desired merged model ID
merged_model_id = "code-llama-openapi-completion"

model = model.merge_and_unload()
model.save_pretrained(merged_model_id)
model.push_to_hub(merged_model_id)

### Upload Code Llama tokenizer to the model repository

To make the merged model fully functional withing the HF Inference Endponits platform, we have to add a tokenizer to the repository. For some reason, the following approach results in the model not generating `<EOT>` token during inference:

```
tokenizer.push_to_hub(merged_model_id)
```

So we are copying the original Code Llama tokenizer files manually using git CLI.

In [None]:
# Replace with your Hugging Face account name
full_merged_model_id = f"BohdanPetryshyn/{merged_model_id}"

%cd /content

# Clone the merged repo. Skipping large files
!export GIT_LFS_SKIP_SMUDGE=1 && git clone https://huggingface.co/{full_merged_model_id} merged_model

# Clone the original Code Llama repo to copy the tokenizer files
!export GIT_LFS_SKIP_SMUDGE=1 && git clone https://huggingface.co/codellama/CodeLlama-7b-hf original_code_llama
!cd original_code_llama && git lfs pull --include "tokenizer*"

%cd merged_model

!cp ../original_code_llama/special_tokens_map.json .
!cp ../original_code_llama/tokenizer* .

!git config --global user.email "code-llama-fim-fine-tuning@colab.research.google.com"
!git config --global user.name "Code Llama FIM Fine Tuning by Bohdan Petryshyn"
!git add .
!git commit -m "Add tokenizer from the original Code Llama model"
!git push origin main