<a href="https://www.kaggle.com/code/loubl00m/llama-finetuned?scriptVersionId=194573016" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Loading modules

In [1]:
!pip install -U transformers peft bitsandbytes

Collecting transformers
  Downloading transformers-4.44.2-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading transformers-4.44.2-py3-none-any.whl (9.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m85.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hDownloading peft-0.12.0-py3-none-any.whl (296 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.4/296.4 kB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl (137.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m


In [2]:
import torch
from peft import PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient

from datasets import load_dataset

In [3]:
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("hf_token")
login(token = hf_token)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Loading model from hugging face

In [4]:
torch_dtype = torch.float16
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch_dtype,
    bnb_4bit_use_double_quant = True,
)

# Change it to load from hugging face.
base_path = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model_path = "LouayYahyaoui/llama-edu"

base_model = AutoModelForCausalLM.from_pretrained(base_path, quantization_config = bnb_config, \
                                                  device_map = device)
model = PeftModel.from_pretrained(base_model, model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

config.json:   0%|          | 0.00/855 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/740 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/439 [00:00<?, ?B/s]

# Interacting with the model

In [5]:
def answer(prompt):
    if isinstance(prompt, str):
        prompt = [{'role': 'system',
        'content': 'You are an educational assistant that helps university students.'},
         {'role': 'user',
        'content': prompt}]
    elif isinstance(prompt, list):
        prompt = prompt[:2]
    elif isinstance(prompt, dict):
        prompt = prompt["messages"][:2]
    
    end_header = "<|end_header_id|>"
        
    input_ids = tokenizer.apply_chat_template(prompt, tokenize = True, \
                                add_generation_prompt = True, return_tensors="pt")
    
    
    attention_mask = torch.ones_like(input_ids)
    pad_token_id = tokenizer.pad_token_id 

    input_ids = input_ids.to(device)
    out = model.generate(input_ids, max_new_tokens = 2048, do_sample = True, temperature = 0.6, \
    attention_mask = attention_mask, pad_token_id = pad_token_id) # just to shut down warnings.
    
    response = tokenizer.decode(out[0, input_ids.shape[-1]:], skip_special_tokens=True).strip()
    
    return response

In [6]:
# Change prompt here
prompt = "I've been fine-tuning a model to improve its advice-giving and summarization capabilities. Could you suggest some best practices or strategies for further enhancing these skills in the model?"
print(f"User prompt: {prompt}\n\nAssistant: {answer(prompt)}")

2024-08-29 14:44:43.334255: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-29 14:44:43.334351: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-29 14:44:43.450947: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


User prompt: I've been fine-tuning a model to improve its advice-giving and summarization capabilities. Could you suggest some best practices or strategies for further enhancing these skills in the model?

Assistant: You can enhance the model's advice-giving and summarization capabilities by incorporating more diverse training data, fine-tuning on specific domains, and integrating with other AI tools for multimodal processing. Additionally, experimenting with different architectures and techniques such as attention mechanisms and transfer learning can also improve performance. Lastly, testing the model on various tasks and scenarios can provide valuable insights for further refinement.
