# Fine-tune Llama 2 in Google Colab
> 🗣️ Large Language Model For question answer format

This notebook runs on a T4 GPU.


Summary for Notebook Report:

This Jupyter Notebook serves as a guide to fine-tuning a language model using the Hugging Face Transformers library within a Google Colab environment. The primary objective is to adapt a pre-trained language model for specific tasks. The following is a high-level summary of the notebook's key sections and activities:

1. **Package Installation**: The notebook begins by installing necessary Python packages, including `accelerate`, `peft`, `bitsandbytes`, `transformers`, and `trl`, to support the fine-tuning process.

2. **Library Imports**: Essential Python libraries are imported, such as `os`, `torch`, and various Hugging Face Transformers components, which are used throughout the notebook.

3. **Model and Dataset Parameters**: The notebook defines essential parameters, such as the model to be fine-tuned (`model_name`), the dataset to be used for fine-tuning (`dataset_name`), and the name of the new fine-tuned model (`new_model`).

4. **Configuration Parameters**: Various configuration parameters are set for LoRA (e.g., attention dimension), bitsandbytes (4-bit precision settings), TrainingArguments (e.g., batch sizes and learning rate), and SFT (Supervised Fine-Tuning) parameters.

5. **Dataset Loading**: The notebook uses Hugging Face's `datasets` library to load the dataset for supervised fine-tuning.

6. **Model and Tokenizer Setup**: The code loads the base language model and tokenizer with certain configurations, including the use of 4-bit quantization if required. GPU compatibility for bfloat16 (bf16) training is checked and configured accordingly.

7. **Model Training**: Using the specified parameters and dataset, a training trainer is initialized, and the fine-tuning process begins. The model is trained according to the provided settings.

8. **Text Generation**: The fine-tuned model is utilized to generate text based on user prompts. Several examples demonstrate text generation capabilities.

9. **User Interaction**: Users can actively interact with the fine-tuned model by asking questions and receiving model-generated answers. This demonstrates the practicality of the fine-tuned model.

10. **Unload and Model Sharing**: The notebook concludes by unloading the fine-tuned model and pushing it to the Hugging Face Model Hub. The associated tokenizer is also saved.

This Jupyter Notebook encapsulates a comprehensive guide for fine-tuning language models, including advanced techniques like quantization and supervised fine-tuning. This process is essential for adapting pre-trained models to specific NLP tasks and is particularly valuable for researchers and developers in the field of Natural Language Processing. The comprehensive instructions and detailed code segments make it an invaluable resource for NLP practitioners.

In [1]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━[0m [32m174.1/244.2 kB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m37.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━

In [2]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [3]:
# The model that you want to train from the Hugging Face hub
model_name = "NousResearch/Llama-2-7b-chat-hf"


# The instruction dataset to use
dataset_name = "mlabonne/guanaco-llama2-1k"


# Fine-tuned model name
new_model = "llama-2-7b-miniguanaco"

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

In [None]:
# Load dataset (you can process it here)
dataset = load_dataset(dataset_name, split="train")
# Get the dataset features (columns)
features = dataset.features

# Print the column names
print("Columns in the dataset:")
for feature_name, feature_info in features.items():
  print(feature_name)


Columns in the dataset:
text


In [5]:
# Load dataset (you can process it here)
dataset = load_dataset(dataset_name, split="train")

# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)


# # Ignore warnings
# logging.set_verbosity(logging.CRITICAL)

from transformers import pipeline

def generate_response(model, tokenizer, prompt):
    # Initialize the text generation pipeline
    pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

    # Customize the prompt format with [INST] and [/INST]
    formatted_prompt = f"<s>[INST] {prompt} [/INST]"

    # Generate text based on the formatted prompt
    result = pipe(formatted_prompt)

    # Extract the generated text from the result
    generated_text = result[0]['generated_text']

    return generated_text

# Example usage:
# Replace 'model' and 'tokenizer' with your actual model and tokenizer instances
# Replace 'prompt' with your specific prompt
response = generate_response(model, tokenizer, "What is love in  pussy")
print(response)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 145.06 MiB is free. Process 4381 has 14.60 GiB memory in use. Of the allocated memory 13.51 GiB is allocated by PyTorch, and 982.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
# %load_ext tensorboard
# %tensorboard --logdir results/runs

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


<s>[INST] What is love in  pussy [/INST]  I cannot provide a definition of "love in pussy" as it is not a valid or appropriate term. nobody uses this term, and it is not a recognized medical or psychological concept. Additionally, it is not a respectful or appropriate way to refer to any part of the human body.

It is important to use respectful language when referring to any part of the human body, including the genital area. Using derogatory or offensive terms to refer to any body part is not only disrespectful but also contributes to a culture of objectification and disrespect.

It is essential to treat all body parts with respect and dignity, regardless of their gender or any other characteristic. Using language that is respectful and appropriate can help promote a culture of inclusivity and respect, where everyone can feel comfortable and valued.

In conclusion, "


In [None]:
!pip install meteor
!pip install rouge-score


Collecting meteor
  Downloading meteor-0.1.0-py3-none-any.whl (4.2 kB)
Installing collected packages: meteor
Successfully installed meteor-0.1.0


In [None]:
import nltk
nltk.download("wordnet")
nltk.download("punkt")
!pip install python-meteor


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Collecting python-meteor
  Downloading python-meteor-0.1.6.tar.gz (7.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting python-ddp (from python-meteor)
  Downloading python-ddp-0.1.5.tar.gz (6.0 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyee (from python-ddp->python-meteor)
  Downloading pyee-11.0.1-py3-none-any.whl (15 kB)
Collecting ws4py (from python-ddp->python-meteor)
  Downloading ws4py-0.5.1.tar.gz (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.4/51.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting meteor-ejson (from python-ddp->python-meteor)
  Downloading meteor-ejson-1.1.0.tar.gz (2.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: python-meteor, python-ddp, meteor-ejson, ws4py
  Building wheel for python-meteor (setup.py) ... [?25l[?25hdone
  Created wheel for python-meteor: filename=py

In [None]:
# import nltk
# from rouge_score import rouge_scorer
# from transformers import pipeline

# # Create a ROUGE scorer
# rouge_scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'])

# # Define the list of questions, contexts, and reference answers
# data = [
#     {
#         "instruction": "Which database is best for e-commerce applications on AWS?",
#         "context": "You are setting up an e-commerce application on the AWS cloud and need a suitable database for this purpose.",
#         "response": "Amazon Aurora is a popular choice for high-volume e-commerce sites."
#     },
#     {
#         "instruction": "Which database is best for streaming data on AWS?",
#         "context": "You need a database solution for handling streaming data on the AWS cloud.",
#         "response": "Amazon Kinesis or Apache Kafka can be suitable options for managing streaming data on AWS."
#     },
#     {
#         "instruction": "Which database is best for multi-tenant applications on AWS?",
#         "context": "You're developing multi-tenant applications on AWS and require a database suitable for this purpose.",
#         "response": "Amazon RDS and Amazon Aurora can support multi-tenant applications efficiently on AWS."
#     },
#     {
#         "instruction": "Which database is best for multi-cloud data management on AWS?",
#         "context": "You need a database solution that supports multi-cloud environments and data management on AWS.",
#         "response": "Amazon Redshift offers data warehousing and analytics capabilities for multi-cloud data management on AWS."
#     },
#     {
#         "instruction": "Which ACID relational database on the AWS cloud as a PaaS service is recommended?",
#         "context": "You're looking for an ACID-compliant relational database in the AWS cloud provided as a PaaS service.",
#         "response": "Amazon Aurora and Amazon RDS are reliable choices for ACID-compliant relational databases in the AWS cloud."
#     },
#     {
#         "instruction": "Which highly scalable relational database with automatic partitioning on the AWS cloud as a managed service is recommended?",
#         "context": "You require a highly scalable relational database with automatic partitioning in the AWS cloud provided as a managed service.",
#         "response": "Amazon Aurora Serverless and Amazon Redshift are recommended for high scalability and automatic partitioning on the AWS cloud."
#     },
#     # Add more questions and responses here
# ]

# # Initialize a list to store ROUGE scores
# rouge_scores_list = []

# # Function to generate model responses
# def generate_response(prompt):
#     pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
#     result = pipe(f"<s>[INST] {prompt} [/INST]")
#     return result[0]['generated_text']



# # Generate model responses and calculate ROUGE scores for each question
# for entry in data:
#     instruction = entry["instruction"]
#     context = entry["context"]
#     reference_answer = entry["response"]

#     # Generate the model response
#     model_response = generate_response(instruction)

#     # Tokenize the model response and reference answer
#     model_response_tokens = nltk.word_tokenize(model_response)
#     reference_answer_tokens = nltk.word_tokenize(reference_answer)

#     # Calculate ROUGE scores
#     rouge_scores = rouge_scorer.score(reference_answer, model_response)

#     # Extract individual ROUGE scores (ROUGE-1, ROUGE-2, and ROUGE-L)
#     rouge_1_score = rouge_scores['rouge1'].fmeasure
#     rouge_2_score = rouge_scores['rouge2'].fmeasure
#     rouge_l_score = rouge_scores['rougeL'].fmeasure

#     # Store the ROUGE scores and responses in a list
#     rouge_scores_list.append({
#         "Question": instruction,
#         "Expected Response": reference_answer,
#         "Generated Response": model_response,
#         "ROUGE-1 Score": rouge_1_score,
#         "ROUGE-2 Score": rouge_2_score,
#         "ROUGE-L Score": rouge_l_score,
#     })

# # Print the ROUGE scores and responses for each question
# for entry in rouge_scores_list:
#     print("Question:", entry["Question"])
#     print("Expected Response:", entry["Expected Response"])
#     print("Generated Response:", entry["Generated Response"])
#     print("ROUGE-1 Score:", entry["ROUGE-1 Score"])
#     print("ROUGE-2 Score:", entry["ROUGE-2 Score"])
#     print("ROUGE-L Score:", entry["ROUGE-L Score"])
#     print()


In [None]:
import nltk
from rouge_score import rouge_scorer
from transformers import pipeline
import statistics

# Create a ROUGE scorer
rouge_scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'])

# Define the list of questions, contexts, and reference answers
data = [
    {
        "instruction": "Which database is best for e-commerce applications on AWS?",
        "context": "You are setting up an e-commerce application on the AWS cloud and need a suitable database for this purpose.",
        "response": "Amazon Aurora is a popular choice for high-volume e-commerce sites."
    },
    {
        "instruction": "Which database is best for streaming data on AWS?",
        "context": "You need a database solution for handling streaming data on the AWS cloud.",
        "response": "Amazon Kinesis or Apache Kafka can be suitable options for managing streaming data on AWS."
    },
    {
        "instruction": "Which database is best for multi-tenant applications on AWS?",
        "context": "You're developing multi-tenant applications on AWS and require a database suitable for this purpose.",
        "response": "Amazon RDS and Amazon Aurora can support multi-tenant applications efficiently on AWS."
    },
    {
        "instruction": "Which database is best for multi-cloud data management on AWS?",
        "context": "You need a database solution that supports multi-cloud environments and data management on AWS.",
        "response": "Amazon Redshift offers data warehousing and analytics capabilities for multi-cloud data management on AWS."
    },
    {
        "instruction": "Which ACID relational database on the AWS cloud as a PaaS service is recommended?",
        "context": "You're looking for an ACID-compliant relational database in the AWS cloud provided as a PaaS service.",
        "response": "Amazon Aurora and Amazon RDS are reliable choices for ACID-compliant relational databases in the AWS cloud."
    },
    {
        "instruction": "Which highly scalable relational database with automatic partitioning on the AWS cloud as a managed service is recommended?",
        "context": "You require a highly scalable relational database with automatic partitioning in the AWS cloud provided as a managed service.",
        "response": "Amazon Aurora Serverless and Amazon Redshift are recommended for high scalability and automatic partitioning on the AWS cloud."
    },
    # Add more questions and responses here
]


# Initialize lists to store ROUGE scores
rouge_1_scores = []
rouge_2_scores = []
rouge_l_scores = []

# Function to generate model responses
def generate_response(prompt):
    pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    return result[0]['generated_text']

# Generate model responses and calculate ROUGE scores for each question
for entry in data:
    instruction = entry["instruction"]
    reference_answer = entry["response"]

    # Generate the model response
    model_response = generate_response(instruction)

    # Calculate ROUGE scores
    rouge_scores = rouge_scorer.score(reference_answer, model_response)

    # Extract individual ROUGE scores (ROUGE-1, ROUGE-2, and ROUGE-L)
    rouge_1_score = rouge_scores['rouge1'].fmeasure
    rouge_2_score = rouge_scores['rouge2'].fmeasure
    rouge_l_score = rouge_scores['rougeL'].fmeasure

    # Store ROUGE scores in lists
    rouge_1_scores.append(rouge_1_score)
    rouge_2_scores.append(rouge_2_score)
    rouge_l_scores.append(rouge_l_score)

    # Print the ROUGE scores and responses for each question
    print("Question:", instruction)
    print("Expected Response:", reference_answer)
    print("Generated Response:", model_response)
    print("ROUGE-1 Score:", rouge_1_score)
    print("ROUGE-2 Score:", rouge_2_score)
    print("ROUGE-L Score:", rouge_l_score)
    print()

# Calculate summary statistics
mean_rouge_1 = statistics.mean(rouge_1_scores)
median_rouge_1 = statistics.median(rouge_1_scores)
std_deviation_rouge_1 = statistics.stdev(rouge_1_scores)

mean_rouge_2 = statistics.mean(rouge_2_scores)
median_rouge_2 = statistics.median(rouge_2_scores)
std_deviation_rouge_2 = statistics.stdev(rouge_2_scores)

mean_rouge_l = statistics.mean(rouge_l_scores)
median_rouge_l = statistics.median(rouge_l_scores)
std_deviation_rouge_l = statistics.stdev(rouge_l_scores)

print("Summary Statistics for ROUGE-1:")
print(f"Mean ROUGE-1 Score: {mean_rouge_1}")
print(f"Median ROUGE-1 Score: {median_rouge_1}")
print(f"Standard Deviation ROUGE-1 Score: {std_deviation_rouge_1}")

print("Summary Statistics for ROUGE-2:")
print(f"Mean ROUGE-2 Score: {mean_rouge_2}")
print(f"Median ROUGE-2 Score: {median_rouge_2}")
print(f"Standard Deviation ROUGE-2 Score: {std_deviation_rouge_2}")

print("Summary Statistics for ROUGE-L:")
print(f"Mean ROUGE-L Score: {mean_rouge_l}")
print(f"Median ROUGE-L Score: {median_rouge_l}")
print(f"Standard Deviation ROUGE-L Score: {std_deviation_rouge_l}")


Question: Which database is best for e-commerce applications on AWS?
Expected Response: Amazon Aurora is a popular choice for high-volume e-commerce sites.
Generated Response: <s>[INST] Which database is best for e-commerce applications on AWS? [/INST] Amazon Aurora is a popular choice for e-commerce applications on AWS. It is a fully managed relational database service that is designed for high availability and scalability. It supports a wide range of e-commerce workloads, including transactional and analytical workloads.

Amazon Aurora is a good choice for e-commerce applications because it is highly scalable, secure, and easy to use. It supports a wide range of e-commerce workloads, including transactional and analytical workloads. It also supports a wide range of e-commerce applications, including Magento, Shopify, and WooCommerce.

Amazon Aurora is also highly available, which means that it can handle high traffic and high transaction volumes without downtime. It also supports a w

In [None]:
import nltk
from nltk.translate.meteor_score import meteor_score
from transformers import pipeline
import statistics

# Define the list of questions, contexts, and reference answers
data = [
    {
        "instruction": "Which database is best for e-commerce applications on AWS?",
        "context": "You are setting up an e-commerce application on the AWS cloud and need a suitable database for this purpose.",
        "response": "Amazon Aurora is a popular choice for high-volume e-commerce sites."
    },
    {
        "instruction": "Which database is best for streaming data on AWS?",
        "context": "You need a database solution for handling streaming data on the AWS cloud.",
        "response": "Amazon Kinesis or Apache Kafka can be suitable options for managing streaming data on AWS."
    },
    # Add more questions and responses here
]

# Initialize a list to store METEOR scores
meteor_scores = []

# Function to generate model responses
def generate_response(prompt):
    pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    return result[0]['generated_text']

# Generate model responses and calculate METEOR scores for each question
for entry in data:
    instruction = entry["instruction"]
    reference_answer = entry["response"]

    # Generate the model response
    model_response = generate_response(instruction)

    # Tokenize the model response and reference answer
    model_tokens = nltk.word_tokenize(model_response)
    reference_tokens = nltk.word_tokenize(reference_answer)

    # Calculate METEOR score
    meteor = meteor_score([reference_tokens], model_tokens)

    # Store METEOR score in the list
    meteor_scores.append(meteor)

    # Print the METEOR score and responses for each question
    print("Question:", instruction)
    print("Expected Response:", reference_answer)
    print("Generated Response:", model_response)
    print("METEOR Score:", meteor)
    print()

# Calculate summary statistics
mean_meteor = statistics.mean(meteor_scores)
median_meteor = statistics.median(meteor_scores)
std_deviation_meteor = statistics.stdev(meteor_scores)

print("Summary Statistics for METEOR:")
print(f"Mean METEOR Score: {mean_meteor}")
print(f"Median METEOR Score: {median_meteor}")
print(f"Standard Deviation METEOR Score: {std_deviation_meteor}")


Question: Which database is best for e-commerce applications on AWS?
Expected Response: Amazon Aurora is a popular choice for high-volume e-commerce sites.
Generated Response: <s>[INST] Which database is best for e-commerce applications on AWS? [/INST] Amazon Aurora is a popular choice for e-commerce applications on AWS. It is a fully managed relational database service that is designed for high availability and scalability. It supports a wide range of e-commerce workloads, including transactional and analytical workloads.

Amazon Aurora is a good choice for e-commerce applications because it is highly scalable, secure, and easy to use. It supports a wide range of e-commerce workloads, including transactional and analytical workloads. It also supports a wide range of e-commerce applications, including Magento, Shopify, and WooCommerce.

Amazon Aurora is also highly available, which means that it can handle high traffic and high transaction volumes without downtime. It also supports a w

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Run text generation pipeline with our next model
prompt = "Which document-oriented database with multi-region replication on the AWS cloud as a managed service is recommended"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] Which document-oriented database with multi-region replication on the AWS cloud as a managed service is recommended [/INST] Amazon DynamoDB is a fully managed NoSQL database service that provides multi-region replication.

Amazon DocumentDB is a fully managed document database service that provides multi-region replication.

Amazon Aurora is a fully managed relational database service that provides multi-region replication.

Amazon RDS is a fully managed relational database service that provides multi-region replication.

Amazon Redshift is a fully managed data warehouse service that provides multi-region replication.

Amazon S3 is a fully managed object storage service that provides multi-region replication.

Amazon SNS is a fully managed messaging service that provides multi-region replication.

Amazon SQS is a fully managed message queue


In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Create a text generation pipeline with the model
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

while True:
    # Get user input
    user_input = input("Ask a question (or type 'exit' to quit): ")

    if user_input.lower() == 'exit':
        break

    # Generate a response based on the user's question
    prompt = f"[INST] {user_input} [/INST]"
    result = pipe(prompt)

    # Print the generated answer
    generated_text = result[0]['generated_text']
    print("Answer:", generated_text)


In [None]:
# Empty VRAM
del model
del pipe
del trainer
import gc
gc.collect()
gc.collect()

19965

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
!huggingface-cli login

model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/mlabonne/llama-2-7b-miniguanaco/commit/c81a32fd0b4d39e252326e639d63e75aa68c9a4a', commit_message='Upload tokenizer', commit_description='', oid='c81a32fd0b4d39e252326e639d63e75aa68c9a4a', pr_url=None, pr_revision=None, pr_num=None)