In [1]:
from IPython.core.display import HTML
HTML("""
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
    horizontal-align: middle;
}
h1 {
    text-align: center;
    border-style: solid;
    border-width: 3px;
    background-color: #9467bd;
    padding: 20px;
    margin: 0;
    color: black;
    font-family: ariel;
    border-radius: 80px;
    border-color: #ff7f00;
}

h2 {
    text-align: center;
    border-style: solid;
    border-width: 3px;
    background-color: #de9ed6;
    padding: 12px;
    margin: 0;
    color: black;
    font-family: ariel;
    border-radius: 80px;
    border-color: #800080;
}

h3 {
    text-align: center;
    border-style: solid;
    border-width: 3px;
    background-color: #756bb1;
    padding: 12px;
    margin: 0;
    color: black;
    font-family: ariel;
    border-radius: 80px;
    border-color: #393b79;
}

body, p {
    font-family: ariel;
    font-size: 18px;
    color: black;
}
div {
    font-size: 14px;
    margin: 0;

}

h4 {
    padding: 0px;
    margin: 0;
    font-family: ariel;
    color: purple;
}

</style>
""")

# Inference

Now, let's serve the food which we already [cooked](https://www.kaggle.com/code/adityadawn/training-llm-using-accelerate-and-w-b).

## Installing required Libraries to load a trained LLM

In [2]:
!pip install peft==0.4.0
!pip install torch==2.0.0
!pip install datasets==2.1.0
!pip install bitsandbytes==0.41.1
!pip install transformers==4.30.2
!pip install transformers[torch]
!pip install transformers[sentencepiece]

!pip install huggingface_hub==0.16.4

from IPython.display import clear_output
clear_output()

## Importing Required Libraries and Modules

In [3]:
import torch
from peft import PeftModel
from peft import LoraConfig
from datasets import Dataset
from dataclasses import dataclass
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import BitsAndBytesConfig

import gc
import pandas as pd
from tqdm import tqdm
import huggingface_hub
from string import Template
from kaggle_secrets import UserSecretsClient



## Using Secret Ingredients 🤫

In [4]:
# Fetch Secrets
user_secrets=UserSecretsClient()

# Fetch secret keys
my_secret_hf_api_key=user_secrets.get_secret("hf_model_read")

# Use the secret keys to login
huggingface_hub.login(token=my_secret_hf_api_key, add_to_git_credential=False)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Helper Functions

### Unlocking BNB's Hidden Potential: Mastering Quantization Configuration

**Configuring the BNB Spice Blend**: Imagine crafting the perfect curry. You need the right mix of fiery chilies, aromatic spices, and creamy coconut milk. This function plays the chef, carefully preparing the "Bits & Bytes" configuration, ensuring your PEFT model has the right balance of precision and efficiency – a recipe for AI success!

In [5]:
def config_bnb(Config_BNB):
    """Creates a BitsAndBytesConfig object with settings from a configuration object.

   Args:
   - Config_BNB (object): A configuration object containing BNB-related settings.

   Returns:
   - BitsAndBytesConfig: A configured BitsAndBytesConfig object.

   Raises:
   - Exception: If any errors occur during configuration.
   """
    try:
        return BitsAndBytesConfig(
                    load_in_4bit=Config_BNB.LOAD_IN_4BIT,
                    bnb_4bit_quant_type=Config_BNB.BNB_4BIT_QUANT_TYPE,
                    bnb_4bit_compute_dtype=Config_BNB.BNB_4BIT_COMPUTE_DTYPE,
                    bnb_4bit_use_double_quant=Config_BNB.BNB_4BIT_USE_DOUBLE_QUANT,
                    load_in_8bit_fp32_cpu_offload=Config_BNB.LOAD_IN_8BIT_FP32_CPU_OFFLOAD
                )
    except Exception as e:
        raise(e)

### Charting the Course: Unveiling the Recipe Behind a PEFT Masterpiece

**Unveiling the Model's Recipe Book**: Every great dish starts with a plan. This function serves as your trusty sous-chef, pulling out the precise "model configuration" – the secret ingredients and instructions that tell your PEFT model what to do. With this knowledge, you can adjust flavors and cook up the perfect AI dish.

In [6]:
def get_model_config(peft_model_id: str):
    """Retrieves the model configuration for a given PEFT model ID.

    Args:
    - peft_model_id (str): The ID of the PEFT model to load.

    Returns:
    - LoraConfig: The loaded model configuration.

    Raises:
    - ValueError: If the model configuration cannot be loaded from the pretrained model.
    """
    return LoraConfig.from_pretrained(peft_model_id)

### From Blueprint to Brushstrokes: Assembling the Model from Configuration Blocks

**Summoning the AI Master Chef**: Picture the sizzle and steam as you assemble a magnificent paella. This function takes the model configuration and brings it to life, conjuring a complete PEFT model ready to work its magic. Imagine the possibilities – creating your own AI masterpiece!

In [6]:
def get_model(peft_model_id: str, model_config):
    """Loads a PEFT model with quantization configuration.

    Args:
    - peft_model_id (str): The ID of the PEFT model to load.
    - model_config (LoraConfig): The configuration of the model.

    Returns:
    - PeftModel: The loaded PEFT model.
    """
    model = AutoModelForCausalLM.from_pretrained(
                model_config.base_model_name_or_path,
                return_dict=True,
                quantization_config=config_bnb(Config_BNB),
                trust_remote_code=True,
                load_in_4bit=True,
                device_map={"":0},
            )
    model = PeftModel.from_pretrained(
                model, 
                peft_model_id, 
                device_map={"":0}
            )
    return model

### Unlocking the Code of Communication: Translating Human Words into AI Language

**Chopping the Words for AI Appetizers**: Communication is key in any kitchen, and AI needs its own language. This function acts as the "word chopper," transforming human speech into bite-sized "tokens" that the PEFT model can understand. It's the essential garnish that lets your AI feast on your linguistic offerings.

In [6]:
def get_tokenizer(model_config):
    """
    Loads the tokenizer associated with a model configuration.

    Args:
    - model_config (LoraConfig): The configuration of the model.

    Returns:
    - AutoTokenizer: The loaded tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_config.base_model_name_or_path)
    tokenizer.pad_token = tokenizer.eos_token
    return tokenizer

### From Chaos to Clarity: Crafting a Brain-Teasing Prompt for AI

**Setting the AI Dinner Table**: Before the main course, you arrange the plates and utensils. This function preps the test "prompt," meticulously setting the stage for the PEFT model to show off its skills. It's the elegant presentation that makes your AI's performance truly shine.

In [7]:
def prepare_final_prompt_test(
        row_data: pd.core.series.Series,
        instruction: str='As your response to the prompt, select the most appropriate option among A, B, C, D, and E.'
    ):
    """
    Prepares a final prompt for testing a model's ability to select the correct answer from multiple choices.

    Args:
    - row_data (pd.core.series.Series): A pandas Series containing the prompt, options (A, B, C, D, E), and answer.
    - instruction (str, optional): An instruction to present to the model before the prompt. Defaults to 'As your response to the prompt, select the most appropriate option among A, B, C, D, and E.'.

    Returns:
    - str: The formatted prompt ready for model input.

    Raises:
    - Exception: If any errors occur during template substitution.
    """
        try:
            template = Template(
                            '### Instruction: \n \
                             $instruction \n\n   \
                             ### Prompt \n       \
                             $prompt \n\n        \
                             A) $a \n            \
                             B) $b \n            \
                             C) $c \n            \
                             D) $d \n            \
                             E) $e \n\n          \
                             ### Answer'
                        )
            final_prompt = template.substitute(
                                instruction = instruction,
                                prompt = row_data['prompt'],
                                a = row_data['A'],
                                b = row_data['B'],
                                c = row_data['C'],
                                d = row_data['D'],
                                e = row_data['E']
                            )
            
            del row_data
            gc.collect()
            
            return final_prompt
        except Exception as e:
            raise(e)

### Beyond Multiple Choice: Uncovering the AI's Choice with Confidence

**Extracting the AI's Inner Critic**: After a delicious meal, you ask for feedback. This function acts as the AI's "inner critic," peeking into its neural network and extracting the chosen answer, along with its confidence level. It's the secret sauce that lets you understand what went into the AI's culinary creation.

In [8]:
def get_ans(
        text,
        tokenizer,
        model
    ):
    """
    Predicts the answer to a prompt with multiple choices using a PEFT model.

    Args:
    - text (str): The prompt text, including multiple choices.
    - tokenizer (AutoTokenizer): The tokenizer to process the text.
    - model (PeftModel): The PEFT model to generate predictions.

    Returns:
    - str: The predicted answer (A, B, C, D, or E).

    Raises:
    - ValueError: If the model's prediction is invalid (e.g., outside the expected answer choices).
    """
    inputs = tokenizer(
                text, 
                return_tensors='pt'
            )
    logits = model(
                input_ids=inputs['input_ids'].cuda(), 
                attention_mask=inputs['attention_mask'].cuda()
            ).logits[0, -1]
    
    options_list = [
                        (logits[tokenizer('A').input_ids[-1]], 'A'), 
                        (logits[tokenizer('B').input_ids[-1]], 'B'), 
                        (logits[tokenizer('C').input_ids[-1]], 'C'), 
                        (logits[tokenizer('D').input_ids[-1]], 'D'), 
                        (logits[tokenizer('E').input_ids[-1]], 'E')
                    ] 
    options_list = sorted(options_list, reverse=True)
    ans = options_list[1]
        
    return ans

### From Prompt to Prophecy: Unveiling the Future with a PEFT's Predictions

**Predicting a Banquet of Solutions**: Imagine having a crystal ball that shows every possible dessert course. This function taps into the PEFT model's predictive power, generating an entire banquet of "predictions" for a whole dataset. It's like seeing the future of AI-powered solutions unfold before your eyes.

In [9]:
def get_predictions(
        test_dataset,
        tokenizer,
        model
    ):
    """
    Generates predictions for a test dataset using a PEFT model.

    Args:
    - test_dataset (Iterable[dict]): An iterable of dictionaries, each containing a 'final_prompt' key.
    - tokenizer (AutoTokenizer): The tokenizer to process the text.
    - model (PeftModel): The PEFT model to generate predictions.

    Returns:
    - Tuple[List[int], List[str]]: A tuple containing two lists:
        - ids (List[int]): A list of IDs corresponding to each data point in the test dataset.
        - predictions (List[str]): A list of predicted answers (A, B, C, D, or E) for each prompt.

    Raises:
        ValueError: If any errors occur during prediction generation.
    """
    id = []
    prediction = []

    for idx, text in tqdm(
                        enumerate(test_dataset),
                        total = len(test_dataset),
                        desc = 'Obtainineg Predictions'
                    ):
        prediction.append(
                            get_ans(
                                text['final_prompt'],
                                tokenizer,
                                model
                            )[1]
                         )
        id.append(idx)
    
    return id, prediction

### From Training Ground to Battlefield: Launching a PEFT's Inference Mission

**The Grand AI Buffet Opens**: It's finally time to indulge! This function throws open the doors, unleashing the PEFT model on real-world data. It's the grand reveal, the moment you witness the culmination of all your efforts – your AI creation, ready to serve up a feast of insights and solutions.

In [10]:
def inference(
        peft_model_id: str,
        test_csv_path: str
    ):
    """Performs inference using a PEFT model on a test dataset.

    Args:
    - peft_model_id (str): The ID of the PEFT model to use.
    - test_csv_path (str): The path to the CSV file containing test data.

    Returns:
    - Tuple[List[int], List[str]]: A tuple containing:
        - ids (List[int]): A list of IDs corresponding to each data point in the test dataset.
        - predictions (List[str]): A list of predicted answers for each prompt.
    """

    # Obtaining trained model and tokenizer
    model_config = get_model_config(peft_model_id)
    model = get_model(peft_model_id, model_config)
    tokenizer = get_tokenizer(model_config)
    
    # Prepraring Test dataset
    test_df = pd.read_csv(TEST_CSV_PATH)
    test_df['final_prompt'] = test_df.apply(prepare_final_prompt_test, axis=1)
    test_ds = Dataset.from_pandas(pd.DataFrame(test_df['final_prompt']))
    
    ids, predictions = get_predictions(
                            test_ds,
                            tokenizer,
                            model
                        )
    
    return ids, predictions

## Finally, let's prepare the table

In [11]:
PEFT_MODEL_ID = '/kaggle/input/kaggle-scienc-llm-model/kaggle/working/Kaggle-Science-LLM'
TEST_CSV_PATH = '/kaggle/input/kaggle-llm-science-exam/test.csv'

@dataclass(frozen=True)
class Config_BNB:
    LOAD_BIT: int=4
    LOAD_IN_4BIT: bool=True
    BNB_4BIT_QUANT_TYPE: str="nf4"
    BNB_4BIT_USE_DOUBLE_QUANT: bool=True
    BNB_4BIT_COMPUTE_DTYPE=torch.bfloat16
    LOAD_IN_8BIT_FP32_CPU_OFFLOAD: bool=True

## Let's serve the food

In [12]:
id, prediction = inference(PEFT_MODEL_ID, TEST_CSV_PATH)

prediction_df = pd.DataFrame(
                    data={
                        'id':id,
                        'prediction':prediction
                    }
                )

Downloading config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Obtainineg Predictions: 100%|██████████| 200/200 [03:41<00:00,  1.11s/it]


## And ... Bon Appétit!

In [13]:
prediction_df

Unnamed: 0,id,prediction
0,0,D
1,1,C
2,2,D
3,3,D
4,4,D
...,...,...
195,195,B
196,196,D
197,197,D
198,198,B


Special Mentions:
1. [Falcon-7b LLM (QLORA)W&B](https://www.kaggle.com/code/mayank00rastogi/falcon-7b-llm-qlora-w-b)
2. [Falcon-7b🚀(Inferencing🤖)](https://www.kaggle.com/code/mayank00rastogi/falcon-7b-inferencing)

#### Endnote: This has been a wonderful journey. I had a great time learning all of these things! The main challenges were to make this work and also explain each step. In fact, at the end, I still think, 'I could have done that differently!', 'Maybe the writings are too illustrative', etc. So please, if you think I have made any mistakes or could have approached a specific step in the entire project differently, do let me know in the comments.

# Please upvote if you like!