In [None]:
#This notebook is by Anastasia Ruzmaikina for Kaggle Competition LLM Science Exam.

Inspired by the OpenBookQA dataset, this competition challenges participants to answer difficult science-based questions written by a Large Language Model.

Your work will help researchers better understand the ability of LLMs to test themselves, and the potential of LLMs that can be run in resource-constrained environments.

Context

As the scope of large language model capabilities expands, a growing area of research is using LLMs to characterize themselves. Because many preexisting NLP benchmarks have been shown to be trivial for state-of-the-art models, there has also been interesting work showing that LLMs can be used to create more challenging tasks to test ever more powerful models.

At the same time methods like quantization and knowledge distillation are being used to effectively shrink language models and run them on more modest hardware. The Kaggle environment provides a unique lens to study this as submissions are subject to both GPU and time limits.

The dataset for this challenge was generated by giving gpt3.5 snippets of text on a range of scientific topics pulled from wikipedia, and asking it to write a multiple choice question (with a known answer), then filtering out easy questions.

Right now we estimate that the largest models run on Kaggle are around 10 billion parameters, whereas gpt3.5 clocks in at 175 billion parameters. If a question-answering model can ace a test written by a question-writing model more than 10 times its size, this would be a genuinely interesting result; on the other hand if a larger model can effectively stump a smaller one, this has compelling implications on the ability of LLMs to benchmark and test themselves.

This notebook started with adapting Phil Culliton's notebook 'Fine-Tuning with Llama 2, Bits and Bytes, and QLoRA', which was written to answer questions from Jeopardy.
However, in LLM Science Exam, the questions are multiple choice, so answering them correctly required prompt engineering. In addition, Deberta-V3 and Llama-2-7b did not work adequately, so Llama-3-8B-Instruct was used.
In this version of notebook, the Internet On is required.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
# So far it is just returning back the answers
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session


/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/model-00003-of-00004.safetensors
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/config.json
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/model-00001-of-00004.safetensors
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/tokenizer.json
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/tokenizer_config.json
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/model-00004-of-00004.safetensors
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/special_tokens_map.json
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/model-00002-of-00004.safetensors
/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct/generation_config.json
/kaggle/input/einops/einops-0.8.0-py3-none-any.whl
/kaggle/input/200000-jeopardy-questions/JEOPARDY_CSV.csv
/kaggle/input/jepardy/jeopardy.csv
/kaggle/input/jepardy/This is Jeopardy_Solution.ipynb
/kaggl

QLoRA: Quantized Low Rank Adapters - this is a method for fine-tuning LLMs that uses a small number of quantized, updateable parameters to limit the complexity of training. This technique also allows those small sets of parameters to be added efficiently into the model itself, which means you can do fine-tuning on lots of data sets, potentially, and swap these "adapters" into your model when necessary.
Bits and Bytes: An excellent package by Tim Dettmers et al., which provides a lightweight wrapper around custom CUDA functions that make LLMs go faster - optimizers, matrix mults, and quantization. In this notebook we'll be using the library to load our model as efficiently as possible.
PEFT: An excellent Huggingface library that enables a number Parameter Efficient Fine-tuning (PEFT) methods, which again make it less expensive to fine-tune LLMs - especially on more lightweight hardware like that present in Kaggle notebooks.


In [2]:
#!pip install -qqq bitsandbytes==0.39.0
!pip install -qqq  /kaggle/input/llama3-1-dependencies/dependencies/torch-2.4.0-cp310-cp310-manylinux1_x86_64.whl      #torch   #==2.0.1
#!pip install -qqq -U git+https://github.com/huggingface/transformers.git@e03a9cc
#!pip install -qqq -U git+https://github.com/huggingface/peft.git@42a184f
#!pip install -qqq -U git+https://github.com/huggingface/accelerate.git@c9fbb71
#import torch
#torch.cuda.empty_cache()

!pip install  datasets    #==2.12.0
!pip install  loralib     #==0.1.1
!pip install einops  #-qqq /kaggle/input/d/parsahriri/einops/einops-0.8.0-py3-none-any.whl         #einops     #==0.6.1
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_nccl_cu12-2.20.5-py3-none-manylinux2014_x86_64.whl
!pip install -qqq /kaggle/input/llama3-1-dependencies/dependencies/nvidia_nvjitlink_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl

Collecting loralib
  Downloading loralib-0.1.2-py3-none-any.whl.metadata (15 kB)
Downloading loralib-0.1.2-py3-none-any.whl (10 kB)
Installing collected packages: loralib
Successfully installed loralib-0.1.2
Collecting einops
  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.8.0


In [3]:
# Install package for inferences
!pip install -qq --no-deps /kaggle/input/daigt-pip/peft-0.6.0-py3-none-any.whl --use-deprecated=legacy-resolver
!pip install -qq --no-deps /kaggle/input/daigt-pip/transformers-4.35.0-py3-none-any.whl --use-deprecated=legacy-resolver
!pip install -qq --no-deps /kaggle/input/daigt-pip/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl --use-deprecated=legacy-resolver
#!pip install -qq --no-deps /kaggle/input/accelerate-and-bitsandbytes/accelerate-0.29.3-py3-none-any.whl --use-deprecated=legacy-resolver
#!pip install -qq --no-deps /kaggle/input/accelerate-and-bitsandbytes/bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl --use-deprecated=legacy-resolver
!pip install -qq --no-deps /kaggle/input/daigt-pip/optimum-1.14.0-py3-none-any.whl --use-deprecated=legacy-resolver
!pip install -qq --no-deps /kaggle/input/llama3-1-dependencies/dependencies/bitsandbytes-0.43.2-py3-none-manylinux_2_24_x86_64.whl
!pip install -qq --no-deps /kaggle/input/llama3-1-dependencies/dependencies/accelerate-0.33.0-py3-none-any.whl

In [4]:
!pip install -qq --no-deps /kaggle/input/daigt-pip/peft-0.6.0-py3-none-any.whl --use-deprecated=legacy-resolver
#!pip install -qq --no-deps /kaggle/input/daigt-pip/transformers-4.35.0-py3-none-any.whl 
#!pip install -qq --no-deps /kaggle/input/daigt-pip/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
!pip install -qq --no-deps /kaggle/input/daigt-pip/optimum-1.14.0-py3-none-any.whl --use-deprecated=legacy-resolver
#!pip install -qq --no-deps /kaggle/input/llm-detect-pip/accelerate-0.24.1-py3-none-any.whl
#!pip install -qq --no-deps /kaggle/input/llm-detect-pip/bitsandbytes-0.41.1-py3-none-any.whl --use-deprecated=legacy-resolver
#!pip install -qq --no-deps /kaggle/input/bitsandbytes-0-42-0 --use-deprecated=legacy-resolver
#!pip install -qq --no-deps /kaggle/input/bitsandbytes-0-42-0/bitsandbytes-0.42.0-py3-none-any.whl --use-deprecated=legacy-resolver

In [5]:
import pandas as pd
import json
import os
from pprint import pprint
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset, Dataset
from huggingface_hub import notebook_login

from peft import LoraConfig, PeftConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"


  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(



We're going to use the Llama-3-8B-Instruct model for our test. We'll be using Bits and Bytes to load it in 4-bit format, which should reduce memory consumption considerably, at a cost of some accuracy.

Note the parameters in BitsAndBytesConfig - this is a fairly standard 4-bit quantization configuration, loading the weights in 4-bit format, using a straightforward format (normal float 4) with double quantization to improve QLoRA's resolution. The weights are converted back to bfloat16 for weight updates, then the extra precision is discarded.

In [6]:
#!pip install accelerate
#!pip install -i https://test.pypi.org/simple/ bitsandbytes    #bitsandbytes


In [7]:
#!pip install accelerate
#!pip install bitsandbytes

In [8]:
model = '/kaggle/input/meta-llama3-8b/Meta-Llama-3-8B-Instruct'#'/kaggle/input/llama3-70b-instruct-fp8-2xh100-trtllm-engine'#'/kaggle/input/llama2-7b-hf/Llama2-7b-hf'#'/kaggle/input/llama2-7b-hf'#'/kaggle/input/llama-7b-chat-jax'#"/kaggle/input/llama-2/pytorch/13b-chat-hf/1"
MODEL_NAME = model

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="cuda:0",
    trust_remote_code=True,
    quantization_config=bnb_config
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Below, we'll use a nice PEFT wrapper to set up our model for training / fine-tuning. Specifically this function sets the output embedding layer to allow gradient updates, as well as performing some type casting on various components to ensure the model is ready to be updated.

In [9]:
model = prepare_model_for_kbit_training(model)


Below, we define some helper functions - their purpose is to properly identify our update layers so we can... update them!

In [10]:
import re
def get_num_layers(model):
    numbers = set()
    for name, _ in model.named_parameters():
        for number in re.findall(r'\d+', name):
            numbers.add(int(number))
    return max(numbers)

def get_last_layer_linears(model):
    names = []
    
    num_layers = get_num_layers(model)
    for name, module in model.named_modules():
        if str(num_layers) in name and not "encoder" in name:
            if isinstance(module, torch.nn.Linear):
                names.append(name)
    return names


LORA config
Some key elements from this configuration:

r is the width of the small update layer. In theory, this should be set wide enough to capture the complexity of the problem you're attempting to fine-tune for. More simple problems may be able to get away with smaller r. In our case, we'll go very small, largely for the sake of speed.
target_modules is set using our helper functions - every layer identified by that function will be included in the PEFT update.

In [11]:
config = LoraConfig(
    r=4,
    lora_alpha=32,
    target_modules=get_last_layer_linears(model),
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)


In [12]:
train_df = pd.read_csv('/kaggle/input/kaggle-llm-science-exam/train.csv')
test_df = pd.read_csv('/kaggle/input/kaggle-llm-science-exam/test.csv')
df = train_df.drop(['id','prompt','A','B','C','D','E','answer'], axis=1)
df["Question"] = "Prompt: " + train_df['prompt'] + '; A: ' + train_df.A + '; B: ' + train_df.B + '; C: ' + train_df.C + '; D: ' + train_df.D + '; E: ' + train_df.E +';' 
df['Answer'] =train_df.answer
data = Dataset.from_pandas(df)
prompt = df["Question"].values[0] + ". Answer as briefly as possible: ".strip()
prompt
df

Unnamed: 0,Question,Answer
0,Prompt: Which of the following statements accu...,D
1,Prompt: Which of the following is an accurate ...,A
2,Prompt: Which of the following statements accu...,A
3,Prompt: What is the significance of regulariza...,C
4,Prompt: Which of the following statements accu...,D
...,...,...
195,Prompt: What is the relation between the three...,C
196,"Prompt: What is the throttling process, and wh...",B
197,Prompt: What happens to excess base metal as a...,B
198,"Prompt: What is the relationship between mass,...",D


In [13]:
#df = pd.read_csv("/kaggle/input/200000-jeopardy-questions/JEOPARDY_CSV.csv", nrows=1000)
#df.columns = [str(q).strip() for q in df.columns]

#data = Dataset.from_pandas(df)
df["Question"].values[0:5]

array(['Prompt: Which of the following statements accurately describes the impact of Modified Newtonian Dynamics (MOND) on the observed "missing baryonic mass" discrepancy in galaxy clusters?; A: MOND is a theory that reduces the observed missing baryonic mass in galaxy clusters by postulating the existence of a new form of matter called "fuzzy dark matter."; B: MOND is a theory that increases the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 20.; C: MOND is a theory that explains the missing baryonic mass in galaxy clusters that was previously considered dark matter by demonstrating that the mass is in the form of neutrinos and axions.; D: MOND is a theory that reduces the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 2.; E: MOND is a theory that eliminates the

In [14]:
prompt = df["Question"].values[0] + ". Answer as briefly as possible: ".strip()
prompt

'Prompt: Which of the following statements accurately describes the impact of Modified Newtonian Dynamics (MOND) on the observed "missing baryonic mass" discrepancy in galaxy clusters?; A: MOND is a theory that reduces the observed missing baryonic mass in galaxy clusters by postulating the existence of a new form of matter called "fuzzy dark matter."; B: MOND is a theory that increases the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 20.; C: MOND is a theory that explains the missing baryonic mass in galaxy clusters that was previously considered dark matter by demonstrating that the mass is in the form of neutrinos and axions.; D: MOND is a theory that reduces the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 2.; E: MOND is a theory that eliminates the observ

Below we're setting up our generative model:

Top P: a method for choosing from among a selection of most probable outputs, as opposed to greedily just taking the highest)
Temperature: a modulation on the softmax function used to determine the values of our outputs
We limit the return sequences to 1 - only one answer is allowed! - and deliberately force the answer to be short.

In [15]:
generation_config = model.generation_config
generation_config.max_new_tokens = 10
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [16]:
device = "cuda"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(
        input_ids = encoding.input_ids,
        attention_mask = encoding.attention_mask,
        generation_config = generation_config
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))




Prompt: Which of the following statements accurately describes the impact of Modified Newtonian Dynamics (MOND) on the observed "missing baryonic mass" discrepancy in galaxy clusters?; A: MOND is a theory that reduces the observed missing baryonic mass in galaxy clusters by postulating the existence of a new form of matter called "fuzzy dark matter."; B: MOND is a theory that increases the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 20.; C: MOND is a theory that explains the missing baryonic mass in galaxy clusters that was previously considered dark matter by demonstrating that the mass is in the form of neutrinos and axions.; D: MOND is a theory that reduces the discrepancy between the observed missing baryonic mass in galaxy clusters and the measured velocity dispersions from a factor of around 10 to a factor of about 2.; E: MOND is a theory that eliminates the observe

In [17]:
def generate_prompt(data_point):
    return f"""
            Classify the correct answer to the Question into one of the categories: A, B, C, D, E.
            Question: Prompt: What is the capital of France; + A: Moscow; + B: Bohn; + C: Paris; + D: London; + E: Zurich;
            Answer: C.
            {data_point["Question"]}. 
            Answer as briefly as possible: {data_point["Answer"]}
            """.strip()


def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data.shuffle().map(generate_and_tokenize_prompt)


Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [18]:
device_name = 'cuda:0' if torch.cuda.is_available() else 'cpu'
device = torch.device(device_name)
device

device(type='cuda', index=0)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Train!
Now, we'll use our data to update our model. Using the Huggingface transformers library, let's set up our training loop and then run it. Note that we are ONLY making one pass on all this data.

In [19]:
torch.cuda.set_device(0)
torch.cuda.current_device()
import torch
#TORCH_USE_CUDA_DSA
#torch.cuda.set_device(1)
import os
os.environ['CUDA_DEVICE_ORDER']='PCI_BUS_ID'
os.environ['CUDA_VISIBLE_DEVICES']='0'




In [20]:
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=2,
    learning_rate=1e-5,
    fp16=True,
    output_dir="finetune_jeopardy",
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.01,
    report_to="none"
)
device_name = 'cuda:0' if torch.cuda.is_available() else 'cpu'
device = torch.device(device_name)
#device_map={'':torch.cuda.current_device()}  #` or `device_map={'':torch.xpu.current_device()}`"
peft_model = model.to(device)
trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=data,
    args=training_args,
    #torch.device == device(type='cuda', index=0),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False
trainer.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss


TrainOutput(global_step=100, training_loss=1.7260977172851562, metrics={'train_runtime': 1118.5605, 'train_samples_per_second': 0.358, 'train_steps_per_second': 0.089, 'total_flos': 4828188880060416.0, 'train_loss': 1.7260977172851562, 'epoch': 2.0})

In [21]:
model.save_pretrained("trained-model")

PEFT_MODEL = "/kaggle/working/trained-model"

config = PeftConfig.from_pretrained(PEFT_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="cuda:0",  #auto
    trust_remote_code=True
)

tokenizer=AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

model = PeftModel.from_pretrained(model, PEFT_MODEL)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
  adapters_weights = torch.load(filename, map_location=torch.device(device))


In [22]:
generation_config = model.generation_config
generation_config.max_new_tokens = 10
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id


In [23]:
import numpy as np
test_df["Question"] = "Prompt: " + test_df['prompt'] + '; A: ' + test_df.A + '; B: ' + test_df.B + '; C: ' + test_df.C + '; D: ' + test_df.D + '; E: ' + test_df.E +';' 


In [24]:
def replace_lowercase_with_spaces(string):
    new_string = ""
    for char in string:
        if char.islower():
            new_string += ""
        else:
            new_string += char
    return new_string
import string

def replace_punctuation_with_spaces(text):
  """Replaces punctuation characters with spaces in a given text."""

  # Create a translation table mapping punctuation characters to spaces
  translator = str.maketrans(string.punctuation, ' ' * len(string.punctuation))

  # Use the translate method to replace the punctuation characters
  return text.translate(translator).replace(" ", "" )


In [25]:
test_df['score'] = test_df['id']
for i in range(len(test_df)):   
    prompt = "Choose a correct multiple choice answer to a question. For example, Prompt: What is the capital of France?  A: Lisbon; B: Bohn; C: Paris; D: Munich; E: Zurich; Answer: C" +test_df.Question[i] + "Answer as briefly as possible:".strip()   
    n = len(prompt)
    device = "cuda"
    encoding = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.inference_mode():
      outputs = model.generate(
          input_ids = encoding.input_ids,
          attention_mask = encoding.attention_mask,
          generation_config = generation_config
      )
    ans= tokenizer.decode(outputs[0], skip_special_tokens=True)
    ans1 = str(ans[n:n+3]).replace(':', '')
    ans1 = replace_lowercase_with_spaces(ans1)
    ans1 = replace_punctuation_with_spaces(ans1)
    test_df.loc[i,'score'] = ans1.strip()

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ans= tokenizer.decode(outputs[0], skip_special_tokens=True)
str(ans[n:n+3]).replace(':', '').strip()


  test_df.loc[i,'score'] = ans1.strip()


Choose a correct multiple choice answer to a question. For example, Prompt: What is the capital of France?  A: Lisbon; B: Bohn; C: Paris; D: Munich; E: Zurich; Answer: CPrompt: What did Arthur Eddington discover about two of Einstein's types of gravitational waves?; A: Arthur Eddington showed that two of Einstein's types of waves were artifacts of the coordinate system he used, and could only be made to propagate at the speed of gravity by choosing appropriate coordinates.; B: Arthur Eddington showed that two of Einstein's types of waves were artifacts of the coordinate system he used, and could only be made to propagate at the speed of sound by choosing appropriate coordinates.; C: Arthur Eddington showed that two of Einstein's types of waves were artifacts of the coordinate system he used, and could be made to propagate at any speed by choosing appropriate coordinates.; D: Arthur Eddington showed that two of Einstein's types of waves were artifacts of the coordinate system he used, a

'A'

In [26]:
test_df

Unnamed: 0,id,prompt,A,B,C,D,E,Question,score
0,0,Which of the following statements accurately d...,MOND is a theory that reduces the observed mis...,MOND is a theory that increases the discrepanc...,MOND is a theory that explains the missing bar...,MOND is a theory that reduces the discrepancy ...,MOND is a theory that eliminates the observed ...,Prompt: Which of the following statements accu...,D
1,1,Which of the following is an accurate definiti...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,Prompt: Which of the following is an accurate ...,A
2,2,Which of the following statements accurately d...,The triskeles symbol was reconstructed as a fe...,The triskeles symbol is a representation of th...,The triskeles symbol is a representation of a ...,The triskeles symbol represents three interloc...,The triskeles symbol is a representation of th...,Prompt: Which of the following statements accu...,A
3,3,What is the significance of regularization in ...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Prompt: What is the significance of regulariza...,A
4,4,Which of the following statements accurately d...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,Prompt: Which of the following statements accu...,D
...,...,...,...,...,...,...,...,...,...
195,195,What is the relation between the three moment ...,The three moment theorem expresses the relatio...,The three moment theorem is used to calculate ...,The three moment theorem describes the relatio...,The three moment theorem is used to calculate ...,The three moment theorem is used to derive the...,Prompt: What is the relation between the three...,C
196,196,"What is the throttling process, and why is it ...",The throttling process is a steady flow of a f...,The throttling process is a steady adiabatic f...,The throttling process is a steady adiabatic f...,The throttling process is a steady flow of a f...,The throttling process is a steady adiabatic f...,"Prompt: What is the throttling process, and wh...",A
197,197,What happens to excess base metal as a solutio...,"The excess base metal will often solidify, bec...",The excess base metal will often crystallize-o...,"The excess base metal will often dissolve, bec...","The excess base metal will often liquefy, beco...","The excess base metal will often evaporate, be...",Prompt: What happens to excess base metal as a...,B
198,198,"What is the relationship between mass, force, ...",Mass is a property that determines the weight ...,Mass is an inertial property that determines a...,Mass is an inertial property that determines a...,Mass is an inertial property that determines a...,Mass is a property that determines the size of...,"Prompt: What is the relationship between mass,...",A


In [27]:
sub = pd.DataFrame()
sub['id'] = test_df['id']
sub['prediction'] = test_df['score']
sub.to_csv("/kaggle/working/submission.csv", index=False)

In [28]:
submission = pd.read_csv("/kaggle/working/submission.csv")
submission

Unnamed: 0,id,prediction
0,0,D
1,1,A
2,2,A
3,3,A
4,4,D
...,...,...
195,195,C
196,196,A
197,197,B
198,198,A
