## **Qwen 1.5-0.5B**

####**Import Libraries**

In [None]:
%%capture
!pip install datasets==2.21.0 transformers peft torch rouge-score nltk

In [None]:
%%capture
!pip install accelerate -U

In [None]:
%%capture
#Loads transformers, torch and huggingface_hub
!pip install transformers torch huggingface_hub

#AutoModelForCausalLM - Creates models for causal language modeling tasks
#AutoTokenizer - To tokenize text data for the model
from transformers import Qwen2ForCausalLM, Qwen2Tokenizer, AutoModelForCausalLM, AutoTokenizer

#Transformers_stream_generator - text generation method which returns a generator,
# #streaming out each token in real-time during inference, based on
# Huggingface/Transformers Einops (Einstein Operations) - library for tensor manipulations
!pip install transformers_stream_generator einops

#BPE tokeniser for use with OpenAI's models (Byte Pair Encoding - is a compression technique). It splits text into tokens.
!pip install tiktoken


In [None]:
import transformers
from datasets import load_dataset, load_metric, Dataset,DatasetDict

### **Define Model**

In [None]:
##Qwen2.0 model with - 0.5 billion parameters; Hosted on - Hugging Face model hub
#sModelName = "Qwen/Qwen2-0.5B"  ## "Qwen1.5-7B-Chat" & "Qwen/Qwen2-75B"

model_name ="Qwen/Qwen1.5-0.5B"

In [None]:
#Initialize Tokenizer & Model

#trust_remote_code - Allows execution of code from the tokenizer files
bTrust_remote_code = True

#Load the tokenizer
tokenizer = Qwen2Tokenizer.from_pretrained(model_name, trust_remote_code=bTrust_remote_code, model_max_length=8192) #model_max_length = 8192   #, use_fast=False    #Qwen
#Load the model
model = Qwen2ForCausalLM.from_pretrained(model_name)

tokenizer.pad_token = tokenizer.eos_token #End of sentence

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]

### **Using AIML Q&A Content - Custom Data source**




In [None]:
 %%capture
!pip install accelerate -U

In [None]:
#Delete existing downloads/folders if any
import os, shutil
folder = "/content/group18_final_project"

if os.path.isfile(folder) and os.access(folder, os.R_OK):
    print("File exists and is readable")

    if folder.exists():
        print("Path exists")  # path exists

    if folder.is_file():
        print("file exists") # file exists

    if folder.is_dir():
        print("directory exists")  # directory exists

    for filename in os.listdir(folder):
            file_path = os.path.join(folder, filename)
            try:
                if os.path.isfile(file_path) or os.path.islink(file_path):
                    os.unlink(file_path)
                elif os.path.isdir(file_path):
                    shutil.rmtree(file_path)
            except Exception as e:
                print('Failed to delete %s. Reason: %s' % (file_path, e))

else:
    print("Either the file is missing or not readable")




Either the file is missing or not readable


In [None]:
#Fetch QnA data from Github
!git clone https://github.com/anukvma/group18_final_project.git

import os
import json
import pandas as pd
Path = "/content/group18_final_project/"

# Define the folder containing the text files
folder_path = Path + 'aiml_question_answers/AIML_QnA_Content/Group18_AIML_QA.csv'
dfQnAData = pd.read_csv(Path +"aiml_question_answers/AIML_QnA_Content/Group18_AIML_QA.csv", names=['id','question','answer','unit'],encoding='unicode_escape',header=0)
dfQnADataPart2 = pd.read_csv(Path +"aiml_question_answers/sampled_qa_data.csv", names=['id','question','answer','unit'],encoding='unicode_escape',header=0)

dfQnAData = pd.concat([dfQnAData, dfQnADataPart2])


Cloning into 'group18_final_project'...
remote: Enumerating objects: 350, done.[K
remote: Counting objects: 100% (180/180), done.[K
remote: Compressing objects: 100% (166/166), done.[K
remote: Total 350 (delta 114), reused 28 (delta 14), pack-reused 170 (from 1)[K
Receiving objects: 100% (350/350), 7.44 MiB | 16.67 MiB/s, done.
Resolving deltas: 100% (199/199), done.


In [None]:
dfQnAData.head()

Unnamed: 0,id,question,answer,unit
0,1.0,What is a linear classifier?,A linear classifier is a model that makes pred...,1.0
1,2.0,How does a linear classifier make predictions?,A linear classifier predicts by calculating th...,1.0
2,3.0,What is the objective function in a linear cla...,The objective function often used is the loss ...,1.0
3,4.0,What is gradient descent?,Gradient descent is an optimization algorithm ...,1.0
4,5.0,How does learning rate affect gradient descent?,The learning rate controls the step size in gr...,1.0


In [None]:
#Data - Cleanup
dfQnAData.dropna(axis=0, inplace=True)
dfQnAData.isna().sum()
dfQnAData = dfQnAData.sample(frac=1).reset_index(drop=True)

dfQnAData.head()

Unnamed: 0,id,question,answer,unit
0,234.0,How does an MLP differ from a single-layer per...,"An MLP has multiple layers, allowing it to lea...",2.0
1,58.0,Which ensemble technique is used by Random for...,Bagging is the technique used by Random Forest...,1.0
2,614.0,Provide an example of a real-world application...,Computer vision is used in medical imaging for...,4.0
3,570.0,How would you handle file uploads in FastAPI?,FastAPI provides a simple way to handle file u...,4.0
4,285.0,"What is the significance of the statement ""Sel...",This statement highlights that SSL creates sup...,4.0


In [None]:
medium_datasets = DatasetDict()
medium_datasets

df = dfQnAData.copy()

train_dataset: Dataset = Dataset.from_pandas(df[:800])
validation_dataset: Dataset = Dataset.from_pandas(df[800:900])
test_dataset: Dataset = Dataset.from_pandas(df[900:])

train_dataset

Dataset({
    features: ['id', 'question', 'answer', 'unit'],
    num_rows: 800
})

In [None]:
#Collate split datasets into DatasetDict
medium_datasets["train"] = train_dataset
medium_datasets["validation"] = validation_dataset
medium_datasets["test"] = test_dataset

print("\n")
medium_datasets





DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'answer', 'unit'],
        num_rows: 800
    })
    validation: Dataset({
        features: ['id', 'question', 'answer', 'unit'],
        num_rows: 100
    })
    test: Dataset({
        features: ['id', 'question', 'answer', 'unit'],
        num_rows: 127
    })
})

In [None]:
##To display summary
!pip install torchinfo

from torchinfo import summary
summary(model)

Collecting torchinfo
  Downloading torchinfo-1.8.0-py3-none-any.whl.metadata (21 kB)
Downloading torchinfo-1.8.0-py3-none-any.whl (23 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.8.0


Layer (type:depth-idx)                                  Param #
Qwen2ForCausalLM                                        --
├─Qwen2Model: 1-1                                       --
│    └─Embedding: 2-1                                   155,582,464
│    └─ModuleList: 2-2                                  --
│    │    └─Qwen2DecoderLayer: 3-1                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-2                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-3                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-4                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-5                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-6                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-7                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-8                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-9                      12,850,176
│    │    └─Qwen2DecoderLayer: 3-10                     12,850,176
│    │    └─Qwen2Deco

###**Tokenizer**

In [None]:
##Format data before mapping into tokenised dataset
#DefaultPrefix = "Please answer the AIML question: "

max_input_length = 128
max_target_length = 128
tokenizer.pad_token= tokenizer.eos_token

def format_data(examples):
    inputs = [q + "\n" + a for q, a in zip(examples['question'], examples['answer'])]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, padding="max_length")
    labels = model_inputs['input_ids'].copy()
    model_inputs['labels'] = labels
    return model_inputs

tokenized_datasets = medium_datasets.map(format_data, batched=True)
tokenized_datasets

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/127 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'answer', 'unit', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 800
    })
    validation: Dataset({
        features: ['id', 'question', 'answer', 'unit', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 100
    })
    test: Dataset({
        features: ['id', 'question', 'answer', 'unit', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 127
    })
})

###**LoRa**

In [None]:
%%capture
!pip install peft
from peft import LoraConfig, get_peft_model

In [None]:
##Config LoRa
from peft import TaskType
lora_config = LoraConfig(
    r=4,  # Rank of the low-rank adaptation matrix
    lora_alpha=16,  # Scaling factor for the low-rank adaptation
    lora_dropout=0.1,  # Dropout for regularization
    #target_modules=
    bias="none",  # No bias adjustment
    task_type="CAUSAL_LM" #TaskType.CAUSAL_LM, #"CAUSAL_LM"  #Task type, QUESTION_ANS or CAUSAL_LM
)
lora_config

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='CAUSAL_LM', inference_mode=False, r=4, target_modules=None, lora_alpha=16, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False))

In [None]:
# Apply LoRA to the model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() #Qwen's

trainable params: 393,216 || all params: 464,380,928 || trainable%: 0.0847


In [None]:
######## NOT TO EXECUTE ##########
###LoRA for Qwen 7B Chat
# from peft import TaskType
# config = LoraConfig(
#     task_type=TaskType.CAUSAL_LM,
#     target_modules=["c_attn", "c_proj", "w1", "w2"],
#     inference_mode=False, #Training mode
#     r=8, # Lora rank
#     lora_alpha=32, # Lora alaph，Specifically - Lora principle
#     lora_dropout=0.1# Dropout proportion
# )

# config  #Config not yet applied to model

In [None]:
sModelName = model_name
sModelName
model_name

'Qwen/Qwen1.5-0.5B'

###**Training Arguments**

In [None]:
#Remove folder if exists
#!rm -r {model_dir}

In [None]:
#Fine-tune the model
from transformers import TrainingArguments

sModelOutputDir ="./Qwen1B-ForQnA"

training_args = TrainingArguments(
    output_dir=sModelOutputDir,
    push_to_hub=False,
    overwrite_output_dir=True,
    #remove_unused_columns=False,
    ##Evaluation
    #evaluation_strategy="steps",
    eval_strategy = "steps",
    eval_steps=100, #100
    ##Logging
    logging_strategy="steps",
    logging_steps=100, #100 or 50
    num_train_epochs=10,   #4,    ##Epochs
    #Have used low batch sizes
    per_device_train_batch_size=6,     #1 or #2
    per_device_eval_batch_size=2,    #2
    gradient_accumulation_steps=2,     #4  #Low based on GPU
    save_steps=500, #500
    save_total_limit=2,
    gradient_checkpointing=True, ##
    #save_on_each_node=True,  ##
    #learning_rate=4e-4       #1e-4, #2e-4   ##R
    fp16=True,  # Mixed precision training for efficiency
    report_to="none",
    dataloader_pin_memory=True
    #use_cache = False   #R
)

if training_args.gradient_checkpointing==True:
          model.enable_input_require_grads()

training_args.eval_batch_size


2

In [None]:
training_args.device

device(type='cuda', index=0)

### **Rouge**

In [None]:
##ROUGE metric  -- (For Qwen)

import numpy as np

rouge = load_metric("rouge")  ##evaluate.load
#rouge = evaluate_load("rouge")   #load_metric("./rouge.py")

def compute_metrics(eval_pred):
    qPredictions, qReferences = eval_pred
    qPredictions = np.argmax(qPredictions, axis=-1)  # Get the index of the highest logit (token ID)

    decoded_preds = []  #list of predictions to score. Each prediction should be a string with tokens separated by spaces.
    decoded_ref = []    #list of reference for each prediction or a list of several references per prediction. Each reference should be a string with tokens separated by spaces.

    for pred, label in zip(qPredictions, qReferences):
        # Decode the token IDs (skip special tokens)
        decoded_preds.append(tokenizer.decode(pred, skip_special_tokens=True))
        decoded_ref.append(tokenizer.decode(label, skip_special_tokens=True))

    #Use_aggregator - If True, returns aggregates. Defaults to True.
    bUseAggregator = True
    #Use_stemmer - If True, uses Porter stemmer to strip word suffixes. Defaults to False.
    bUseStemmer = True

    # Compute ROUGE
    rouge_scores = rouge.compute(predictions=decoded_preds, references=decoded_ref, use_stemmer=bUseStemmer) #, bUseAggregator)
    #rouge_scores = rouge.compute(predictions=decoded_preds, references=decoded_ref, tokenizer=lambda x: x.split())

    rouge1 = rouge_scores['rouge1'].mid.fmeasure   #unigram (1-gram) based scoring
    rouge2 = rouge_scores['rouge2'].mid.fmeasure  #unigram (1-gram) based scoring
    rougeL = rouge_scores['rougeL'].mid.fmeasure  #Longest common subsequence based scoring
    rougeLsum = rouge_scores['rougeLsum'].mid.fmeasure  #splits text using "\n"

    print(rouge_scores)
    #print("rougeLsum : " + rouge1+ ",  rouge2 :" + rouge2+ ",  rouge3 :" + rougeL + ",  rougeLsum :" + str(rougeLsum))

    return { "rouge1": rouge1, "rouge2": rouge2, "rougeL": rougeL, "rougeLsum": rougeLsum  }

###**Trainer**

In [None]:
model.name_or_path

'Qwen/Qwen1.5-0.5B'

In [None]:
from transformers import Trainer

trainer = Trainer(
    model,
    args=training_args,
    #model_max_length= 8192, #Qwen
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    #data_collator=data_collator, #For chat - DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    preprocess_logits_for_metrics=None,
)

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


In [None]:
import torch
torch.cuda.empty_cache()
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

###**Train Model**

In [None]:
trainer.model.name_or_path

'Qwen/Qwen1.5-0.5B'

In [None]:
#Train the model
trainer.train() ##For 5 epocs

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.8535,0.856242,0.588604,0.298625,0.530647,0.551215
200,0.836,0.849685,0.591516,0.303898,0.533612,0.554064
300,0.8122,0.846748,0.594063,0.30646,0.535916,0.556141
400,0.8113,0.845485,0.595267,0.306098,0.536664,0.559129
500,0.8034,0.844539,0.594298,0.308212,0.537103,0.55843
600,0.7882,0.844864,0.595372,0.306843,0.537679,0.560439


{'rouge1': AggregateScore(low=Score(precision=0.5834135704652542, recall=0.5522146961982569, fmeasure=0.5665560775211506), mid=Score(precision=0.6049608818804875, recall=0.5740639010966064, fmeasure=0.5886039634913193), high=Score(precision=0.6261550190245534, recall=0.5962600983277679, fmeasure=0.6100733364906887)), 'rouge2': AggregateScore(low=Score(precision=0.28218591172712376, recall=0.26831808870419754, fmeasure=0.274988070912339), mid=Score(precision=0.3067717692586388, recall=0.2910222358012472, fmeasure=0.298625080556474), high=Score(precision=0.33138578357571535, recall=0.31482045975450706, fmeasure=0.32249653527404337)), 'rougeL': AggregateScore(low=Score(precision=0.5246816623758226, recall=0.49674353428869045, fmeasure=0.509773222623714), mid=Score(precision=0.545699178136861, recall=0.5174580055034675, fmeasure=0.5306471041773566), high=Score(precision=0.5653382809684987, recall=0.5373983513478109, fmeasure=0.5502383044795307)), 'rougeLsum': AggregateScore(low=Score(preci

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.590695614162403, recall=0.5604854547131196, fmeasure=0.5750421828882675), mid=Score(precision=0.6112010296196619, recall=0.5811961904217653, fmeasure=0.5953723132667701), high=Score(precision=0.63143348845058, recall=0.6015888794317531, fmeasure=0.615263341990522)), 'rouge2': AggregateScore(low=Score(precision=0.29154927943699516, recall=0.27597049272028407, fmeasure=0.2833565780426637), mid=Score(precision=0.31473245190284427, recall=0.29945986779916767, fmeasure=0.30684329075928535), high=Score(precision=0.33951903987081417, recall=0.32362303232595957, fmeasure=0.33131180364557744)), 'rougeL': AggregateScore(low=Score(precision=0.5323096898928839, recall=0.5043141012493967, fmeasure=0.5171149314073031), mid=Score(precision=0.5521822605640639, recall=0.5248057748864505, fmeasure=0.5376785804308744), high=Score(precision=0.5716565217215255, recall=0.5441500722487262, fmeasure=0.5574191489177229)), 'rougeLsum': AggregateScore(low=Score(pre

TrainOutput(global_step=670, training_loss=0.8142517886944671, metrics={'train_runtime': 309.2349, 'train_samples_per_second': 25.87, 'train_steps_per_second': 2.167, 'total_flos': 1897257762816000.0, 'train_loss': 0.8142517886944671, 'epoch': 10.0})

In [None]:
#To push the trained model to Hugging face hub
#trainer.push_to_hub()

In [None]:
#Train the model
trainer.train() ##For 5 epocs  ##Sep28 5:25am

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.708,0.863881,0.602253,0.307293,0.534382,0.564577
200,0.7214,0.862781,0.600049,0.311143,0.533812,0.561926
300,0.7074,0.869557,0.598342,0.308048,0.531841,0.560502
400,0.6938,0.866352,0.601667,0.310996,0.533239,0.562395
500,0.6534,0.875211,0.599222,0.307012,0.530736,0.559162
600,0.7196,0.873917,0.596902,0.305793,0.529628,0.560079
700,0.6333,0.876502,0.598694,0.308711,0.531838,0.560741
800,0.7209,0.876204,0.597094,0.306065,0.529287,0.558699
900,0.6757,0.881272,0.597192,0.306104,0.528775,0.55776
1000,0.653,0.879501,0.597496,0.306764,0.529993,0.558699


{'rouge1': AggregateScore(low=Score(precision=0.5970198033092926, recall=0.5685820172182978, fmeasure=0.5815878412382303), mid=Score(precision=0.6175900422059868, recall=0.5882922302553123, fmeasure=0.6022528667774415), high=Score(precision=0.6387033458208056, recall=0.6079331323737658, fmeasure=0.6229911262966029)), 'rouge2': AggregateScore(low=Score(precision=0.28725396212443655, recall=0.2733741327242373, fmeasure=0.28002810038371856), mid=Score(precision=0.3155375008882939, recall=0.3001278528436341, fmeasure=0.3072933720359509), high=Score(precision=0.340412245099557, recall=0.3228129918217979, fmeasure=0.33085446881517294)), 'rougeL': AggregateScore(low=Score(precision=0.5230512431449031, recall=0.4986715084423543, fmeasure=0.510414025327513), mid=Score(precision=0.5481035778928824, recall=0.5219161027867126, fmeasure=0.5343816572142503), high=Score(precision=0.5701191689595989, recall=0.5431079306611967, fmeasure=0.5559684511269876)), 'rougeLsum': AggregateScore(low=Score(precis

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5910814446868053, recall=0.5630851756875305, fmeasure=0.5769684971911874), mid=Score(precision=0.6116869783412777, recall=0.5833577018883535, fmeasure=0.5969021570502295), high=Score(precision=0.6337912526891217, recall=0.6053396955980294, fmeasure=0.6188170828908947)), 'rouge2': AggregateScore(low=Score(precision=0.2865274658640128, recall=0.2722907769914736, fmeasure=0.27875456100158547), mid=Score(precision=0.3136100189322021, recall=0.2984962927264391, fmeasure=0.30579316808766127), high=Score(precision=0.33759399591733563, recall=0.3209087720055459, fmeasure=0.3288899930284616)), 'rougeL': AggregateScore(low=Score(precision=0.5184316033671619, recall=0.4947626980962313, fmeasure=0.5063097074740687), mid=Score(precision=0.5427384120374403, recall=0.5174851998783354, fmeasure=0.5296278832534296), high=Score(precision=0.5652351134289412, recall=0.5390053418346705, fmeasure=0.5514875533432769)), 'rougeLsum': AggregateScore(low=Score(prec

TrainOutput(global_step=1000, training_loss=0.689949914932251, metrics={'train_runtime': 456.4551, 'train_samples_per_second': 8.763, 'train_steps_per_second': 2.191, 'total_flos': 948628881408000.0, 'train_loss': 0.689949914932251, 'epoch': 5.0})

###**Eval model**

In [None]:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device.type)

cuda


In [None]:
##Using trained model
print(model.name_or_path)

Qwen/Qwen1.5-0.5B


In [None]:
trainer.save_model() #Save to HF

In [None]:
def ask_question(question):
    inputs = tokenizer.encode('Q: ' + question + ' A:', return_tensors='pt').to(device)
    attention_mask = torch.ones(inputs.shape, device=device)
    outputs = model.generate(inputs, attention_mask = attention_mask, max_new_tokens=500, num_return_sequences=1)
    gen_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    question, answer = gen_text.split(' A:')

    return question, answer

In [None]:
sQuestion, sAnswer = ask_question("What is K-means clustering?") #Using trained model
sQuestion, "Answer: " + sAnswer

('Q: What is K-means clustering?',
 'Answer:  K-means clustering is a technique used to group similar data points together. It involves assigning each data point to the nearest centroid, which is the point closest to the centroid. The number of clusters is determined by the number of centroids.')

In [None]:
#Eval model  #Sep 27  2:50pm
trainer.evaluate()

#Training Results: 600	0.788200	0.844864	0.595372	0.306843	0.537679	0.560439

{'rouge1': AggregateScore(low=Score(precision=0.5905352999033762, recall=0.5606865208324169, fmeasure=0.5755123262358359), mid=Score(precision=0.6110572473896332, recall=0.5816635407888854, fmeasure=0.5957114227113449), high=Score(precision=0.6317314106888839, recall=0.6025011022918098, fmeasure=0.6157108466401028)), 'rouge2': AggregateScore(low=Score(precision=0.2921795676293773, recall=0.2773055804875994, fmeasure=0.2845873012109488), mid=Score(precision=0.3153171294811688, recall=0.3003389838055476, fmeasure=0.30747195541384775), high=Score(precision=0.34074768582183795, recall=0.32479724864273585, fmeasure=0.33233487650436216)), 'rougeL': AggregateScore(low=Score(precision=0.531858441641031, recall=0.5047183929910679, fmeasure=0.5171989165212297), mid=Score(precision=0.5518685740816995, recall=0.5252039709258727, fmeasure=0.5379576568774984), high=Score(precision=0.5718391598583887, recall=0.5448421841285375, fmeasure=0.5579565794071045)), 'rougeLsum': AggregateScore(low=Score(prec

{'eval_loss': 0.8447777628898621,
 'eval_rouge1': 0.5957114227113449,
 'eval_rouge2': 0.30747195541384775,
 'eval_rougeL': 0.5379576568774984,
 'eval_rougeLsum': 0.5601060847110381,
 'eval_runtime': 9.8292,
 'eval_samples_per_second': 10.174,
 'eval_steps_per_second': 5.087,
 'epoch': 10.0}

In [None]:
#Eval model  #Sep 27  7:30am
#trainer.evaluate()

#Train/Val results: 	0.792900	0.798846	0.596820	0.297634	0.526688	0.555140
#Train/Test results:  0.728400	0.878322	0.587335	0.288649	0.517021	0.543373

In [None]:
#Eval model    #Sep 27  6:00am
#trainer.evaluate()

#Training Results: 	0.653000	0.879501	0.597496	0.306764	0.529993	0.558699

{'rouge1': AggregateScore(low=Score(precision=0.5911899346783774, recall=0.5648694428906319, fmeasure=0.5781541621318277), mid=Score(precision=0.6116783921685018, recall=0.5844525249423034, fmeasure=0.5974961173462122), high=Score(precision=0.6326793249399969, recall=0.6046610194853625, fmeasure=0.6176050452337842)), 'rouge2': AggregateScore(low=Score(precision=0.28654655000169305, recall=0.27372850318159814, fmeasure=0.2797203108907585), mid=Score(precision=0.3147031170343422, recall=0.29974282448921946, fmeasure=0.3067639961613069), high=Score(precision=0.3382191525284444, recall=0.3223064278923301, fmeasure=0.3298927239734007)), 'rougeL': AggregateScore(low=Score(precision=0.5182401388613569, recall=0.4961392478597974, fmeasure=0.5068280951906065), mid=Score(precision=0.5424544355254348, recall=0.5181979413718671, fmeasure=0.5299932941763617), high=Score(precision=0.5632775456236766, recall=0.5382553085740398, fmeasure=0.5501352950612918)), 'rougeLsum': AggregateScore(low=Score(prec

{'eval_loss': 0.8795011639595032,
 'eval_rouge1': 0.5974961173462122,
 'eval_rouge2': 0.3067639961613069,
 'eval_rougeL': 0.5299932941763617,
 'eval_rougeLsum': 0.558699353073848,
 'eval_runtime': 9.6966,
 'eval_samples_per_second': 10.313,
 'eval_steps_per_second': 5.156,
 'epoch': 5.0}

In [None]:
#Eval model
sQuestion, sAnswer = ask_question("What is K-means clustering?") #Using trained model
sQuestion, "Answer: " + sAnswer

('Q: What is K-means clustering?',
 'Answer:  K-means clustering is a machine learning technique used for clustering data points into groups based on their similarity. The algorithm starts with a set of initial centroids, and iteratively assigns each data point to the nearest centroid, until the centroids no longer change. The algorithm is based on the idea of minimizing the distance between each data point and the centroids, and then assigning each data point to the closest centroid.')

In [None]:
##trainer.train() ##For 5 epocs

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.7934,0.873704,0.584048,0.285601,0.516486,0.541018
200,0.7798,0.874439,0.587033,0.287834,0.51848,0.544442
300,0.7692,0.872907,0.586866,0.287882,0.519988,0.544692
400,0.7597,0.873148,0.587425,0.287368,0.519473,0.545094
500,0.7514,0.87437,0.58956,0.290069,0.518689,0.545763
600,0.7445,0.876049,0.58781,0.289243,0.517884,0.544235
700,0.7387,0.875455,0.588918,0.28995,0.518125,0.545156
800,0.7342,0.876567,0.588663,0.289695,0.518613,0.544418
900,0.7309,0.877826,0.588097,0.289553,0.517969,0.544433
1000,0.7284,0.878322,0.587335,0.288649,0.517021,0.543373


{'rouge1': AggregateScore(low=Score(precision=0.5827476097963842, recall=0.5489876652074918, fmeasure=0.564662996729826), mid=Score(precision=0.6017306889865738, recall=0.5680907146681832, fmeasure=0.5840475268281512), high=Score(precision=0.6191558039710585, recall=0.5851371838002469, fmeasure=0.601019984068282)), 'rouge2': AggregateScore(low=Score(precision=0.2730174058261809, recall=0.25685957510747176, fmeasure=0.2645974869327944), mid=Score(precision=0.2943239416443571, recall=0.27760551731428085, fmeasure=0.2856012169085689), high=Score(precision=0.3166118900873646, recall=0.2998672548163816, fmeasure=0.3075029264885035)), 'rougeL': AggregateScore(low=Score(precision=0.5114275044499575, recall=0.4816475651984322, fmeasure=0.4958078082010148), mid=Score(precision=0.5322460277892028, recall=0.5022930796719931, fmeasure=0.5164855095553372), high=Score(precision=0.5523881011791505, recall=0.5216785049489137, fmeasure=0.5361952868123911)), 'rougeLsum': AggregateScore(low=Score(precisi

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5857729320335537, recall=0.5546127587924959, fmeasure=0.5693500911593733), mid=Score(precision=0.6042462990413178, recall=0.5726057016702832, fmeasure=0.5878095290981722), high=Score(precision=0.6214051982955758, recall=0.5887784741661889, fmeasure=0.6042573478604493)), 'rouge2': AggregateScore(low=Score(precision=0.27672904833173556, recall=0.2616492180350674, fmeasure=0.268945682186648), mid=Score(precision=0.2975484393694696, recall=0.2816097845186477, fmeasure=0.2892434454144593), high=Score(precision=0.31989630584500145, recall=0.30349517400626685, fmeasure=0.31116593553476246)), 'rougeL': AggregateScore(low=Score(precision=0.5114627349561129, recall=0.483391166391784, fmeasure=0.49689571902369567), mid=Score(precision=0.5325826456359202, recall=0.5043204636117391, fmeasure=0.5178844663069191), high=Score(precision=0.5528261887304399, recall=0.5240728849516922, fmeasure=0.5377884716791566)), 'rougeLsum': AggregateScore(low=Score(prec

TrainOutput(global_step=1000, training_loss=0.7530256576538086, metrics={'train_runtime': 479.5686, 'train_samples_per_second': 16.682, 'train_steps_per_second': 2.085, 'total_flos': 1897257762816000.0, 'train_loss': 0.7530256576538086, 'epoch': 10.0})

In [None]:
trainer.evaluate()

{'rouge1': AggregateScore(low=Score(precision=0.5866939448415992, recall=0.5550509930425499, fmeasure=0.5702477964508825), mid=Score(precision=0.6039732527106998, recall=0.5719368154374441, fmeasure=0.5873354390722347), high=Score(precision=0.6213511700119582, recall=0.588471508150662, fmeasure=0.6043743103642178)), 'rouge2': AggregateScore(low=Score(precision=0.27516005332962507, recall=0.2603489487842725, fmeasure=0.2674547244947684), mid=Score(precision=0.2972391472572572, recall=0.2809659033630709, fmeasure=0.2886490082214134), high=Score(precision=0.31941393597748796, recall=0.30210014372582067, fmeasure=0.3104077425220658)), 'rougeL': AggregateScore(low=Score(precision=0.5105823167820936, recall=0.4823476773404223, fmeasure=0.4959085296946145), mid=Score(precision=0.531810399107514, recall=0.5033569395462433, fmeasure=0.5170214077516637), high=Score(precision=0.5506991743112366, recall=0.5218292751094411, fmeasure=0.535707017224386)), 'rougeLsum': AggregateScore(low=Score(precisi

{'eval_loss': 0.878322422504425,
 'eval_rouge1': 0.5873354390722347,
 'eval_rouge2': 0.2886490082214134,
 'eval_rougeL': 0.5170214077516637,
 'eval_rougeLsum': 0.5433727219097128,
 'eval_runtime': 10.7628,
 'eval_samples_per_second': 11.8,
 'eval_steps_per_second': 2.973,
 'epoch': 10.0}

In [None]:
sQuestion, sAnswer = ask_question("What is the difference between CNN and RNN?") #Using trained model
sQuestion, "Answer: " + sAnswer

('Q: What is the difference between CNN and RNN?',
 'Answer:  CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) are two types of neural networks used for sequence processing. CNNs are used for image and sequence processing, while RNNs are used for sequential data processing.')

In [None]:
sQuestion, sAnswer = ask_question("What is Backpropagation?") #Using trained model
sQuestion, "Answer: " + sAnswer

('Q: What is Backpropagation?',
 'Answer:  Backpropagation is a technique used to train neural networks by adjusting the weights of the network based on the gradients of the loss function. It is a key component of deep learning.')

###**Save Trained Model in Hugging face hub**

In [None]:
trainer.model.name_or_path

'Qwen/Qwen1.5-0.5B'

In [None]:
#Copy the model and tokenizer before pushing it to HF
ModelBeforeTraining = model
TokenizerBeforeTraining = tokenizer
print(ModelBeforeTraining.name_or_path)
print(TokenizerBeforeTraining.name_or_path)

#Copy the model and tokenizer before pushing it to HF
ModelBeforeHF = model #trainer.model
TokenizerBeforeHF = trainer.tokenizer

print(ModelBeforeHF.name_or_path)
print(TokenizerBeforeHF.name_or_path)

print(type(model))
print(type(ModelBeforeHF))
print(type(TokenizerBeforeHF))

Qwen/Qwen1.5-0.5B
Qwen/Qwen1.5-0.5B
Qwen/Qwen1.5-0.5B
Qwen/Qwen1.5-0.5B
<class 'peft.peft_model.PeftModelForCausalLM'>
<class 'peft.peft_model.PeftModelForCausalLM'>
<class 'transformers.models.qwen2.tokenization_qwen2.Qwen2Tokenizer'>


#####**HF - Login to Hugging Face hub**

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
'''
sHuggingFacePath = "RohiniPS/Qwen1B" #+ model_save_name
HFModelName = "Qwen1B"

##Check
#Specify from_tf=True to convert a checkpoint from TensorFlow to PyTorch:
#HFQwenModel = model.from_pretrained(sHuggingFacePath)

#pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
#HFQwenModel.save_pretrained(sHuggingFacePath)
'''
notebook_login.cache_dir = "/content/cache"

#####**HF - Upload the model to the Hub**

In [None]:
from google.colab import userdata
HF_Token_ALL = userdata.get('HF_RW_TOKEN')

!huggingface-cli login --token $HF_Token_ALL               #export HF_TOKEN=Actualtoken && python3 download-model.py RohiniPS/Qwen1B

sHuggingFacePath = "RohiniPS/Qwen1B-QnA-3-5" #Qwen1.5B_Cust1" #+ model_save_name
#HFModelName = "Qwen1B"

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
#upload the model to the Hub. It creates a repository with the model name
#trainer.push_to_hub()
print(sHuggingFacePath)
ModelBeforeHF.push_to_hub(repo_id=sHuggingFacePath, token=HF_Token_ALL)
TokenizerBeforeHF.push_to_hub(repo_id=sHuggingFacePath, token=HF_Token_ALL) #To push the tokenizer
# model.push_to_hub(repo_id=sHuggingFacePath, token=HF_Token_ALL)
# tokenizer.push_to_hub(repo_id=sHuggingFacePath, token=HF_Token_ALL) #To push the tokenizer

RohiniPS/Qwen1B-QnA-3-5




CommitInfo(commit_url='https://huggingface.co/RohiniPS/Qwen1B-QnA-3-5/commit/d90a902343ac88e6297955da960e0c54e4dc0b82', commit_message='Upload tokenizer', commit_description='', oid='d90a902343ac88e6297955da960e0c54e4dc0b82', pr_url=None, pr_revision=None, pr_num=None)

###**HF - Fetch the pre-trained model from hub**

In [None]:
#Fetch the model as pre-trained from HF
#from transformers import AutoModelCasualLM
modelHF = Qwen2ForCausalLM.from_pretrained(sHuggingFacePath)    #("your_username/my-awesome-model")
tokenizerHF = Qwen2Tokenizer.from_pretrained(sHuggingFacePath)
#trainerHF = Trainer.from_pretrained(sHuggingFacePath)

adapter_config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/3.38M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/80.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

In [None]:
print(modelHF.name_or_path)
print(tokenizerHF.name_or_path)

Qwen/Qwen1.5-0.5B
RohiniPS/Qwen1B-QnA-3-2
Qwen/Qwen1.5-0.5B


In [None]:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

modelHF = modelHF.to(device)

In [None]:
def ask_question_HFModel1(question):
    inputs = tokenizerHF.encode('Q: ' + question + ' A:', return_tensors='pt').to(device)
    attention_mask = torch.ones(inputs.shape, device=device)
    outputs = modelHF.generate(inputs, attention_mask = attention_mask, max_new_tokens=100, num_return_sequences=1, )
    gen_text = tokenizerHF.decode(outputs[0], skip_special_tokens=True)
    '''print(inputs)
    print(attention_mask)
    print(outputs)
    print(gen_text)
    exit'''
    question, answer = gen_text.split(' A:')
    return question, answer


In [None]:
modelHF.name_or_path

'Qwen/Qwen1.5-0.5B'

In [None]:
#Using classes from HF
sQuestion, sAnswer = ask_question_HFModel1("What is K-means clustering?")
sQuestion, "Answer: " + sAnswer

('Q: What is K-means clustering?',
 'Answer:  K-means clustering is a technique used to group similar data points together. It involves assigning each data point to the nearest centroid, which is the point closest to the centroid. The number of clusters is determined by the number of centroids.')

In [None]:
def ask_question(question):
    inputs = tokenizer.encode('Q: ' + question + ' A:', return_tensors='pt').to(device)
    attention_mask = torch.ones(inputs.shape, device=device)
    outputs = model.generate(inputs, attention_mask = attention_mask, max_new_tokens=100, num_return_sequences=1, )
    gen_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    '''print(inputs)
    print(attention_mask)
    print(outputs)
    print(gen_text)
    exit'''
    question, answer = gen_text.split(' A:')
    return question, answer

In [None]:
#
sQuestion, sAnswer = ask_question("What is K-means clustering?")
sQuestion, "Answer: " + sAnswer

('Q: What is K-means clustering?',
 'Answer:  K-means clustering is a technique used to group similar data points together. It involves assigning each data point to the nearest centroid, which is the point closest to the centroid. The number of clusters is determined by the number of centroids.')

##### **HF - Rouge**

In [None]:
modelHF.name_or_path
tokenizerHF.name_or_path

'RohiniPS/Qwen1B-QnA-3-5'

In [None]:
##Format data before mapping into tokenised dataset
max_input_length = 128
max_target_length = 128
tokenizerHF.pad_token= tokenizerHF.eos_token

def format_data(examples):
    inputs = [q + "\n" + a for q, a in zip(examples['question'], examples['answer'])]
    model_inputs = tokenizerHF(inputs, max_length=max_input_length, truncation=True, padding="max_length")
    labels = model_inputs['input_ids'].copy()
    model_inputs['labels'] = labels
    return model_inputs

tokenized_datasets = medium_datasets.map(format_data, batched=True)
tokenized_datasets

Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/127 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'answer', 'unit', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 800
    })
    validation: Dataset({
        features: ['id', 'question', 'answer', 'unit', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 100
    })
    test: Dataset({
        features: ['id', 'question', 'answer', 'unit', 'input_ids', 'attention_mask', 'labels'],
        num_rows: 127
    })
})

In [None]:
##ROUGE metric  -- (For Qwen)

#!pip install datasets==2.21.0 transformers peft torch rouge-score nltk
import numpy as np
rouge = load_metric("rouge")  ##evaluate.load

def compute_metrics_HFModel(eval_pred):
    qPredictions, qReferences = eval_pred
    qPredictions = np.argmax(qPredictions, axis=-1)  # Get the index of the highest logit (token ID)

    decoded_preds = []  #list of predictions to score. Each prediction should be a string with tokens separated by spaces.
    decoded_ref = []    #list of reference for each prediction or a list of several references per prediction. Each reference should be a string with tokens separated by spaces.

    for pred, label in zip(qPredictions, qReferences):
        # Decode the token IDs (skip special tokens)
        decoded_preds.append(tokenizerHF.decode(pred, skip_special_tokens=True))
        decoded_ref.append(tokenizerHF.decode(label, skip_special_tokens=True))

    bUseAggregator = True  #Use_aggregator - If True, returns aggregates. Defaults to True.
    bUseStemmer = True  #Use_stemmer - If True, uses Porter stemmer to strip word suffixes. Defaults to False.

    # Compute ROUGE
    rouge_scores = rouge.compute(predictions=decoded_preds, references=decoded_ref, use_stemmer=bUseStemmer) #, bUseAggregator)   # ,tokenizer=lambda x: x.split())

    rouge1 = rouge_scores['rouge1'].mid.fmeasure   #unigram (1-gram) based scoring
    rouge2 = rouge_scores['rouge2'].mid.fmeasure  #unigram (1-gram) based scoring
    rougeL = rouge_scores['rougeL'].mid.fmeasure  #Longest common subsequence based scoring
    rougeLsum = rouge_scores['rougeLsum'].mid.fmeasure  #splits text using "\n"

    print(rouge_scores)
    #print("rougeLsum : " + rouge1+ ",  rouge2 :" + rouge2+ ",  rouge3 :" + rougeL + ",  rougeLsum :" + str(rougeLsum))

    return { "rouge1": rouge1, "rouge2": rouge2, "rougeL": rougeL, "rougeLsum": rougeLsum }

  rouge = load_metric("rouge")  ##evaluate.load


Downloading builder script:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

The repository for rouge contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/rouge.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


In [None]:
#Eval using Test dataset
from transformers import Trainer

# Update training_args to disable mixed precision
#training_args.bf16 = False

trainerHF = Trainer(
    modelHF,
    args=training_args,
    #model_max_length= 8192, #Qwen
    train_dataset=tokenized_datasets["validation"],
    eval_dataset=tokenized_datasets["test"],
    #test_dataset=tokenized_datasets["test"],
    #data_collator=data_collator,
    tokenizer=tokenizerHF,
    compute_metrics=compute_metrics_HFModel,
    preprocess_logits_for_metrics=None,
)


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


In [None]:
trainerHF.train_dataset.shape

(100, 7)

In [None]:
device="cuda"
trainerHF.train()

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


AssertionError: No inf checks were recorded for this optimizer.

In [None]:
#Evaluate the deployed model from Hugging face hub
trainerHF.evaluate()

AttributeError: 'NotebookTrainingTracker' object has no attribute 'value'

#####**HF - Eval model**

In [None]:
sQuestion, sAnswer = ask_question_HFModel1("What is Backpropagation?") #Using the same trained model, downloaded from Hugging face
sQuestion, "Answer: " + sAnswer

('Q: What is Backpropagation?',
 'Answer:  Backpropagation is a technique used to train neural networks by adjusting the weights of the network based on the gradients of the loss function.')

In [None]:
sQuestion, sAnswer = ask_question_HFModel1("What is  the difference between CNN and RNN?") #Using the same trained model, downloaded from Hugging face
sQuestion, "Answer: " + sAnswer

('Q: What is  the difference between CNN and RNN?',
 'Answer:  CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) are two types of neural networks used for sequence processing. CNNs are used for image and sequence processing, while RNNs are used for sequential data processing.')

-------------------------

### **Training Results  ------- DO NOT EXECUTE --------**

In [None]:
#Train the model
#trainer.train() ##For 4 epocs  ##Sep28 4:55am
'''
    training_args = TrainingArguments
    (
    output_dir="./Qwen-QAResults",
    overwrite_output_dir=True,
    ##Evaluation
    #evaluation_strategy="steps",
    eval_strategy = "steps",
    eval_steps=100, #100
    ##Logging
    logging_strategy="steps",
    logging_steps=50, #100
    num_train_epochs=4,    ##Epochs
    #Have used low batch sizes
    per_device_train_batch_size=1, #2
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,  #4  #Have set it low based on GPU
    save_steps=500, #500
    save_total_limit=2,
    gradient_checkpointing=True, ##Ro
    #save_on_each_node=True,  ##Ro
    #learning_rate=1e-4, #2e-4   ##Ro
    fp16=True,  # Mixed precision training for efficiency
    report_to="none",
    dataloader_pin_memory=True
    #use_cache = False   #Ro
    )
'''

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.7435,0.852117,0.600668,0.301249,0.533078,0.564075
200,0.738,0.851631,0.604002,0.306673,0.534574,0.565199
300,0.7472,0.850926,0.602859,0.308834,0.535021,0.564693
400,0.7788,0.85008,0.598436,0.305995,0.533114,0.560798
500,0.6365,0.852796,0.599327,0.30551,0.533812,0.561211
600,0.702,0.855427,0.601459,0.312695,0.53563,0.563105
700,0.8343,0.854258,0.602925,0.311166,0.535197,0.565799
800,0.7652,0.853197,0.601556,0.309067,0.535563,0.5642
900,0.717,0.856829,0.602202,0.307353,0.534836,0.564915
1000,0.6756,0.858447,0.600181,0.307408,0.533473,0.562005


{'rouge1': AggregateScore(low=Score(precision=0.59628253839226, recall=0.5665516926758946, fmeasure=0.5807675275163035), mid=Score(precision=0.6169687198595599, recall=0.5856324141709275, fmeasure=0.6006680050964381), high=Score(precision=0.6378557534007631, recall=0.60608765897682, fmeasure=0.6209259489540753)), 'rouge2': AggregateScore(low=Score(precision=0.28126818215914695, recall=0.26639156857262336, fmeasure=0.2734761058984941), mid=Score(precision=0.30913147653894824, recall=0.2938167002493211, fmeasure=0.30124883366472444), high=Score(precision=0.33661585303594177, recall=0.31874304752305865, fmeasure=0.3272609738464818)), 'rougeL': AggregateScore(low=Score(precision=0.5235200992154334, recall=0.4969106496849591, fmeasure=0.5095058567860928), mid=Score(precision=0.5475559589317673, recall=0.5197691898904475, fmeasure=0.5330779805794139), high=Score(precision=0.5684923204170277, recall=0.5403248086479103, fmeasure=0.5543622282602853)), 'rougeLsum': AggregateScore(low=Score(preci

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5953486483167912, recall=0.5677992679479638, fmeasure=0.5811877681807165), mid=Score(precision=0.6159801383130421, recall=0.5880318690101067, fmeasure=0.6014594305753311), high=Score(precision=0.636398986946767, recall=0.6083787127397337, fmeasure=0.6214112686450431)), 'rouge2': AggregateScore(low=Score(precision=0.2921499638186495, recall=0.27880914104256, fmeasure=0.2850621983621091), mid=Score(precision=0.32038719130991655, recall=0.3055260774000335, fmeasure=0.31269547275978554), high=Score(precision=0.34498115592304424, recall=0.3292965040993928, fmeasure=0.33688264714101634)), 'rougeL': AggregateScore(low=Score(precision=0.5237550312767029, recall=0.5009993195127297, fmeasure=0.5120759682903226), mid=Score(precision=0.5484413279504424, recall=0.5236056639051905, fmeasure=0.5356303400923808), high=Score(precision=0.5696043746695231, recall=0.543075398640063, fmeasure=0.5556922692901098)), 'rougeLsum': AggregateScore(low=Score(precisi

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5968034020819208, recall=0.5672505431506304, fmeasure=0.5817504540345182), mid=Score(precision=0.6166038223258516, recall=0.5866151439878775, fmeasure=0.6009294746237371), high=Score(precision=0.6365158651143024, recall=0.6065696446397966, fmeasure=0.620649171871199)), 'rouge2': AggregateScore(low=Score(precision=0.2862445875060808, recall=0.2717933275368643, fmeasure=0.27885147245257913), mid=Score(precision=0.31389639516016477, recall=0.2985000576321415, fmeasure=0.3060276074581105), high=Score(precision=0.33763551383146484, recall=0.3205487889431957, fmeasure=0.32875605724812307)), 'rougeL': AggregateScore(low=Score(precision=0.5215508318499326, recall=0.4963291605862968, fmeasure=0.5085579276657807), mid=Score(precision=0.5457099628401785, recall=0.5192352148483926, fmeasure=0.5320132950567731), high=Score(precision=0.5668254354478739, recall=0.539891393012556, fmeasure=0.5522739416245455)), 'rougeLsum': AggregateScore(low=Score(preci

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5966482550075581, recall=0.5671361084277724, fmeasure=0.5817388762031439), mid=Score(precision=0.6170123413059045, recall=0.5868398356298182, fmeasure=0.6011945611704279), high=Score(precision=0.6371367505995852, recall=0.6069287156090426, fmeasure=0.621476274396253)), 'rouge2': AggregateScore(low=Score(precision=0.28822383448168776, recall=0.2735272591945644, fmeasure=0.28057924662834427), mid=Score(precision=0.31597402227024374, recall=0.30005056935850416, fmeasure=0.30760071136950407), high=Score(precision=0.33987174051579155, recall=0.3225715853945821, fmeasure=0.3303483837534119)), 'rougeL': AggregateScore(low=Score(precision=0.5226121793779717, recall=0.49793262599176336, fmeasure=0.5100666278361666), mid=Score(precision=0.5475561255325472, recall=0.520879681105743, fmeasure=0.5337957484031598), high=Score(precision=0.5686216907573057, recall=0.5409334065439236, fmeasure=0.5541093068187142)), 'rougeLsum': AggregateScore(low=Score(pr

'\n    training_args = TrainingArguments\n    (\n    output_dir="./Qwen-QAResults",\n    overwrite_output_dir=True,\n    ##Evaluation\n    #evaluation_strategy="steps",\n    eval_strategy = "steps",\n    eval_steps=100, #100\n    ##Logging\n    logging_strategy="steps",\n    logging_steps=50, #100\n    num_train_epochs=4,    ##Epochs\n    #Have used low batch sizes\n    per_device_train_batch_size=1, #2\n    per_device_eval_batch_size=1,\n    gradient_accumulation_steps=2,  #4  #Have set it low based on GPU\n    save_steps=500, #500\n    save_total_limit=2,\n    gradient_checkpointing=True, ##Ro\n    #save_on_each_node=True,  ##Ro\n    #learning_rate=1e-4, #2e-4   ##Ro\n    fp16=True,  # Mixed precision training for efficiency\n    report_to="none",\n    dataloader_pin_memory=True\n    #use_cache = False   #Ro\n    )\n'

In [None]:
#Train the model
#trainer.train() ##For 10 epocs         #Sep28 4:44am
'''
    training_args = TrainingArguments
    (
    output_dir="./Qwen-QAResults",
    overwrite_output_dir=True,
    eval_strategy = "steps",
    eval_steps=100, #100
    logging_strategy="steps",
    logging_steps=100, #100 or 50
    num_train_epochs=10,   #4,    ##Epochs
    per_device_train_batch_size=8, #1 or  #2
    per_device_eval_batch_size=1, #2
    gradient_accumulation_steps=2,  #4
    save_steps=500, #500
    save_total_limit=2,
    gradient_checkpointing=True, ##
    #save_on_each_node=True,  ##
    #learning_rate=1e-4, #2e-4   ##
    fp16=True,  # Mixed precision training for efficiency
    report_to="none",
    dataloader_pin_memory=True
    #use_cache = False   ##
)
'''

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.8021,0.847836,0.600826,0.308572,0.534912,0.564914
200,0.7841,0.846778,0.60239,0.307675,0.535387,0.563435
300,0.7707,0.847435,0.600836,0.304232,0.532297,0.562307
400,0.7614,0.848155,0.600552,0.305847,0.53404,0.564156
500,0.7558,0.848448,0.59987,0.306638,0.533764,0.563821


{'rouge1': AggregateScore(low=Score(precision=0.5954254409752455, recall=0.5666868099025867, fmeasure=0.5804358222590388), mid=Score(precision=0.6161919376766871, recall=0.5866756680789484, fmeasure=0.6008259662871676), high=Score(precision=0.63710387672489, recall=0.6066090739428327, fmeasure=0.6213184200515793)), 'rouge2': AggregateScore(low=Score(precision=0.2888542374522638, recall=0.2739358473849305, fmeasure=0.2811343107559829), mid=Score(precision=0.3166547028757125, recall=0.3010484332500928, fmeasure=0.3085718479315669), high=Score(precision=0.34316950269980356, recall=0.32650637478720385, fmeasure=0.334067371999637)), 'rougeL': AggregateScore(low=Score(precision=0.5228968993983352, recall=0.49689614827005757, fmeasure=0.5095573503283367), mid=Score(precision=0.5487709320686245, recall=0.5220094530949988, fmeasure=0.5349124013911771), high=Score(precision=0.5702055181350326, recall=0.5426018244035223, fmeasure=0.5557722298966502)), 'rougeLsum': AggregateScore(low=Score(precisi

'\n    training_args = TrainingArguments\n    (\n    output_dir="./Qwen-QAResults",\n    overwrite_output_dir=True,\n    eval_strategy = "steps",\n    eval_steps=100, #100\n    logging_strategy="steps",\n    logging_steps=100, #100 or 50\n    num_train_epochs=10,   #4,    ##Epochs\n    per_device_train_batch_size=8, #1 or  #2\n    per_device_eval_batch_size=1, #2\n    gradient_accumulation_steps=2,  #4\n    save_steps=500, #500\n    save_total_limit=2,\n    gradient_checkpointing=True, ##\n    #save_on_each_node=True,  ##\n    #learning_rate=1e-4, #2e-4   ##\n    fp16=True,  # Mixed precision training for efficiency\n    report_to="none",\n    dataloader_pin_memory=True\n    #use_cache = False   ##\n)\n'

In [None]:
#Train the model
#trainer.train() ##For 10 epocs         #Sep28 4:30am
'''
training_args = TrainingArguments(
    output_dir=sModelOutputDir,
    push_to_hub=False,
    overwrite_output_dir=True,
    eval_strategy = "steps",
    eval_steps=100, #100
    logging_strategy="steps",
    logging_steps=50, #100 or 50
    num_train_epochs=5,   #4,    ##Epochs
    per_device_train_batch_size=2, #1 or  #2
    per_device_eval_batch_size=2, #2
    gradient_accumulation_steps=2,  #4
    save_steps=500, #500
    save_total_limit=2,
    gradient_checkpointing=True, ##Rohini
    fp16=True,  # Mixed precision training for efficiency
    report_to="none",
    dataloader_pin_memory=True
)
'''

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,1.0031,0.93581,0.573312,0.278592,0.50375,0.531765
200,0.8846,0.888291,0.58665,0.290779,0.517716,0.546485
300,0.8585,0.870564,0.593224,0.301681,0.524517,0.554727
400,0.8347,0.862145,0.590932,0.299008,0.522982,0.552383
500,0.7933,0.857731,0.594358,0.303551,0.528186,0.557283
600,0.8564,0.855959,0.591872,0.301938,0.526458,0.554236
700,0.7705,0.854403,0.595525,0.304587,0.530269,0.55797
800,0.8543,0.852587,0.594635,0.30407,0.52941,0.556847
900,0.8214,0.852328,0.596999,0.306349,0.531885,0.558881
1000,0.7899,0.851951,0.595587,0.304366,0.530734,0.557899


{'rouge1': AggregateScore(low=Score(precision=0.5684759929669569, recall=0.5359259391559668, fmeasure=0.5511933735380962), mid=Score(precision=0.5904178859426261, recall=0.5576760554586873, fmeasure=0.5733120602644868), high=Score(precision=0.6121693727349705, recall=0.5794273875676037, fmeasure=0.5946813220640196)), 'rouge2': AggregateScore(low=Score(precision=0.26138985458547115, recall=0.24605889046195062, fmeasure=0.2531052009553013), mid=Score(precision=0.2869450253657375, recall=0.27107881374835086, fmeasure=0.27859203050499115), high=Score(precision=0.3138322669007037, recall=0.29601948705975323, fmeasure=0.30466459210754027)), 'rougeL': AggregateScore(low=Score(precision=0.49559544956473994, recall=0.46738372306866527, fmeasure=0.4810465633749066), mid=Score(precision=0.5187987470117597, recall=0.4900228626656873, fmeasure=0.5037500349752094), high=Score(precision=0.5402669465626425, recall=0.5113528890561739, fmeasure=0.5249312728708162)), 'rougeLsum': AggregateScore(low=Score

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5882404744840077, recall=0.5561606600257681, fmeasure=0.5714524469057585), mid=Score(precision=0.6083664363572419, recall=0.5765895358974247, fmeasure=0.5918716520341205), high=Score(precision=0.6289523765088535, recall=0.5971373449079104, fmeasure=0.6124404975534192)), 'rouge2': AggregateScore(low=Score(precision=0.2856325393582143, recall=0.27017952365685066, fmeasure=0.2778777863704712), mid=Score(precision=0.31062090083836735, recall=0.2940891017985835, fmeasure=0.301937613861761), high=Score(precision=0.33496776402178957, recall=0.3174775110850449, fmeasure=0.32562074478756975)), 'rougeL': AggregateScore(low=Score(precision=0.5173069917309935, recall=0.4900316918549638, fmeasure=0.5030863215497096), mid=Score(precision=0.5412012484481827, recall=0.5128305879929234, fmeasure=0.5264578128879687), high=Score(precision=0.5630917078098155, recall=0.5331807963135042, fmeasure=0.5474757929140578)), 'rougeLsum': AggregateScore(low=Score(prec

TrainOutput(global_step=1000, training_loss=1.1782330932617187, metrics={'train_runtime': 459.2187, 'train_samples_per_second': 8.71, 'train_steps_per_second': 2.178, 'total_flos': 948628881408000.0, 'train_loss': 1.1782330932617187, 'epoch': 5.0})

In [None]:
#Train the model
#trainer.train() ##For 10 epocs          #Sep27 10:32am

'''
training_args = TrainingArguments(
    output_dir="./Qwen-QAResults",
    overwrite_output_dir=True,
    eval_strategy = "steps",
    eval_steps=100, #100
    logging_strategy="steps",
    logging_steps=50, #100 or 50
    num_train_epochs=5,   #4,    ##Epochs
    per_device_train_batch_size=2, #1 or  #2
    per_device_eval_batch_size=2, #2
    gradient_accumulation_steps=2,  #4
    save_steps=500, #500
    save_total_limit=2,
    gradient_checkpointing=True, ##Rohini
    #save_on_each_node=True,  ##Rohini
    #learning_rate=4e-4   #1e-4, #2e-4   ##Rohini
    fp16=True,  # Mixed precision training for efficiency
    report_to="none",
    dataloader_pin_memory=True
    #use_cache = False   #Rohini
)
'''

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.7529,0.723774,0.59727,0.295241,0.533662,0.56504
200,0.6734,0.718842,0.598198,0.299721,0.537843,0.567262
300,0.6775,0.725224,0.596207,0.295652,0.532546,0.563488
400,0.7571,0.726009,0.596975,0.296829,0.535627,0.565468
500,0.697,0.730761,0.595657,0.297729,0.535527,0.563571
600,0.6867,0.729054,0.592742,0.293274,0.530961,0.560883
700,0.668,0.735885,0.594456,0.295492,0.533721,0.564162
800,0.6672,0.735056,0.591964,0.293835,0.531394,0.560581
900,0.6574,0.737522,0.59122,0.291926,0.530017,0.559929
1000,0.6597,0.736795,0.592799,0.293025,0.531223,0.561664


{'rouge1': AggregateScore(low=Score(precision=0.5968189509219234, recall=0.5603806932899886, fmeasure=0.5779468427410331), mid=Score(precision=0.6162186909156525, recall=0.5800126401013439, fmeasure=0.5972700242649251), high=Score(precision=0.6351621017749448, recall=0.6001940617275044, fmeasure=0.6164721492508954)), 'rouge2': AggregateScore(low=Score(precision=0.28183764949737916, recall=0.26443409270735047, fmeasure=0.2726540055627123), mid=Score(precision=0.3044854427504926, recall=0.28666753354351104, fmeasure=0.2952411682290217), high=Score(precision=0.3282046989065307, recall=0.30939561354730427, fmeasure=0.31850681607397924)), 'rougeL': AggregateScore(low=Score(precision=0.5285830539949796, recall=0.4963667548356805, fmeasure=0.5112808502870361), mid=Score(precision=0.5505347036884598, recall=0.5183271869890247, fmeasure=0.5336622394180157), high=Score(precision=0.5702032358316002, recall=0.5376078555079122, fmeasure=0.5525823000456578)), 'rougeLsum': AggregateScore(low=Score(pr

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5902671214351951, recall=0.5591514631858931, fmeasure=0.573804102458194), mid=Score(precision=0.6097926703729795, recall=0.5775623910540059, fmeasure=0.5927423001538079), high=Score(precision=0.6287128156395286, recall=0.5967268263038397, fmeasure=0.6118819461280646)), 'rouge2': AggregateScore(low=Score(precision=0.2789532958932706, recall=0.26379713165598645, fmeasure=0.271024827732941), mid=Score(precision=0.3016973582734154, recall=0.28557145365904957, fmeasure=0.29327410066900733), high=Score(precision=0.3239655059618862, recall=0.30692463680169924, fmeasure=0.314976663070731)), 'rougeL': AggregateScore(low=Score(precision=0.525471504114849, recall=0.49668731779485265, fmeasure=0.5107534741562807), mid=Score(precision=0.5462229486460528, recall=0.5173731849887913, fmeasure=0.5309611441629742), high=Score(precision=0.5654730856662241, recall=0.536396064970665, fmeasure=0.5503871470317693)), 'rougeLsum': AggregateScore(low=Score(precisi

'\ntraining_args = TrainingArguments(\n    output_dir="./Qwen-QAResults",\n    overwrite_output_dir=True,\n    ##Evaluation\n    #evaluation_strategy="steps",\n    eval_strategy = "steps",\n    eval_steps=100, #100\n    ##Logging\n    logging_strategy="steps",\n    logging_steps=50, #100 or 50\n\n    num_train_epochs=5,   #4,    ##Epochs\n\n    #Have used low batch sizes\n    per_device_train_batch_size=2, #1 or  #2\n    per_device_eval_batch_size=2, #2\n\n    gradient_accumulation_steps=2,  #4  #Have set it low based on GPU\n    save_steps=500, #500\n    save_total_limit=2,\n\n    gradient_checkpointing=True, ##Rohini\n\n    #save_on_each_node=True,  ##Rohini\n    #learning_rate=4e-4   #1e-4, #2e-4   ##Rohini\n\n    fp16=True,  # Mixed precision training for efficiency\n    report_to="none",\n    dataloader_pin_memory=True\n\n    #use_cache = False   #Ro \n)\n'

In [None]:
#Train the model
#trainer.train() ##For 10 epocs          #Sep27 10:20am
'''
training_args = TrainingArguments(
    output_dir="./Qwen-QAResults",
    overwrite_output_dir=True,
    eval_strategy = "steps",
    eval_steps=100, #100
    logging_strategy="steps",
    logging_steps=100, #100 or 50
    num_train_epochs=10,   #4,    ##Epochs
    per_device_train_batch_size=4, #1 or  #2
    per_device_eval_batch_size=2, #2
    gradient_accumulation_steps=2,  #4  #Low based on GPU
    save_steps=500, #500
    save_total_limit=2,
    gradient_checkpointing=True, ##Ro
    fp16=True,  # Mixed precision training for efficiency
    report_to="none",
    dataloader_pin_memory=True
)
'''

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.7612,0.707408,0.60297,0.304937,0.541457,0.571795
200,0.7481,0.70889,0.604628,0.304409,0.542529,0.572989
300,0.7382,0.70949,0.602966,0.300435,0.541619,0.570537
400,0.7288,0.713337,0.599603,0.299086,0.538595,0.568061
500,0.7207,0.715421,0.598787,0.296575,0.539396,0.568261
600,0.7131,0.71668,0.598105,0.296014,0.537388,0.566039
700,0.7073,0.716374,0.597781,0.297048,0.53816,0.567436
800,0.7021,0.720267,0.597938,0.299684,0.537981,0.567534
900,0.6982,0.720944,0.598359,0.298324,0.537765,0.566328
1000,0.6958,0.721137,0.598015,0.298823,0.537335,0.566808


{'rouge1': AggregateScore(low=Score(precision=0.6011098378555999, recall=0.5670676994688658, fmeasure=0.5842633877901191), mid=Score(precision=0.6203127316209556, recall=0.5867954995105708, fmeasure=0.6029696930960031), high=Score(precision=0.6396022974984452, recall=0.6053669565295415, fmeasure=0.6219363487796032)), 'rouge2': AggregateScore(low=Score(precision=0.2894962140292839, recall=0.27367096339547875, fmeasure=0.28137467790263737), mid=Score(precision=0.3135531731546167, recall=0.2969122195480643, fmeasure=0.30493685135103943), high=Score(precision=0.3388415225694116, recall=0.32123594545086304, fmeasure=0.3297368334287543)), 'rougeL': AggregateScore(low=Score(precision=0.5350439008170257, recall=0.5056603728456919, fmeasure=0.5194607414036523), mid=Score(precision=0.5570670729761493, recall=0.5271115990574831, fmeasure=0.5414565937019518), high=Score(precision=0.5761813634265979, recall=0.5465298355803467, fmeasure=0.5607034372936613)), 'rougeLsum': AggregateScore(low=Score(pre

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'rouge1': AggregateScore(low=Score(precision=0.5961899169496034, recall=0.5617469525820781, fmeasure=0.5785534405649795), mid=Score(precision=0.6158119316758512, recall=0.5816629077410662, fmeasure=0.5981049182467659), high=Score(precision=0.6343845061552671, recall=0.6011694930097113, fmeasure=0.6169746791215748)), 'rouge2': AggregateScore(low=Score(precision=0.2809758083922575, recall=0.2646219817647682, fmeasure=0.27245989123799563), mid=Score(precision=0.3047162326295241, recall=0.28792426323716747, fmeasure=0.2960138468961749), high=Score(precision=0.3289276084117305, recall=0.31190928274366264, fmeasure=0.32012353581423264)), 'rougeL': AggregateScore(low=Score(precision=0.5318676550970698, recall=0.5003851175127925, fmeasure=0.5156216268881022), mid=Score(precision=0.5534108974933241, recall=0.5228840570559581, fmeasure=0.5373882017980123), high=Score(precision=0.5729631291884838, recall=0.5420379810229716, fmeasure=0.5569768151402018)), 'rougeLsum': AggregateScore(low=Score(pre

'\ntraining_args = TrainingArguments(\n    output_dir="./Qwen-QAResults",\n    overwrite_output_dir=True,\n    ##Evaluation\n    #evaluation_strategy="steps",\n    eval_strategy = "steps",\n    eval_steps=100, #100\n    ##Logging\n    logging_strategy="steps",\n    logging_steps=100, #100 or 50\n\n    num_train_epochs=10,   #4,    ##Epochs\n\n    #Have used low batch sizes\n    per_device_train_batch_size=4, #1 or  #2\n    per_device_eval_batch_size=2, #2\n\n    gradient_accumulation_steps=2,  #4  #Have set it low based on GPU\n    save_steps=500, #500\n    save_total_limit=2,\n\n    gradient_checkpointing=True, ##Ro\n\n    #save_on_each_node=True,  ##Ro\n    #learning_rate=1e-4, #2e-4   ##Ro\n\n    fp16=True,  # Mixed precision training for efficiency\n    report_to="none",\n    dataloader_pin_memory=True\n\n    #use_cache = False   #Ro \n)\n'

In [None]:
#Train the model -- 20th Sep
#trainer.train() ##For 10 epocs

Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.7891,0.869983,0.583127,0.288687,0.517443,0.543784
200,0.776,0.868619,0.581023,0.287232,0.516865,0.541515
300,0.7652,0.869962,0.580415,0.287059,0.515609,0.541083
400,0.7557,0.87156,0.579108,0.282144,0.513493,0.53947
500,0.7478,0.872057,0.581302,0.283543,0.513005,0.539459
600,0.7402,0.874039,0.579512,0.285632,0.514508,0.53907
700,0.7345,0.87544,0.580619,0.284905,0.512764,0.539212
800,0.7296,0.876742,0.57893,0.283661,0.511733,0.537662
900,0.726,0.876756,0.580236,0.284251,0.512175,0.538475
1000,0.7237,0.877188,0.580556,0.284066,0.512114,0.538413


{'rouge1': AggregateScore(low=Score(precision=0.5747243776462728, recall=0.5480731470345724, fmeasure=0.5605286969083704), mid=Score(precision=0.5976910878793631, recall=0.5696375547040702, fmeasure=0.5831267190591403), high=Score(precision=0.6179041534485458, recall=0.5905078448897733, fmeasure=0.603557673499586)), 'rouge2': AggregateScore(low=Score(precision=0.271227007400957, recall=0.2583053105866779, fmeasure=0.2642994186404237), mid=Score(precision=0.2954290752954044, recall=0.2820929969795532, fmeasure=0.28868689829992566), high=Score(precision=0.32197681344928575, recall=0.3072728334583882, fmeasure=0.3143222750254811)), 'rougeL': AggregateScore(low=Score(precision=0.5060926521033818, recall=0.4810656469273232, fmeasure=0.49307419843010697), mid=Score(precision=0.5300421451567987, recall=0.5058310715597052, fmeasure=0.5174432252843595), high=Score(precision=0.5524593542328222, recall=0.5277167316006552, fmeasure=0.5396212149607584)), 'rougeLsum': AggregateScore(low=Score(precis

TrainOutput(global_step=1000, training_loss=0.7487754135131836, metrics={'train_runtime': 550.6178, 'train_samples_per_second': 14.529, 'train_steps_per_second': 1.816, 'total_flos': 1897257762816000.0, 'train_loss': 0.7487754135131836, 'epoch': 10.0})

In [None]:
#Train the model  ---- 20th Sep
#trainer.train()   ##For 4 epocs

Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
100,0.8198,0.876614,0.57913,0.285195,0.515231,0.540708
200,0.8056,0.873353,0.577779,0.285334,0.514448,0.539212
300,0.796,0.872431,0.581224,0.289494,0.519052,0.543176
400,0.7903,0.871949,0.580506,0.288864,0.517543,0.54228


{'rouge1': AggregateScore(low=Score(precision=0.573539648238829, recall=0.5446628686230633, fmeasure=0.5588216204537829), mid=Score(precision=0.5943665590302387, recall=0.5652121249152942, fmeasure=0.5791297859972537), high=Score(precision=0.613762127926485, recall=0.5851423597066432, fmeasure=0.5990854820084511)), 'rouge2': AggregateScore(low=Score(precision=0.2668842689644897, recall=0.25377411064978483, fmeasure=0.25983773442586167), mid=Score(precision=0.29240090768271787, recall=0.2784689630188216, fmeasure=0.285195184249181), high=Score(precision=0.3186558366103516, recall=0.3036258639297467, fmeasure=0.3108349516907347)), 'rougeL': AggregateScore(low=Score(precision=0.5051538117172836, recall=0.4790360521668064, fmeasure=0.4916462682640109), mid=Score(precision=0.5288708114760188, recall=0.5026978388216629, fmeasure=0.5152305992578026), high=Score(precision=0.5504110270294014, recall=0.5248383088858483, fmeasure=0.5370803740928092)), 'rougeLsum': AggregateScore(low=Score(precisi

TrainOutput(global_step=400, training_loss=0.8029491424560546, metrics={'train_runtime': 222.5179, 'train_samples_per_second': 14.381, 'train_steps_per_second': 1.798, 'total_flos': 758903105126400.0, 'train_loss': 0.8029491424560546, 'epoch': 4.0})

###**############### End of Q&A using Qwen ##################**

###**###############    Results : DO NOT EXECUTE  ###############**

## **Qwen -- Samples for diff scenario**

**Test Generation**

In [None]:
prompt = "What is artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

What is artificial intelligence?
Artificial intelligence (AI) is a branch of computer science that deals with the development of intelligent machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI is based on


**Creative writing**

In [None]:
prompt = "Write a short poem about the changing seasons:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Write a short poem about the changing seasons: Autumn, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter,


**Code generation**

In [None]:
prompt = "Write a Python function to calculate the Fibonacci sequence:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.2)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Write a Python function to calculate the Fibonacci sequence: n. The function should take an integer n as input and return the Fibonacci sequence up to the nth term. The Fibonacci sequence is defined as follows: the first two terms are 0 and 1, and each subsequent term is the sum of the two preceding ones. The function should handle negative values of n and return an error message if n is negative. Additionally, the function should also handle large values of n (up to 10^18) efficiently, without causing a stack overflow or taking too long to execute. The function should also be able to handle large values of n and return the Fibonacci sequence up to the nth term in O(n) time complexity. The function should also be able to handle large values of n and return the Fibonacci sequence up to the nth term in O(n) time complexity. The function should also be able to handle large values of n and return the Fibonacci sequence up to the nth term in O(n) time complexity. The function should also be

 **Question answering**

**Factual Question**

In [None]:
question = "What is the capital of Java?"
inputs = tokenizer(question, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Q: {question}\nA: {answer}")

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Q: What is the capital of Java?
A: What is the capital of Java? The capital of Java is Jakarta.


**Open-ended question**

In [None]:
question = "What are the potential ethical concerns surrounding artificial intelligence?"
inputs = tokenizer(question, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Q: {question}\nA: {answer}")

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Q: What are the potential ethical concerns surrounding artificial intelligence?
A: What are the potential ethical concerns surrounding artificial intelligence? 1. Bias and Discrimination: AI systems can be biased and discriminatory if they are trained on biased data or if they are designed to make decisions based on biased assumptions.

2. Privacy and Security: AI systems can collect and analyze vast amounts of personal data, raising concerns about privacy and security.

3. Job Displacement: AI systems can automate many jobs, leading to job displacement and economic inequality.

4. Autonomous Weapons: AI systems can be used to develop autonomous weapons, raising concerns about the ethics of using lethal force without human intervention.

5. Accountability and Transparency: AI systems can be opaque and difficult to understand, raising concerns about accountability and transparency.

6. Weaponization: AI systems can be used to develop autonomous weapons, raising concerns about the potent

## **Qwen - Q&A Outputs Variations**

In [None]:
text = "What is a linear classifier?"
# input_text = f"Question: {data['question']}\nAnswer:"
inputs = f"Question: {text} \n Answer:"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Perform inference
outputs = model.generate(
    inputs['input_ids'],
    attention_mask=inputs['attention_mask'],
    max_length=256,
    max_new_tokens=200, #max_new_tokens=500,
    num_beams=2,
    early_stopping=True,
    repetition_penalty=.9
)
#print(outputs)
# Decode the generated token to text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Text:", generated_text)


Generated Text: What is a linear classifier? A linear classifier is a type of machine learning algorithm that is used for classification tasks. It works by training a model on a set of labeled data and then using that model to predict the label of new, unseen data. Linear classifiers are often used for tasks such as image classification, text classification, and sentiment analysis.

Can you give me an example of a linear classifier? Sure, here's an example of a linear classifier:

Let's say we have a dataset of images of cats and dogs, labeled as either "cat" or "dog". We can use a linear classifier to predict the label of new, unseen images based on the labels of the images in the dataset.

Here's how we can use a linear classifier to predict the label of a new image:

1. First, we need to split the dataset into a training set and a testing set. We can use the `train_test_split` function from scikit-learn to split the dataset into these two sets.

2. Next,


In [None]:
text = "What is a linear classifier?"
# input_text = f"Question: {data['question']}\nAnswer:"
inputs = f"Question: {text} \n Answer:"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Perform inference
outputs = model.generate(
    inputs['input_ids'],
    attention_mask=inputs['attention_mask'],
    max_length=256,
    max_new_tokens=500, #max_new_tokens=500,
    num_beams=8,
    early_stopping=True,
    repetition_penalty=.9
)
#print(outputs)

In [None]:
# Decode the generated token to text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Text:", generated_text)

Generated Text: What is a linear classifier? A linear classifier is a type of machine learning algorithm that is used for classification tasks. It works by training a model on a set of training data and then using that model to make predictions on new, unseen data. The goal of a linear classifier is to minimize the difference between the predicted values and the actual values in the training data.

There are several types of linear classifiers, including decision trees, random forests, support vector machines, and neural networks. Each type of classifier has its own strengths and weaknesses, and the choice of algorithm depends on the specific problem and the characteristics of the data.

Can you give me an example of a classification task that can be solved using a linear classifier? Sure, here's an example of a classification task that can be solved using a linear classifier:

Let's say you have a dataset of images of cats and dogs, and you want to classify each image as either a cat 

In [None]:
print("Generated Text:", generated_text.count)

Generated Text: <built-in method count of str object at 0x56401356f170>


In [None]:
text = "What is a linear classifier?"
# input_text = f"Question: {data['question']}\nAnswer:"
inputs = f"Question: {text} \n Answer:"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
# Perform inference
outputs = model.generate(
    inputs['input_ids'],
    attention_mask=inputs['attention_mask'],
    max_length=256,
    max_new_tokens=500, #max_new_tokens=500,
    num_beams=2,
    early_stopping=True,
    repetition_penalty=.9
)
#print(outputs)

In [None]:
# Decode the generated token IDs to text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Text:", generated_text)

Generated Text: What is a linear classifier? A linear classifier is a type of machine learning algorithm that is used for classification tasks. It works by training a model on a set of labeled data and then using that model to predict the label of new, unseen data. Linear classifiers are often used for tasks such as image classification, text classification, and sentiment analysis.

Can you give me an example of a linear classifier? Sure, here's an example of a linear classifier:

Let's say we have a dataset of images of cats and dogs, labeled as either "cat" or "dog". We can use a linear classifier to predict the label of new, unseen images based on the labels of the images in the dataset.

Here's how we can use a linear classifier to predict the label of a new image:

1. First, we need to split the dataset into a training set and a testing set. We can use the `train_test_split` function from scikit-learn to split the dataset into these two sets.

2. Next, we need to train a linear cl