### Changing to the main directory

In [1]:
%cd ..

/home/isham/Desktop/machine-learning-projects/fine-tuning-q-and-a


### Importing Necessary Libraries

In [2]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

from utils import FINAL_MODEL_PATH, DEVICE, BASE_MODEL_PATH, MODEL_ID, PROCESSED_DATA_DIR, SENTENCE_EMBEDDING_MODEL
from utils import full_prompt, clear_gpu_memory, generate_response, cosine_similarity
from sentence_transformers import SentenceTransformer

import pandas as pd

In [3]:
print(full_prompt("What is cuda insight?"))

According to the following question:

What is cuda insight?
Answer:




### Loading Trained Model

In [4]:
trained_model = AutoModelForSeq2SeqLM.from_pretrained(FINAL_MODEL_PATH).to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained(FINAL_MODEL_PATH)
clear_gpu_memory()

In [5]:
generate_response(model=trained_model, tokenizer=tokenizer, prompt="What is cuda insight?")

'cuda insight is a technology that provides insights into the behavior of cuda threads and their interactions enabling developers to optimize and optimize their cuda applications'

Lets compare with the original base model

In [6]:
original_model = AutoModelForSeq2SeqLM.from_pretrained(BASE_MODEL_PATH).to(DEVICE)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
clear_gpu_memory()

In [7]:
generate_response(model=original_model, tokenizer=tokenizer, prompt="What is cuda insight?")

'(iv)'

Lets pick few examples from test dataset

In [8]:
test_df = pd.read_csv(f"{PROCESSED_DATA_DIR}/test.csv")
test_sample = test_df.sample(n=5, random_state=42)
del test_df

In [9]:
generate_response(model=original_model, tokenizer=tokenizer, prompt="What is cuda insight?")

'(iv)'

In [10]:
test_sample['generated_answer_original'] = test_sample['question'].apply(lambda x: generate_response(model=original_model, tokenizer=tokenizer, prompt=full_prompt(x)))
test_sample['generated_answer_trained'] = test_sample['question'].apply(lambda x: generate_response(model=trained_model, tokenizer=tokenizer, prompt=full_prompt(x)))

In [11]:
test_sample

Unnamed: 0,question,answer,generated_answer_original,generated_answer_trained
732,what principles guide the development and supp...,nvidia ai enterprise is guided by principles o...,(iv),the nvidia ai enterprise aims to support the d...
657,how does unified memory simplify memory manage...,unified memory furnishes a unified memory spac...,(iv),unified memory furnishes a unified virtual add...
168,how is the theoretical peak bandwidth of a gpu...,the theoretical peak bandwidth is calculated b...,(iv),the theoretical peak bandwidth of a gpu is cal...
86,how can you define dependencies for a kit file,dependencies for a kit file are defined in the...,(iv),dependencies for a kit file are defined in the...
411,how does hardwareaccelerated gpu scheduling af...,hardwareaccelerated gpu scheduling a model int...,(iv),hardwareaccelerated gpu scheduling significant...


In [12]:
del trained_model
del original_model
clear_gpu_memory()

Lets compare similarity scores between answer and trained answer.

In [13]:
test_sample.drop(columns=['generated_answer_original'], inplace=True)

In [21]:
test_sample.reset_index(drop=True, inplace=True)

In [15]:
# Load the sentence transformer model
sentence_model = SentenceTransformer(SENTENCE_EMBEDDING_MODEL)

In [16]:
embeddings_original = sentence_model.encode(test_sample['answer'].tolist())
embeddings_generated = sentence_model.encode(test_sample['generated_answer_trained'].tolist())

In [17]:
# Calculate cosine similarities and add them to the dataframe
test_sample['cosine_similarity'] = [cosine_similarity(embeddings_original[i], embeddings_generated[i]) for i in range(len(test_sample))]

In [22]:
test_sample

Unnamed: 0,question,answer,generated_answer_trained,cosine_similarity
0,what principles guide the development and supp...,nvidia ai enterprise is guided by principles o...,the nvidia ai enterprise aims to support the d...,0.707371
1,how does unified memory simplify memory manage...,unified memory furnishes a unified memory spac...,unified memory furnishes a unified virtual add...,0.801617
2,how is the theoretical peak bandwidth of a gpu...,the theoretical peak bandwidth is calculated b...,the theoretical peak bandwidth of a gpu is cal...,0.688995
3,how can you define dependencies for a kit file,dependencies for a kit file are defined in the...,dependencies for a kit file are defined in the...,0.754526
4,how does hardwareaccelerated gpu scheduling af...,hardwareaccelerated gpu scheduling a model int...,hardwareaccelerated gpu scheduling significant...,0.806502


In [26]:
for i in range(5):
    print(f"Question {i+1}: {test_sample['question'][i]}")
    print(f"Actual Answer: {test_sample['answer'][i]}")
    print(f"Predicted Answer: {test_sample['generated_answer_trained'][i]}")
    print(f"Cosine Similarity: {test_sample['cosine_similarity'][i]}")
    print()

Question 1: what principles guide the development and support of nvidia ai enterprise
Actual Answer: nvidia ai enterprise is guided by principles of security stability api stability and enterprisegrade support
Predicted Answer: the nvidia ai enterprise aims to support the development and support of nvidia ai enterprise with a focus on enhancing enterprise support and fostering ai enterprise
Cosine Similarity: 0.7073711156845093

Question 2: how does unified memory simplify memory management in cuda
Actual Answer: unified memory furnishes a unified memory space accessible by gpus and cpus simplifying memory allocation and access across both processing units
Predicted Answer: unified memory furnishes a unified virtual address space allowing virtual memory allocations and allocations in cuda can be managed using unified memory this eliminates the need for explicit memory allocations in cuda cpu code
Cosine Similarity: 0.8016171455383301

Question 3: how is the theoretical peak bandwidth o

### Conclusion

- **Conclusion for Question 1:** The predicted answer captures the essence but misses specific details like "security" and "API stability." The emphasis on fostering and enhancing support aligns with the actual answer.
- **Conclusion for Question 2:** Both answers refer to the simplification of memory management with CUDA's unified memory, though the predicted answer adds the aspect of a virtual address space, which is accurate and provides a bit more technical insight.
- **Conclusion for Question 3:** The predicted answer is circular and doesn't provide a meaningful method for calculating bandwidth. The actual answer gives a clear method for calculation using specific properties.
- **Conclusion for Question 4:** Both answers provide a correct definition of where dependencies are defined in a kit file, with the predicted answer providing an example of a specific dependency.
- **Conclusion for Question 5:** The predicted answer is somewhat redundant and less specific than the actual answer. It states the effect on performance but lacks the how – the details about reducing latency and improving throughput.

Overall, the predicted answers show some alignment with the actual answers but tend to be less specific or miss certain details. The model seems to understand the general context but lacks precision in technical specifics, and in one case, the predicted answer was not useful. The cosine similarity scores reflect the varying degrees of alignment between the predicted and actual answers.

### Suggestions for Improvement

- **Enhance Training Data:** Include more detailed and technical content in the training dataset to improve specificity.
- **Permutation and Combination of Hyper Parameters:** Fine-tune the model's parameters to focus on technical accuracy and detail retention.
- **Use Domain-Specific Models:** Consider training or using a domain-specific model that specializes in technical and programming language.
- **Post-Processing Predictions:** Implement a post-processing step to check for and correct circular or redundant predictions.
- **Evaluation Metrics:** Utilize more robust evaluation metrics that can capture the effectiveness of technical content beyond cosine similarity, like BLEU score for language translation accuracy.
- **User Feedback Loop:** Incorporate user feedback to continuously improve model predictions over time.
- Use base model with more parameter e.g. flan-t5-large