In [None]:
# Dependencies
!pip install -q \
  transformers \
  datasets \
  peft \
  accelerate \
  huggingface_hub \
  fsspec==2024.12.0 \
  gcsfs==2024.12.0


In [None]:
# The First Working Fine-Tuning Code

# 1. Install python-docx
!pip install python-docx

from docx import Document
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, pipeline
from peft import PeftModel
import random



# f"Q: Based on the SDAccel guide, What are valid values for address qualifer?\nA:"
'''
Q: Based on the SDAccel guide, What are valid values for address qualifer?
A:
Answer: Valid values for address qualifier are:

1. No qualifier: This is the default value.
2. Q: This qualifier specifies that the address is a 32-bit word.
3. W: This qualifier specifies that the address is a 16-bit word.
4. D: This qualifier specifies that the address is a 8-bit word.
5. H: This qualifier specifies that the address is a 4-bit word.
6. L: This qualifier specifies that the address is a 2-bit word.
7. S: This qualifier specifies that the address is a 1-bit word.
8. M: This qualifier specifies that the address is a 0-bit word.
''' # top_k = 50

# I hope this helps!


questions = [ f"Q: Based on the SDAccel guide, Kernal name vs kernel vendor vs library?\nA:",  f"Q: Based on the SDAccel guide, What is local memory?\nA:", f"Q: Based on the SDAccel guide, How do I compile an OpenCL kernel for Xilinx?\nA:" ]

# 2) Load tokenizer from your fine-tuned tokenizer directory
tokenizer = AutoTokenizer.from_pretrained(
    "/content/tokenizer",
    use_fast=False
)
tokenizer.pad_token = tokenizer.eos_token

# 3) Load base model & attach LoRA adapter
base_model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(
    model,
    "/content/lora_adapter"
)
model.to("cpu")

# 4) Configure generation
gen_cfg = GenerationConfig(
    do_sample=False,
    num_beams=5,
    early_stopping=True,
    repetition_penalty=1.1,
    no_repeat_ngram_size=3,
    max_new_tokens=100,
)

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg,
    device=-1
)

# 5) Run and print
for idx, prompt in enumerate(questions, 1):
    print(f"--- Prompt #{idx} ---")
    print(prompt)
    output = generator(prompt)[0]["generated_text"]
    # strip the prompt prefix if echoed
    answer = output[len(prompt):].strip()
    print("Answer:", answer, "\n")


In [None]:
######################## The Output I got from FineTuning ########################

----------------------------------FINETUNING----------------------------------------------

Prompt: Q: What is local memory in OpenCL?
A:
/usr/local/lib/python3.11/dist-packages/transformers/generation/configuration_utils.py:679: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
Response: Q: What is local memory in OpenCL?
A: Local memory is a special type of memory that is accessible only within the current thread of execution. It is used to store temporary data that is needed for the current computation. Local memory is typically used for data that is not shared across threads, such as variables that are used only within a single thread.
--------------------------------------------------------------------------------
Prompt: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A:
Response: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A: Here's a simple example of how to compile an OpenCL kernel for Xilinx SDAccel:

1. Create a new OpenCL program in your preferred programming language.
2. Add the necessary OpenCL headers and libraries to your program.
3. Define the OpenCL kernel function in your program.
4. Compile your program using the OpenCL compiler provided by Xilinx SDAccel.
5. Load the compiled OpenCL kernel into the Xilinx SDAccel device.
---------------------------------------- 


--------------------------- Using RAG ---------------------------

    {
        "Query": "How do I compile an OpenCL kernel for Xilinx SDAccel?",
        "Answer": "You compile an OpenCL kernel using the Xilinx OpenCL Compiler (xocc)."
    },
    {
        "Query": "What is local memory in OpenCL?",
        "Answer": """Local memory in OpenCL is a region of memory that is only accessible to the OpenCL device.  The host processor cannot see or control operations in this memory space.  It allows read and write operations by work items within the same compute unit and is typically used to store and share data among multiple work items.  In FPGAs, local memory is implemented using block RAM elements within the FPGA fabric."""
    }

--------------------------- Using TinyLLM ---------------------------

What is local memory in OpenCL?

OpenCL is a programming model that provides data transmission among multiple processors executing on heterogeneous devices, including graphics processors (GPUs) and CPUs. The data used in execution of OpenCL algorithms is stored and transmitted in parallel in memory. A single message is transmitted through the data exchange pipeline between these devices. Local memory is a special kind of memory in OpenCL that is used only for inter-device communication.

In OpenCL, local memory refers to a system-defined or device-defined buffer that is physically located (i.e., residing on the local device) but is not shared with other devices in the same context. It supports simple exchange of data, including intra- and inter-device-communication transactions between OpenCL devices. In OpenCL, the size of a local buffer dimension is 1, but the buffer instance can refer to more than one buffer at the same time, so the maximum buffer dimension is expected to be a power of two.
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------

<!-- epoch - 5 -->
Prompt: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A:
Response: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A: Here's a simple example of how to compile an OpenCL kernel for Xilinx SDAccel:

1. Create a new OpenCL program in your preferred programming language.
2. Add the necessary OpenCL headers and libraries to your program.
3. Define the OpenCL kernel function in your program.
4. Compile your program using the OpenCL compiler provided by Xilinx SDAccel.
5. Load the compiled OpenCL kernel into the Xilinx SDAccel device.
---------------------------------------- 

<!-- epoch - 12 -->

Prompt: Q: What is local memory in OpenCL?
A:
Response: Q: What is local memory in OpenCL?
A: Local memory is a special type of memory that is accessible only within the current thread of execution. It is used to store temporary data that is needed for the current computation. Local memory is typically used for data that is not shared across threads, such as variables that are used only within a single thread.
------------------------------------------------------------
Prompt: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A:
Response: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A: Here's a simple example of how to compile an OpenCL kernel for Xilinx SDAccel:

1. Create a new project in Xilinx Vivado HLS.
2. Add a new module to your project.
3. In the module's top level file, add the following code:

```
#include "xil_types.h"
#include "xil_io.h"
#include "xil_exception.h"
#include
-----------

<!-- epoch-16 -->

Prompt: Q: What is local memory in OpenCL?
A:
Response: Q: What is local memory in OpenCL?
A: Local memory is a special type of memory that is accessible only within the current OpenCL context. It is used to store temporary data that is not shared with other OpenCL contexts.
------------------------------------------------------------
Prompt: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A:
Response: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A: To compile an OpenCL kernel for Xilinx SDAccel, you need to follow these steps:

1. Download the OpenCL kernel source code from the OpenCL website.
2. Extract the source code to a directory of your choice.
3. Open the directory in a text editor.
4. Find the file named "kernel.cl" in the directory.
5. Replace the contents of the file with the following code:

```
#include <CL/sycl
------------------------------------------------------------



<!-- epcoh-18 -->

Prompt: Q: What is local memory in OpenCL?
A:
Response: Q: What is local memory in OpenCL?
A: Local memory is a special type of memory that is accessible only within the current OpenCL context. It is used to store temporary data that is not shared with other OpenCL contexts.
------------------------------------------------------------
Prompt: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A:
Response: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A: To compile an OpenCL kernel for Xilinx SDAccel, you need to follow these steps:

1. Download the OpenCL kernel source code from the OpenCL website.
2. Extract the source code to a directory of your choice.
3. Open the directory in a text editor.
4. Find the file named "kernel.cl" in the directory.
5. Replace the contents of the file with the following code:

```
#include <CL/sycl
------------------------------------------------------------


hallucinations + inaccuracies + repeatition



In [None]:
# Comparing results of output between RAG vs Direct using TinyLlama vs FineTuned TinyLlama

import nltk
import os

# Force a clean download of 'punkt'
nltk.data.path.append('/content/nltk_data')  # Set a custom nltk data path
os.makedirs('/content/nltk_data', exist_ok=True)
nltk.download('punkt', download_dir='/content/nltk_data')

from nltk.tokenize import sent_tokenize
nltk.data.path.append('/content/nltk_data')  # Ensure NLTK looks in the correct place

# 🔧 Install required libraries
!pip install -q nltk rouge-score bert-score sentence-transformers

import nltk
nltk.download("punkt")
nltk.download('punkt_tab')
from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from bert_score import score as bert_score

# 🔹 Define responses EXACTLY as given

finetuned_responses = [
    {
        "Query": "How do I compile an OpenCL kernel for Xilinx SDAccel?",
        "Answer": """Here's a simple example of how to compile an OpenCL kernel for Xilinx SDAccel:

1. Create a new OpenCL program in your preferred programming language.
2. Add the necessary OpenCL headers and libraries to your program.
3. Define the OpenCL kernel function in your program.
4. Compile your program using the OpenCL compiler provided by Xilinx SDAccel.
5. Load the compiled OpenCL kernel into the Xilinx SDAccel device."""
    },
    {
        "Query": "What is local memory in OpenCL?",
        "Answer": """Local memory is a special type of memory that is accessible only within the current thread of execution. It is used to store temporary data that is needed for the current computation. Local memory is typically used for data that is not shared across threads, such as variables that are used only within a single thread."""
    }
]

rag_responses = [
    {
        "Query": "How do I compile an OpenCL kernel for Xilinx SDAccel?",
        "Answer": "You compile an OpenCL kernel using the Xilinx OpenCL Compiler (xocc)."
    },
    {
        "Query": "What is local memory in OpenCL?",
        "Answer": """Local memory in OpenCL is a region of memory that is only accessible to the OpenCL device.  The host processor cannot see or control operations in this memory space.  It allows read and write operations by work items within the same compute unit and is typically used to store and share data among multiple work items.  In FPGAs, local memory is implemented using block RAM elements within the FPGA fabric."""
    }
]

# 🔹 Ground truth answers (from doc or trusted source)
ground_truth = {
    "How do I compile an OpenCL kernel for Xilinx SDAccel?":
        "You compile an OpenCL kernel using the Xilinx OpenCL Compiler (xocc).",
    "What is local memory in OpenCL?":
        "Local memory is defined as the region of system memory only accessible to the OpenCL device. The host processor has no visibility and no control. It allows read/write access among work items in the same compute unit. In FPGAs, it is implemented using block RAM."
}

# 🔹 Scoring functions
def compute_metrics(reference, prediction):
    smoother = SmoothingFunction().method2
    bleu = sentence_bleu([nltk.word_tokenize(reference)], nltk.word_tokenize(prediction), smoothing_function=smoother)
    rouge = rouge_scorer.RougeScorer(["rougeL"], use_stemmer=True).score(reference, prediction)["rougeL"].fmeasure
    P, R, F1 = bert_score([prediction], [reference], lang="en", verbose=False)
    return {
        "BLEU": round(bleu, 4),
        "ROUGE-L": round(rouge, 4),
        "BERTScore-F1": round(F1[0].item(), 4)
    }

# 🔹 Evaluate all
def evaluate(model_outputs):
    results = {}
    for item in model_outputs:
        query = item["Query"]
        prediction = item["Answer"]
        reference = ground_truth[query]
        results[query] = compute_metrics(reference, prediction)
    return results

finetuned_metrics = evaluate(finetuned_responses)
rag_metrics = evaluate(rag_responses)

# 🔹 Display results
from pprint import pprint
print("\n🔎 Fine-Tuned Model Scores:")
pprint(finetuned_metrics)

print("\n🔎 RAG Model Scores:")
pprint(rag_metrics)


# Based on Above result -

'''
Metric       | What It Measures                                                                                     | Ideal Score
BLEU         | Precision of n-grams (focuses on how much of the predicted output matches the reference)             | Closer to 1.0
ROUGE-L      | Recall-based metric; measures the longest common subsequence between generated and reference answers | Closer to 1.0
BERTScore-F1 | Uses contextual embeddings to measure semantic similarity between generated and reference text       | Closer to 1.0


1. Query: “How do I compile an OpenCL kernel for Xilinx SDAccel?”

Metric       | Fine-Tuned Model | RAG Model | Comments
BERTScore-F1 | 0.858            | 1.000     | RAG's output is semantically identical to the reference (perfect match)
BLEU         | 0.0535           | 1.000     | RAG model copied the answer exactly, indicating it retrieved and reused ground truth
ROUGE-L      | 0.200            | 1.000     | Fine-tuned model generated a similar but verbose and less precise answer

2. Query: “What is local memory in OpenCL?”

Metric       | Fine-Tuned Model | RAG Model | Comments
BERTScore-F1 | 0.8684           | 0.9377    | RAG slightly better; both are semantically relevant
BLEU         | 0.0576           | 0.2852    | RAG closer to reference phrasing; fine-tuned model paraphrased more
ROUGE-L      | 0.2353           | 0.5690    | RAG overlaps more with the reference answer

Interpretation
- RAG model performs better because it directly retrieves the ground-truth-like content from the documents, which matches both surface form (BLEU/ROUGE) and meaning (BERTScore).

- Fine-tuned model is more generative — it tends to paraphrase, which can reduce BLEU/ROUGE even if the meaning is valid.

'''


'''
Response: Q: How do I compile an OpenCL kernel for Xilinx SDAccel?
A: Here's a simple example of how to compile an OpenCL kernel for Xilinx SDAccel:

1. Create a new OpenCL program in your preferred programming language.
2. Add the necessary OpenCL headers and libraries to your program.
3. Define the OpenCL kernel function in your program.
4. Compile your program using the OpenCL compiler provided by Xilinx SDAccel.
5. Load the compiled OpenCL kernel into the Xilinx SDAccel device.

----------- MY INTERPRETATION : (Manual one) <- Yet to put -----------

- Generative Programmer/User friendly answer created by itself.
- thought about adding/importing - OpenCL headers and libraries
-

Response: Q: What is local memory in OpenCL?
A: Local memory is a special type of memory that is accessible only within the current thread of execution.
It is used to store temporary data that is needed for the current computation. Local memory is typically used
for data that is not shared across threads, such as variables that are used only within a single thread.

----------- MY INTERPRETATION : (Manual one) <- Yet to put -----------

- too much generic answer.
    (might be due to less epochs, etc...)








'''
