# Experiments with Network Expert LLM


##We will use torchtune library to finetune LLama 3.2 1 billion parameter model.

### We do the following steps:

1. Perform unsupervised pretraining using QlORA on the wireless standards book.
https://pytorch.org/torchtune/stable/tutorials/qlora_finetune.html

2. Due to unsupervised pretraining the model loses the ability to converse with the humans. So we need to do Instruction finetuning after this.
    Instruction finetuning allows trains the llm to follow the instructions provided  and generate answers in the desired format. Otherwise after step 1 we just geta sentence completion model.
3. Once step 1 and 2 are done, we evaluate the LLM on a set of MCQ questions

### Mount your google drive, it will contain the required folders

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Copy the contents of the folder to your current workspace.

In [3]:
!scp -r /content/drive/MyDrive/LLM-NetworkExpert/networkLLM/* /content/

# Install Dependencies

In [4]:
!pip install torch torchvision torchao
!pip install torchtune

Collecting torchao
  Downloading torchao-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading torchao-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torchao
Successfully installed torchao-0.6.1
Collecting torchtune
  Downloading torchtune-0.4.0-py3-none-any.whl.metadata (19 kB)
Collecting datasets (from torchtune)
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting tiktoken (from torchtune)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting blobfile>=2 (from torchtune)
  Downloading blobfile-3.0.0-py3-none-any.whl.metadata (15 kB)
Collecting omegaconf (from torchtune)
  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting pycryptodomex>=3.8 (from blobfile>=2->t

### Download the original meta LLama 3.2 1B parameter model from huggingface

In [1]:
!tune download meta-llama/Llama-3.2-1B-Instruct --output-dir /content/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token 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


Ignoring files matching the following patterns: original/consolidated.00.pth
Fetching 12 files:   0% 0/12 [00:00<?, ?it/s]
config.json: 100% 877/877 [00:00<00:00, 5.00MB/s]

model.safetensors:   0% 0.00/2.47G [00:00<?, ?B/s][A

USE_POLICY.md: 100% 6.02k/6.02k [00:00<00:00, 26.3MB/s]


original/params.json: 100% 220/220 [00:00<00:00, 1.57MB/s]


.gitattributes: 100% 1.52k/1.52k [00:00<00:00, 5.85MB/s]
Fetching 12 files:   8% 1/12 [00:00<00:02,  4.32it/s]

LICENSE.txt: 100% 7.71k/7.71k [00:00<00:00, 15.7MB/s]


README.md: 100% 41.7k/41.7k [00:00<00:00, 6.64MB/s]


generation_config.json: 100% 189/189 [00:00<00:00, 930kB/s]
Fetching 12 files:  50% 6/12 [00:00<00:00, 20.44it/s]

special_tokens_map.json: 100% 296/296 [00:00<00:00, 1.59MB/s]


tokenizer.json:   0% 0.00/9.09M [00:00<?, ?B/s][A[A


tokenizer_config.json: 100% 54.5k/54.5k [00:00<00:00, 8.17MB/s]



tokenizer.model:   0% 0.00/2.18M [00:00<?, ?B/s][A[A[A
tokenizer.model: 100% 2.18M/2.18M [00:00<00:00, 31.4MB/s]

model.safet

## Unsupervised fine-tuning on the Wireless Technology Standards book.

The config "1B_qlora_single_device_custom.yaml" contains the paraneters to pretrain the model

In [None]:
## Don't uncomment, finetuning takes a few hours


# !tune validate configs/1B_qlora_single_device_custom.yaml

# !tune run lora_finetune_single_device --config configs/1B_qlora_single_device_custom.yaml


## The new model will be saved in Llama-3.2-1B-Instruct-tuned folder. By doing unsupervised finetuning, the model has lost the capability to interact with humans and is now a sentence completion model. We will do some intruction finetuning where we give the model a question or a instruction and train it to match the answer that is desired.

### For this task, we used the same wirelss standards booklet and used CHATGPT to generate some question-answer pairs to the instruction finetuning of our new model.

### The question-answer pairs are stored in the file: alpaca_questions.json

We use the llama_instruction_tune config to define the parameters of this supervised finetuning.

The new finetuned model will be saved in Llama-3.2-1B-Instruct-alpaca

In [None]:
## Don't uncomment, finetuning takes a few hours

# !tune validate configs/1B_qlora_single_device_custom.yaml

# !tune run lora_finetune_single_device --config configs/llama_instruction_tune.yaml


## Evaluation

### Let's evaluate the original model on 40 MCQ questions

In [None]:
import json
import subprocess
import re
import shlex

# Load the MCQ questionnaire from the JSON file
def load_questions(filename):
    with open(filename, 'r') as file:
        return json.load(file)

# Function to extract the answer from the LLM's response
def extract_answer(llm_response):
    # Using a regex to find the final answer after 'Answer' with optional colon
    match = re.search(r"(?i)\banswer\s*:?\s*(is\s*)?([A-D])\b", llm_response)
    if match:
        return match.group(2)
    return None

# Function to clean the options by removing None or empty values
def clean_options(options):
    # Filter out None or empty strings
    return [option for option in options if option not in (None, "")]

# Function to call torchtune and get the LLM's response
def generate_llm_response(question, options, config_name):
    # Universal prompt to be used before every question
    prompt = ("Answer the following multiple choice question. First provide a short reasoning. "
              "After this conclude with Answer in a single string whose values can be one of the 4 alphabets A, B, C, or D. "
              "For example, Answer option number in alphabet. Do not forget to give the final answer in the end from either A, B, C or D.")

    # Adding the question and options to the prompt
    prompt += f"\n\nQuestion: {question}\n"
    prompt += "Options:\n"
    for i, option in enumerate(options):
        prompt += f"{option}\n"

    # Safely escape the prompt to prevent YAML parsing issues
    safe_prompt = shlex.quote(prompt)

    # Command to run the LLM generation using torchtune
    command = [
        "tune", "run", "generate",
        "--config", config_name,
        f"prompt[user]={safe_prompt}"  # Pass the prompt as a safe quoted string
    ]

    # Run the subprocess and capture the error messages (stderr)
    try:
        print(f"Running torchtune command: {' '.join(command)}")
        result = subprocess.run(command, capture_output=True, text=True, check=True)

        # If stderr has the generated response, return it
        if result.stderr:
            # print(result.stderr.strip())
            return result.stderr.strip()
        else:
            print(f"Error: No output generated. STDERR is empty.")
            return None
    except subprocess.CalledProcessError as e:
        print(f"Error executing torchtune command: {e}")
        print(f"STDOUT: {e.stdout}")
        print(f"STDERR: {e.stderr}")
        return None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

# Function to calculate the score for all questions
def calculate_score(questions, config_name):
    score = 0
    total_questions = len(questions)

    for question in questions[:]:
        question["options"] = clean_options(question["options"])
        # Get the correct answer from the question
        correct_answer = question['correct_option']

        # Generate the LLM response based on the question and options
        llm_response = generate_llm_response(question['question'], question['options'], config_name)

        if llm_response is None:
            print(f"Skipping question due to error: {question['question']}")
            continue

        # Extract the answer from the LLM's response
        llm_answer = extract_answer(llm_response)

        if llm_answer == correct_answer:
            score += 1

        # Optionally print each question and LLM answer for debugging
        print(f"Question: {question['question']}")
        print(f"Options: {', '.join(question['options'])}")
        print(f"Correct Answer: {correct_answer}")
        print(f"LLM Answer: {llm_answer}")
        print("-" * 50)

    # Calculate the final score as a percentage
    score_percentage = (score / total_questions) * 100
    print(f"Final Score: {score_percentage}% ({score}/{total_questions})")



In [None]:
# Load the MCQ questions from the file
questions = load_questions('mcq_questions.json')

# Calculate the score for the LLM's answers
calculate_score(questions, config_name="configs/custom_generation_config_original_model.yaml")


Running torchtune command: tune run generate --config configs/custom_generation_config_original_model.yaml prompt[user]='Answer the following multiple choice question. First provide a short reasoning. After this conclude with Answer in a single string whose values can be one of the 4 alphabets A, B, C, or D. For example, Answer option number in alphabet. Do not forget to give the final answer in the end from either A, B, C or D.

Question: Which function does a GLK STA coordinate with to create a virtual point-to-point LAN between a general link and an associated GLK STA?
Options:
A.GLK divergence function 
B.MAC convergence function
C.GLK convergence function
D.SAP convergence function
'
Question: Which function does a GLK STA coordinate with to create a virtual point-to-point LAN between a general link and an associated GLK STA?
Options: A.GLK divergence function , B.MAC convergence function, C.GLK convergence function, D.SAP convergence function
Correct Answer: C
LLM Answer: B
-----

### Let's evaluate with our finetuned model

In [None]:
# Load the MCQ questions from the file
questions = load_questions('mcq_questions.json')

# Calculate the score for the LLM's answers
calculate_score(questions, config_name="configs/custom_generation_config_finetuned_model.yaml")


Running torchtune command: tune run generate --config configs/custom_generation_config_finetuned_model.yaml prompt[user]='Answer the following multiple choice question. First provide a short reasoning. After this conclude with Answer in a single string whose values can be one of the 4 alphabets A, B, C, or D. For example, Answer option number in alphabet. Do not forget to give the final answer in the end from either A, B, C or D.

Question: Which function does a GLK STA coordinate with to create a virtual point-to-point LAN between a general link and an associated GLK STA?
Options:
A.GLK divergence function 
B.MAC convergence function
C.GLK convergence function
D.SAP convergence function
'
Question: Which function does a GLK STA coordinate with to create a virtual point-to-point LAN between a general link and an associated GLK STA?
Options: A.GLK divergence function , B.MAC convergence function, C.GLK convergence function, D.SAP convergence function
Correct Answer: C
LLM Answer: None
-

## The finetuned model performance is worse than the generalized model. We discuss the potential reasons in the conclusion section

## We can also ask some general questions

In [None]:
Question = f"Do you know about wireless networks?"

In [4]:
!tune run generate --config configs/custom_generation_config_finetuned_model.yaml prompt["user"]=f"Do you know about wireless networks?"

INFO:torchtune.utils._logging:Running InferenceRecipe with resolved config:

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: Llama-3.2-1B-Instruct-alpaca/
  checkpoint_files:
  - hf_model_0001_0.pt
  model_type: LLAMA3_2
  output_dir: Llama-3.2-1B-Instruct-alpaca/
  recipe_checkpoint: null
device: cpu
dtype: bf16
enable_kv_cache: true
max_new_tokens: 1000
model:
  _component_: torchtune.models.llama3_2.llama3_2_1b
prompt:
  system: null
  user: fDo you know about wireless networks?
quantizer: null
seed: 1234
temperature: 0
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: Llama-3.2-1B-Instruct/original/tokenizer.model
  prompt_template: null
top_k: 300

DEBUG:torchtune.utils._logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
INFO:torchtune.utils._logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils._logging:fDo you know about wireless net

## Conclusion


1. We were successfully able to train a large language model, specifically Llama 3.2 model with 1 billion parameters on a wireless specification document.

2. We didn't anticipate that doing unsupervised pre-training will make the model forget instruction following.

3. To counter this, we did supervised instruction tuning of the model on question and answer pairs from the wireless specification handbook. These question answer pairs were generated with the help of ChatGPT as its a much more powerful and capable model.

4. After supervised instruction tuning the model was able to converse in a natural language which is more human and assistant like.

5. This demonstrates that the pipeline works and can be used to scale to much bigger models given the compute.



Some Limitations of the current work:

1. Since we did not have access to a lot of compute, we could only do both types of finetuning for one epoch, because google colab does not allow runtime for more than 3-4 hours.
2. This restricted the number of experiments that could be done and hence the finetuned model might not necessarily perform better than the original model.
3. It is also possible that the original model had more generalized knowledge which was forgotten by finetuning on our small text. A better method might be to do unsupervised finetuning on much more Wireless Network literature like books and Research papers so that the model is specialized but also knowledgable across a broad range of topics in wireless communication.