# F.E.R.B – Fine-tuned Encoder for Response and Behavior

## Installing Required Libraries for AI Model Deployment

This command installs the necessary Python libraries for running and fine-tuning transformer-based AI models efficiently:

accelerate (v0.21.0): Optimizes deep learning model training and inference for better performance.
peft (v0.4.0): Enables efficient fine-tuning of large models using techniques like LoRA (Low-Rank Adaptation).
bitsandbytes (v0.40.2): Supports 8-bit and 4-bit quantization, reducing memory usage for large models.
transformers (v4.31.0): Provides pre-trained NLP models like LLaMA, GPT, and BERT from Hugging Face.
trl (v0.4.7): Aids in reinforcement learning fine-tuning for transformer models.

In [None]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

##Installing Hugging Face Hub for Model Access

This command installs the Hugging Face Hub library, which allows seamless access to pre-trained models, datasets, and tokenizers from Hugging Face. It enables functionalities such as:

Downloading and using models from the Hugging Face Model Hub.
Uploading and sharing custom models and datasets.
Managing authentication and API tokens for secure access.

In [None]:
!pip install huggingface_hub

##Upgrading Key Libraries for Efficient Model Execution

Upgrades BitsandBytes, which enables 8-bit and 4-bit quantization, reducing memory usage for large transformer models.
Upgrades Transformers, the core library for working with pre-trained NLP models like LLaMA, GPT, and BERT.
Upgrades Accelerate, which optimizes multi-GPU and mixed-precision training for faster and more efficient deep learning model execution.
TRL (Transformer Reinforcement Learning): Supports fine-tuning models with reinforcement learning techniques.
PEFT (Parameter Efficient Fine-Tuning): Enables low-rank adaptation (LoRA) and other efficient fine-tuning methods.
Datasets: Provides access to large-scale NLP datasets with easy preprocessing capabilities

In [None]:
!pip install --upgrade bitsandbytes
!pip install --upgrade transformers accelerate
!pip install --upgrade trl peft datasets


##Checking NVIDIA CUDA Compiler Version

In [None]:
!nvcc --version


## Importing Libraries for Fine-Tuning LLMs

Torch: PyTorch for deep learning.
TRL & SFTTrainer: Fine-tuning transformer models with reinforcement learning.
PEFT & LoraConfig: Efficient parameter fine-tuning using LoRA.
Datasets: Loading NLP datasets.
Transformers (AutoModelForCausalLM, AutoTokenizer, etc.): Handling LLMs like LLaMA and GPT.
BitsAndBytesConfig: Enables 8-bit quantization for memory efficiency.
Pipeline: Simplifies text generation and inference.

In [None]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Loading and Quantizing a Fine-Tuned LLaMA Model

AutoModelForCausalLM.from_pretrained(...) loads the model from "aboonaji/llama2finetune-v2".

BitsAndBytesConfig(load_in_4bit=True, ...) enables 4-bit quantization, reducing memory consumption.

bnb_4bit_compute_dtype=torch.float16 ensures faster computations in float16 precision.

bnb_4bit_quant_type="nf4" improves numerical stability using NormalFloat4 (NF4) quantization.

llama_model.config.use_cache = False disables caching to avoid memory issues during fine-tuning.

llama_model.config.pretraining_tp = 1 sets tensor parallelism to 1, ensuring compatibility in single-GPU setups.

In [None]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2",
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit = True,
                                                                                            bnb_4bit_compute_dtype = getattr(torch, "float16"),
                                                                                            bnb_4bit_quant_type = "nf4"))
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1

## Loading and Configuring LLaMA Tokenizer

This code loads the LLaMA 2 tokenizer from a fine-tuned model and configures it for proper text processing. It ensures that tokenization aligns with the model’s needs by setting a padding token and defining the padding side. This setup is essential for efficient text generation, fine-tuning, and inference, preventing tokenization-related issues during training and deployment.

In [None]:
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2", trust_remote_code = True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"

##  Defining Training Parameters for Fine-Tuning

This code sets up training configurations for fine-tuning a model. It specifies the output directory for saving results, the batch size per device during training, and the maximum number of training steps. These parameters help control the efficiency, memory usage, and overall training process for optimizing the model.

In [None]:
training_arguments = TrainingArguments(output_dir = "./results", per_device_train_batch_size = 4, max_steps = 100)

In [None]:
torch.cuda.empty_cache()

In [None]:
!nvidia-smi


##Forcefully Clearing GPU Memory

In [None]:
!kill -9 $(nvidia-smi | awk '$2=="Processes:" {f=1; next} f {print $5}' | xargs)


##Displaying GPU Status and Usage

This command shows the GPU status, memory usage, temperature, and active processes using nvidia-smi (NVIDIA System Management Interface). It helps monitor GPU utilization, running processes, and potential memory bottlenecks, making it essential for deep learning and high-performance computing tasks.

In [None]:
!nvidia-smi


In [None]:

torch.cuda.empty_cache()


## Step 5: Fine-Tuning LLaMA 2 with LoRA and Optimized Training

This code fine-tunes a LLaMA 2 model on a medical dataset using LoRA (Low-Rank Adaptation) for efficient training. It first loads and tokenizes the dataset, then applies LoRA-based parameter-efficient fine-tuning to reduce GPU memory usage. The training configuration optimizes performance with gradient accumulation, mixed-precision (fp16), and logging via Weights & Biases (wandb). Finally, the SFTTrainer is initialized to handle the fine-tuning process efficiently.

In [None]:
from datasets import load_dataset
from peft import LoraConfig
from transformers import TrainingArguments

# ✅ Load and preprocess the dataset
train_dataset = load_dataset("aboonaji/wiki_medical_terms_llam2_format", split="train")

def preprocess_function(examples):
    return llama_tokenizer(
        examples["text"], truncation=True, padding="max_length", max_length=512
    )

# ✅ Apply tokenization and batching
train_dataset = train_dataset.map(preprocess_function, batched=True, num_proc=4)

# ✅ Define LoRA Config (Fixing the missing variable)
peft_config = LoraConfig(
    task_type="CAUSAL_LM", r=64, lora_alpha=16, lora_dropout=0.1
)

# ✅ Define Training Arguments with Fewer Steps
training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,  # ✅ Reduce epochs further if needed
    max_steps=30,  # ✅ Stop training after 500 steps (Adjust as needed)
    per_device_train_batch_size=1,  # ✅ Lower batch size to avoid OOM
    gradient_accumulation_steps=8,  # ✅ Simulates larger batch
    fp16=True,  # ✅ Mixed precision training
    save_steps=100,  # ✅ Save model every 100 steps
    save_total_limit=2,
    logging_steps=10,
    report_to="wandb"  # ✅ Logs to Weights & Biases
)

# ✅ Initialize Trainer
llama_sft_trainer = SFTTrainer(
    model=llama_model,
    args=training_arguments,
    train_dataset=train_dataset,
    peft_config=peft_config  # ✅ Now defined correctly
)


## Step 6: Training the model

fdd17058c6cc236455c7169f3ebaebfc2628d1aa


Copy the above key if prompted.
Weights and Biases assists in hyperparameter tuning and debugging.
> Add blockquote



In [None]:
llama_sft_trainer.train()

"""
Weights & Biases (W&B) is integrated into the training process to track, visualize, and log key training metrics in real time. Since fine-tuning LLaMA 2 with LoRA involves multiple hyperparameters (batch size, gradient accumulation, fp16 training, etc.), W&B helps by:

Logging Training Progress – Tracks loss, learning rate, and gradients to monitor model improvements.
Experiment Tracking – Stores different runs, allowing easy comparison of training results.
Reproducibility – Saves configurations and logs, making it easy to replicate experiments.
Remote Monitoring – Provides a dashboard to visualize training even when running remotely (e.g., on Colab or a cloud GPU)."""

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

In [None]:
!pip install gtts

In [None]:
!pip install gtts playsound

In [None]:
!pip install pydub

In [None]:

!pip install SpeechRecognition

import speech_recognition as sr

##Speech-to-Text Conversion Using Google Speech Recognition

This code allows users to upload an audio file, process it, and convert speech into text using Google's Speech Recognition API. It first uploads an audio file in Google Colab, extracts the filename, and initializes a speech recognizer. The audio is then processed, and speech is transcribed into text. If the recognition fails due to unclear audio or API issues, appropriate error messages are displayed. This is useful for voice assistants, transcription services, and NLP applications.

In [None]:
# Install required libraries
!pip install SpeechRecognition pyttsx3 pydub
!apt-get install -y ffmpeg espeak  # Install eSpeak for pyttsx3

# Import libraries
import speech_recognition as sr
import pyttsx3
from IPython.display import Javascript, display, Audio
from google.colab import output
import base64
import io
from pydub import AudioSegment

# JavaScript to record audio in WAV format
RECORD_JS = """
const sleep = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
  reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream, { mimeType: 'audio/webm' })  // Record in webm
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.start()
  await sleep(time)
  recorder.onstop = async () => {
    blob = new Blob(chunks, { type: 'audio/webm' })
    text = await b2text(blob)
    resolve(text)
  }
  recorder.stop()
})
"""

# Global list to store recognized texts
recognized_texts = ['entry text']

def record_audio(seconds=10):
    """Starts audio recording and waits for result."""
    display(Javascript(RECORD_JS))
    audio_data = output.eval_js(f"record({seconds * 1000})")  # Wait for recording to finish
    return process_audio(audio_data)

def process_audio(audio_data):
    """Processes the recorded audio and converts speech to text."""
    global recognized_texts

    try:
        # Decode base64 audio
        if "," in audio_data:
            audio_bytes = base64.b64decode(audio_data.split(",")[1])
        else:
            print("Invalid audio data received.")
            return None

        # Convert webm to WAV using pydub
        webm_audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format="webm")
        wav_audio = webm_audio.set_frame_rate(16000).set_channels(1)
        wav_audio.export("recorded_audio.wav", format="wav")
        print("Audio converted to WAV format")

        # Recognize speech
        recognizer = sr.Recognizer()
        with sr.AudioFile("recorded_audio.wav") as source:
            audio = recognizer.record(source)
            text = recognizer.recognize_google(audio)
            recognized_texts.append(text)  # Append to list
            print("You said:", text)
            SpeakText(text)
            return text

    except sr.UnknownValueError:
        print("Could not understand audio")
    except sr.RequestError as e:
        print(f"Google API error: {e}")

    return None  # Return None if recognition fails

def SpeakText(command):
    """Converts text to speech."""
    engine = pyttsx3.init()
    engine.say(command)
    engine.runAndWait()

# Start first recording
"""print("Recording... Speak now!")
text = record_audio(seconds=10)

# Print all stored texts
print("All Recorded Texts:", recognized_texts)"""


In [None]:
#print("Stored Text:", recognized_text)


In [None]:
"""print("Recording again...")
record_audio(seconds=10)
"""

In [None]:
#print("All Recorded Texts:", recognized_texts[-1])


In [None]:
# Install required libraries
!pip install SpeechRecognition pyttsx3

# Import libraries
import speech_recognition as sr
import pyttsx3
from IPython.display import Javascript, display
from google.colab import output
import base64

In [None]:
!pip install googletrans==4.0.0-rc1

In [None]:
from googletrans import Translator

def translate_to_tamil(text):
    translator = Translator()
    translated_text = translator.translate(text, dest='ta').text
    return translated_text

# Example usage
english_text = "Hello, how are you?"
tamil_translation = translate_to_tamil(english_text)
print("English:", english_text)
print("Tamil:", tamil_translation)



In [None]:
# Install required libraries
!pip install SpeechRecognition pyttsx3 pydub
!apt-get install -y ffmpeg espeak  # Install eSpeak for pyttsx3

# Import libraries
import speech_recognition as sr
import pyttsx3
from IPython.display import Javascript, display, Audio
from google.colab import output
import base64
import io
from pydub import AudioSegment

## Chatting with the model

This code implements a voice-based AI assistant named F.E.R.B, which interacts with users via text and speech. It listens for the activation phrase "Hello" and responds using Google Text-to-Speech (gTTS). If the user inputs a query, it processes the text using a LLaMA 2 text generation model and provides a response. The generated text is then converted into speech and played back. The assistant continues running until the user says "Thank You", at which point it exits. This setup creates an interactive AI chatbot with voice output, making it useful for virtual assistants, automation, or AI-driven conversations.

In [None]:
from gtts import gTTS
import IPython.display as ipd
from transformers import pipeline

# Assuming you have your model and tokenizer initialized
# llama_model and llama_tokenizer should be defined already
# Example:
# from transformers import AutoTokenizer, AutoModelForCausalLM
# llama_model = AutoModelForCausalLM.from_pretrained("your-llama-model")
# llama_tokenizer = AutoTokenizer.from_pretrained("your-llama-model")

State = True
print("Recording again...")
record_audio(seconds=5)
while State:
    if recognized_texts[-1].lower() == 'hello':
        print("Say 'Thank You' to exit.")
        intro = "F.E.R.B: Hi there! How may I help you?"
        tts = gTTS(text=intro, lang='en')
        tts.save("intro.mp3")
        ipd.display(ipd.Audio("intro.mp3"))
        print(intro)
        tamil_translation = translate_to_tamil(intro)
        print(tamil_translation)

        while True:
            print("Recording again...")
            record_audio(seconds=10)
            if recognized_texts[-1].lower() == "thank you":
                outro = "F.E.R.B: Bye!"
                tts = gTTS(text=outro, lang='en')
                tts.save("outro.mp3")
                ipd.display(ipd.Audio("outro.mp3"))
                print(outro)
                tamil_translation = translate_to_tamil(outro)
                State = False
                break
            else:
                text_generation_pipeline = pipeline(task="text-generation", model=llama_model, tokenizer=llama_tokenizer, max_length=300)
                model_answer = text_generation_pipeline(f"<s>[INST] {recognized_texts[-1]} [/INST]")

                # Remove <s>[INST] and [/INST] tags
                generated_text = model_answer[0]['generated_text']
                cleaned_text = generated_text.replace('<s>[INST]', '').replace('[/INST]', '').strip()

                print('F.E.R.B:', cleaned_text)
                tamil_text = translate_to_tamil(cleaned_text)
                print(tamil_text)
                tts = gTTS(text=cleaned_text, lang='en')
                tts.save("answer.mp3")
                ipd.display(ipd.Audio("answer.mp3"))

    else:
        print("Say 'Hello' to start the model.")
        State = False
