## Assignmnet Week 10

LoRA is a type of PEFT technique where a smaller low rank weight matrix is trained and adjusted instead of performing full fine-tuning to train all the weights of a pre-trained model. Full fine-tuning is resource intensive and time consuming, hence parameter efficient fine-tuning like LoRA can be utilized to reduce the resource requirements while still providing good results with specific task.

LoRA approach is demonstrated below using `gpt2` as foundaion model and then fine-tuning it with LoRA.

The demonstration will show how LoRA fine-tuning approach can be used to direct a foundational model to perform a **Quote tagging** task. Initially a quote will be passed for inferenece using a `gpt2` foundational model and the result will be checked. Then the `gpt2` foundational model will be fine-tuned with [Quote tagging][1] dataset using LoRA, which will enable it to tag quotes rather than simply generating text. The fine-tuned model will be tested to evaluate how LoRA fine-tuning will enhance the response.

[1]: https://huggingface.co/datasets/Abirate/english_quotes 

In [6]:
import os
import time
# import prometheus_client as prom
import mlflow
from openai import OpenAI
import tiktoken as tk
from colorama import Fore, Style, init
import json

As the mlflow is running locally on port 5000, the mlflow uri is `http://localhost:5000`

In [11]:
MLFLOW_URI = "http://localhost:8080"

set the constants

In [17]:
MODEL = "gpt-4o-mini"
TEMPERATURE = 0.7
TOP_P = 1
FREQUENCY_PENALTY = 0
PRESENCE_PENALTY = 0
MAX_TOKENS = 800
DEBUG = False

In [13]:
# Initialize colorama
init()

# get api key from file
with open("../../apikeys/openai-keys.json", "r") as key_file:
    api_key = json.load(key_file)["default_api_key"]

os.environ["OPENAI_API_KEY"] = api_key
# Initialize OpenAI client
client = OpenAI()

In [14]:
# Set MLflow tracking URI
mlflow.set_tracking_uri(MLFLOW_URI)
mlflow.set_experiment("week10_assignment")  # Replace with your experiment name

2025/02/07 14:34:06 INFO mlflow.tracking.fluent: Experiment with name 'week10_assignment' does not exist. Creating a new experiment.


<Experiment: artifact_location='/mlflow/artifacts/260420388858552430', creation_time=1738956846799, experiment_id='260420388858552430', last_update_time=1738956846799, lifecycle_stage='active', name='week10_assignment', tags={}>

In [15]:
# Print user input and AI output with colors
def print_user_input(text):
    print(f"{Fore.GREEN}You: {Style.RESET_ALL}", text)

def print_ai_output(text):
    print(f"{Fore.BLUE}AI Assistant:{Style.RESET_ALL}", text)

# count tokens
def count_tokens(string: str, encoding_name="cl100k_base") -> int:
    # Get the encoding
    encoding = tk.get_encoding(encoding_name)
    
    # Encode the string
    encoded_string = encoding.encode(string, disallowed_special=())

    # Count the number of tokens
    num_tokens = len(encoded_string)
    return num_tokens

# Generate text using OpenAI API
def generate_text(conversation, max_tokens=100)->str:
    # Generate text using OpenAI API
    start_time = time.time()
    response = client.chat.completions.create(
        model=MODEL,
        messages=conversation,
        temperature=TEMPERATURE,
        max_tokens=max_tokens,
        top_p=TOP_P,
        frequency_penalty=FREQUENCY_PENALTY,
        presence_penalty=PRESENCE_PENALTY
    )
    latency = time.time() - start_time
    message_response = response.choices[0].message.content
    
    # Count tokens in the prompt and the completion
    prompt_tokens = count_tokens(conversation[-1]['content'])
    conversation_tokens = count_tokens(str(conversation))
    completion_tokens = count_tokens(message_response)
    
    run = mlflow.active_run()
    if DEBUG:    
        print(f"Run ID: {run.info.run_id}")
        input("Press Enter to continue...")

    mlflow.log_metrics({
        "request_count": 1,
        "request_latency": latency,
        "prompt_tokens": prompt_tokens,
        "completion_tokens": completion_tokens,
        "conversation_tokens": conversation_tokens
    })
    
    mlflow.log_params({
        "model": MODEL,
        "temperature": TEMPERATURE,
        "top_p": TOP_P,
        "frequency_penalty": FREQUENCY_PENALTY,
        "presence_penalty": PRESENCE_PENALTY
    })

    return message_response


In [18]:
mlflow.autolog()

# Start a new MLflow run
with mlflow.start_run() as run:
    conversation = [
        {"role": "system", "content": "You are a helpful assistant."},
    ]

    while True:
        user_input = input("User: ")
        if user_input.lower() in ["exit", "quit", "q", "e"]:
            break

        conversation.append({"role": "user", "content": user_input})
        ai_output = generate_text(conversation, MAX_TOKENS)
        print_ai_output(ai_output)
        conversation.append({"role": "assistant", "content": ai_output})

2025/02/07 14:37:55 INFO mlflow.tracking.fluent: Autologging successfully enabled for openai.


User:  recipe for cardamom tea


2025/02/07 14:38:19 INFO mlflow.tracking.fluent: Autologging successfully enabled for langchain.


AI Assistant: Certainly! Here's a simple recipe for making delicious cardamom tea:

### Cardamom Tea Recipe

#### Ingredients:
- 2 cups water
- 2-3 green cardamom pods (or to taste)
- 1-2 teaspoons loose black tea or 1-2 tea bags (Assam or Darjeeling work well)
- Milk (optional, to taste)
- Sugar or sweetener (optional, to taste)

#### Instructions:

1. **Crush the Cardamom:**
   - Lightly crush the cardamom pods using a mortar and pestle or the back of a spoon. This helps to release the flavor.

2. **Boil Water:**
   - In a saucepan, bring 2 cups of water to a boil.

3. **Add Cardamom:**
   - Once the water is boiling, add the crushed cardamom pods to the water. Let it simmer for about 2-3 minutes to infuse the flavor.

4. **Add Tea:**
   - Add the loose black tea or the tea bags to the boiling water. Let it steep for 3-5 minutes, depending on how strong you like your tea.

5. **Add Milk (Optional):**
   - If you prefer your tea with milk, add it at this stage. You can adjust the quan

User:  q


üèÉ View run whimsical-ant-922 at: http://localhost:8080/#/experiments/260420388858552430/runs/a2df44b2d5a841058c0da5309faae13c
üß™ View experiment at: http://localhost:8080/#/experiments/260420388858552430
