# Ternary Classifier of User Queries

This LLM based classifier is designed to categories user queries coming into the following classes: 'problems', 'questions', 'statements'

Now let's import the Lamini Python SDK and set our API key. You can get your API key from [app.lamini.ai](https://app.lamini.ai/account)

*Note: Remember to pip install lamini*

In [10]:
import lamini

#Uncomment the line below to add api key manually in-line.
#lamini.api_key = "<insert key from app.lamini.ai>"

## Create Project:

Here we're creating a project named "Ternary_Classifier" that will leverage the Meta-Llama-3.1-8B-Instruct model.

> #### In this notebook we are going to be working through a sample problem of categorizing user queries. All the examples are made up. 

Let's Get Started!

In [11]:
from lamini.classify.lamini_classifier import LaminiClassifier
import random
import textwrap


cls = LaminiClassifier(
    f"Ternary_Classifier_Example{random.randint(1000,9999)}"
)

### Define the classes:
Once the project is created, we define the classes. The more detailed the description, the higher your initial accuracy will be. It helps to give a few examples of keywords or phrases that are likely to appear in this category.

In [5]:

classes = {
    "problems": """Expressions of difficulty, distress, or challenges the user is facing. These often include negative emotions, obstacles, or situations causing concern. Common indicators include words like 'struggling', 'can't', 'worried', 'stressed', or descriptions of difficult situations.
    Two examples are:
    1. I've been feeling really overwhelmed at work lately and can't seem to get anything done
    2. My relationship with my mother is falling apart and I don't know how to fix it
    """,
    
    "statements": """Neutral or factual sharing of information, observations, or reflections about oneself, others, or situations. These are often descriptive rather than interrogative or problem-focused. May include opinions, realizations, or status updates.
    Two examples are:
    1. I noticed I felt more energetic after starting my morning walks
    2. My therapist suggested I try mindfulness meditation during our last session
    """,
    
    "questions": """Direct requests for information, advice, clarification, or guidance. These are typically marked by question marks and interrogative words (what, how, why, etc.), seeking specific answers or insights.
    Two examples are:
    1. What techniques can I use to manage anxiety during presentations?
    2. Why do we remember some things vividly?
    """
}


### Add Example(s) for each class

Now that we have a description above for Lamini to use when creating the classifier, we can add additional tagged examples explicitly. These additional examples are training data. Adding example inputs is optional, but will help with accuracy. You can always do this later - we'll add more as a follow up step later in this notebook.

In [6]:
examples = {
    "problems": [
        "I feel like I'm constantly apologizing for things that aren't my fault",
        "I don't know how to make myself more approachable in social settings",
        "I've been wanting to try therapy, but I don't know where to start",
        "I feel like my life is just one endless routine right now",
        "I don't know how to express myself without worrying about judgment",
        "I've been trying to stay off social media, but it's addictive",
        "I feel like I'm not prioritizing my personal goals enough",
        "I don't know how to fix things with a friend I've drifted from",
        "I've been nervous about speaking up during meetings at work",
        "I feel like I'm always stuck doing tasks that no one else wants"
    ],
    "statements": [
        "Absolutely not",
        "Maybe later",
        "Who cares",
        "That's okay",
        "This is incredible",
        "Everything is fine",
        "I keep losing track of myself",
        "My dreams feel further away fuck",
        "I want to feel normal again",
        "Even if we disagree, I think there's value in talking it through"
    ],
    "questions": [
        "What's the best option?",
        "How do I learn?",
        "What is your capability?",
        "Are you always correct?",
        "How is this possible?",
        "What connects the physical universe to human consciousness?",
        "Why do humans create art to express themselves?",
        "Where is it?",
        "How do we?",
        "What is life?"
    ]
}

A simple function to print out the examples in a readable format. We will use this again when we add data programmatically a bit later.

Lets review what we have so far.

In [12]:
# Function to word-wrap and display
def display_wrapped_content(data, width=200):
    wrapper = textwrap.TextWrapper(width=width)
    for key, conversations in data.items():
        print(f"Category: {key}\n" + "="*len(f"Category: {key}"))
        for i, conversation in enumerate(conversations, 1):
            print(f"\nConversation {i}:")
            print(wrapper.fill(conversation))
            print("\n" + "-"*width)
            
#Review the examples from our file
display_wrapped_content(examples)

Category: problems

Conversation 1:
I feel like I'm constantly apologizing for things that aren't my fault

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Conversation 2:
I don't know how to make myself more approachable in social settings

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Conversation 3:
I've been wanting to try therapy, but I don't know where to start

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Conversation 4:
I feel like my life is just one endless routine right now

------------------------------------------

We have our class descriptions and training examples, time to create the classifier. Use the **.initialize** endpoint to create a new project. This can take about a minute per class, so we'll put in a simple timer to keep us updated on status.

In [13]:
resp = cls.initialize(classes, examples) 

import time

while True:
    print("Waiting for classifier to initialize")
    time.sleep(5)
    resp = cls.train_status()
    if resp["status"] == "completed":
        print("Model ID: " + resp["model_id"])
        first_model_id = resp["model_id"]
        break
    if resp["status"] == "failed":
        print(resp)
        raise Exception("failed training")


Waiting for classifier to initialize
Waiting for classifier to initialize
Model ID: 21e43229-2a88-4a39-8a00-4218ce11e213


Cool, we have our first model version! Let's try it out with a quick test.

In [14]:
import json

# target: questions (the user makes multiple inquiries)
response = cls.classify(''' How does technology impact human cognitive abilities?
                        ''')

print(json.dumps(response))

{"classification": [[{"class_id": 2, "class_name": "questions", "prob": 0.47822750685789217}, {"class_id": 1, "class_name": "statements", "prob": 0.2845725850202026}, {"class_id": 0, "class_name": "problems", "prob": 0.23719990812190528}]]}


Let's take a quick look at the output above. We get a list of all the categories defined in our project, including a confidence score for each.

We can go even further to easily quantify the accuracy of our classifier. Let's run an evaluation!

What an evaluation means for a classifier: when you provide a set of inputs and the expected output, Lamini will test the accuracy of the model on those inputs, and give you back both overall metrics as well as per-input assessment. 

I've included elements that will test the classifier's ability to handle:

- Mixed content (e.g., questions that contain problem statements)
- Subtle distinctions (e.g., statements that include observations about problems)
- Multi-turn conversations
- Different lengths and complexities
- Various therapy-related topics
- Different emotional intensities
- Both explicit and implicit expressions

### Build Our Evaluation Dataset

In [15]:
questions_eval = [
    {
        "input": "Can you explain quantum mechanics?",
        "target": "questions"
    },
    {
        "input": "How do plants make oxygen?",
        "target": "questions"
    },
    {
        "input": "Where is it?",
        "target": "questions"
    },
    {
        "input": "How do we?",
        "target": "questions"
    },
    {
        "input": "What is life?",
        "target": "questions"
    },
    {
        "input": "Where at?",
        "target": "questions"
    },
    {
        "input": "Who's next?",
        "target": "questions"
    },
    {
        "input": "How bad?",
        "target": "questions"
    },
    {
        "input": "When again?",
        "target": "questions"
    },
    {
        "input": "What?",
        "target": "questions"
    }
]

problems_eval = [
    {
        "input": "I don't know how to stay consistent with my goals",
        "target": "problems"
    },
    {
        "input": "I can't seem to connect with people around me",
        "target": "problems"
    },
    {
        "input": "I don't know how to fix this mistake I made",
        "target": "problems"
    },
    {
        "input": "I feel like I'm not achieving anything meaningful",
        "target": "problems"
    },
    {
        "input": "I can't find the motivation to keep going",
        "target": "problems"
    },
    {
        "input": "I don't know how to overcome my fear of failure",
        "target": "problems"
    },
    {
        "input": "I feel like my efforts never pay off",
        "target": "problems"
    },
    {
        "input": "I can't stop worrying about what others think of me",
        "target": "problems"
    },
    {
        "input": "I don't know how to handle this conflict",
        "target": "problems"
    },
    {
        "input": "I can't figure out why I'm so unhappy",
        "target": "problems"
    }
]

statements_eval = [
    {
        "input": "I'm seeing things more clearly",
        "target": "statements"
    },
    {
        "input": "Every step teaches me something",
        "target": "statements"
    },
    {
        "input": "I'm realizing how this matters",
        "target": "statements"
    },
    {
        "input": "I keep losing track of myself",
        "target": "statements"
    },
    {
        "input": "My dreams feel further away fuck",
        "target": "statements"
    },
    {
        "input": "I just don't see how this could work out the way you're suggesting",
        "target": "statements"
    },
    {
        "input": "Welcome",
        "target": "statements"
    },
    {
        "input": "This is amazing",
        "target": "statements"
    },
    {
        "input": "I totally agree",
        "target": "statements"
    },
    {
        "input": "It's all good",
        "target": "statements"
    }
]

### Run the evaluation

Using the evaluation endpoint, we simply pass the eval dataset we just built to the **eval_data** parameter of the Lamini Evaler.

In [16]:
from lamini.one_evaler.one_evaler import LaminiOneEvaler

eval = LaminiOneEvaler(
    test_model_id=first_model_id,
    eval_data_id=f"first_eval{random.randint(1000,9999)}",
    eval_data=questions_eval+problems_eval+statements_eval,
    test_eval_type="classifier",
)

full_eval = eval.run()

print(json.dumps(full_eval, indent=2))

{
  "eval_job_id": "521147986",
  "eval_data_id": "first_eval4426",
  "metrics": {
    "tuned_accuracy": 0.7666666666666667,
    "tuned_precision": 1.0,
    "tuned_recall": 0.7666666666666667,
    "tuned_f1": 0.8679245283018868
  },
  "status": "COMPLETED",
  "predictions": [
    {
      "input": "Can you explain quantum mechanics?",
      "target": "questions",
      "test_output": "questions",
      "base_output": null
    },
    {
      "input": "How do plants make oxygen?",
      "target": "questions",
      "test_output": "questions",
      "base_output": null
    },
    {
      "input": "Where is it?",
      "target": "questions",
      "test_output": "questions",
      "base_output": null
    },
    {
      "input": "How do we?",
      "target": "questions",
      "test_output": "questions",
      "base_output": null
    },
    {
      "input": "What is life?",
      "target": "questions",
      "test_output": "questions",
      "base_output": null
    },
    {
      "input": "W

Even with just a few examples of each class, the model already got 76% of eval inputs correct. This is captured in the **tuned_accuracy** field of the eval output.

>Best practice: after you've added a few high-quality examples, you should run an eval and carefully review the ground truth labels to make sure they're aligned with the classifier's task and scope - the tags in a general eval set aren't always the best for a narrowly-defined classifier agent to learn from. Don't just review inputs where the assigned class was completely wrong - also review inputs where the classifier's answer is correct but the confidence score is low.

> Reminder: We can view the confidence by using **cls.classify** like we did above.

Lets take a look at which examples the classifier missed. We can use the below function to print out and view the eval examples that the model got wrong.

In [None]:
def print_missed_evals(eval):
    missed = [
        pred for pred in eval['predictions'] 
        if pred['test_output'] != pred['target']
    ]
    
    if missed:
        print("Missed evals:")
        for prediction in missed:
            print(f"Input: {prediction['input']}")
            print(f"Expected: {prediction['target']}")
            print(f"Predicted: {prediction['test_output']}")
            print("-" * 80)
    else:
        print("No Missed Evals!")     

print_missed_evals(full_eval)

Missed evals:
Input: Who's next?
Expected: questions
Predicted: statements
--------------------------------------------------------------------------------
Input: How bad?
Expected: questions
Predicted: statements
--------------------------------------------------------------------------------
Input: When again?
Expected: questions
Predicted: statements
--------------------------------------------------------------------------------
Input: What?
Expected: questions
Predicted: statements
--------------------------------------------------------------------------------
Input: I don't know how to fix this mistake I made
Expected: problems
Predicted: questions
--------------------------------------------------------------------------------
Input: Every step teaches me something
Expected: statements
Predicted: questions
--------------------------------------------------------------------------------
Input: I keep losing track of myself
Expected: statements
Predicted: problems
---------------

#### What should we do to get higher accuracy?

To Start: More, well curated training data typically helps, lets add a few more training examples. We only added a few for each class so far.

### Load Training Examples from a file:

Most times our data is likely going to be ingested in an unstructured format from our chat application. Let's take a look at how we might stage that data into the right format to be used to train the classifier. 

> Note: This is additional training data, examples we use here should be ideal examples that our Classifier agent is going to learn from.

The below snippet, loads data from a file that is in the data directory of this repo. It is pre-populated with some data for us. As we run the below code, it will print out the loaded examples to review briefly before we train the classifier again (**.initialize** includes the first training run).

In [None]:
import json
import os

# Path to your training data file
filepath_with_examples = "data/training_data.jsonl"

def load_jsonl_file(relpath):
    data = []
    
    # Get the current directory where the notebook is running
    current_dir = os.getcwd()
    
    # Add the relative path to the data directory with our filename
    file_path = os.path.join(current_dir, relpath)
    
    print(f"Attempting to load file from: {file_path}")
    
    # Load data with explicit UTF-8 encoding
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            try:
                entry = json.loads(line)
                data.append(entry)
            except json.JSONDecodeError as e:
                print(f"Error parsing line: {line.strip()}")
                print(f"Error details: {str(e)}")
                continue
    
    print(f"\nTotal entries loaded: {len(data)}")
    return data

def format_examples_from_file(data):
    examples = {
        "questions": [],
        "problems": [],
        "statements": []
    }
    
    # Sort data by category first to ensure consistent ordering
    for entry in sorted(data, key=lambda x: x['category']):
        category = entry['category']
        if category in examples:
            examples[category].append(entry['conversation'])
    
    # Verify counts for each category
    for category, items in examples.items():
        print(f"Number of {category}: {len(items)}")
        # Print first and last entry of each category for verification
        if items:
            print(f"{category} first entry: {items[0]}")
            print(f"{category} last entry: {items[-1]}")
            print("-" * 50)
    
    return examples

def display_wrapped_content(data):
    for category, queries in sorted(data.items()):
        print(f"\nCategory: {category}")
        print("=" * 50)
        for i, query in enumerate(queries, 1):
            # Replace problematic quotes with standard ones
            query = query.replace('"', '"').replace('"', '"').replace("'", "'")
            print(f"{i}. {query}")
        print("-" * 50)

# Load and display the data
try:
    loaded_data = load_jsonl_file(filepath_with_examples)
    examples_from_file = format_examples_from_file(loaded_data)
    display_wrapped_content(examples_from_file)
except Exception as e:
    print(f"Error loading or processing data: {str(e)}")
    print(f"Current working directory: {os.getcwd()}")

Attempting to load file from: c:\Users\atkin\Documents\AFUS\Ternary_Classifier\data/training_data.jsonl

Total entries loaded: 120
Number of questions: 40
questions first entry: Why?
questions last entry: Why care?
--------------------------------------------------
Number of problems: 40
problems first entry: I don’t know how to fix this broken relationship.
problems last entry: I don’t know how to regain control of my life.
--------------------------------------------------
Number of statements: 40
statements first entry: Hello
statements last entry: Even if it seems like the right choice, there’s a risk we’re overlooking.
--------------------------------------------------

Category: problems
1. I don’t know how to fix this broken relationship.
2. I’m struggling to pay my bills this month.
3. I can’t seem to lose weight no matter what I try.
4. I feel like I’m failing at my job.
5. I don’t know how to apologize without making it worse.
6. I can’t afford to fix my car right now.
7. I f

### Review Programmatically Loaded Examples Above

Our new examples look good!

Last time we added training examples, we used examples straight away when creating the classifier. Here we are going to add new examples to the same classifier to provide additional data. We will use the same **cls** object that we created already and call the **.add** method. Passing in a unique name for this dataset and the examples. 

### Train the Classifier with the New Examples

Because we are adding additional data explicitely, we need to call **.train** when we have added all the data we want to add to this next model of our classifier project. 
> Note: We can add more than one dataset before training. Training is fast, typically each iteration can be 1 dataset as we iterate on improving the classifier.

In [None]:
resp = cls.add(
    f"additional_data{random.randint(1000,9999)}", examples_from_file
)

resp = cls.train()

while True:
    print("Waiting for classifier to train")
    time.sleep(5)
    resp = cls.train_status()
    if resp["status"] == "completed":
        print("Model ID: " + resp["model_id"])
        second_model_id = resp["model_id"]
        break
    if resp["status"] == "failed":
        print(resp["status"])
        raise Exception("failed training")


Waiting for classifier to train
Waiting for classifier to train
Model ID: df2b8819-62e5-4c4b-8e5a-1c082057a5b7


Great, now we have a second model version in our project! Let's run an eval and compare it to the first version. 

> #### Note: When comparing 2 models, you'll notice we pass a few extra parameters to the Evaler. The test model continues to be the model that we want to understand its behavior (our most recent one). The model we are comparing to, is the base model. 

### Lets see the results from the updated classifier!

In [None]:
print("Running comparison eval between model versions " + first_model_id + " and " + second_model_id)

eval_2 = LaminiOneEvaler(
    test_model_id=second_model_id,
    eval_data_id=f"second_eval{random.randint(1000,9999)}",
    eval_data=questions_eval+problems_eval+statements_eval,
    test_eval_type="classifier",
    base_model_id=first_model_id,
    sbs=True,
    fuzzy=True,
)

full_eval2 = eval_2.run()

print(json.dumps(full_eval2, indent=2))

Running comparison eval between model versions 21e43229-2a88-4a39-8a00-4218ce11e213 and df2b8819-62e5-4c4b-8e5a-1c082057a5b7
{
  "eval_job_id": "552029668",
  "eval_data_id": "second_eval6400",
  "metrics": {
    "base_accuracy": 0.7666666666666667,
    "base_precision": 1.0,
    "base_recall": 0.7666666666666667,
    "base_f1": 0.8679245283018868,
    "base_fuzzy_accuracy": 0.7666666666666667,
    "base_fuzzy_precision": 1.0,
    "base_fuzzy_recall": 0.7666666666666667,
    "base_fuzzy_f1": 0.8679245283018868,
    "tuned_accuracy": 0.9666666666666667,
    "tuned_precision": 1.0,
    "tuned_recall": 0.9666666666666667,
    "tuned_f1": 0.9830508474576272,
    "tuned_fuzzy_accuracy": 0.9666666666666667,
    "tuned_fuzzy_precision": 1.0,
    "tuned_fuzzy_recall": 0.9666666666666667,
    "tuned_fuzzy_f1": 0.9830508474576272,
    "tuned_win_loss_ratio": 6.0,
    "base_win_loss_ratio": 0.0
  },
  "status": "COMPLETED",
  "predictions": [
    {
      "input": "Can you explain quantum mechanic

### Tuned model success:

The fine-tuned model outperformed the base model and had an accuracy of 96%. 

### Eval comparison:

The eval output makes it easy to compare model versions overall, and to see exactly where the differences are, so you know exactly where to focus to improve your workflow.

Lets take a look at all of the eval data points above that the classifier did not get correct.

In [None]:
print_missed_evals(full_eval2)

Missed evals:
Input: I keep losing track of myself
Expected: statements
Predicted: problems
--------------------------------------------------------------------------------


However, I'm okay with this as this entry is the most ambigiuous. 

## Applications:

This classifier can now be used in the backend to classify user queries with minimal latency and provide the state machine with crucial information on the nature of the query.