<a href="https://colab.research.google.com/github/YasidaWanigatunga/Intellihack_Innovision_2/blob/luckseegan_LLM/Task2_innovation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### ***NLU Task 2 - Innovation***

In [3]:
import pandas as pd
import numpy as np
import string
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.feature_extraction.text import CountVectorizer

In [4]:
df=pd.read_csv('data.csv')
df=df[['intent','example']]
df.head()

Unnamed: 0,intent,example
0,greet_smalltalk,o meet you- hi good morning
1,greet_smalltalk,- hey everyone
2,greet_smalltalk,- nice to meet you everyone
3,greet_smalltalk,- hello hello buddy today
4,greet_smalltalk,- hi hello buddy there



### **Preprocessing**



In [5]:

# Define a function to remove punctuation marks
def remove_punctuation(text):
    translator = str.maketrans('', '', string.punctuation)
    return text.translate(translator)

# Apply the function to remove punctuation marks from the example column
df['example'] = df['example'].apply(remove_punctuation)

# Define a function to count the number of words in a text
def count_words(text):
    return len(text.split())

# Apply the function to count words and create a new column 'word_count'
df['word_count'] = df['example'].apply(count_words)

# Define a function to remove punctuation marks and lowercase the text
def remove_punctuation_and_lowercase(text):
    translator = str.maketrans('', '', string.punctuation)
    text = text.translate(translator)
    return text.lower()

# Apply the function to remove punctuation marks and lowercase the text in the example column
df['example'] = df['example'].apply(remove_punctuation_and_lowercase)


df.head()

Unnamed: 0,intent,example,word_count
0,greet_smalltalk,o meet you hi good morning,6
1,greet_smalltalk,hey everyone,2
2,greet_smalltalk,nice to meet you everyone,5
3,greet_smalltalk,hello hello buddy today,4
4,greet_smalltalk,hi hello buddy there,4


The code snippet provided illustrates the preprocessing steps undertaken to prepare text data for intent classification. Initially, punctuation marks are removed from the text examples using a custom function, ensuring that the subsequent analysis isn't affected by extraneous characters. This step is crucial for maintaining consistency in the dataset. Subsequently, the number of words in each text example is counted, facilitating a deeper understanding of the text's complexity or simplicity, which could influence classification accuracy. Furthermore, to standardize the text and mitigate potential discrepancies, all characters are converted to lowercase, making the text uniform across examples. These preprocessing steps collectively serve to enhance the quality and consistency of the text data, thereby improving the performance of any subsequent intent classification model trained on it.

### **Model Building and Training**

In [6]:
# Split the dataset into training and testing sets
X = df['example']
y = df['intent']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Vectorize the text data
vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Naive Bayes classifier
classifier = MultinomialNB()
classifier.fit(X_train_vectorized, y_train)

# Predict on the test set
y_pred = classifier.predict(X_test_vectorized)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0


### Model Training and Evaluation

To build and evaluate the intent classification model, the dataset was split into training and testing sets using the `train_test_split` function from the `sklearn.model_selection` module. The feature data (`X`) consisted of the preprocessed text examples, while the target data (`y`) contained the corresponding intent labels. The dataset was split with a test size of 20% and a random state of 42 to ensure reproducibility.

For feature representation, the text data was vectorized using the `CountVectorizer` from `sklearn.feature_extraction.text`. This process converts the text into numerical feature vectors, with each feature representing the frequency of a word in the text examples.

A Multinomial Naive Bayes classifier was then trained on the vectorized training data using the `MultinomialNB` class from `sklearn.naive_bayes`. Naive Bayes is a probabilistic classifier known for its simplicity and effectiveness in text classification tasks.

After training the classifier, predictions were made on the test set, and the model's performance was evaluated using accuracy as the metric. Accuracy, calculated using the `accuracy_score` function from `sklearn.metrics`, measures the proportion of correctly classified instances out of all instances in the test set.

The model achieved an accuracy of [insert accuracy score here], indicating its effectiveness in classifying intents based on the given text examples.


### **Confidence score evaluation**

In [7]:
# Define a fallback mechanism
def handle_fallback():
    # Fallback mechanism logic goes here
    return "NLU fallback: Intent could not be confdently determined"

# Function to classify intent and calculate confidence score
def classify_intent(text):
    # Vectorize the text
    text_vectorized = vectorizer.transform([text])
    # Predict probabilities
    probabilities = classifier.predict_proba(text_vectorized)
    max_probability = max(probabilities[0])
    predicted_intent = classifier.classes_[probabilities.argmax()]
    confidence_score = max_probability
    if confidence_score >= 0.7:
        return predicted_intent, confidence_score
    else:
        return "Fallback: Intent could not be confidently determined", confidence_score



Predicted Intent: bot_challenge, Confidence Score: 0.9381434014190176


In [14]:
# Define a function to classify intent and handle fallback
def classify_and_handle_fallback(input_text):
    # Classify intent
    predicted_intent, confidence_score = classify_intent(input_text)
    # Check if fallback is needed
    if predicted_intent.startswith("Fallback"):
        # Fallback triggered, handle fallback
        response = handle_fallback()
    else:
        # No fallback, return predicted intent and confidence score
        response = f"Predicted Intent: {predicted_intent}, Confidence Score: {confidence_score}"
    return response


inputs = [
    "real human",
    "Hello",
    "See you later",
    "happiness",
    "for sure",
    "not true",
    "weather like"
]

# Iterate over the inputs and classify intents
for input_text in inputs:
    response = classify_and_handle_fallback(input_text)
    print(f"Input Text: '{input_text}' - Response: {response}")


Input Text: 'real human' - Response: Predicted Intent: bot_challenge, Confidence Score: 0.9381434014190176
Input Text: 'Hello' - Response: Predicted Intent: greet_smalltalk, Confidence Score: 0.8841901166897334
Input Text: 'See you later' - Response: Predicted Intent: goodbyes_smalltalk, Confidence Score: 0.9990673999736447
Input Text: 'happiness' - Response: NLU fallback: Intent could not be confdently determined
Input Text: 'for sure' - Response: Predicted Intent: affirm_smalltalk, Confidence Score: 0.8062317884484574
Input Text: 'not true' - Response: Predicted Intent: deny_smalltalk, Confidence Score: 0.770831148460746
Input Text: 'weather like' - Response: NLU fallback: Intent could not be confdently determined




A fallback mechanism is implemented to handle cases where the intent classification confidence score falls below a predefined threshold. The `handle_fallback` function defines the logic for the fallback mechanism, returning a predefined message indicating that the intent could not be confidently determined.

The `classify_intent` function takes a text input as an argument, vectorizes the text using the previously defined `vectorizer`, and predicts the intent probabilities using the trained classifier. The maximum probability and corresponding predicted intent are extracted from the probabilities array. If the confidence score (maximum probability) exceeds the threshold of 0.7, the predicted intent and confidence score are returned. Otherwise, the fallback mechanism is triggered, and the predefined message along with the confidence score is returned.

An example usage demonstrates how to classify intent from an input text. In this example, the input text is "real human". The `classify_intent` function is called to predict the intent and confidence score. If the predicted intent starts with "Fallback", indicating that the confidence score is below the threshold, the `handle_fallback` function is invoked to provide a fallback response. Otherwise, a response containing the predicted intent and confidence score is generated.

The output of the example usage provides insights into how the fallback mechanism is utilized when the confidence score is insufficient for accurate intent classification.
