#**Intent Detection Model**

# 1. Framing the Problem (Machine Learning)
**Problem Type:** This problem is a classic supervised learning task, specifically a classification problem. The goal is to classify user inputs into predefined intent categories based on labeled training data.

**Features and Labels:**

**Features:** The user input text (Sentences).

**Labels:** The intent categories (e.g., "EMI", "Order_Status", "Return_Exchange").

**Approach:** Using NLP preprocessing (e.g., tokenization, lemmatization) and a text classification algorithm (e.g., Naive Bayes) to build an end-to-end pipeline.

# 2. Pros and Cons of Formulations
**Approach Considered:**

**Naive Bayes Classifier with CountVectorizer:**

**Pros:** Simple, interpretable, efficient for text classification.

**Cons:** Assumes feature independence; may struggle with nuanced text.

**Alternative Options:**

**Logistic Regression:** More flexible but computationally heavier.

**Deep Learning** (e.g., LSTM, Transformers): Higher potential accuracy but requires more data and computation.

**Preprocessing:**
Using spaCy for lemmatization and stopword removal improves input quality but adds preprocessing time.

# 3. Building and Assessing the Model
**Model Pipeline:**

Data loading, preprocessing, model training, and evaluation.

**Results Interpretation:**
Using metrics like accuracy and the classification report to understand performance. If results show high variance between classes, adjustments might be needed.

# 4. Justification and Improvements
**Why Results Make Sense:**
A simple model like Naive Bayes is effective for basic intent classification. Results may be consistent for straightforward datasets with distinct categories.

**Potential Improvements:**
Switch to TF-IDF Vectorization: Reduces the weight of common words, potentially improving model generalization.

**Hyperparameter Tuning: **Adjusting the alpha parameter in Naive Bayes or exploring grid search for optimal settings.

**Model Upgrade: **Consider using LogisticRegression or tree-based algorithms (e.g., RandomForest) for better decision boundaries.

**Advanced Models:** Implement spaCy-based embeddings or switch to deep learning models for richer semantic understanding.

**Rationale:** These adjustments should help the model generalize better to unseen data and potentially handle more complex intent nuances.

In [1]:
# Installing necessory frameworks to buildthe model
!pip install tensorflow



In [2]:
# Importing necessary libraries
import pandas as pd
import spacy
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn import metrics

In [3]:
# Loading the spaCy language model
nlp = spacy.load('en_core_web_sm')

In [4]:
# Loading the dataset
data = pd.read_csv('/content/sofmattress_train.csv')

In [5]:
# Ensuring the dataset has 'sentence' and 'label' columns
print(data.head())

                                         sentence label
0                    You guys provide EMI option?   EMI
1  Do you offer Zero Percent EMI payment options?   EMI
2                                         0% EMI.   EMI
3                                             EMI   EMI
4                           I want in installment   EMI


In [6]:
# Preprocessing the text data using spaCy
preprocessed_texts = []
# Accessing the correct column
for doc in nlp.pipe(data['sentence'], disable=["ner", "parser"]):
    tokens = [token.lemma_ for token in doc if not token.is_stop and token.is_alpha]
    preprocessed_texts.append(" ".join(tokens))

In [7]:
data['processed_text'] = preprocessed_texts

In [8]:
# Splitting the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    data['processed_text'], data['label'], test_size=0.2, random_state=42
)

In [9]:
# Creating a pipeline with CountVectorizer and MultinomialNB classifier
model = make_pipeline(CountVectorizer(), MultinomialNB())



In [10]:
# Training the model
model.fit(X_train, y_train)



In [11]:
# Predicting on the test set
y_pred = model.predict(X_test)



In [12]:
# Printing the classification report
print(metrics.classification_report(y_test, y_pred))



                       precision    recall  f1-score   support

100_NIGHT_TRIAL_OFFER       1.00      0.75      0.86         4
   ABOUT_SOF_MATTRESS       0.50      0.67      0.57         3
         CANCEL_ORDER       1.00      1.00      1.00         2
        CHECK_PINCODE       1.00      1.00      1.00         1
                  COD       0.67      1.00      0.80         2
           COMPARISON       0.00      0.00      0.00         1
    DELAY_IN_DELIVERY       0.00      0.00      0.00         2
         DISTRIBUTORS       0.60      0.75      0.67         8
                  EMI       0.57      0.80      0.67         5
        ERGO_FEATURES       1.00      0.25      0.40         4
             LEAD_GEN       0.33      0.25      0.29         4
        MATTRESS_COST       1.00      1.00      1.00         3
               OFFERS       1.00      0.67      0.80         3
         ORDER_STATUS       0.33      1.00      0.50         1
       ORTHO_FEATURES       1.00      1.00      1.00  

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [13]:
# Print the accuracy score
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))


Accuracy: 0.6060606060606061
