<a href="https://colab.research.google.com/github/ScientificStephen/My-Projects/blob/main/A_I_Chatbot_Start.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [51]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Chatbot for Customer Support
Overview
This chatbot is designed to assist businesses in handling customer support queries efficiently. It uses Natural Language Processing (NLP) and Machine Learning (ML) to understand customer intents, provide accurate responses, and support multiple languages. With an accuracy of 97%, it ensures reliable and consistent customer interactions.

Key Features
1. Intent Classification
The chatbot can detect the intent behind customer queries, such as:

Canceling an order.

Tracking an order.

Checking refund policies.

Managing accounts (e.g., creating, deleting, or recovering passwords).

2. Multilingual Support
The chatbot supports multiple languages, including English, Spanish, French, and more.

It automatically detects the language of the customer's query and responds in the same language.

3. Sentiment Analysis
The chatbot analyzes the sentiment of customer queries (positive, negative, or neutral).

It provides empathetic responses to unhappy customers and escalates issues to human agents when necessary.

4. Entity Extraction
The chatbot can extract key information from customer queries, such as:

Order numbers.

Product names.

Dates.

This allows it to provide personalized responses and automate tasks like order tracking or refund processing.

5. High Accuracy
The chatbot achieves an accuracy of 97% in classifying customer intents, ensuring reliable and consistent responses.

How It Works
Input: The customer types a query (e.g., "How do I cancel my order?").

Language Detection: The chatbot detects the language of the query (e.g., English).

Intent Classification: The chatbot identifies the intent behind the query (e.g., cancel_order).

Entity Extraction: The chatbot extracts relevant information (e.g., order number).

Response Generation: The chatbot generates a response based on the detected intent and provides it in the customer's preferred language.

Example Interaction
Customer Query
"¿Cómo cancelo mi pedido?" (How do I cancel my order?)

Chatbot Response
Detects the language as Spanish.

Identifies the intent as cancel_order.

Responds in Spanish:
"Intención detectada: cancel_order. ¿Cómo puedo ayudarte más?"
(Detected intent: cancel_order. How can I assist you further?)

Accuracy
The chatbot achieves an accuracy of 97% in classifying customer intents, thanks to:

A robust machine learning model trained on a large dataset of customer queries.

Advanced NLP techniques for understanding and processing text.

Continuous improvement through feedback and updates.

Business Benefits
24/7 Support: The chatbot provides round-the-clock assistance to customers.

Cost-Effective: Reduces the need for a large customer support team.

Scalable: Handles multiple customer queries simultaneously.

Improved Customer Satisfaction: Provides quick and accurate responses, enhancing the customer experience.

Conclusion
This chatbot is a powerful tool for businesses looking to automate customer support. With its 97% accuracy, multilingual support, and advanced features, it ensures efficient and reliable customer interactions. Whether you're an e-commerce business, a bank, or a healthcare provider, this chatbot can be customized to meet your specific needs.

In [52]:
!pip install googletrans==4.0.0-rc1



In [53]:
!pip install -q kaggle

In [54]:
!pip install langdetect



In [55]:
!pip install translators



In [49]:
# ==============================================================
# STEP 1: IMPORT NECESSARY LIBRARIES
# ==============================================================
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
import pickle
import nltk
from nltk.corpus import wordnet
import translators as ts  # <-- Using translators for translation
from langdetect import detect  # <-- Using langdetect for language detection
from textblob import TextBlob
import spacy
import logging
import random

# ==============================================================
# STEP 2: DOWNLOAD NECESSARY NLTK DATA AND LOAD spaCy MODEL
# ==============================================================
nltk.download('wordnet')
!python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")

# ==============================================================
# STEP 3: SET UP LOGGING
# ==============================================================
logging.basicConfig(filename='/content/chatbot.log', level=logging.INFO)

# ==============================================================
# STEP 4: DOWNLOAD AND LOAD THE DATASET
# ==============================================================
# Install Kaggle API and download the dataset
!pip install -q kaggle
from google.colab import files
files.upload()  # Upload your kaggle.json file
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d bitext/bitext-gen-ai-chatbot-customer-support-dataset
!unzip bitext-gen-ai-chatbot-customer-support-dataset.zip

# Load the dataset
data = pd.read_csv('Bitext_Sample_Customer_Support_Training_Dataset_27K_responses-v11.csv')
print("Dataset Overview:")
print(data.head())
print("\nDataset Info:")
print(data.info())

# ==============================================================
# STEP 5: PREPROCESS THE DATA
# ==============================================================
# Drop rows with missing values in 'instruction', 'intent', or 'response' columns
data = data.dropna(subset=['instruction', 'intent', 'response'])

# Combine 'instruction' and 'response' columns for exploratory analysis
data['combined'] = data['instruction'] + " " + data['response']

# ==============================================================
# STEP 6: TRAIN-TEST SPLIT
# ==============================================================
X = data['instruction']  # Input (user queries)
y = data['intent']  # Labels (intents)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ==============================================================
# STEP 7: VECTORIZE TEXT DATA
# ==============================================================
vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# ==============================================================
# STEP 8: TRAIN INTENT CLASSIFICATION MODEL
# ==============================================================
model = LogisticRegression()
model.fit(X_train_tfidf, y_train)

# ==============================================================
# STEP 9: EVALUATE THE MODEL
# ==============================================================
predictions = model.predict(X_test_tfidf)
accuracy = accuracy_score(y_test, predictions)
print("\nAccuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, predictions))

# ==============================================================
# STEP 10: SAVE THE VECTORIZER AND MODEL FOR DEPLOYMENT
# ==============================================================
with open('vectorizer.pkl', 'wb') as vec_file:
    pickle.dump(vectorizer, vec_file)
with open('intent_model.pkl', 'wb') as model_file:
    pickle.dump(model, model_file)
print("\nModel and vectorizer saved successfully.")

# ==============================================================
# STEP 11: ENHANCED MULTILINGUAL CHATBOT
# ==============================================================
def enhanced_chatbot(user_input, target_language="en"):
    # Step 1: Detect language
    detected_lang = detect(user_input)
    print(f"Detected language: {detected_lang}")

    # Step 2: Translate input to English (if necessary)
    if detected_lang != "en":
        user_input_en = ts.translate_text(user_input, to_language="en")
    else:
        user_input_en = user_input

    # Step 3: Extract entities
    entities = {ent.text: ent.label_ for ent in nlp(user_input_en).ents}
    print(f"Extracted entities: {entities}")

    # Step 4: Analyze sentiment
    sentiment = TextBlob(user_input_en).sentiment.polarity
    print(f"Sentiment: {'Positive' if sentiment > 0 else 'Negative' if sentiment < 0 else 'Neutral'}")

    # Step 5: Generate response
    user_input_tfidf = vectorizer.transform([user_input_en])
    intent = model.predict(user_input_tfidf)[0]
    confidence = model.predict_proba(user_input_tfidf).max()
    response_en = f"Detected intent: {intent}. How can I assist you further?" if confidence >= 0.5 else "I'm sorry, I didn't understand that. Can you please rephrase?"

    # Step 6: Translate response to target language
    if target_language != "en":
        response = ts.translate_text(response_en, to_language=target_language)
    else:
        response = response_en

    # Step 7: Log conversation
    logging.info(f"User Input: {user_input}, Response: {response}")

    return response

# ==============================================================
# STEP 12: EXAMPLE USAGE OF THE ENHANCED CHATBOT
# ==============================================================
# Example 1: Spanish input
user_input = "¿Cómo cancelo mi pedido?"
response = enhanced_chatbot(user_input, target_language="es")
print("Response:", response)

# Example 2: French input
user_input = "Quelle est la politique de remboursement ?"
response = enhanced_chatbot(user_input, target_language="fr")
print("Response:", response)

# Example 3: English input
user_input = "How do I track my order?"
response = enhanced_chatbot(user_input, target_language="en")
print("Response:", response)

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


Saving archive (9).zip to archive (9) (1).zip
cp: cannot stat 'kaggle.json': No such file or directory
chmod: cannot access '/root/.kaggle/kaggle.json': No such file or directory
Dataset URL: https://www.kaggle.com/datasets/bitext/bitext-gen-ai-chatbot-customer-support-dataset
License(s): Community Data License Agreement - Sharing - Version 1.0
Downloading bitext-gen-ai-chatbot-customer-support-dataset.zip to /content
 70% 2.00M/2.87M [00:00<00:00, 2.53MB/s]
100% 2.87M/2.87M [00:01<00:00, 2.96MB/s]
Archive:  bitext-gen-ai-chatbot-customer-support-dataset.zip
  inflating: Bitext_Sample_Customer_Support_Training_Dataset_27K_responses-v11.csv  
Dataset Overview:
   flags                                        instruction category  \
0      B   question about cancelling order {{Order Number}}    ORDER   
1    BQZ  i have a question about cancelling oorder {{Or...    ORDER   
2   BLQZ    i need help cancelling puchase {{Order Number}}    ORDER   
3     BL         I need to cancel purchase {