### **Heuristic Approach:**
1. Rule-based system: Define rules for answering questions based on keywords, synonyms, and context.
2. Keyword extraction: Identify key phrases from questions and match with summary text.
3. Pattern matching: Use regular expressions or string matching algorithms.


### **Machine Learning Approach:**
1. Text classification: Train models to classify questions into categories.
2. Named Entity Recognition (NER): Identify entities in questions and summary.
3. Question Answering (QA) models: Utilize models like BERT, RoBERTa, or XLNet.


### **Steps:**
1. Preprocess data: Tokenize, remove stop words, and lemmatize.
2. Train models: Use scikit-learn, TensorFlow, or PyTorch.
3. Integrate heuristic and ML approaches.


### **Some popular ML models for QA:**
1. BERT (Bidirectional Encoder Representations from Transformers)
2. RoBERTa (Robustly optimized BERT approach)
3. XLNet (Extreme Language Modeling)
4. DistilBERT (Distilled BERT)


### **Tools and Libraries:**
1. NLTK (Natural Language Toolkit)
2. spaCy (Modern NLP library)
3. scikit-learn (Machine learning library)
4. TensorFlow or PyTorch (Deep learning libraries)
5. Rasa or Dialogflow (Chatbot development frameworks)



In [41]:
# Summary
summary_text = [
    'Initiate Trips table provides information about User login, Landing page, My Trip list, and Create new Trip.',
    'Driver Manager Actions table provides information about actions such as adding a driver, editing driver details, adding new trucks.',
    'Driver / Dispatcher table provides information about the Driver / Dispatcher List page, Truck Status, and Pre-Trip Checklist.',
    'Dispatcher Actions table provides information about actions related to dispatchers, including the Dispatcher List page.',
    'Compliance Assistant table provides information about the Compliance Assistant Trip List, Truck List, and checklists.',
    'Compliance Supervisor table provides information about the Compliance Supervisor Trip List and checklists.',
    'HV Assigner table provides information about Truck Assignment Board, Trip Detail, and Assign Load.',
    'HV Assigner Manager table provides information about Trip Approval Status Board and Trip Detail.',
    'Planning & Consolidation table provides information about the Segment Requester.',
    "JMS Coordinator table provides information about the JMS Coordinator List and the JMS id Entry form.",
    'Receiving table provides information about the HV Receiver Trip List and the HV Receiver Checklist.'
]
summary_questions = [
    'What is the Initiate Trips table about?',
    'What actions can the Driver Manager perform?',
    'What does the Driver / Dispatcher table provide?',
    'What actions does the Dispatcher Actions table cover?',
    'What does the Compliance Assistant table provide?',
    'What information does the Compliance Supervisor table contain?',
    'What does the HV Assigner table cover?',
    'What does the HV Assigner Manager table contain?',
    'What information is in the Planning & Consolidation table?',
    "What does the JMS Coordinator table include?",
    'What information is in the Receiving table?'
]

# Questions answeres
# Merged Dataset
questions = [
    {
        'question': "Hi",
        'answer': "Hello! How can I assist you today?"
    },
    {
        'question': "Hello",
        'answer': "Hi there! What can I do for you?"
    },
    {
        'question': "How are you?",
        'answer': "I'm just a program, but I'm doing great! How about you?"
    },
    {
        'question': "What is your name?",
        'answer': "I'm your friendly chatbot!"
    },
    {
        'question': "Tell me a joke.",
        'answer': "Why don't scientists trust atoms? Because they make up everything!"
    },
    {
        'question': "How fast are you?",
        'answer': "I'm as fast as a program can be!"
    },
    {
        'question': "Goodbye",
        'answer': "Goodbye! Have a great day!"
    },
    {
        'question': "Thank you",
        'answer': "You're welcome!"
    },
    {
        'question': "What can you do?",
        'answer': "I can answer questions and chat with you!"
    },
    {
        'question': 'Can a trip request be edited after submitting it?',
        'answer': "Yes, both you and approver 1 from your segment can edit your fleet request before it has been approved. Go to 'My Trip Requests', select the request you want to edit, and click 'Edit'."
    },
    {
        'question': 'Can I use data of a previously submitted trip in a new trip request?',
        'answer': "Yes. Go to 'My Trip Requests', select the request you want to copy data from and click “copy to new request”. You will be taken to the new trip request form with pre-filled data as per clicked Trip ID."
    },
    {
        'question': 'How do I know if my trip has been approved?',
        'answer': 'You will receive a notification once your request has been approved. You can also check the status in ‘Trip details page.'
    },
    {
        'question': 'What should I do if my submitted trip is denied for approval?',
        'answer': 'If your request is denied, you will receive a notification with the reason. You can resubmit new trip after inclusion of changes mentioned in rejection remarks.'
    },
    {
        'question': 'Who do I contact for technical support?',
        'answer': 'For technical support, please contact our project support team at - Anjani (before 6:30 PM (KSA)) - anjani@oges.co - +91 96 5465 80 01 & Sanskar (after 6:30 PM (KSA)) - sanskar.jain@oges.co - +91 98 2134 88 11'
    },
    {
        'question': 'How can I report application bugs or feature improvement requests in HVMS application?',
        'answer': "Application bugs or feature improvement requests by clicking the 'Feedback' button on the right side."
    },
    {
        'question': 'Are there any training resources available for new users?',
        'answer': 'Yes, for going through overall workflow of HVMS application, you can also download the application workflow help file from the “Help” section.'
    },
    {
        'question': 'User login',
        'answer': '''
                    Step 1- For login user must be connected NESR official network.
                    Step 2- Open the link oges.nesr.com.
                    Step 3 - NESR employee can login with his LDAP id.
                    Step 4 - Click on the login button.
                    Step 5 - Then you will be logged in to the website.
                '''
    },
    {
        'question': 'Landing page',
        'answer': 'When landing on the NESR platform from different applications, to enter the HVMS application, click on the HVMS icon. Then you can access the HVMS platform.'
    },
    {
        'question': 'My Trip list',
        'answer': '''
                    Step 1 - For the Requester page, you will land on the Requester My Trip List page.
                    Step 2 - On the Trip List page, the requester can see all their requested trips.
                '''
    },
    {
        'question': 'Create new Trip',
        'answer': '''
                    Step 1 - Click on the 'Create New Trip' text link, which is located on the menu bar. By clicking on 'Create New Trip', the requester will enter the Trip Request form.
                    Step 2 - Every requester has a different segment. When the requester requests a trip, the trip goes under the requester's segment.
                    Step 3 - In the Requested Date and Time field, the requester can select the date on which the trip will start. 
                    Step 4 - The requester can select the Trip Type (e.g., Tool movement) & Priority (e.g., normal) of the trip. 
                    Step 5 - The user can fill in all the trip details and click the "Submit" button. If the user wants to add more loads, they can click the "Add Load" button to include additional loads in a single trip.
                    Step 6 - If the user does not have all the necessary data, they can save the request as a draft by clicking the "Save as Draft" button.
                    Step 7 - Once the trip is created, it will appear on the list page. On this page, the user can edit the trip by clicking the "Edit" button.
                    Step 8 - Once the user clicks the "Edit" button, they will land on the Edit Trip page. On this page, if the user makes any changes, they can click the "Submit" button. If they do not make any changes and wish to keep the existing data, they can click the "Discard Changes" button. By clicking either button, the user will return to the Trip List page.
                    Step 9 - If the user does not want to proceed with the trip, they can cancel it by clicking the "Cancel" button. After clicking the "Cancel" button, a cancellation popup will appear. In the popup, the terms and conditions will be mentioned. If the user agrees with them, they can click the "Cancel Trip" button, and the trip will be canceled.
                    Step 10 - On this page, only the trips that are saved as drafts will be displayed.
                    Step 11 - To complete those trips, click the "Edit" button, fill in all the data, and then click the "Submit" button. After clicking the "Submit" button, the trip will disappear from this page and will appear on the Trip List page.
                    Step 12 - On this page, all the requested trips are visible.
                    Step 13 - On the Trip Detail page, all the requested trips are visible.
                '''
    }
]


In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from sklearn.linear_model import LogisticRegression
import random
import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer
from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB

# Download required NLTK resources
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Preprocess questions for semantic similarity
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform([q['question'] for q in questions])


# load dataset
def load_dataset(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()

    summary_text = []
    summary_questions = []
    questions = []

    section = None

    for line in lines:
        line = line.strip()
        if line.startswith("# Summary"):
            section = "summary"
            continue
        elif line.startswith("# Questions Answers"):
            section = "questions"
            continue
        
        if section == "summary":
            # Handle summary lines
            if line:
                summary_questions.append(f"What is the {line.split()[0]} table about?")
                summary_text.append(line)
        elif section == "questions":
            # Handle questions and answers
            if line:
                question, answer = line.split(' | ', 1)
                questions.append({'question': question.strip(), 'answer': answer.strip()})

    return summary_questions, summary_text, questions

# Greeting Conversation
def greeting_conversation(user_input: str) -> str:
    # Create a TfidfVectorizer and fit it on the questions
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(normal_questions)
    
    # Train a simple classifier
    model = LogisticRegression()
    model.fit(X, np.arange(len(normal_questions)))
    
    user_input_vec = vectorizer.transform([user_input])
    predicted_index = model.predict(user_input_vec)[0]
    return responses[predicted_index]
    

# Train a simple classifier
model = LogisticRegression()
model.fit(X, np.arange(len(questions)))

# Heuristic Approach
def heuristic_approach(user_input: str) -> str:
    tokens = word_tokenize(user_input, language='english', preserve_line=True)
    lemmatizer = WordNetLemmatizer()

    for question in questions:
        keywords = [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(question['question'], language='english', preserve_line=True)]
        match_count = sum(1 for token in tokens if lemmatizer.lemmatize(token.lower()) in keywords)

        # Adjust the threshold value as needed
        if match_count >= len(keywords) * 0.5:
            return question['answer']

    return None

# Semantic Similarity Approach
def semantic_similarity_approach(user_input: str):
    user_input_vector = tfidf_vectorizer.transform([user_input])
    similarities = cosine_similarity(user_input_vector, tfidf_matrix).flatten()
    
    # Find the index of the most similar question
    answer_index = np.argmax(similarities)
    return questions[answer_index]['answer']

# Machine Learning Approach (if needed)
def machine_learning_approach(user_input: str):
    # Prepare the data
    X = summary_questions  # Features
    y = summary_text  # Labels
    
    # Create a TF-IDF Vectorizer and Multinomial Naive Bayes model
    model = make_pipeline(TfidfVectorizer(), MultinomialNB())
    
    # Fit the model
    model.fit(X, y)
    
    # Function to get response based on user input
    def get_response(user_input: str) -> str:
        response = model.predict([user_input])
        return response[0]
    

# Hybrid Approach
def chatbot(user_input):
    # Check with the heuristic approach first
    answer = heuristic_approach(user_input)
    
    # If no answer found, check with semantic similarity
    if answer is None:
        answer = semantic_similarity_approach(user_input)

    # If still no answer, use the machine learning approach
    if answer is None:
        answer = machine_learning_approach(user_input)

    return answer

# Test Chatbot
if __name__ == "__main__":
    while True:
        user_input = input("Ask your question: ")
        if user_input.lower() in ['none', '', 'thank you']:
            break
        answer = chatbot(user_input)
        print(answer)


In [47]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.naive_bayes import MultinomialNB

# Download required NLTK resources
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Preprocess questions for semantic similarity
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform([q['question'] for q in questions])

# Heuristic Approach
def heuristic_approach(user_input: str) -> str:
    tokens = word_tokenize(user_input, language='english', preserve_line=True)
    lemmatizer = WordNetLemmatizer()
    for question in questions:
        keywords = [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(question['question'], language='english', preserve_line=True)]
        match_count = sum(1 for token in tokens if lemmatizer.lemmatize(token.lower()) in keywords)
        # Adjust the threshold value as needed
        if match_count >= len(keywords) * 0.5:
            return question['answer']
    return None

# Semantic Similarity Approach
def semantic_similarity_approach(user_input: str):
    user_input_vector = tfidf_vectorizer.transform([user_input])
    similarities = cosine_similarity(user_input_vector, tfidf_matrix).flatten()
    
    # Find the index of the most similar question
    max_similarity = np.max(similarities)
    
    # Set threshold: if less than 0.5 (50%), return None
    if max_similarity < 0.5:
        return None
    
    answer_index = np.argmax(similarities)
    return questions[answer_index]['answer']

# Machine Learning Approach for Summary
def machine_learning_approach_summary(user_input: str):
    # Prepare the data
    X = summary_questions  # Features
    y = summary_text      # Labels
    # Create a TF-IDF Vectorizer and Multinomial Naive Bayes model
    model = make_pipeline(TfidfVectorizer(), MultinomialNB())
    # Fit the model
    model.fit(X, y)
    # Predict the response based on user input
    response = model.predict([user_input])
    return response[0]

# Hybrid Approach
def chatbot(user_input):
    # Check with the heuristic approach first
    answer = heuristic_approach(user_input)
    
    # If no answer found, check with semantic similarity
    if answer is None:
        answer = semantic_similarity_approach(user_input)

    # If still no answer, use the machine learning approach for summary questions
    if answer is None:
        answer = machine_learning_approach_summary(user_input)

    if answer is None:
        answer = "Sorry, I am not able to answer."

    return answer

# Test Chatbot
if __name__ == "__main__":
    # summary_questions, summary_text, questions = load_dataset('nesr_chatbot_dataset.txt')
    while True:
        user_input = input("User: ")
        if user_input.lower() in ['none', '', 'bye', 'exit']:
            print("Chatbot: Thank you for using the chatbot.")
            break
        answer = chatbot(user_input)
        print("Chatbot: ", answer)


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\vikas\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\vikas\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\vikas\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


User:  hii


Chatbot:  Compliance Assistant table provides information about the Compliance Assistant Trip List, Truck List, and checklists.


User:  what is hvms


Chatbot:  Receiving table provides information about the HV Receiver Trip List and the HV Receiver Checklist.


User:  How can I use hvms


Chatbot:  Driver Manager Actions table provides information about actions such as adding a driver, editing driver details, adding new trucks.


User:  bye


Chatbot: Thank you for using the chatbot.


In [45]:
!pip freeze > requirements.txt