### **Deep Learning Project Based Learning Group 7 (Roll no 9-12)**

**Title**: Develop a Chatbot using Deep Learning Algorithms


# Step 0: Installation

In [1]:
#Installing the libraries:
!pip install numpy tensorflow scikit-learn



# Step 1: Preparing the Training Data

In [2]:
# Training data (user messages)
train_data = [
    "Hello",
    "How are you?",
    "Good morning",
    "Good evening",
    "Nice to meet you",
    "What's up?",
    "How's your day going?",
    "Greetings!",
    "Good afternoon",
    "How can I assist you?",
    "Pleasure to see you",
    "Is there anything I can help with?",
    "What are you up to?",
    "How's the weather today?",
    "Long time no see!",
    "Hey there!",
    "What have you been doing?",
    "How was your weekend?",
    "Tell me about yourself.",
    "What's your favorite hobby?",
    "Do you like music?",
    "Any plans for today?",
    "Can you recommend a good book?",
    "What's the meaning of life?",
    "How do you define happiness?",
    "Are you a robot?",
    "Where are you from?",
    "What languages do you speak?",
    "What's the capital of France?",
]

# Output data (bot responses)
train_labels = [
    "Hi",
    "I'm fine, how about you?",
    "Good morning to you",
    "Good evening, how can I help you?",
    "Nice to meet you too",
    "Not much, just hanging out",
    "It's going well, thank you",
    "Hello!",
    "Good afternoon to you too",
    "I'm here to assist you",
    "Likewise!",
    "Yes, I have a question",
    "Just chilling, thanks for asking!",
    "It's sunny today, perfect weather!",
    "Indeed, it's been a while!",
    "Hey! How can I assist you today?",
    "Just working on improving myself!",
    "It was relaxing, thanks for asking!",
    "Sure, I'm a chatbot designed to assist you!",
    "I enjoy chatting with users like you!",
    "Yes, I'm programmed to appreciate music!",
    "Nothing specific, just here to help you out!",
    "Certainly! How about 'The Great Gatsby'?",
    "The meaning of life is subjective, what's your take?",
    "Happiness is feeling fulfilled and content.",
    "I'm an AI-powered chatbot!",
    "I exist in the digital realm!",
    "I speak the language of ones and zeros!",
    "The capital of France is Paris!",
]


# Step 2: Data Preprocessing
  
1.   Label Encoding
2.   Tokenisation
3.   Padding

In [3]:
from sklearn.preprocessing import LabelEncoder
from tensorflow import keras

# 1. Label Encoding
# 1.1 The LabelEncoder is fitted on the train_labels to learn the unique labels and assign them numerical values.
label_encoder = LabelEncoder()

# 1.2 The transform method is then used to encode the labels.
encoded_labels = label_encoder.fit_transform(train_labels)

# 2. Tokenisation
# 2.1 The Tokenizer is fitted on the train_data to learn the unique words and assign them integer values.
tokenizer = keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(train_data)

# 2.2 The texts_to_sequences method is used to convert the text data into sequences of integers based on the learned mapping.
train_sequences = tokenizer.texts_to_sequences(train_data)

# 3. Padding
# 3.1 pad_sequences method is used to ensure all sequences have the same length by padding or truncating them.
train_sequences = keras.preprocessing.sequence.pad_sequences(train_sequences)


# Step 3: Building and Training the Chatbot Model

**Layers:**

**1.   Embedding Layer**: Maps the input sequence of integers to a dense vector representation.

**2.   Flaten Layer**: Convert the multi-dimensional output of the Embedding layer into a one-dimensional vector

**3.   Dense layer with ReLU activation**:  Introduces non-linearity to the model, allowing it to learn complex patterns in the data.

**4.   Output Layer with Softmax Activation Function**: Produces probability scores for each class label


**Compile the model using**
1. Adam optimizer
2. Sparse categorical crossentropy loss function

**Training model using fit method with following parameters**
1. input sequences
2. encoded labels
3. epochs (number of times the model will iterate over the entire training dataset)

In [4]:
# We will use an Embedding layer to handle the text data.
# The Embedding layer maps the input sequence of integers to a dense vector representation.
model = keras.models.Sequential()

# The Embedding layer takes two arguments:
# 1. Vocabulary size given by (len(tokenizer.word_index) + 1), which is determined by the total number of unique words in our training data
# 2. Embedding dimension (100 in this case), which specifies the size of the dense vector representation for each word.
model.add(keras.layers.Embedding(len(tokenizer.word_index) + 1, 100, input_length=train_sequences.shape[1]))

# The Flatten layer is added to convert the multi-dimensional output of the Embedding layer into a one-dimensional vector.
model.add(keras.layers.Flatten())

# The dense layer with ReLU activation introduces non-linearity to the model, allowing it to learn complex patterns in the data.
model.add(keras.layers.Dense(64, activation='relu'))

# Output Layer with Softmax Activation Function: Produces probability scores for each class label
model.add(keras.layers.Dense(len(train_labels), activation='softmax'))

# Compile the model using the Adam optimizer and the sparse categorical crossentropy loss function, which is suitable for multi-class classification problems.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Training model using fit method with parameters (input sequences, encoded labels, epochs)
model.fit(train_sequences, encoded_labels, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7f8ac9947a90>

# Step 4: Generating Responses

In [5]:
import numpy as np

# takes input "text"
def generate_response(text):

    # tokenizes the input
    sequence = tokenizer.texts_to_sequences([text])

    # pads the sequence
    sequence = keras.preprocessing.sequence.pad_sequences(sequence, maxlen=train_sequences.shape[1])

    # makes predictions using the trained model
    prediction = model.predict(sequence)

    # predicted label is then converted back into its original text form using the inverse_transform method of the LabelEncoder
    predicted_label = np.argmax(prediction)
    response = label_encoder.inverse_transform([predicted_label])[0]
    return response

In [None]:
# continuously prompts the user for input and generates responses using the generate_response function
while True:
    user_input = input("Enter a message: ")
    response = generate_response(user_input)
    print("ChatBot: ", response)


Enter a message: Hello
ChatBot:  Hi
Enter a message: Tell me about yourself
ChatBot:  Sure, I'm a chatbot designed to assist you!
Enter a message: What are you up to?
ChatBot:  Just chilling, thanks for asking!
Enter a message: Do you like music?
ChatBot:  Yes, I'm programmed to appreciate music!
Enter a message: What's the capital of France?
ChatBot:  The capital of France is Paris!
Enter a message: What languages do you speak?
ChatBot:  I speak the language of ones and zeros!
Enter a message: Can you recommend a good book?
ChatBot:  Certainly! How about 'The Great Gatsby'?
Enter a message: weather today?
ChatBot:  Hi
