<a href="https://colab.research.google.com/github/ashishmission93/ML-PTOJECTS/blob/main/deep_learning_system_capable_of_achieving_human_level_performance_in_open_domain_conversation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Designing a machine learning and deep learning system capable of achieving human-level performance in open-domain conversation, surpassing the Turing Test threshold, while also being able to adapt and evolve its conversational abilities dynamically in real-time based on user feedback, cultural nuances, and contextual understanding across multiple languages and dialects.

In [20]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample conversation data
conversations = [
    ("Hello.", "Hi there!"),
    ("How are you?", "I'm good, thank you."),
    ("What's your name?", "I am an AI chatbot."),
    ("Nice to meet you!", "Likewise!"),
    ("What can you do?", "I can answer questions, tell jokes, and engage in conversation."),
    ("Tell me a joke.", "Why don't scientists trust atoms? Because they make up everything!"),
    ("Do you dream?", "No, I don't sleep, so I can't dream."),
    ("Where are you from?", "I was created by a team of developers."),
    ("What's the weather like?", "I'm sorry, I can't check the weather."),
    ("What's the meaning of life?", "The meaning of life is a philosophical question. Some say it's 42."),
    ("How old are you?", "I don't have an age, as I am just a computer program."),
    ("What's your favorite color?", "I don't have preferences, but I like the concept of colors."),
    ("Are you sentient?", "No, I am not sentient. I am a machine learning model."),
    ("Can you learn?", "I can improve over time with more data and training."),
    ("What's the capital of France?", "The capital of France is Paris."),
    ("Tell me about yourself.", "I am an AI chatbot designed to engage in conversation."),
    ("Are you a robot?", "In a way, yes. I exist as a program running on a computer."),
    ("Do you have emotions?", "No, I don't have emotions. I process information logically."),
    ("What's the best movie?", "That's subjective, but some popular choices include The Godfather and The Shawshank Redemption."),
    ("What's the square root of 144?", "The square root of 144 is 12."),
    ("Can you sing?", "I can't sing, but I can provide lyrics if you'd like."),
    ("Tell me something interesting.", "Did you know that the shortest war in history was between Britain and Zanzibar in 1896? It lasted only 38 minutes!"),
    ("Are you tired?", "I don't get tired like humans do."),
    ("What's the tallest mountain?", "Mount Everest is the tallest mountain in the world."),
    ("What languages do you speak?", "I can understand and communicate in multiple languages."),
    ("Tell me a story.", "Once upon a time, in a faraway land, there was a magical kingdom..."),
    ("What's your favorite food?", "As an AI, I don't have preferences for food."),
    ("What's the speed of light?", "The speed of light in a vacuum is approximately 299,792 kilometers per second."),
    ("Can you solve equations?", "Yes, I can solve mathematical equations."),
    ("What's the largest animal?", "The blue whale is the largest animal on Earth."),
    ("What's the longest river?", "The Nile River is the longest river in the world."),
    ("What's the capital of Japan?", "The capital of Japan is Tokyo."),
    ("What's the boiling point of water?", "The boiling point of water is 100 degrees Celsius."),
    ("Can you tell me about AI?", "Artificial intelligence is the simulation of human intelligence by machines."),
    ("Who is the president of the United States?", "As of my last update, the president of the United States is Joe Biden."),
    ("What's your favorite book?", "I don't have preferences for books, but I can recommend some popular titles."),
    ("What's the population of China?", "China has the largest population in the world, with over 1.4 billion people."),
    ("What's the largest desert?", "The largest desert in the world is the Sahara Desert."),
    ("Tell me a riddle.", "What has keys but can't open locks?"),
    ("What's your favorite sport?", "As an AI, I don't have preferences for sports."),
    ("Can you tell me a bedtime story?", "Once upon a time, in a magical forest, there lived a family of..."),
    ("What's the seventh planet from the sun?", "The seventh planet from the sun is Uranus."),
    ("What's the chemical symbol for water?", "The chemical symbol for water is H2O."),
    ("Can you recommend a movie?", "Sure! How about watching The Matrix?"),
    ("Tell me about the universe.", "The universe is vast and contains billions of galaxies, stars, and planets."),
    ("What's the capital of Brazil?", "The capital of Brazil is Brasília."),
    ("Can you tell me a fun fact?", "Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still perfectly edible!"),
    ("What's the largest country by land area?", "Russia is the largest country by land area."),
    ("Can you tell me a tongue twister?", "Sure! How about this one: Peter Piper picked a peck of pickled peppers."),
    ("What's the smallest planet?", "Mercury is the smallest planet in our solar system."),
    ("What's the chemical symbol for gold?", "The chemical symbol for gold is Au."),
    ("Can you tell me a myth?", "In Greek mythology, Icarus was the son of Daedalus who flew too close to the sun with wings made of feathers and wax."),
    ("What's the tallest building?", "The tallest building in the world is the Burj Khalifa in Dubai, United Arab Emirates."),
    ("What's the speed of sound?", "The speed of sound in air at sea level is approximately 343 meters per second."),
    ("What's the capital of Italy?", "The capital of Italy is Rome."),
    ("Can you tell me about dinosaurs?", "Dinosaurs were a diverse group of reptiles that lived millions of years ago."),
    ("What's the largest ocean?", "The Pacific Ocean is the largest ocean on Earth."),
    ("Can you tell me about the human brain?", "The human brain is a complex organ that controls our thoughts, emotions, and actions."),
    ("What's the largest city by population?", "Tokyo, Japan, is the largest city in the world by population."),
    ("What's the boiling point of mercury?", "The boiling point of mercury is 356.7 degrees Celsius."),
    ("Can you tell me a famous quote?", "Sure! Here's one from Albert Einstein: 'Imagination is more important than knowledge.'"),
    ("What's the deepest ocean?", "The deepest ocean in the world is the Pacific Ocean's Mariana Trench."),
    ("Can you tell me about the Great Wall of China?", "The Great Wall of China is a series of fortifications made of stone, brick, tamped earth, wood, and other materials."),
    ("What's the chemical symbol for oxygen?", "The chemical symbol for oxygen is O."),
    ("Can you tell me about famous scientists?", "Sure! There are many famous scientists, including Albert Einstein, Isaac Newton, and Marie Curie."),
    ("What's the circumference of the Earth?", "The equatorial circumference of the Earth is approximately 40,075 kilometers."),
    ("Can you tell me a nursery rhyme?", "Sure! How about 'Twinkle, Twinkle, Little Star'?"),
    ("What's the capital of Australia?", "The capital of Australia is Canberra."),
    ("Can you tell me about black holes?", "Black holes are regions of spacetime where gravity is so strong that nothing, not even light, can escape."),
    ("What's the largest moon in the solar system?", "Ganymede, a moon of Jupiter, is the largest moon in the solar system."),
    ("What's the chemical symbol for carbon?", "The chemical symbol for carbon is C."),
    ("Can you tell me a famous painting?", "Sure! One of the most famous paintings is the Mona Lisa by Leonardo da Vinci."),
    ("What's the smallest bone in the human body?", "The stapes bone in the middle ear is the smallest bone in the human body."),
    ("Can you tell me a mythological creature?", "Sure! How about the phoenix, a mythical bird that is said to be reborn from its own ashes?"),
    ("What's the capital of South Africa?", "The capital of South Africa is Pretoria."),
    ("Can you tell me about space exploration?", "Space exploration is the discovery and exploration of outer space by means of space technology."),
    ("What's the chemical symbol for silver?", "The chemical symbol for silver is Ag."),
    ("Can you tell me about renewable energy?", "Renewable energy is energy that is collected from renewable resources, which are naturally replenished on a human timescale."),
    ("What's the population of India?", "India has the second-largest population in the world, with over 1.3 billion people."),
    ("Can you tell me about famous inventors?", "Sure! There are many famous inventors, including Thomas Edison, Alexander Graham Bell, and Nikola Tesla."),
    ("What's the smallest country in the world?", "The smallest country in the world is Vatican City."),
    ("Can you tell me about famous landmarks?", "Sure! There are many famous landmarks, including the Eiffel Tower, the Taj Mahal, and the Statue of Liberty."),
    ("What's the chemical symbol for helium?", "The chemical symbol for helium is He."),
    ("Can you tell me about the human heart?", "The human heart is a muscular organ that pumps blood throughout the body."),
    ("What's the circumference of the Moon?", "The circumference of the Moon is approximately 10,921 kilometers."),
    ("Can you tell me a famous speech?", "Sure! One famous speech is Martin Luther King Jr.'s 'I Have a Dream' speech."),
    ("What's the largest desert in North America?", "The largest desert in North America is the Great Basin Desert."),
    ("Can you tell me about the periodic table?", "The periodic table is a tabular arrangement of the chemical elements."),
    ("What's the capital of Canada?", "The capital of Canada is Ottawa."),
    ("Can you tell me about the human skeleton?", "The human skeleton is the internal framework of the body, consisting of bones, cartilage, and other connective tissues."),
    ("What's the population of Russia?", "Russia has the ninth-largest population in the world, with over 144 million people."),
    ("Can you tell me about famous landmarks?", "Sure! There are many famous landmarks, including the Great Wall of China, the Pyramids of Giza, and Machu Picchu."),
    ("What's the chemical symbol for sodium?", "The chemical symbol for sodium is Na."),
    ("Can you tell me about the Industrial Revolution?", "The Industrial Revolution was a period of major industrialization that transformed society from agrarian to industrial."),
    ("What's the circumference of the Sun?", "The circumference of the Sun is approximately 4,370,000 kilometers."),
    ("Can you tell me about famous musicians?", "Sure! There are many famous musicians, including Mozart, Beethoven, and The Beatles."),
    ("What's the highest mountain in Africa?", "The highest mountain in Africa is Mount Kilimanjaro."),
    ("Can you tell me about the human respiratory system?", "The human respiratory system is a series of organs responsible for taking in oxygen and expelling carbon dioxide."),
    ("What's the chemical symbol for potassium?", "The chemical symbol for potassium is K."),
    ("Can you tell me about the Silk Road?", "The Silk Road was a network of trade routes connecting the East and West."),
    ("What's the capital of China?", "The capital of China is Beijing."),
    ("Can you tell me about the Big Bang theory?", "The Big Bang theory is the prevailing cosmological model explaining the existence of the observable universe from the earliest known periods through its subsequent large-scale evolution."),
    ("What's the chemical symbol for nitrogen?", "The chemical symbol for nitrogen is N."),
    ("Can you tell me about famous philosophers?", "Sure! There are many famous philosophers, including Socrates, Plato, and Aristotle."),
    ("What's the longest river in Asia?", "The longest river in Asia is the Yangtze River."),
    ("Can you tell me about the history of photography?", "The history of photography spans nearly 200 years, from the first photograph to the digital photography of today."),
    ("What's the chemical symbol for iron?", "The chemical symbol for iron is Fe."),
    ("Can you tell me about the theory of relativity?", "The theory of relativity, developed by Albert Einstein, describes the fundamental interaction of gravitation as a geometric property of space and time."),
    ("What's the largest lake in Africa?", "The largest lake in Africa is Lake Victoria."),
    ("Can you tell me about the human digestive system?", "The human digestive system is a series of organs responsible for breaking down food and absorbing nutrients."),
    ("What's the chemical symbol for carbon dioxide?", "The chemical symbol for carbon dioxide is CO2."),
    ("Can you tell me about famous mathematicians?", "Sure! There are many famous mathematicians, including Pythagoras, Euclid, and Isaac Newton."),
    ("What's the deepest lake in the world?", "The deepest lake in the world is Lake Baikal."),
    ("Can you tell me about famous astronauts?", "Sure! There are many famous astronauts, including Neil Armstrong, Buzz Aldrin, and Sally Ride."),
    ("What's the chemical symbol for lead?", "The chemical symbol for lead is Pb."),
    ("Can you tell me about the history of cinema?", "The history of cinema dates back to the late 19th century, with the invention of motion picture cameras."),
    ("What's the chemical symbol for calcium?", "The chemical symbol for calcium is Ca."),
    ("Can you tell me about famous explorers?", "Sure! There are many famous explorers, including Christopher Columbus, Marco Polo, and Ferdinand Magellan."),
    ("What's the deepest cave in the world?", "The deepest cave in the world is Krubera Cave."),
    ("Can you tell me about famous artists?", "Sure! There are many famous artists, including Leonardo da Vinci, Michelangelo, and Vincent van Gogh."),
    ("What's the chemical symbol for copper?", "The chemical symbol for copper is Cu."),
    ("Can you tell me about the history of aviation?", "The history of aviation dates back to the invention of the hot air balloon in the 18th century."),
    ("What's the chemical symbol for mercury?", "The chemical symbol for mercury is Hg."),
    ("Can you tell me about famous battles?", "Sure! There are many famous battles, including the Battle of Thermopylae, the Battle of Hastings, and the Battle of Gettysburg."),
    ("What's the chemical symbol for silver?", "The chemical symbol for silver is Ag."),
    ("Can you tell me about famous authors?", "Sure! There are many famous authors, including William Shakespeare, Charles Dickens, and Jane Austen."),
    ("What's the chemical symbol for tin?", "The chemical symbol for tin is Sn."),
    ("Can you tell me about famous revolutions?", "Sure! There are many famous revolutions, including the French Revolution, the American Revolution, and the Russian Revolution."),
    ("What's the chemical symbol for uranium?", "The chemical symbol for uranium is U."),
    ("Can you tell me about famous inventions?", "Sure! There are many famous inventions, including the light bulb, the telephone, and the internet."),
    ("What's the chemical symbol for zinc?", "The chemical symbol for zinc is Zn."),
    ("Can you tell me about famous dictators?", "Sure! There are many famous dictators, including Adolf Hitler, Joseph Stalin, and Mao Zedong."),
    ("What's the chemical symbol for arsenic?", "The chemical symbol for arsenic is As."),
    ("Can you tell me about famous astronomers?", "Sure! There are many famous astronomers, including Galileo Galilei, Johannes Kepler, and Carl Sagan."),
    ("What's the chemical symbol for boron?", "The chemical symbol for boron is B."),
    ("Can you tell me about famous revolutions?", "Sure! There are many famous revolutions, including the French Revolution, the American Revolution, and the Russian Revolution."),
    ("What's the chemical symbol for mercury?", "The chemical symbol for mercury is Hg."),
    ("Can you tell me about famous inventors?", "Sure! There are many famous inventors, including Thomas Edison, Alexander Graham Bell, and Nikola Tesla."),
    ("What's the chemical symbol for oxygen?", "The chemical symbol for oxygen is O."),
    ("Can you tell me about famous landmarks?", "Sure! There are many famous landmarks, including the Eiffel Tower, the Taj Mahal, and the Statue of Liberty."),
    ("What's the chemical symbol for helium?", "The chemical symbol for helium is He."),
    ("Can you tell me about famous musicians?", "Sure! There are many famous musicians, including Mozart, Beethoven, and The Beatles."),
    ("What's the chemical symbol for sodium?", "The chemical symbol for sodium is Na."),
    ("Can you tell me about famous philosophers?", "Sure! There are many famous philosophers, including Socrates, Plato, and Aristotle."),
    ("What's the chemical symbol for potassium?", "The chemical symbol for potassium is K."),
    ("Can you tell me about famous painters?", "Sure! There are many famous painters, including Leonardo da Vinci, Vincent van Gogh, and Pablo Picasso."),
    ("What's the chemical symbol for carbon?", "The chemical symbol for carbon is C."),
    ("Can you tell me about famous writers?", "Sure! There are many famous writers, including William Shakespeare, Charles Dickens, and Jane Austen."),
    ("What's the chemical symbol for nitrogen?", "The chemical symbol for nitrogen is N."),
    ("Can you tell me about famous sculptors?", "Sure! There are many famous sculptors, including Michelangelo, Auguste Rodin, and Donatello."),
    ("What's the chemical symbol for gold?", "The chemical symbol for gold is Au."),
    ("Can you tell me about famous architects?", "Sure! There are many famous architects, including Frank Lloyd Wright, Antoni Gaudí, and Zaha Hadid."),
    ("What's the chemical symbol for silver?", "The chemical symbol for silver is Ag."),
    ("Can you tell me about famous mathematicians?", "Sure! There are many famous mathematicians, including Pythagoras, Euclid, and Isaac Newton."),
    ("What's the chemical symbol for iron?", "The chemical symbol for iron is Fe."),
    ("Can you tell me about famous chemists?", "Sure! There are many famous chemists, including Marie Curie, Dmitri Mendeleev, and Linus Pauling."),
    ("What's the chemical symbol for calcium?", "The chemical symbol for calcium is Ca."),
    ("Can you tell me about famous biologists?", "Sure! There are many famous biologists, including Charles Darwin, Gregor Mendel, and Jane Goodall."),
    ("What's the chemical symbol for lead?", "The chemical symbol for lead is Pb."),
    ("Can you tell me about famous physicists?", "Sure! There are many famous physicists, including Albert Einstein, Isaac Newton, and Stephen Hawking."),
    ("What's the chemical symbol for copper?", "The chemical symbol for copper is Cu."),
    ("Can you tell me about famous engineers?", "Sure! There are many famous engineers, including Leonardo da Vinci, Nikola Tesla, and Thomas Edison."),
    ("What's the chemical symbol for tin?", "The chemical symbol for tin is Sn."),
    ("Can you tell me about famous psychologists?", "Sure! There are many famous psychologists, including Sigmund Freud, B.F. Skinner, and Carl Jung."),
    ("What's the chemical symbol for mercury?", "The chemical symbol for mercury is Hg."),
    ("Can you tell me about famous sociologists?", "Sure! There are many famous sociologists, including Karl Marx, Max Weber, and Émile Durkheim."),
    ("What's the chemical symbol for arsenic?", "The chemical symbol for arsenic is As."),
    ("Can you tell me about famous anthropologists?", "Sure! There are many famous anthropologists, including Margaret Mead, Franz Boas, and Claude Lévi-Strauss."),
    ("What's the chemical symbol for boron?", "The chemical symbol for boron is B."),
    ("Can you tell me about famous economists?", "Sure! There are many famous economists, including Adam Smith, John Maynard Keynes, and Milton Friedman."),
    # Add more conversation pairs
]

# Extract text from tuples
questions, answers = zip(*conversations)

# Tokenization
tokenizer = Tokenizer(filters='')
tokenizer.fit_on_texts(list(questions) + list(answers))
vocab_size = len(tokenizer.word_index) + 1

# Add <start> and <end> tokens
tokenizer.word_index['<start>'] = vocab_size
tokenizer.word_index['<end>'] = vocab_size + 1
vocab_size += 2

# Convert text to sequences
X_sequences = tokenizer.texts_to_sequences(questions)
y_sequences = tokenizer.texts_to_sequences(answers)

# Padding sequences
max_length = max(max(len(seq) for seq in X_sequences), max(len(seq) for seq in y_sequences))
X_padded = pad_sequences(X_sequences, maxlen=max_length, padding='post')
y_padded = pad_sequences(y_sequences, maxlen=max_length, padding='post')

# Define model architecture
latent_dim = 256

encoder_inputs = Input(shape=(max_length,))
encoder_inputs_reshaped = tf.expand_dims(encoder_inputs, axis=-1)
encoder = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs_reshaped)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(1,))
decoder_inputs_reshaped = tf.expand_dims(decoder_inputs, axis=-1)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs_reshaped, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Training the model
model.fit([X_padded, y_padded], y_padded, batch_size=64, epochs=100)

# Inference
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs_reshaped, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

def respond(input_text):
    input_seq = tokenizer.texts_to_sequences([input_text])
    input_seq = pad_sequences(input_seq, maxlen=max_length, padding='post')
    input_seq = np.expand_dims(input_seq, axis=-1)
    states_value = encoder_model.predict(input_seq)

    target_seq = np.zeros((1, 1))
    target_seq[0, 0] = tokenizer.word_index['<start>']

    stop_condition = False
    response = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        if sampled_token_index == 0:  # Check for out-of-vocabulary token
            sampled_word = '<unk>'  # Handle unknown token
        else:
            sampled_word = tokenizer.index_word.get(sampled_token_index, '<unk>')  # Get word or handle unknown token

        if sampled_word == '<end>' or len(response.split()) > max_length:
            stop_condition = True
        else:
            response += sampled_word + ' '

        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index
        states_value = [h, c]

    return response

# Example usage
user_input = "What's your name?"
response = respond(user_input)
print("Response:", response)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [21]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample conversation data
conversations = [
    ("Hello.", "Hi there!"),
    ("How are you?", "I'm good, thank you."),
    ("What's your name?", "I am an AI chatbot."),
    # Add more conversation pairs
]

# Extract text from tuples
questions, answers = zip(*conversations)

# Tokenization
tokenizer = Tokenizer(filters='')
tokenizer.fit_on_texts(list(questions) + list(answers))
vocab_size = len(tokenizer.word_index) + 1

# Add <start> and <end> tokens
tokenizer.word_index['<start>'] = vocab_size
tokenizer.word_index['<end>'] = vocab_size + 1
vocab_size += 2

# Convert text to sequences
X_sequences = tokenizer.texts_to_sequences(questions)
y_sequences = tokenizer.texts_to_sequences(answers)

# Padding sequences
max_length = max(max(len(seq) for seq in X_sequences), max(len(seq) for seq in y_sequences))
X_padded = pad_sequences(X_sequences, maxlen=max_length, padding='post')
y_padded = pad_sequences(y_sequences, maxlen=max_length, padding='post')

# Define model architecture
latent_dim = 256

encoder_inputs = Input(shape=(max_length,))
encoder_inputs_reshaped = tf.expand_dims(encoder_inputs, axis=-1)
encoder = LSTM(latent_dim, return_sequences=True, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs_reshaped)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(1,))
decoder_inputs_reshaped = tf.expand_dims(decoder_inputs, axis=-1)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs_reshaped, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# Training the model
model.fit([X_padded, y_padded], y_padded, batch_size=64, epochs=100)

# Inference
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs_reshaped, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

def respond(input_text):
    input_seq = tokenizer.texts_to_sequences([input_text])
    input_seq = pad_sequences(input_seq, maxlen=max_length, padding='post')
    input_seq = np.expand_dims(input_seq, axis=-1)
    states_value = encoder_model.predict(input_seq)

    target_seq = np.zeros((1, 1))
    target_seq[0, 0] = tokenizer.word_index['<start>']

    stop_condition = False
    response = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        if sampled_token_index == 0:  # Check for out-of-vocabulary token
            sampled_word = '<unk>'  # Handle unknown token
        else:
            sampled_word = tokenizer.index_word.get(sampled_token_index, '<unk>')  # Get word or handle unknown token

        if sampled_word == '<end>' or len(response.split()) > max_length:
            stop_condition = True
        else:
            response += sampled_word + ' '

        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index
        states_value = [h, c]

    return response

# Example usage
user_input = "Hy how are you"
response = respond(user_input)
print("Response:", response)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78