<a href="https://colab.research.google.com/github/HasanNayon/Deep-Learning/blob/main/LSTM_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
faqs = """Artificial intelligence (AI) has rapidly evolved over the past few decades, revolutionizing various industries and significantly impacting human lives.
One of the most exciting advancements in AI is natural language processing (NLP), which enables machines to understand, generate, and interact with human language.
From virtual assistants like Siri and Alexa to sophisticated chatbots and language translation tools, NLP has made human-computer interaction more seamless than ever before.
Deep learning, a powerful subset of machine learning, plays a crucial role in these advancements by allowing AI models to learn patterns from vast amounts of text data.
Recurrent Neural Networks (RNNs) and Transformer models, such as GPT (Generative Pre-trained Transformer), have demonstrated remarkable performance in text prediction, machine translation, and sentiment analysis.
These models rely on vast datasets and complex architectures to predict the next word in a sentence with high accuracy, making them valuable for applications such as auto-complete, spell checking, and AI-driven content creation.

As AI-powered language models continue to improve, their applications expand into education, healthcare, customer service, and content generation.
In education, AI-driven tutoring systems provide personalized learning experiences, adapting to students' strengths and weaknesses.
In healthcare, AI assists doctors by analyzing medical records, predicting potential diagnoses, and even generating patient reports.
Businesses leverage AI chatbots to enhance customer interactions, offering quick responses and support without human intervention.
Despite these advancements, challenges remain, including ethical concerns, bias in AI models, and the potential misuse of AI-generated content.
Researchers are working on developing fair, unbiased, and responsible AI systems that align with ethical standards.
With continuous improvements in deep learning and computational power,
the future of AI-driven text prediction holds immense potential, reshaping how humans interact with technology and unlocking new possibilities in artificial intelligence
"""

In [2]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [3]:
tokenizer = Tokenizer()

In [4]:
tokenizer.fit_on_texts([faqs])

In [5]:
len(tokenizer.word_index)

191

In [6]:
input_sequences = []
for sentence in faqs.split('\n'):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])

In [7]:
input_sequences

[[20, 21],
 [20, 21, 2],
 [20, 21, 2, 22],
 [20, 21, 2, 22, 42],
 [20, 21, 2, 22, 42, 43],
 [20, 21, 2, 22, 42, 43, 44],
 [20, 21, 2, 22, 42, 43, 44, 5],
 [20, 21, 2, 22, 42, 43, 44, 5, 45],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49, 50],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49, 50, 1],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49, 50, 1, 51],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49, 50, 1, 51, 52],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49, 50, 1, 51, 52, 9],
 [20, 21, 2, 22, 42, 43, 44, 5, 45, 46, 47, 48, 49, 50, 1, 51, 52, 9, 53],
 [54, 6],
 [54, 6, 5],
 [54, 6, 5, 55],
 [54, 6, 5, 55, 56],
 [54, 6, 5, 55, 56, 12],
 [54, 6, 5, 55, 56, 12, 3],
 [54, 6, 5, 55, 56, 12, 3, 2],
 [54, 6, 5, 55, 56, 12, 3, 2, 57],
 [54, 6, 5, 55, 56, 12, 3, 2, 57, 58],
 [54, 6

In [8]:
max_len = max([len(x) for x in input_sequences])

In [9]:
max_len

36

In [10]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len, padding='pre')

In [11]:
padded_input_sequences

array([[  0,   0,   0, ...,   0,  20,  21],
       [  0,   0,   0, ...,  20,  21,   2],
       [  0,   0,   0, ...,  21,   2,  22],
       ...,
       [  0,   0,   0, ..., 190, 191,   3],
       [  0,   0,   0, ..., 191,   3,  20],
       [  0,   0,   0, ...,   3,  20,  21]], dtype=int32)

In [12]:
X = padded_input_sequences[:,:-1]

In [13]:
y = padded_input_sequences[:,-1]

In [14]:
X

array([[  0,   0,   0, ...,   0,   0,  20],
       [  0,   0,   0, ...,   0,  20,  21],
       [  0,   0,   0, ...,  20,  21,   2],
       ...,
       [  0,   0,   0, ..., 189, 190, 191],
       [  0,   0,   0, ..., 190, 191,   3],
       [  0,   0,   0, ..., 191,   3,  20]], dtype=int32)

In [15]:
y

array([ 21,   2,  22,  42,  43,  44,   5,  45,  46,  47,  48,  49,  50,
         1,  51,  52,   9,  53,   6,   5,  55,  56,  12,   3,   2,  57,
        58,  10,  59,  23,  60,  61,  62,   4,  63,  64,   1,  24,   7,
         9,  10,  65,  66,  67,  68,   1,  69,   4,  70,  26,   1,  10,
        27,  71,  23,  22,  72,   9,  73,  74,  75,  76,  77,  78,  79,
        11,  13,  80,  81,   6,  29,  11,  82,  13,  83,  84,   3,  14,
        12,  30,  85,   2,   8,   4,  86,  87,  25,  31,  88,   6,  15,
        89,  91,  92,  93,   1,  32,   8,  33,  16,  94,  95,  96,  97,
        32,  98,  99, 100, 101,   3,  15,  34,  29,  27,   1, 102, 103,
         8, 104,  35,  31, 105,   1, 106, 107,   4, 108,   5, 109, 110,
         3,  13, 111,   7, 112, 113, 114, 115, 116, 117,  36,  33,  16,
       118, 119, 120, 121,   1,   2,  17,  18, 122,   2, 123,  10,   8,
       124,   4, 125, 126,  36, 127, 128,  37,  38,  39, 129,   1,  18,
       130,  37,   2,  17, 131,  40, 132, 133,  11, 134, 135,   

In [16]:
X.shape

(280, 35)

In [17]:
y.shape

(280,)

In [19]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y,num_classes=192)

In [20]:
y.shape

(280, 192)

In [21]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM,Dense

In [22]:
model= Sequential()
model.add(Embedding(192,100,input_length=56))
model.add(LSTM(150))
model.add(Dense(192,activation='softmax'))



In [23]:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [24]:
model.summary()

In [26]:
model.fit(X,y,epochs=200)

Epoch 1/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9989 - loss: 0.1635
Epoch 2/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9979 - loss: 0.1525
Epoch 3/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.9924 - loss: 0.1495
Epoch 4/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9984 - loss: 0.1430
Epoch 5/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9928 - loss: 0.1479
Epoch 6/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9928 - loss: 0.1401
Epoch 7/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9973 - loss: 0.1341
Epoch 8/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9973 - loss: 0.1303
Epoch 9/200
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m

<keras.src.callbacks.history.History at 0x7bd371096750>

In [28]:
import time
import numpy as np

text = "Machine learning is"

for i in range(10):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=56, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      print(text)
      time.sleep(2)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 197ms/step
Machine learning is powerful
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Machine learning is powerful subset
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Machine learning is powerful subset of
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Machine learning is powerful subset of machine
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step
Machine learning is powerful subset of machine learning
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Machine learning is powerful subset of machine learning plays
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Machine learning is powerful subset of machine learning plays a
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
Machine learning is powerful subset of machine learning plays a crucial
[1m1/1[0m [32m━

In [29]:
tokenizer.word_index

{'and': 1,
 'ai': 2,
 'in': 3,
 'to': 4,
 'the': 5,
 'of': 6,
 'with': 7,
 'models': 8,
 'human': 9,
 'language': 10,
 'learning': 11,
 'advancements': 12,
 'a': 13,
 'these': 14,
 'text': 15,
 'as': 16,
 'driven': 17,
 'content': 18,
 'potential': 19,
 'artificial': 20,
 'intelligence': 21,
 'has': 22,
 'nlp': 23,
 'interact': 24,
 'from': 25,
 'chatbots': 26,
 'translation': 27,
 'deep': 28,
 'machine': 29,
 'by': 30,
 'vast': 31,
 'transformer': 32,
 'such': 33,
 'prediction': 34,
 'on': 35,
 'applications': 36,
 'education': 37,
 'healthcare': 38,
 'customer': 39,
 'systems': 40,
 'ethical': 41,
 'rapidly': 42,
 'evolved': 43,
 'over': 44,
 'past': 45,
 'few': 46,
 'decades': 47,
 'revolutionizing': 48,
 'various': 49,
 'industries': 50,
 'significantly': 51,
 'impacting': 52,
 'lives': 53,
 'one': 54,
 'most': 55,
 'exciting': 56,
 'is': 57,
 'natural': 58,
 'processing': 59,
 'which': 60,
 'enables': 61,
 'machines': 62,
 'understand': 63,
 'generate': 64,
 'virtual': 65,
 'assis