In [5]:
dataset= """ Hey, how's your day going?
 It's been pretty busy, but good. How about you?
 I'm just relaxing at home, nothing much going on.
 Have you seen the new movie that came out last week?
 No, not yet! Is it worth watching?
 Yeah, it’s really good! The plot was intense.
 What time are we meeting tomorrow?
 Let's meet around 3 PM. Does that work for you?
 Sure, that works. Where should we go?
 How about the cafe downtown? I’ve heard it’s nice.
 Sounds good! I could use a coffee right now.
 Me too. I didn’t sleep well last night.
 Why’s that? Stayed up late again?
 Yeah, I was watching a documentary that went on forever.
 Oh, I love documentaries! Which one was it?
 It was about space exploration, really fascinating stuff.
 That sounds awesome. I’ve always been curious about space.
 Have you ever thought about going stargazing?
 Yeah, but I never got the chance. Maybe we should plan something.
 Definitely! A night under the stars would be amazing.
 I can't believe it's already October, time is flying by.
 I know, right? This year has been so fast.
 Any plans for the holidays yet?
 Not really, I’m thinking of just staying home with family.
 That sounds nice and peaceful. I might go on a short trip.
 Where to? Somewhere warm, I hope.
 Yeah, thinking about a beach getaway. I need some sun!
 Lucky you! I haven’t been to the beach in ages.
 You should totally come with us! The more, the merrier.
 I’ll think about it! When are you leaving?
 Probably mid-December. It’ll be great to unwind before the new year.
 True, we all need a break sometimes.
 Do you know how to cook? I’ve been trying to learn new recipes.
 Yeah, I love cooking! What are you trying to make?
 Just simple stuff like pasta and stir-fry. Nothing too complicated yet.
 That’s a great start! Pasta is always a good choice.
 What’s your favorite dish to cook?
 I love making homemade pizza. It’s so fun and tastes great.
 Oh, that sounds delicious! I need to try that sometime.
 You should! It’s easier than it looks.
 Do you play any instruments? I’ve been thinking about learning guitar.
 I play a little bit of piano. Guitar sounds cool though!
 I’ve always wanted to learn piano too. It seems so relaxing.
 It is! Once you get the hang of it, it’s really enjoyable.
 What music do you usually listen to?
 Mostly pop and a bit of indie. What about you?
 I’m more into rock and alternative. Love the energy.
 That’s awesome! Have you been to any concerts recently?
 Not lately, but I’m hoping to go to one next month.
 That sounds exciting! Live music is always the best.
"""

In [6]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer


**tokenizing every word**

In [7]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts([dataset])

In [8]:
len(tokenizer.word_index)
# tokenizer.word_index

241

**making data in sequence**

In [9]:
input_sequences = []
for sentence in dataset.split('\n'):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])

In [10]:
len(input_sequences)

406

In [11]:
max_len=max([len(x) for x in input_sequences])

**add 0 padding , to make every sentence equal length**

In [12]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len, padding='pre')

In [13]:
padded_input_sequences

array([[  0,   0,   0, ...,   0,  78,  79],
       [  0,   0,   0, ...,  78,  79,  44],
       [  0,   0,   0, ...,  79,  44,  80],
       ...,
       [  0,   0,   0, ...,  77,  11,  23],
       [  0,   0,   0, ...,  11,  23,   3],
       [  0,   0,   0, ...,  23,   3, 241]], dtype=int32)

**making input and output**

In [14]:
(padded_input_sequences[:,:-1]).shape

(406, 12)

In [15]:
X = padded_input_sequences[:,:-1]
Y = padded_input_sequences[:,-1]

In [16]:
X.shape , Y.shape

((406, 12), (406,))

# building model

In [17]:
from tensorflow.keras.utils import to_categorical
Y = to_categorical(Y,num_classes=242)

In [18]:
Y.shape

(406, 242)

In [24]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense , Input


In [22]:
# Parameters
vocab_size = 242
embedding_dim = 100
max_sequence_len = 12
dropout_rate = 0.2

model = Sequential()
# Input Layer: Define input shape
model.add(Input(shape=(max_sequence_len,)))
# Embedding Layer.
model.add(Embedding(input_dim=vocab_size, output_dim=embedding_dim))
# LSTM Layer
model.add(LSTM(units=150, return_sequences=True))
# model.add(Dropout(dropout_rate))  # Regularization to avoid overfitting
# LSTM Layer 2
model.add(LSTM(units=150))
# model.add(Dropout(dropout_rate))
model.add(Dense(units=vocab_size, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()


In [23]:
model.fit(X,Y,epochs=100)

Epoch 1/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 41ms/step - accuracy: 0.0079 - loss: 5.4835
Epoch 2/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 41ms/step - accuracy: 0.0245 - loss: 5.3144
Epoch 3/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 43ms/step - accuracy: 0.0328 - loss: 5.1724
Epoch 4/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.0344 - loss: 5.1466
Epoch 5/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 81ms/step - accuracy: 0.0187 - loss: 5.0308
Epoch 6/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 68ms/step - accuracy: 0.0270 - loss: 5.0096
Epoch 7/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 44ms/step - accuracy: 0.0230 - loss: 5.0189
Epoch 8/100
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 44ms/step - accuracy: 0.0268 - loss: 4.9474
Epoch 9/100
[1m13/13[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x7825c26572b0>

In [91]:
model.save("lstm_model.keras")

In [89]:
import time
import numpy as np
text = "where are"

for i in range(3):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=12, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      # print(text)
      print(text,end=" ")
      time.sleep(2)

# print(text)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
where are we meeting tomorrow 

In [87]:
text

'where are we meeting tomorrow'

In [75]:
text="what"
tokenizer.texts_to_sequences([text])[0]

[19]

In [76]:
pd=pad_sequences([[19]], maxlen=12, padding='pre')

In [77]:
(model.predict(pd)).shape

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step


(1, 242)

In [80]:
pos = np.argmax(model.predict(pd))
pos

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step


5

In [79]:
for word,index in (tokenizer.word_index).items() :
  if index == pos :
    print(word)

that
