<a href="https://colab.research.google.com/github/TAK-PRAVEEN/Deep-Learning-Notebooks/blob/main/Next_Word_Prediction_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
lstm = """As I dive deeper into the fascinating world of deep learning, I've been exploring one of the most impactful advancements in recurrent neural networks: Long Short-Term Memory (LSTM). I’d like to share some insights into the core ideas behind LSTMs and their significance in handling sequence data.
### What are LSTMs?
Introduced by Hochreiter and Schmidhuber in 1997, LSTMs are a specialized type of recurrent neural network (RNN) designed to address some of the critical limitations of standard RNNs, particularly in learning from long sequences. The architecture of LSTMs incorporates unique structures that enable them to remember information for long periods, making them ideal for tasks such as natural language processing, time series forecasting, and more.
### Key Components of LSTMs
The core idea behind LSTMs lies in their cell state and the carefully designed gating mechanisms. Here’s a breakdown:
1. Cell State:
 - The cell state can carry relevant information for long periods, which is critical for tasks requiring context from earlier inputs.
2. Gating Mechanisms: LSTMs utilize three primary gates to control the flow of information:
 - Forget Gate: Decides what information from the previous cell state should be discarded. Using a sigmoid activation function, it outputs values between 0 (forget) and 1 (keep).
 - Input Gate: Determines which new information should be added to the cell state. It also uses a sigmoid activation function, creating a new candidate value that can be added to the cell state.
 - Output Gate: Controls what information from the cell state will be output to the next layer and what goes into the next time step.
3. Mathematical Representation:
 - The combination of these gates enables LSTMs to learn patterns and dependencies over extended sequences, effectively managing the flow of information.
### Advantages of LSTMs
- Long-Term Dependencies
- Flexibility
- Improved Performance
### Real-World Applications
LSTMs have found applications across multiple domains:
- Natural Language Processing: Used for tasks such as machine translation, sentiment analysis, and text generation.
- Time Series Prediction: Employed in stock price prediction, weather forecasting, and anomaly detection.
- Healthcare: Useful for predicting patient outcomes based on historical medical data.
### Conclusion
Understanding the core ideas behind Long Short-Term Memory networks has profoundly reshaped my perspective on deep learning. As we continue to refine and innovate our approaches to sequential data, LSTMs stand as a powerful tool in our arsenal.
I look forward to connecting with fellow learners and professionals who are passionate about AI and deep learning. Let’s exchange insights and grow together!
hashtag#DeepLearning hashtag#LSTM hashtag#MachineLearning hashtag#ArtificialIntelligence hashtag#DataScience hashtag#NLP hashtag#SequentialData hashtag#ContinuousLearning"""

In [3]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [4]:
tokenizer = Tokenizer()

In [5]:
tokenizer.fit_on_texts([lstm])

In [6]:
tokenizer.word_index

{'the': 1,
 'and': 2,
 'of': 3,
 'to': 4,
 'lstms': 5,
 'hashtag': 6,
 'in': 7,
 'information': 8,
 'cell': 9,
 'state': 10,
 'long': 11,
 'a': 12,
 'for': 13,
 'as': 14,
 'learning': 15,
 'what': 16,
 'from': 17,
 'be': 18,
 'into': 19,
 'deep': 20,
 'term': 21,
 'core': 22,
 'behind': 23,
 'data': 24,
 'are': 25,
 'tasks': 26,
 'time': 27,
 'gate': 28,
 'i': 29,
 'world': 30,
 'recurrent': 31,
 'neural': 32,
 'networks': 33,
 'short': 34,
 'memory': 35,
 'lstm': 36,
 'some': 37,
 'insights': 38,
 'ideas': 39,
 'their': 40,
 'designed': 41,
 'critical': 42,
 'sequences': 43,
 'that': 44,
 'them': 45,
 'periods': 46,
 'such': 47,
 'natural': 48,
 'language': 49,
 'processing': 50,
 'series': 51,
 'forecasting': 52,
 'gating': 53,
 'mechanisms': 54,
 '1': 55,
 'can': 56,
 'which': 57,
 'gates': 58,
 'flow': 59,
 'forget': 60,
 'should': 61,
 'sigmoid': 62,
 'activation': 63,
 'function': 64,
 'it': 65,
 'new': 66,
 'added': 67,
 'output': 68,
 'next': 69,
 'dependencies': 70,
 'applicat

In [7]:
input_sequences = []
for sentence in lstm.split('\n'):
  # print(sentence)
  # print(tokenizer.texts_to_sequences([sentence])[0])
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]
  for i in range(1, len(tokenized_sentence)):
    n_gram = tokenized_sentence[:i+1]
    input_sequences.append(n_gram)

In [8]:
input_sequences

[[14, 29],
 [14, 29, 75],
 [14, 29, 75, 76],
 [14, 29, 75, 76, 19],
 [14, 29, 75, 76, 19, 1],
 [14, 29, 75, 76, 19, 1, 77],
 [14, 29, 75, 76, 19, 1, 77, 30],
 [14, 29, 75, 76, 19, 1, 77, 30, 3],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80, 81],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80, 81, 3],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80, 81, 3, 1],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80, 81, 3, 1, 82],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80, 81, 3, 1, 82, 83],
 [14, 29, 75, 76, 19, 1, 77, 30, 3, 20, 15, 78, 79, 80, 81, 3, 1, 82, 83, 84],
 [14,
  29,
  75,
  76,
  19,
  1,
  77,
  30,
  3,
  20,
  15,
  78,
  79,
  80,
  81,
  3,
  1,
  82,
  83,
  84,
  7],
 [14,
  29,
  75,
  7

In [9]:
max_len = max([len(x) for x in input_sequences])
max_len

65

In [10]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences, maxlen=max_len, padding='pre')
padded_input_sequences

array([[  0,   0,   0, ...,   0,  14,  29],
       [  0,   0,   0, ...,  14,  29,  75],
       [  0,   0,   0, ...,  29,  75,  76],
       ...,
       [  0,   0,   0, ..., 233,   6, 234],
       [  0,   0,   0, ...,   6, 234,   6],
       [  0,   0,   0, ..., 234,   6, 235]], dtype=int32)

In [11]:
X = padded_input_sequences[:,:-1]
y = padded_input_sequences[:,-1]
X, y

(array([[  0,   0,   0, ...,   0,   0,  14],
        [  0,   0,   0, ...,   0,  14,  29],
        [  0,   0,   0, ...,  14,  29,  75],
        ...,
        [  0,   0,   0, ...,   6, 233,   6],
        [  0,   0,   0, ..., 233,   6, 234],
        [  0,   0,   0, ...,   6, 234,   6]], dtype=int32),
 array([ 29,  75,  76,  19,   1,  77,  30,   3,  20,  15,  78,  79,  80,
         81,   3,   1,  82,  83,  84,   7,  31,  32,  33,  11,  34,  21,
         35,  36,  85,  86,   4,  87,  37,  38,  19,   1,  22,  39,  23,
          5,   2,  40,  88,   7,  89,  90,  24,  25,   5,  92,  93,   2,
         94,   7,  95,   5,  25,  12,  96,  97,   3,  31,  32,  98,  99,
         41,   4, 100,  37,   3,   1,  42, 101,   3, 102, 103, 104,   7,
         15,  17,  11,  43,   1, 105,   3,   5, 106, 107, 108,  44, 109,
         45,   4, 110,   8,  13,  11,  46, 111,  45, 112,  13,  26,  47,
         14,  48,  49,  50,  27,  51,  52,   2, 113, 115,   3,   5,  22,
        116,  23,   5, 117,   7,  40,   9,  1

In [12]:
X.shape, y.shape

((396, 64), (396,))

In [13]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y, num_classes=len(tokenizer.word_index)+1)
y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [14]:
y.shape

(396, 236)

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, y_train.shape

((316, 64), (316, 236))

In [15]:
# from tensorflow.keras.models import Sequential
# from tensorflow.keras.layers import Dense, Embedding, LSTM

In [16]:
# model = Sequential()

# model.add(Embedding(236, 100, input_length=64))
# model.add(LSTM(150))
# model.add(Dense(236, activation='softmax'))

In [17]:
# model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [18]:
# model.summary()

In [19]:
# from keras.models import Sequential
# from keras.layers import Embedding, LSTM, Dense

# model = Sequential()
# model.add(Embedding(236, 100, input_length=64))
# model.add(LSTM(150))
# model.add(Dense(236, activation='softmax'))

# # Compile the model
# model.compile(optimizer='adam', loss='categorical_crossentropy')

# # Print model summary
# model.summary()

In [16]:
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense
from tensorflow.keras.models import Model

input_layer = Input(shape=(64,))
embedding_layer = Embedding(input_dim=236, output_dim=100)(input_layer)
lstm_layer = LSTM(150)(embedding_layer)
output_layer = Dense(236, activation='softmax')(lstm_layer)

model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [18]:
hsitory = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.9956 - loss: 0.1195 - val_accuracy: 0.0750 - val_loss: 8.8484
Epoch 2/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.9991 - loss: 0.1172 - val_accuracy: 0.0750 - val_loss: 8.8504
Epoch 3/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.9991 - loss: 0.1168 - val_accuracy: 0.0750 - val_loss: 8.8710
Epoch 4/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.9979 - loss: 0.1098 - val_accuracy: 0.0750 - val_loss: 8.8778
Epoch 5/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.9973 - loss: 0.1095 - val_accuracy: 0.0625 - val_loss: 8.8936
Epoch 6/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.9964 - loss: 0.1037 - val_accuracy: 0.0625 - val_loss: 8.8998
Epoch 7/100
[1m10/10[0m [

In [33]:
import numpy as np
from keras.preprocessing.sequence import pad_sequences

text = "LSTM Deep Learning"

for i in range(10):
  # Tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]

  # Padding
  padded_token_text = pad_sequences([token_text], maxlen=64, padding='pre')  # Change maxlen to 64

  # Prediction
  pos = np.argmax(model.predict(padded_token_text))

  # Find the word corresponding to the predicted position
  for word, index in tokenizer.word_index.items():
      if index == pos:
        text = text + " " + word
        print(text)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
LSTM Deep Learning deeper
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step
LSTM Deep Learning deeper deeper
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
LSTM Deep Learning deeper deeper the
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
LSTM Deep Learning deeper deeper the cell
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
LSTM Deep Learning deeper deeper the cell state
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
LSTM Deep Learning deeper deeper the cell state should
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
LSTM Deep Learning deeper deeper the cell state should be
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
LSTM Deep Learning deeper deeper the cell state should be output
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0