In [62]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [63]:
tokenizer = Tokenizer()

In [64]:
faqs = """

 What is Machine Learning?
   Machine Learning is a subset of artificial intelligence that focuses on the development of algorithms and models that allow computers to learn and make predictions or decisions without being explicitly programmed.
 What are the three main types of machine learning?
   The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning.
 What is supervised learning?
   Supervised learning is a type of machine learning where the model is trained on a labeled dataset, meaning that it learns from input-output pairs. It learns to make predictions based on input data.
 What is unsupervised learning?
   Unsupervised learning is a type of machine learning where the model is given unlabeled data and must discover patterns and relationships on its own.
 What is reinforcement learning?
   Reinforcement learning is a type of machine learning where an agent learns to make a sequence of decisions by interacting with an environment. It receives rewards or penalties for its actions.
 What is the difference between classification and regression in machine learning?
   Classification is used when the output variable is a category or label, while regression is used when the output variable is a continuous value.
 What is overfitting in machine learning?
   Overfitting occurs when a model learns the training data too well, to the point that it starts to memorize noise or random fluctuations. This leads to poor performance on unseen data.
 How can overfitting be prevented?
   Overfitting can be prevented by using techniques like cross-validation, regularization, and collecting more diverse training data.
 What is the bias-variance tradeoff?
   The bias-variance tradeoff is the balance between a model's ability to fit the training data closely (low bias) and its ability to generalize to unseen data (low variance).
 What is a decision tree in machine learning?
    A decision tree is a supervised learning algorithm that makes decisions by asking a series of questions about the features of the data. It splits the data into branches based on these questions, ultimately leading to a prediction.
 What is the curse of dimensionality in machine learning?
    The curse of dimensionality refers to the challenges and difficulties that arise when working with high-dimensional data, where the number of features or dimensions is very large. It can lead to increased computational complexity and a need for larger amounts of data.
 What is feature engineering in machine learning?
    Feature engineering is the process of selecting, transforming, or creating new features from the existing data to improve the performance of a machine learning model.
 What is a neural network?
    A neural network is a type of machine learning model inspired by the structure and functioning of the human brain. It consists of interconnected nodes (neurons) organized in layers, and it's used for tasks like classification, regression, and pattern recognition.
 What is a convolutional neural network (CNN)?
    A convolutional neural network (CNN) is a specific type of neural network designed for processing grid-like data, such as images. It uses layers of convolution operations to automatically and adaptively learn features from the data.
 What is the purpose of batch normalization in deep learning?
    Batch normalization is a technique used in deep learning to improve the training process and stability of neural networks. It normalizes the inputs to a layer, helping to reduce internal covariate shifts and accelerate convergence.
 What is transfer learning in machine learning?
    Transfer learning is a technique where a pre-trained model, originally trained on a large dataset for a specific task, is fine-tuned or adapted for a different but related task with a smaller dataset.
 What is an ensemble method in machine learning?
    An ensemble method combines multiple machine learning models to improve overall performance. Common techniques include bagging (e.g., Random Forest), boosting (e.g., AdaBoost), and stacking.
 What is natural language processing (NLP) in machine learning?
    Natural Language Processing (NLP) is a field of machine learning focused on the interaction between computers and human language. It involves tasks like language translation, sentiment analysis, and text generation.
 What is the difference between precision and recall in classification?
    Precision is the ratio of true positives to the sum of true positives and false positives, while recall is the ratio of true positives to the sum of true positives and false negatives. Precision emphasizes the accuracy of positive predictions, while recall focuses on finding all positive instances.
 What is deep reinforcement learning?
    Deep reinforcement learning combines deep learning techniques with reinforcement learning principles. It involves training neural networks to make decisions based on a reward system, often used in complex tasks like playing video games or controlling autonomous systems.
"""

In [65]:
tokenizer.fit_on_texts([faqs])

In [66]:
 tokenizer.word_index

{'is': 1,
 'learning': 2,
 'the': 3,
 'a': 4,
 'of': 5,
 'and': 6,
 'to': 7,
 'what': 8,
 'machine': 9,
 'data': 10,
 'in': 11,
 'it': 12,
 'on': 13,
 'or': 14,
 'neural': 15,
 'that': 16,
 'reinforcement': 17,
 'model': 18,
 'for': 19,
 'type': 20,
 'where': 21,
 'used': 22,
 'training': 23,
 'like': 24,
 'network': 25,
 'deep': 26,
 'positives': 27,
 'make': 28,
 'decisions': 29,
 'supervised': 30,
 'learns': 31,
 'an': 32,
 'by': 33,
 'with': 34,
 'between': 35,
 'classification': 36,
 'when': 37,
 'overfitting': 38,
 'features': 39,
 'language': 40,
 'true': 41,
 'predictions': 42,
 'unsupervised': 43,
 'trained': 44,
 'dataset': 45,
 'from': 46,
 'output': 47,
 'based': 48,
 'its': 49,
 'regression': 50,
 'while': 51,
 'performance': 52,
 'can': 53,
 'techniques': 54,
 'bias': 55,
 'variance': 56,
 'improve': 57,
 'tasks': 58,
 'processing': 59,
 'precision': 60,
 'recall': 61,
 'focuses': 62,
 'models': 63,
 'computers': 64,
 'learn': 65,
 'are': 66,
 'three': 67,
 'main': 68,
 '

In [70]:
input_sequences = []
for sentence in faqs.split('\n'):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])


In [71]:
input_sequences

[[8, 1],
 [8, 1, 9],
 [8, 1, 9, 2],
 [9, 2],
 [9, 2, 1],
 [9, 2, 1, 4],
 [9, 2, 1, 4, 112],
 [9, 2, 1, 4, 112, 5],
 [9, 2, 1, 4, 112, 5, 113],
 [9, 2, 1, 4, 112, 5, 113, 114],
 [9, 2, 1, 4, 112, 5, 113, 114, 16],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115, 5],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115, 5, 116],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115, 5, 116, 6],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115, 5, 116, 6, 63],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115, 5, 116, 6, 63, 16],
 [9, 2, 1, 4, 112, 5, 113, 114, 16, 62, 13, 3, 115, 5, 116, 6, 63, 16, 117],
 [9,
  2,
  1,
  4,
  112,
  5,
  113,
  114,
  16,
  62,
  13,
  3,
  115,
  5,
  116,
  6,
  63,
  16,
  117,
  64],
 [9,
  2,
  1,
  4,
  112,
  5,
  113,
  114,
  16,
  62,
  13,
  3,
  115,
  5,
  

In [72]:
max_len = max([len(x) for x in input_sequences])


In [73]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len , padding = 'pre')

In [74]:
X = padded_input_sequences[:,:-1]

In [75]:
y = padded_input_sequences[:,-1]

In [84]:
 len(tokenizer.word_index)

278

In [77]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y,num_classes=305)

In [78]:
y.shape

(727, 305)

In [79]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM , Dense

In [88]:
model = Sequential()
model.add(Embedding(305,100,input_length=47))
model.add(LSTM(151))
model.add(Dense(305,activation='softmax'))

In [90]:
model.compile(loss='categorical_crossentropy' , optimizer = 'adam' , metrics=['accuracy'])

In [91]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 47, 100)           30500     
                                                                 
 lstm_3 (LSTM)               (None, 151)               152208    
                                                                 
 dense_3 (Dense)             (None, 305)               46360     
                                                                 
Total params: 229068 (894.80 KB)
Trainable params: 229068 (894.80 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [92]:
model.fit(X,y,epochs= 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x7cfa395cda80>

In [50]:
X

array([[  0,   0,   0, ...,   0,   0,   8],
       [  0,   0,   0, ...,   0,   8,   1],
       [  0,   0,   0, ...,   8,   1,   9],
       ...,
       [  0,   0,   0, ..., 274, 275,  14],
       [  0,   0,   0, ..., 275,  14, 276],
       [  0,   0,   0, ...,  14, 276, 277]], dtype=int32)

In [None]:
text = "machine"


In [None]:
#tokenize
token_text = tokenizer.texts_to_sequence([text])[0]
#padding
padded_token_text = pad_sequences([token_text] , maxlen = 47 , padding = 'pre')
#predict
pos = np.argmax(model.predict(padded_token_text))

for word,index in tokenizer.word_index.items():
  if index == pos:
    print(word)