In [74]:
faqs = """Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers.
These neural networks attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it
to “learn” from large amounts of data. While a neural network with a single layer can still make approximate predictions,
additional hidden layers can help to optimize and refine for accuracy.
Deep learning drives many artificial intelligence (AI) applications and services that improve automation,
performing analytical and physical tasks without human intervention. Deep learning technology lies behind everyday products and services
(such as digital assistants, voice-enabled TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-driving cars).
If deep learning is a subset of machine learning, how do they differ? Deep learning distinguishes itself from classical machine learning by the type
of data that it works with and the methods in which it learns.
Machine learning algorithms leverage structured,
labeled data to make predictions—meaning that specific features are defined from the input data for the model and organized into tables.
This doesn’t necessarily mean that it doesn’t use unstructured data; it just means that if it does,
it generally goes through some pre-processing to organize it into a structured format.
Deep learning eliminates some of data pre-processing that is typically involved with machine learning.
These algorithms can ingest and process unstructured data, like text and images, and it automates feature extraction,
removing some of the dependency on human experts. For example, let’s say that we had a set of photos of different pets,
and we wanted to categorize by “cat”, “dog”, “hamster”, et cetera. Deep learning algorithms can determine
which features (e.g. ears) are most important to distinguish each animal from another.
In machine learning, this hierarchy of features is established manually by a human expert.
Then, through the processes of gradient descent and backpropagation, the deep learning algorithm adjusts and fits itself for accuracy,
allowing it to make predictions about a new photo of an animal with increased precision.
Machine learning and deep learning models are capable of different types of learning as well,
which are usually categorized as supervised learning,
unsupervised learning, and reinforcement learning. Supervised learning utilizes labeled datasets to categorize or make predictions;
this requires some kind of human intervention to label input data correctly.
In contrast, unsupervised learning doesn’t require labeled datasets, and instead, it detects patterns in the data,
clustering them by any distinguishing characteristics. Reinforcement learning is a process in which a model learns to
become more accurate for performing an action in an environment based on feedback in order to maximize the reward.
Deep learning neural networks, or artificial neural networks, attempts to mimic the human brain through a combination of data inputs,
weights, and bias. These elements work together to accurately recognize, classify, and describe objects within the data.
Deep neural networks consist of multiple layers of interconnected nodes, each building upon the previous layer to refine and
optimize the prediction or categorization. This progression of computations through the network is called forward propagation.
The input and output layers of a deep neural network are called visible layers.
The input layer is where the deep learning model ingests the data for processing, and the output layer is where the final prediction or classification is made.
Another process called backpropagation uses algorithms, like gradient descent, to calculate errors in predictions and then adjusts
the weights and biases of the function by moving backwards through the layers in an effort to train the model. Together,
forward propagation and backpropagation allow a neural network to make predictions and correct for any errors accordingly.
Over time, the algorithm becomes gradually more accurate.
The above describes the simplest type of deep neural network in the simplest terms. However,
deep learning algorithms are incredibly complex, and there are different types of neural networks to address specific problems or datasets. For example,
"""

In [75]:
pip install tensorflow




In [109]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [77]:
# Create a Tokenizer instance
tokenizer = Tokenizer ()

In [111]:
#Fitting the Tokenizer on the FAQs dataset to build the vocabulary.
tokenizer.fit_on_texts([faqs])

In [79]:
len(tokenizer.word_index)

291

In [110]:

#This code generates input sequences for text generation model training by breaking down sentences into sub-sequences.
input_sequences=[]
for sentence in faqs.split("\n"):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]
  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])


In [81]:
input_sequences

[[6, 2],
 [6, 2, 10],
 [6, 2, 10, 7],
 [6, 2, 10, 7, 49],
 [6, 2, 10, 7, 49, 4],
 [6, 2, 10, 7, 49, 4, 14],
 [6, 2, 10, 7, 49, 4, 14, 2],
 [6, 2, 10, 7, 49, 4, 14, 2, 22],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11, 17],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11, 17, 23],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11, 17, 23, 97],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11, 17, 23, 97, 18],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11, 17, 23, 97, 18, 38],
 [6, 2, 10, 7, 49, 4, 14, 2, 22, 10, 96, 7, 11, 17, 23, 97, 18, 38, 19],
 [39, 11],
 [39, 11, 24],
 [39, 11, 24, 98],
 [39, 11, 24, 98, 5],
 [39, 11, 24, 98, 5, 99],
 [39, 11, 24, 98, 5, 99, 1],
 [39, 11, 24, 98, 5, 99, 1, 100],
 [39, 11, 24, 98, 5, 99, 1, 100, 4],
 [39, 11, 24, 98, 5, 99, 1, 100, 4, 1],
 [39, 11, 24, 98, 5, 99, 1, 100, 4, 1, 20],

In [82]:
Max_len= max([len(x) for x in input_sequences])

In [83]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

# 'input_sequences' is a list of sequences that you want to pad
# 'Max_len' is the desired maximum sequence length
# 'padding='pre'' specifies that padding should be added to the beginning of sequences

padded_input_sequences = pad_sequences(input_sequences, maxlen=Max_len, padding='pre')



In [84]:
padded_input_sequences

array([[  0,   0,   0, ...,   0,   6,   2],
       [  0,   0,   0, ...,   6,   2,  10],
       [  0,   0,   0, ...,   2,  10,   7],
       ...,
       [  0,   0,   0, ..., 291,  18,  47],
       [  0,   0,   0, ...,  18,  47,  13],
       [  0,   0,   0, ...,  47,  13,  70]], dtype=int32)

In [85]:
x= padded_input_sequences[:,:-1]

In [86]:
x

array([[  0,   0,   0, ...,   0,   0,   6],
       [  0,   0,   0, ...,   0,   6,   2],
       [  0,   0,   0, ...,   6,   2,  10],
       ...,
       [  0,   0,   0, ...,  64, 291,  18],
       [  0,   0,   0, ..., 291,  18,  47],
       [  0,   0,   0, ...,  18,  47,  13]], dtype=int32)

In [88]:
y= padded_input_sequences[:,-1]

In [89]:
y

array([  2,  10,   7,  49,   4,  14,   2,  22,  10,  96,   7,  11,  17,
        23,  97,  18,  38,  19,  11,  24,  98,   5,  99,   1, 100,   4,
         1,  20, 101, 102,  25, 103, 104, 105,   9, 106,  25, 107, 108,
         4,   8, 109,   7,  11,  17,  23,   7, 110,  31,  32, 111,  26,
       112,  27, 114,  19,  32, 115,   5,  50,   3,  51,  13,  52,   2,
       116, 117,  53, 118, 119, 120,   3,  54,  15, 121, 122, 123,   3,
       124, 125, 126,  20,  56,   6,   2, 127, 128, 129, 130, 131,   3,
        54,  21, 132, 133, 134, 135, 136, 137,   3, 138, 139, 140, 141,
        21,  58,  21, 142, 143,  57,  21, 144, 145, 146,   6,   2,  10,
         7,  49,   4,  14,   2, 147, 148, 149, 150,   6,   2, 151,  60,
        25, 152,  14,   2,  28,   1,  61,   8,  15,   9, 153,  23,   3,
         1, 154,  12,  22,   9,  62,   2,  29, 155,  63,   8,   5,  26,
       156,  15,  64,  41,  16, 157,  25,   1,  33,   8,  13,   1,  34,
         3, 158,  65, 159,  42, 160, 161,  15,   9,  42, 162,  6

In [90]:
x.shape

(621, 26)

In [91]:
y.shape

(621,)

In [92]:
from tensorflow.keras.utils import to_categorical
# Assuming 'y' contains integer labels for your classes
# 'num_classes' specifies the total number of classes in your classification task
y_one_hot = to_categorical(y, num_classes=292)


In [93]:
y.shape

(621, 292)

In [94]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding,LSTM, Dense


In [95]:
#Building a Sequential model with an Embedding layer, LSTM layer, and a Dense layer for text generation.
model=Sequential()
model.add(Embedding(292,100, input_length=26))
model.add(LSTM(130))
model.add(Dense(292, activation='softmax'))

In [96]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


In [97]:
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, 26, 100)           29200     
                                                                 
 lstm_4 (LSTM)               (None, 130)               120120    
                                                                 
 dense_4 (Dense)             (None, 292)               38252     
                                                                 
Total params: 187572 (732.70 KB)
Trainable params: 187572 (732.70 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [98]:
model.fit(x,y,epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x7d41ee2a6230>

In [104]:
text='what is Deep'
tokn_text= tokenizer.texts_to_sequences([text])[0]

In [105]:
padded_text= pad_sequences([tokn_text], maxlen=26, padding="pre")

In [106]:
padded_text

array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0, 10,  6]], dtype=int32)

In [107]:
model.predict(padded_text)



array([[1.60626779e-09, 1.23012767e-04, 9.81046379e-01, 1.98365778e-05,
        1.11766225e-04, 2.21009777e-05, 6.41967563e-05, 1.52885987e-04,
        5.08441044e-05, 1.81788491e-05, 1.86489371e-04, 1.62703153e-02,
        7.13766494e-05, 8.40889243e-07, 6.38454512e-05, 4.93224361e-05,
        5.18480723e-04, 6.07061168e-07, 1.68418774e-05, 1.42460622e-05,
        1.26802649e-06, 1.39616468e-05, 6.60726619e-06, 2.44474122e-05,
        1.30468907e-05, 7.99750705e-06, 5.27535526e-09, 9.25887278e-09,
        1.48001959e-06, 2.25877811e-04, 4.59194125e-04, 1.31660590e-05,
        6.15831814e-05, 2.47636194e-06, 1.49608093e-06, 1.96017172e-05,
        2.39190631e-05, 6.74257365e-07, 1.93336845e-07, 5.21142740e-10,
        3.71907625e-08, 1.41736353e-04, 1.63308385e-07, 5.87127147e-09,
        3.95752477e-06, 2.69562946e-07, 1.31430058e-08, 5.61584386e-07,
        1.72433511e-06, 2.57373122e-05, 1.05779878e-10, 1.29692312e-09,
        1.15387984e-08, 3.62445256e-08, 3.39049372e-10, 2.899199

In [112]:

# Assuming you have previously defined the 'model' and 'padded_text' variables
# Use the model to predict and find the index of the maximum value in the prediction
# Iterate through the tokenizer's word_index to find the word corresponding to 'word_num'
import numpy as np
word_num = np.argmax(model.predict(padded_text))
for word, index in tokenizer.word_index.items():
    if index == word_num:
        # Print the corresponding word
        print(word)





learning
