In [43]:
dataset = """ "Pride and Prejudice" by Jane Austen is a novel that delves into the complexities of love, societal expectations, and personal growth within the backdrop of early 19th-century England. At its core, the essence of "Pride and Prejudice" can be distilled into several key themes and character dynamics.

1. Love Triumphs Over First Impressions:
The central theme of the novel is the transformative power of love. Initially, the two main characters, Elizabeth Bennet and Mr. Fitzwilliam Darcy, are guided by their pride and prejudice. Elizabeth forms a negative opinion of Mr. Darcy based on her first impressions of his arrogance, while he is equally judgmental of her family's social standing. However, as the story unfolds, both characters undergo personal growth, learning to see beyond their initial biases and recognizing the true worth of each other. Their love story is a testament to the idea that genuine love can overcome the barriers of pride and prejudice.

2. The Social Landscape of 19th-Century England:
Austen provides a vivid portrayal of the social hierarchy and expectations of her time. The novel explores the importance of marriage in securing a woman's financial and social standing, particularly in the context of the Bennet family's modest estate. The characters' interactions with one another and their social circles are influenced by class, wealth, and reputation. The novel sheds light on the constraints and limitations imposed by societal norms, especially on women, and the consequences of going against these conventions.

3. Complex and Memorable Characters:
Austen's skill in creating multi-dimensional characters is a hallmark of "Pride and Prejudice." Elizabeth Bennet is celebrated for her intelligence, wit, and independence. She challenges the expectations placed upon women of her time by valuing personal integrity over social standing. Mr. Darcy, despite his initial haughtiness, undergoes a transformation, revealing himself to be a man of integrity and honor. The supporting cast of characters, including the Bennet family, Mr. Bingley, and Mr. Collins, each adds depth and humor to the story, contributing to the rich tapestry of personalities.

4. Satire and Wit:
Austen's writing is marked by sharp wit and satire. She uses humor and irony to critique the social norms and behaviors of her society. Through clever dialogue and commentary, she highlights the absurdities and hypocrisies of the characters and the society they inhabit. This satirical tone adds depth and entertainment to the novel, making it a joy to read.

5. The Impact of Personal Growth:
Throughout the story, several characters undergo significant personal growth. Elizabeth learns the importance of self-awareness and humility, while Mr. Darcy recognizes the value of treating others with respect and kindness. Their journeys towards self-improvement and understanding are central to the novel's message. It suggests that people can change, grow, and learn from their mistakes, ultimately leading to more meaningful and authentic connections with others.

In essence, "Pride and Prejudice" is a timeless exploration of the human condition, emphasizing the transformative nature of love and the significance of looking beyond surface impressions. It remains a beloved classic because of its enduring themes, memorable characters, and Jane Austen's skillful storytelling, making it a captivating and thought-provoking read for generations to come."""

In [44]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [45]:
tokenizer = Tokenizer()

In [46]:
tokenizer.fit_on_texts([dataset])

In [47]:
num_words = len(tokenizer.word_index)
num_words

259

In [48]:
# assigned an index to every word
tokenizer.word_index

{'the': 1,
 'and': 2,
 'of': 3,
 'a': 4,
 'to': 5,
 'is': 6,
 'characters': 7,
 'social': 8,
 'pride': 9,
 'prejudice': 10,
 'by': 11,
 'love': 12,
 'mr': 13,
 'their': 14,
 'her': 15,
 'novel': 16,
 'personal': 17,
 'growth': 18,
 'elizabeth': 19,
 'bennet': 20,
 'darcy': 21,
 'story': 22,
 'in': 23,
 'it': 24,
 'that': 25,
 'expectations': 26,
 'can': 27,
 'impressions': 28,
 'are': 29,
 'on': 30,
 'standing': 31,
 'with': 32,
 "austen's": 33,
 'wit': 34,
 'she': 35,
 'jane': 36,
 'austen': 37,
 'into': 38,
 'societal': 39,
 '19th': 40,
 'century': 41,
 'england': 42,
 'its': 43,
 'essence': 44,
 'be': 45,
 'several': 46,
 'themes': 47,
 'over': 48,
 'first': 49,
 'central': 50,
 'transformative': 51,
 'his': 52,
 'while': 53,
 "family's": 54,
 'undergo': 55,
 'beyond': 56,
 'initial': 57,
 'each': 58,
 'time': 59,
 'importance': 60,
 'norms': 61,
 'women': 62,
 'memorable': 63,
 'for': 64,
 'integrity': 65,
 'adds': 66,
 'depth': 67,
 'humor': 68,
 'satire': 69,
 'society': 70,
 'ma

In [49]:
# sentence by sentence dataset
for sentence in dataset.split("\n"):
  print(sentence)

 "Pride and Prejudice" by Jane Austen is a novel that delves into the complexities of love, societal expectations, and personal growth within the backdrop of early 19th-century England. At its core, the essence of "Pride and Prejudice" can be distilled into several key themes and character dynamics.

1. Love Triumphs Over First Impressions:
The central theme of the novel is the transformative power of love. Initially, the two main characters, Elizabeth Bennet and Mr. Fitzwilliam Darcy, are guided by their pride and prejudice. Elizabeth forms a negative opinion of Mr. Darcy based on her first impressions of his arrogance, while he is equally judgmental of her family's social standing. However, as the story unfolds, both characters undergo personal growth, learning to see beyond their initial biases and recognizing the true worth of each other. Their love story is a testament to the idea that genuine love can overcome the barriers of pride and prejudice.

2. The Social Landscape of 19th-

In [50]:
#now convert words to numbers
for sentence in dataset.split("\n"):
  print(tokenizer.texts_to_sequences([sentence])[0])

[9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12, 39, 26, 2, 17, 18, 77, 1, 78, 3, 79, 40, 41, 42, 80, 43, 81, 1, 44, 3, 9, 2, 10, 27, 45, 82, 38, 46, 83, 47, 2, 84, 85]
[]
[86, 12, 87, 48, 49, 28]
[1, 50, 88, 3, 1, 16, 6, 1, 51, 89, 3, 12, 90, 1, 91, 92, 7, 19, 20, 2, 13, 93, 21, 29, 94, 11, 14, 9, 2, 10, 19, 95, 4, 96, 97, 3, 13, 21, 98, 30, 15, 49, 28, 3, 52, 99, 53, 100, 6, 101, 102, 3, 15, 54, 8, 31, 103, 104, 1, 22, 105, 106, 7, 55, 17, 18, 107, 5, 108, 56, 14, 57, 109, 2, 110, 1, 111, 112, 3, 58, 113, 14, 12, 22, 6, 4, 114, 5, 1, 115, 25, 116, 12, 27, 117, 1, 118, 3, 9, 2, 10]
[]
[119, 1, 8, 120, 3, 40, 41, 42]
[37, 121, 4, 122, 123, 3, 1, 8, 124, 2, 26, 3, 15, 59, 1, 16, 125, 1, 60, 3, 126, 23, 127, 4, 128, 129, 2, 8, 31, 130, 23, 1, 131, 3, 1, 20, 54, 132, 133, 1, 134, 135, 32, 136, 137, 2, 14, 8, 138, 29, 139, 11, 140, 141, 2, 142, 1, 16, 143, 144, 30, 1, 145, 2, 146, 147, 11, 39, 61, 148, 30, 62, 2, 1, 149, 3, 150, 151, 152, 153]
[]
[154, 155, 2, 63, 7]
[33, 156, 23

In [51]:
input_sequences = []
for sentence in dataset.split("\n"):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])

In [52]:
input_sequences

[[9, 2],
 [9, 2, 10],
 [9, 2, 10, 11],
 [9, 2, 10, 11, 36],
 [9, 2, 10, 11, 36, 37],
 [9, 2, 10, 11, 36, 37, 6],
 [9, 2, 10, 11, 36, 37, 6, 4],
 [9, 2, 10, 11, 36, 37, 6, 4, 16],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12, 39],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12, 39, 26],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12, 39, 26, 2],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12, 39, 26, 2, 17],
 [9, 2, 10, 11, 36, 37, 6, 4, 16, 25, 75, 38, 1, 76, 3, 12, 39, 26, 2, 17, 18],
 [9,
  2,
  10,
  11,
  36,
  37,
  6,
  4,
  16,
  25,
  75,
  38,
  1,
  76,
  3,
  12,
  39,
  26,
  2,
 

In [53]:
max_len = max([len(x) for x in input_sequences])
max_len

101

In [54]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [55]:
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len, padding = 'pre')

In [56]:
padded_input_sequences

array([[  0,   0,   0, ...,   0,   9,   2],
       [  0,   0,   0, ...,   9,   2,  10],
       [  0,   0,   0, ...,   2,  10,  11],
       ...,
       [  0,   0,   0, ...,  72,  64, 258],
       [  0,   0,   0, ...,  64, 258,   5],
       [  0,   0,   0, ..., 258,   5, 259]], dtype=int32)

In [57]:
x = padded_input_sequences[:,:-1]
x

array([[  0,   0,   0, ...,   0,   0,   9],
       [  0,   0,   0, ...,   0,   9,   2],
       [  0,   0,   0, ...,   9,   2,  10],
       ...,
       [  0,   0,   0, ..., 257,  72,  64],
       [  0,   0,   0, ...,  72,  64, 258],
       [  0,   0,   0, ...,  64, 258,   5]], dtype=int32)

In [58]:
y = padded_input_sequences[:,-1]

In [59]:
y.shape

(516,)

In [60]:
from keras.utils import to_categorical

In [61]:
y = to_categorical(y,num_classes = num_words+1)

print(y.shape)
print(y[0])

(516, 260)
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [62]:
from keras.models import Sequential
from keras.layers import Embedding,LSTM,Dense

In [63]:
model = Sequential()

In [64]:
max_len

101

In [65]:
model.add(Embedding(260, 100,input_length = 100))
model.add(LSTM(150))
model.add(Dense(260,activation = 'softmax'))

In [66]:
model.compile(loss='categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [67]:
model.summary()

In [68]:
model.fit(x,y,epochs = 100)

Epoch 1/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 58ms/step - accuracy: 0.0431 - loss: 5.5293
Epoch 2/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.0595 - loss: 5.0165
Epoch 3/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 69ms/step - accuracy: 0.0742 - loss: 4.9853
Epoch 4/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 59ms/step - accuracy: 0.0745 - loss: 4.9660
Epoch 5/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.0809 - loss: 4.9870
Epoch 6/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.0840 - loss: 4.9078
Epoch 7/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.1375 - loss: 4.8365
Epoch 8/100
[1m17/17[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 67ms/step - accuracy: 0.1288 - loss: 4.7998
Epoch 9/100
[1m17/17[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x2351254e510>

In [69]:
text = 'Pride'

#tokenize
token_text = tokenizer.texts_to_sequences([text])[0]
#padding
padded_token_text = pad_sequences([token_text], maxlen=100, padding = 'pre')
print(padded_token_text)

# predict
model.predict(padded_token_text)
model.predict(padded_token_text).shape

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9]]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 193ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step


(1, 260)

In [70]:
import numpy as np

In [71]:
position = np.argmax(model.predict(padded_token_text))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step


In [72]:
for word,index in tokenizer.word_index.items():
  if index == position:
    print(word)

and


In [73]:
import time

In [74]:
text = "Pride and"

for i in range(10):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=56, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      print(text)
      time.sleep(1)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 181ms/step
Pride and prejudice
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step
Pride and prejudice by
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
Pride and prejudice by jane
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
Pride and prejudice by jane austen
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
Pride and prejudice by jane austen is
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
Pride and prejudice by jane austen is a
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step
Pride and prejudice by jane austen is a novel
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step
Pride and prejudice by jane austen is a novel that
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step
Pride and prejudice by jane austen is a novel that delves
[1m1/1[0m 

In [75]:
# testing random
text = 'My name'

for i in range(10):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=56, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      print(text)
      time.sleep(1)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
My name love
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
My name love triumphs
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
My name love triumphs over
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
My name love triumphs over first
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
My name love triumphs over first impressions
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step
My name love triumphs over first impressions of
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 54ms/step
My name love triumphs over first impressions of her
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
My name love triumphs over first impressions of her impressions
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
My name love triumphs over first impressions of 