In [2]:
text = """A black hole is a region of spacetime where gravity is so strong that nothing—no particles or even electromagnetic radiation such as light—can escape from it. This cosmic entity forms when a massive star collapses under its own gravity at the end of its life cycle. The boundary of no escape is called the event horizon. Although it has a great effect on the fate and trajectory of light that passes close to it, the event horizon itself emits no radiation. Because it reflects no light, it is incredibly difficult to observe directly.
At the center of a black hole lies a gravitational singularity, a point of infinite density where the laws of physics as we currently understand them break down. All the mass of the collapsed star is crushed into this single, infinitesimally small point. Black holes come in several sizes. Stellar-mass black holes are typically 5 to several tens of times the mass of our sun, while supermassive black holes, found at the center of most large galaxies, including our own Milky Way, can be millions or even billions of times more massive.
Despite their reputation for being inescapable, the physicist Stephen Hawking proposed that black holes are not entirely black. Due to quantum effects near the event horizon, they should theoretically emit a faint thermal radiation, now known as Hawking radiation. This process would cause the black hole to slowly lose mass and, over an immense timescale, eventually evaporate completely. In recent years, our ability to study these phenomena has taken a giant leap forward with the Event Horizon Telescope, a global network of radio telescopes that successfully captured the first-ever direct image of a black hole's shadow, providing stunning visual confirmation of these enigmatic objects."""

In [3]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [4]:
token = Tokenizer()
token.fit_on_texts([text])

In [5]:
len(token.word_index)

181

In [7]:
from re import S
for sentence in text.split('\n'):
  print(sentence)

A black hole is a region of spacetime where gravity is so strong that nothing—no particles or even electromagnetic radiation such as light—can escape from it. This cosmic entity forms when a massive star collapses under its own gravity at the end of its life cycle. The boundary of no escape is called the event horizon. Although it has a great effect on the fate and trajectory of light that passes close to it, the event horizon itself emits no radiation. Because it reflects no light, it is incredibly difficult to observe directly.
At the center of a black hole lies a gravitational singularity, a point of infinite density where the laws of physics as we currently understand them break down. All the mass of the collapsed star is crushed into this single, infinitesimally small point. Black holes come in several sizes. Stellar-mass black holes are typically 5 to several tens of times the mass of our sun, while supermassive black holes, found at the center of most large galaxies, including o

In [8]:
from re import S
for sentence in text.split('\n'):
  print(token.texts_to_sequences([sentence]))

[[3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45, 22, 23, 46, 9, 47, 15, 48, 24, 49, 7, 16, 50, 51, 52, 53, 3, 25, 26, 54, 55, 27, 28, 21, 17, 1, 56, 2, 27, 57, 58, 1, 59, 2, 18, 24, 6, 60, 1, 10, 11, 61, 7, 29, 3, 62, 63, 64, 1, 65, 30, 66, 2, 31, 8, 67, 68, 5, 7, 1, 10, 11, 69, 70, 18, 9, 71, 7, 72, 18, 31, 7, 6, 73, 74, 5, 75, 76]]
[[17, 1, 32, 2, 3, 4, 14, 77, 3, 78, 79, 3, 33, 2, 80, 81, 20, 1, 82, 2, 83, 15, 84, 85, 86, 87, 88, 89, 90, 1, 12, 2, 1, 91, 26, 6, 92, 93, 16, 94, 95, 96, 33, 4, 13, 97, 34, 35, 98, 99, 12, 4, 13, 36, 100, 101, 5, 35, 102, 2, 37, 1, 12, 2, 19, 103, 104, 105, 4, 13, 106, 17, 1, 32, 2, 107, 108, 109, 110, 19, 28, 111, 112, 113, 114, 115, 22, 23, 116, 2, 37, 117, 25]]
[[118, 119, 120, 121, 122, 123, 1, 124, 125, 38, 126, 8, 4, 13, 36, 127, 128, 4, 129, 5, 130, 131, 132, 1, 10, 11, 133, 134, 135, 136, 3, 137, 138, 9, 139, 140, 15, 38, 9, 16, 141, 142, 143, 1, 4, 14, 5, 144, 145, 12, 30, 146, 147, 148, 149, 150, 151, 152, 34, 153, 154, 19, 155, 5, 1

In [9]:
input_list = []
for sentence in text.split('\n'):
  token_sentence = token.texts_to_sequences([sentence])[0]

  for i in range(1,len(token_sentence)):
    input_list.append(token_sentence[:i+1])

In [10]:
input_list

[[3, 4],
 [3, 4, 14],
 [3, 4, 14, 6],
 [3, 4, 14, 6, 3],
 [3, 4, 14, 6, 3, 40],
 [3, 4, 14, 6, 3, 40, 2],
 [3, 4, 14, 6, 3, 40, 2, 41],
 [3, 4, 14, 6, 3, 40, 2, 41, 20],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45, 22],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45, 22, 23],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45, 22, 23, 46],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45, 22, 23, 46, 9],
 [3, 4, 14, 6, 3, 40, 2, 41, 20, 21, 6, 42, 43, 8, 44, 45, 22, 23, 46, 9, 47],
 [3,
  4,
  14,
  6,
  3,
  40,
  2,
  41,
  20,
  21,
  6,
  42,
  43,
  8,
  44,
  45,
  22,
  23,
  46,
  9,
  47,
  15],
 [3,
 

In [11]:
max_len = max([len(x) for x in input_list])
max_len

105

In [12]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
pad_sequences(input_list,maxlen=max_len,padding='pre')

array([[  0,   0,   0, ...,   0,   3,   4],
       [  0,   0,   0, ...,   3,   4,  14],
       [  0,   0,   0, ...,   4,  14,   6],
       ...,
       [  0,   0, 118, ..., 179,   2,  39],
       [  0, 118, 119, ...,   2,  39, 180],
       [118, 119, 120, ...,  39, 180, 181]], dtype=int32)

In [13]:
padded_input = pad_sequences(input_list,maxlen=max_len,padding='pre')

In [15]:
x = padded_input[:,:-1]
y = padded_input[:,-1]
print(x)
print(y)

[[  0   0   0 ...   0   0   3]
 [  0   0   0 ...   0   3   4]
 [  0   0   0 ...   3   4  14]
 ...
 [  0   0 118 ... 178 179   2]
 [  0 118 119 ... 179   2  39]
 [118 119 120 ...   2  39 180]]
[  4  14   6   3  40   2  41  20  21   6  42  43   8  44  45  22  23  46
   9  47  15  48  24  49   7  16  50  51  52  53   3  25  26  54  55  27
  28  21  17   1  56   2  27  57  58   1  59   2  18  24   6  60   1  10
  11  61   7  29   3  62  63  64   1  65  30  66   2  31   8  67  68   5
   7   1  10  11  69  70  18   9  71   7  72  18  31   7   6  73  74   5
  75  76   1  32   2   3   4  14  77   3  78  79   3  33   2  80  81  20
   1  82   2  83  15  84  85  86  87  88  89  90   1  12   2   1  91  26
   6  92  93  16  94  95  96  33   4  13  97  34  35  98  99  12   4  13
  36 100 101   5  35 102   2  37   1  12   2  19 103 104 105   4  13 106
  17   1  32   2 107 108 109 110  19  28 111 112 113 114 115  22  23 116
   2  37 117  25 119 120 121 122 123   1 124 125  38 126   8   4  13  36
 127 

In [16]:
token.word_index

{'the': 1,
 'of': 2,
 'a': 3,
 'black': 4,
 'to': 5,
 'is': 6,
 'it': 7,
 'that': 8,
 'radiation': 9,
 'event': 10,
 'horizon': 11,
 'mass': 12,
 'holes': 13,
 'hole': 14,
 'as': 15,
 'this': 16,
 'at': 17,
 'no': 18,
 'our': 19,
 'where': 20,
 'gravity': 21,
 'or': 22,
 'even': 23,
 'escape': 24,
 'massive': 25,
 'star': 26,
 'its': 27,
 'own': 28,
 'has': 29,
 'and': 30,
 'light': 31,
 'center': 32,
 'point': 33,
 'in': 34,
 'several': 35,
 'are': 36,
 'times': 37,
 'hawking': 38,
 'these': 39,
 'region': 40,
 'spacetime': 41,
 'so': 42,
 'strong': 43,
 'nothing—no': 44,
 'particles': 45,
 'electromagnetic': 46,
 'such': 47,
 'light—can': 48,
 'from': 49,
 'cosmic': 50,
 'entity': 51,
 'forms': 52,
 'when': 53,
 'collapses': 54,
 'under': 55,
 'end': 56,
 'life': 57,
 'cycle': 58,
 'boundary': 59,
 'called': 60,
 'although': 61,
 'great': 62,
 'effect': 63,
 'on': 64,
 'fate': 65,
 'trajectory': 66,
 'passes': 67,
 'close': 68,
 'itself': 69,
 'emits': 70,
 'because': 71,
 'reflects'

In [30]:
from tensorflow.keras.utils import to_categorical
to_categorical(y,num_classes=182)

array([[[1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        ...,
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.]],

       [[1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        ...,
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.]],

       [[1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        ...,
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.]],

       ...,

       [[1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        ...,
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0.

In [31]:
y.shape

(288, 182)

In [32]:
x.shape

(288, 104)

In [40]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GRU, Embedding

In [41]:
model = Sequential()
model.add(Embedding(182,100,input_length=55))
model.add(GRU(150))
model.add(Dense(182,activation='softmax'))

In [42]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [43]:
model.summary()

In [44]:
x.shape

(288, 104)

In [45]:
y.shape

(288, 182)

In [46]:
model.fit(x,y,epochs=100)

Epoch 1/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 269ms/step - accuracy: 0.0191 - loss: 5.2022
Epoch 2/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 141ms/step - accuracy: 0.1097 - loss: 5.1715
Epoch 3/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 143ms/step - accuracy: 0.0770 - loss: 5.1242
Epoch 4/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 143ms/step - accuracy: 0.0703 - loss: 4.8902
Epoch 5/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 147ms/step - accuracy: 0.1096 - loss: 4.8485
Epoch 6/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 140ms/step - accuracy: 0.0937 - loss: 4.7820
Epoch 7/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 207ms/step - accuracy: 0.0876 - loss: 4.7217
Epoch 8/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 145ms/step - accuracy: 0.1124 - loss: 4.6105
Epoch 9/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x79154a55add0>

In [47]:
words_to_generate = 3
text_1 = "black hole"

token_text_1 =token.texts_to_sequences([text_1])[0]


In [48]:
padded_text = pad_sequences([token_text_1],maxlen=33,padding='pre')

In [49]:
model.predict(padded_text)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 262ms/step


array([[1.40467620e-08, 1.22717367e-02, 3.43349093e-05, 4.09582146e-02,
        8.94976954e-04, 1.77820995e-02, 9.03212309e-01, 1.33733696e-03,
        2.59631313e-04, 5.96857026e-05, 1.02108982e-06, 6.64075997e-05,
        3.89954628e-04, 1.01686172e-04, 9.36561089e-04, 2.03053914e-05,
        3.52861352e-05, 1.54220476e-03, 1.87797850e-04, 5.93297864e-06,
        3.78067452e-05, 5.93144323e-05, 6.48374066e-09, 5.80239089e-07,
        5.39165048e-04, 6.71874441e-05, 1.74183078e-04, 1.16858151e-04,
        1.22897845e-05, 7.14759692e-04, 9.12791147e-05, 1.66176382e-04,
        6.17507938e-03, 4.02033902e-06, 1.04747072e-04, 1.62617434e-04,
        2.22506686e-04, 3.78900336e-06, 4.03471586e-06, 1.59628371e-06,
        1.29230335e-04, 7.21835204e-06, 3.99991914e-05, 1.36058918e-06,
        1.81218212e-07, 7.93823389e-08, 6.01249099e-07, 1.93470496e-05,
        5.21685324e-05, 7.10268432e-05, 3.72860291e-06, 6.34120340e-07,
        2.30073215e-06, 9.79785000e-06, 7.36192169e-06, 2.593489

In [50]:
import numpy as np
pos = np.argmax(model.predict(padded_text))
pos

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step


np.int64(6)

In [51]:
for word, index in token.word_index.items():
  if index == pos:
    print(word)

is


In [52]:
text_2 = "A black hole is a region of"

token_text_2 =token.texts_to_sequences([text_2])[0]
padded_text_1 = pad_sequences([token_text_2],maxlen=33,padding='pre')
model.predict(padded_text_1)
pos_1 = np.argmax(model.predict(padded_text_1))

for word, index in token.word_index.items():
  if index == pos_1:
    print(word)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 81ms/step
spacetime


In [53]:
text_3 = "A black hole is a region of spacetime where gravity is so strong that nothing—no particles or even"

token_text_3 =token.texts_to_sequences([text_3])[0]
padded_text_2 = pad_sequences([token_text_3],maxlen=33,padding='pre')
model.predict(padded_text_2)
pos_2 = np.argmax(model.predict(padded_text_2))

for word, index in token.word_index.items():
  if index == pos_2:
    print(word)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step
electromagnetic


In [61]:
text_4 = " being inescapable, the center of a black hole lies a"

token_text_4 =token.texts_to_sequences([text_4])[0]
padded_text_3 = pad_sequences([token_text_4],maxlen=33,padding='pre')
model.predict(padded_text_3)
pos_3 = np.argmax(model.predict(padded_text_3))

for word, index in token.word_index.items():
  if index == pos_3:
    print(word)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
gravitational


In [65]:
text_predict = "A black hole is a region of spacetime where gravity is so strong that nothing—no particles or even"
words_to_generate = 5

for i in range(words_to_generate):
    tokenized_text = token.texts_to_sequences([text_predict])[0]
    padded_text = pad_sequences([tokenized_text], maxlen=max_len, padding='pre')
    predicted_index = np.argmax(model.predict(padded_text, verbose=0))

    predicted_word = ""
    for word, index in token.word_index.items():
        if index == predicted_index:
            predicted_word = word
            break
    text_predict = text_predict + " " + predicted_word

print(f"\npredicted text: '{text_predict}'")



Final predicted text: 'A black hole is a region of spacetime where gravity is so strong that nothing—no particles or even electromagnetic radiation such as light—can'
