In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [34]:
corpus = [
    "Sudhanshu's commitment to affordable education wasn't just a business strategy—it was his life's mission. Over the years, iNeuron has helped over 1.5 million students from 34+ countries, providing them with the skills they need to succeed in today's competitive job market. Many of these students, like Sudhanshu himself, came from disadvantaged backgrounds. They saw iNeuron as a lifeline—an opportunity to rise above their circumstances. By focusing on high-quality, practical, and industry-relevant content, iNeuron made advanced technologies like data science, machine learning, and artificial intelligence accessible to students who otherwise might never have had the chance to learn them.",
    
    "In 2022, iNeuron was acquired by PhysicsWallah in a deal worth ₹250 crore. While this acquisition was a significant milestone, Sudhanshu remained focused on his mission. Even after the acquisition, iNeuron continued to offer some of the most affordable and accessible tech courses in the world. This acquisition allowed iNeuron to expand its offerings, including specialized programs in deep learning, natural language processing, and computer vision, helping thousands of learners prepare for real-world projects and global job opportunities in the AI industry while maintaining the promise of affordable education for all.",
    
    "Deep learning is a branch of machine learning that focuses on neural networks with many layers, allowing models to learn complex patterns in data. It is widely used in computer vision, natural language processing, and generative models, enabling breakthroughs in image recognition, speech-to-text systems, and autonomous vehicles. By leveraging large datasets and high computational power, deep learning models like convolutional neural networks (CNNs) and transformers are pushing the boundaries of what machines can understand and generate in the context of human-like tasks.",
    
    "Natural language processing (NLP) is a field of AI that focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate text and speech. NLP includes tasks like language modeling, sentiment analysis, named entity recognition, and question answering, empowering systems to process vast amounts of unstructured data efficiently. Libraries such as spaCy, NLTK, and Hugging Face Transformers have made it easier for researchers and developers to build advanced NLP systems for real-world applications like chatbots, virtual assistants, and automated content generation tools.",
    
    "Artificial Intelligence (AI) is the future of technology, reshaping industries by automating complex tasks, optimizing processes, and creating intelligent systems that can learn and adapt. From healthcare and education to finance and transportation, AI is driving innovation, making systems smarter and more efficient. By leveraging machine learning and deep learning, AI models are able to process large-scale data, recognize patterns, and make predictions, contributing to the development of self-driving cars, personalized medicine, and intelligent tutoring systems.",
    
    "I enjoy teaching AI to students who are passionate about exploring technology and its potential to solve real-world problems. Through teaching, I get to witness the excitement of learners as they build machine learning models, implement deep learning architectures, and understand how algorithms work under the hood. It is fulfilling to guide students in developing projects like image classifiers, chatbots, and recommendation systems while nurturing a mindset that combines curiosity, critical thinking, and ethical considerations in AI development.",
    
    "Students love AI projects because they allow them to apply theoretical knowledge to practical scenarios, helping them develop a deeper understanding of concepts like supervised learning, unsupervised learning, and reinforcement learning. Working on projects involving computer vision, natural language processing, and predictive modeling provides students with hands-on experience, enhancing their problem-solving abilities and preparing them for industry-level challenges. AI projects also encourage creativity, as students explore innovative ways to address complex problems using technology.",
    
    "Learning new things is exciting, especially in the rapidly evolving field of artificial intelligence and machine learning, where new research and advancements emerge regularly. Exploring areas like generative AI, reinforcement learning, and advanced natural language processing helps learners stay updated and develop skills that are highly relevant in today's tech industry. Continuous learning fosters adaptability, which is essential for professionals looking to contribute to the development of intelligent systems that positively impact society.",
    
    "Teaching AI is rewarding as it allows educators to empower the next generation of data scientists, machine learning engineers, and AI researchers. By breaking down complex concepts into understandable modules and providing real-world examples, teachers can make learning engaging and practical for students. Educators play a crucial role in instilling ethical AI practices, ensuring that students understand the societal implications of technology and the importance of building fair, transparent, and unbiased AI systems for the benefit of all."
]


In [35]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)


In [36]:
tokenizer.index_word

{1: 'and',
 2: 'to',
 3: 'the',
 4: 'of',
 5: 'learning',
 6: 'in',
 7: 'ai',
 8: 'students',
 9: 'a',
 10: 'like',
 11: 'is',
 12: 'systems',
 13: 'for',
 14: 'language',
 15: 'that',
 16: 'ineuron',
 17: 'by',
 18: 'on',
 19: 'machine',
 20: 'them',
 21: 'as',
 22: 'data',
 23: 'world',
 24: 'deep',
 25: 'natural',
 26: 'processing',
 27: 'projects',
 28: 'models',
 29: 'they',
 30: 'industry',
 31: 'real',
 32: 'complex',
 33: 'it',
 34: 'are',
 35: 'understand',
 36: 'technology',
 37: 'affordable',
 38: 'education',
 39: 'was',
 40: 'from',
 41: 'with',
 42: 'practical',
 43: 'advanced',
 44: 'artificial',
 45: 'intelligence',
 46: 'learn',
 47: 'while',
 48: 'acquisition',
 49: 'computer',
 50: 'vision',
 51: 'learners',
 52: 'can',
 53: 'tasks',
 54: 'nlp',
 55: 'intelligent',
 56: 'development',
 57: 'teaching',
 58: 'his',
 59: 'mission',
 60: 'over',
 61: 'providing',
 62: 'skills',
 63: "today's",
 64: 'job',
 65: 'many',
 66: 'sudhanshu',
 67: 'their',
 68: 'high',
 69: 're

In [37]:
total_words = len(tokenizer.word_index) + 1

In [38]:
total_words

369

In [39]:
input_sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i + 1]
        input_sequences.append(n_gram_sequence)
        
        

In [40]:
input_sequences

[[114, 115],
 [114, 115, 2],
 [114, 115, 2, 37],
 [114, 115, 2, 37, 38],
 [114, 115, 2, 37, 38, 116],
 [114, 115, 2, 37, 38, 116, 117],
 [114, 115, 2, 37, 38, 116, 117, 9],
 [114, 115, 2, 37, 38, 116, 117, 9, 118],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60, 3],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60, 3, 121],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60, 3, 121, 16],
 [114,
  115,
  2,
  37,
  38,
  116,
  117,
  9,
  118,
  119,
  39,
  58,
  120,
  59,
  60,
  3,
  121,
  16,
  122],
 [114,
  115,
  2,
  37,
  38,
  116,
  117,
  9,
  118,
  119,
  39,
  58,
  120,
  59,
  6

In [41]:
max_seq_len = max(len(i) for i in input_sequence)
input_sequence = pad_sequences(input_sequence,maxlen=max_seq_len , padding='pre')

In [42]:
input_sequences

[[114, 115],
 [114, 115, 2],
 [114, 115, 2, 37],
 [114, 115, 2, 37, 38],
 [114, 115, 2, 37, 38, 116],
 [114, 115, 2, 37, 38, 116, 117],
 [114, 115, 2, 37, 38, 116, 117, 9],
 [114, 115, 2, 37, 38, 116, 117, 9, 118],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60, 3],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60, 3, 121],
 [114, 115, 2, 37, 38, 116, 117, 9, 118, 119, 39, 58, 120, 59, 60, 3, 121, 16],
 [114,
  115,
  2,
  37,
  38,
  116,
  117,
  9,
  118,
  119,
  39,
  58,
  120,
  59,
  60,
  3,
  121,
  16,
  122],
 [114,
  115,
  2,
  37,
  38,
  116,
  117,
  9,
  118,
  119,
  39,
  58,
  120,
  59,
  6

In [43]:
input_sequences_padded

array([[  0,   0,   0, ...,   0,  21,  22],
       [  0,   0,   0, ...,  21,  22,   5],
       [  0,   0,   0, ...,  22,   5,  12],
       ...,
       [  0,   0,   0, ...,   0,  20,   4],
       [  0,   0,   0, ...,  20,   4,   3],
       [  0,   0,   0, ...,   4,   3, 103]])

In [44]:
x = input_sequences_padded[:,:-1]
y = input_sequences_padded[:,-1] 

In [45]:
x

array([[ 0,  0,  0, ...,  0,  0, 21],
       [ 0,  0,  0, ...,  0, 21, 22],
       [ 0,  0,  0, ..., 21, 22,  5],
       ...,
       [ 0,  0,  0, ...,  0,  0, 20],
       [ 0,  0,  0, ...,  0, 20,  4],
       [ 0,  0,  0, ..., 20,  4,  3]])

In [46]:
y

array([ 22,   5,  12,  23,  24,  25,   1,  26,  27,   9,  13,  28,  14,
        15,   2,  29,   6,  30,  31,  15,  32,  33,  34,  10,  16,  35,
        36,  37,  38,  39,   2,  40,  17,  41,   5,  42,   7,  43,  44,
        45,  46,  47,   8,  48,  10,  49,  18,  50,  51,  16,  52,  53,
        17,  54,   6,  55,   1,  56,  57,   5,  58,  59,  60,  61,  62,
         6,   9,  63,  64,  65,   7,   1,  66,  67,  68,  69,  70,  71,
        19,   9,   1,  72,  73,  18,  74,  75,  76,  13,  14,  77,  78,
         2,  19,   6,  79,   5,  80,  81,   8,   2,  82,  12,  83,  84,
        85,  86,   7,   2,  87,  11,   3,   1,  89,   8,  90,  11,  92,
        93,   3,   1,  94,   8,   4,   3,   2,  95,  97,  20,   4,  98,
         4,  99, 100, 101,   3, 102,   4,   3, 103])

In [47]:
y  = tf.keras.utils.to_categorical(y,num_classes=total_words)

In [48]:
y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [49]:
model = Sequential([
    Embedding(total_words,32),
    LSTM(64),
    Dense(total_words,activation='softmax')
])

In [50]:
total_words

369

In [51]:
model.compile(loss = 'categorical_crossentropy' , optimizer='adam' , metrics=['accuracy'])

In [52]:
model.fit(x,y,epochs=400,verbose=1)

Epoch 1/400


[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - accuracy: 0.0037 - loss: 5.9110  
Epoch 2/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.0491 - loss: 5.8995
Epoch 3/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0402 - loss: 5.8821
Epoch 4/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0654 - loss: 5.8093
Epoch 5/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.0372 - loss: 5.5246
Epoch 6/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.0341 - loss: 5.2508
Epoch 7/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0328 - loss: 4.9745
Epoch 8/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0302 - loss: 4.7986
Epoch 9/400
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

<keras.src.callbacks.history.History at 0x20c49980a00>

In [53]:
tokenizer.word_index

{'and': 1,
 'to': 2,
 'the': 3,
 'of': 4,
 'learning': 5,
 'in': 6,
 'ai': 7,
 'students': 8,
 'a': 9,
 'like': 10,
 'is': 11,
 'systems': 12,
 'for': 13,
 'language': 14,
 'that': 15,
 'ineuron': 16,
 'by': 17,
 'on': 18,
 'machine': 19,
 'them': 20,
 'as': 21,
 'data': 22,
 'world': 23,
 'deep': 24,
 'natural': 25,
 'processing': 26,
 'projects': 27,
 'models': 28,
 'they': 29,
 'industry': 30,
 'real': 31,
 'complex': 32,
 'it': 33,
 'are': 34,
 'understand': 35,
 'technology': 36,
 'affordable': 37,
 'education': 38,
 'was': 39,
 'from': 40,
 'with': 41,
 'practical': 42,
 'advanced': 43,
 'artificial': 44,
 'intelligence': 45,
 'learn': 46,
 'while': 47,
 'acquisition': 48,
 'computer': 49,
 'vision': 50,
 'learners': 51,
 'can': 52,
 'tasks': 53,
 'nlp': 54,
 'intelligent': 55,
 'development': 56,
 'teaching': 57,
 'his': 58,
 'mission': 59,
 'over': 60,
 'providing': 61,
 'skills': 62,
 "today's": 63,
 'job': 64,
 'many': 65,
 'sudhanshu': 66,
 'their': 67,
 'high': 68,
 'releva

In [54]:
def predict_next_word(seed_text, num_words=5):
    for _ in range(num_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_seq_len - 1, padding='pre')
        predicted = model.predict(token_list, verbose=0)
        next_word_index = np.argmax(predicted)
        for word, index in tokenizer.word_index.items():
            if index == next_word_index:
                seed_text += ' ' + word
                break
    return seed_text


In [55]:
predict_next_word("natural language is  ")

'natural language is   the and generate students of'