# This is just a practice project for LSTM

$$ ~Vraj Patel $$

In [18]:
faqs = """Online Python Bootcamp FAQ

Program Overview
What is the enrollment fee for Python Bootcamp 2026?
We charge a one-time fee of $149 for lifetime access to all materials and sessions.

How long does the bootcamp last?
The program spans 12 weeks with live coding sessions twice weekly, totaling 24 hours of instruction.

What topics are covered in the curriculum?
Core modules include:
Python Basics and Syntax
Data Structures & Algorithms
File I/O and Error Handling
Object-Oriented Programming
Web Scraping with BeautifulSoup
API Development with Flask
Database Integration (SQLite/PostgreSQL)
Testing with pytest
Deployment on Heroku
Real-world projects
Detailed outline: https://pythonbootcamp.io/syllabus

Does it include advanced topics like async programming?
Yes, Weeks 9-10 cover asyncio, threading, and concurrent.futures for scalable applications.

What if I miss a live coding session?
All sessions are recorded in HD and available in your private dashboard within 2 hours.

Where's the weekly schedule?
View the live calendar here: https://calendar.google.com/bootcamp-schedule

Typical session length?
Each live session runs 90 minutes: 45 min teaching + 45 min coding challenges.

Instructor language?
English with code walkthroughs; Q&A supports multiple languages via chat.

Class notifications?
Email reminders 24 hours before each session + Discord pings for enrolled students.

Suitable for complete beginners?
Yes – starts from zero programming knowledge with daily practice assignments.

Can I join after Week 1?
Yes, catch-up materials are unlocked immediately upon enrollment.

Access to previous weeks after late join?
Full lifetime access to all 12 weeks + bonus content from day one.

Assignment submission process?
Self-paced with automated testing; solutions + video explanations provided weekly.

Any capstone projects?
Yes, final Week 12: Build and deploy a full-stack web app portfolio piece.

Support contact?
Email support@pythonbootcamp.io or Discord #help channel (24/7 moderation).

Payment & Access Questions
Where do payments process?
Through our secure Stripe portal at pythonbootcamp.io/checkout.

Can I pay in installments?
Yes: 3 payments of $59 (paid monthly) or full $149 upfront (10% discount).

Subscription renewal date?
One-time payment = lifetime access. No recurring charges.

Refund policy details?
14-day money-back guarantee if unsatisfied after Week 1 completion.

International payment issues?
Contact support@pythonbootcamp.io – we accept PayPal, cards, and crypto.

After Enrollment Queries
Video access duration?
Lifetime access to all recordings, updates, and new bonus content forever.

Why no expiration on materials?
Single payment model ensures long-term value for serious learners.

Post-session doubt clearing?
Weekly office hours + private Discord channels for code reviews.

Late joiners: past content access?
Everything unlocks immediately – no catch-up fees required.

Payment failed internationally – next steps?
Reply to confirmation email or Discord #billing with transaction ID.

Certification & Career Support
Certificate eligibility?
Complete 80% of assignments + final project submission (auto-verified).

Late enrollment payment for missed weeks?
Pro-rated credits applied automatically to your dashboard.

What's included in career support?
Resume reviews (3 rounds)
Mock technical interviews (recorded)
LinkedIn profile optimization
Job board access (200+ Python roles)
Salary negotiation guides
No job guarantees – focus on skill-building
"""

In [19]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer

In [20]:
tokenizer = Tokenizer()

In [21]:
tokenizer.fit_on_texts([faqs])

In [22]:
tokenizer.word_index

{'access': 1,
 'for': 2,
 'and': 3,
 'with': 4,
 'the': 5,
 'to': 6,
 'payment': 7,
 'weeks': 8,
 'in': 9,
 'yes': 10,
 'session': 11,
 '–': 12,
 'support': 13,
 'python': 14,
 'bootcamp': 15,
 'enrollment': 16,
 'a': 17,
 'of': 18,
 'lifetime': 19,
 'all': 20,
 'live': 21,
 'weekly': 22,
 'hours': 23,
 'i': 24,
 'pythonbootcamp': 25,
 'io': 26,
 'discord': 27,
 'after': 28,
 'no': 29,
 'what': 30,
 'one': 31,
 'materials': 32,
 'sessions': 33,
 '12': 34,
 'coding': 35,
 '24': 36,
 'are': 37,
 'programming': 38,
 'on': 39,
 'email': 40,
 'week': 41,
 'late': 42,
 'full': 43,
 'content': 44,
 'or': 45,
 'program': 46,
 'fee': 47,
 'we': 48,
 'time': 49,
 '149': 50,
 'long': 51,
 'does': 52,
 'topics': 53,
 'include': 54,
 'web': 55,
 'testing': 56,
 'projects': 57,
 'https': 58,
 '10': 59,
 'if': 60,
 'recorded': 61,
 'your': 62,
 'private': 63,
 'dashboard': 64,
 'schedule': 65,
 'calendar': 66,
 'each': 67,
 '45': 68,
 'min': 69,
 'code': 70,
 'complete': 71,
 'from': 72,
 'assignment

In [23]:
input_sequences = []
for sentence in faqs.split('\n'):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])

Now we have simply converted our entire data (faqs) into numbers

In [24]:
input_sequences

[[92, 14],
 [92, 14, 15],
 [92, 14, 15, 93],
 [46, 94],
 [30, 95],
 [30, 95, 5],
 [30, 95, 5, 16],
 [30, 95, 5, 16, 47],
 [30, 95, 5, 16, 47, 2],
 [30, 95, 5, 16, 47, 2, 14],
 [30, 95, 5, 16, 47, 2, 14, 15],
 [30, 95, 5, 16, 47, 2, 14, 15, 96],
 [48, 97],
 [48, 97, 17],
 [48, 97, 17, 31],
 [48, 97, 17, 31, 49],
 [48, 97, 17, 31, 49, 47],
 [48, 97, 17, 31, 49, 47, 18],
 [48, 97, 17, 31, 49, 47, 18, 50],
 [48, 97, 17, 31, 49, 47, 18, 50, 2],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19, 1],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19, 1, 6],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19, 1, 6, 20],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19, 1, 6, 20, 32],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19, 1, 6, 20, 32, 3],
 [48, 97, 17, 31, 49, 47, 18, 50, 2, 19, 1, 6, 20, 32, 3, 33],
 [98, 51],
 [98, 51, 52],
 [98, 51, 52, 5],
 [98, 51, 52, 5, 15],
 [98, 51, 52, 5, 15, 99],
 [5, 46],
 [5, 46, 100],
 [5, 46, 100, 34],
 [5, 46, 100, 34, 8],
 [5, 46, 100, 34, 8, 4],
 [5, 46,

In [25]:
# since the size of all these lists above is not the same, we will apply zero padding
# for applying zero padding, we will just identify the length of the sentence with max length
max_len = max([len(x) for x in input_sequences])

In [44]:
max_len

16

In [26]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [27]:
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len, padding='pre')

In [29]:
padded_input_sequences[0]

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 92, 14],
      dtype=int32)

In [30]:
# the last element in all these list (padded_input_sequences) will be the output
X = padded_input_sequences[:,:-1]
y = padded_input_sequences[:,-1]

In [33]:
X.shape

(432, 15)

In [32]:
y.shape

(432,)

In [34]:
len(tokenizer.word_index)

315

In [36]:
# we will consider this as a multiclass classification problem

from tensorflow.keras.utils import to_categorical
y = to_categorical(y, num_classes=316)

In [37]:
y.shape

(432, 316)

In [41]:
y[213] # we just use OHE on output

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0.

In [43]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [48]:
model = Sequential()
model.add(Embedding(316, 100, input_length=max_len, input_shape=(max_len,)))
model.add(LSTM(100))
model.add(Dense(316, activation='softmax'))

  super().__init__(**kwargs)


In [52]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [53]:
model.summary()

In [54]:
model.fit(X,y,epochs=100)

Epoch 1/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 12ms/step - accuracy: 0.0026 - loss: 5.7553
Epoch 2/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0219 - loss: 5.7155
Epoch 3/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0044 - loss: 5.5600
Epoch 4/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.0146 - loss: 5.4728
Epoch 5/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0185 - loss: 5.4021
Epoch 6/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0250 - loss: 5.3609
Epoch 7/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0282 - loss: 5.2276
Epoch 8/100
[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0303 - loss: 5.2112
Epoch 9/100
[1m14/14[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x7a089b6f78c0>

In [55]:
import time
text = "What is the enrollment"

for i in range(10):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=56, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      print(text)
      time.sleep(2)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 150ms/step
What is the enrollment fee
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
What is the enrollment fee for
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
What is the enrollment fee for fee
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
What is the enrollment fee for fee for
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
What is the enrollment fee for fee for python
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
What is the enrollment fee for fee for python bootcamp
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step
What is the enrollment fee for fee for python bootcamp faq
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
What is the enrollment fee for fee for python bootcamp faq python
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51m