1. One-Hot Encoding ki Kami (Sparse Data)
Pahle hum words ko aise likhte the:

Apple = [1, 0, 0, 0]

Orange = [0, 1, 0, 0]

Isme do badi mushkilein thi:

Size: Agar aapki dictionary mein 10,000 words hain, toh har ek word ke liye 10,000 numbers ki list banani padegi. Ye memory bahut consume karta hai.

Relationship: Is method mein "Apple" aur "Orange" ke beech koi rishta nahi dikhta. Computer ko lagta hai "Apple" aur "Sky" jitne alag hain, "Apple" aur "Orange" bhi utne hi alag hain.



### 2. Embedding ka Jadoo (Dense & Meaningful)
Embedding words ko ek "Vector Space" mein rakhti hai. Iske 3 bade fayde hain:

A. Similarity (Rishtey Samjhna)
Embedding mein milte-julte words ek-dusre ke paas hote hain.

"King" aur "Queen" paas honge.

"Diabetes" aur "Glucose" paas honge.
Isse aapka model ye samajh jata hai ki agar user "Sugar" bol raha hai, toh uska matlab "Glucose" se hi hai.

B. Dimensions (Gehraai)
Jaise hum ek insaan ko define karte hain (Height, Weight, Age), waise hi Embedding har word ko 32 ya 100 features mein tod deti hai.

Ek dimension "Fruit" ka ho sakta hai.

Ek dimension "Color" ka ho sakta hai.
"Apple" aur "Orange" ka "Fruit" waala number match karega, isliye model unhe ek category mein rakhega.

C. Memory Bachana
Jahan One-Hot 10,000 numbers leta tha, Embedding wahi kaam sirf 32 ya 64 numbers (dense vectors) mein kar leti hai. Isse model fast chalta hai aur memory kam leta hai.

In [5]:
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

# 1. Sample Data
reviews = ['The movie was amazing', 'Total waste of time']

# iska matlb hai that we want every word to be first converted to numbers between 0-499 and should be 
# given an index between it 
vocab_size = 500  # Range of numbers for hashing 

# 2. Convert to One-Hot (Integers)
# Iska naam one_hot hai, par ye actually "integer indices" deta hai
# one_hot converts every word to an integer index based upon the vocabulary size
encoded_reviews = [one_hot(text, vocab_size) for text in reviews]

# 3. Padding (Sabki length barabar karne ke liye)
max_length = 5
# now every sentence has different lenght but for rnn every sentence should have same length
padded_reviews = pad_sequences(encoded_reviews, maxlen=max_length, padding='post')

print(padded_reviews)

[[ 36 241 112 170   0]
 [  7 227  16 433   0]]


#### why not sentence-transformers for LSTM and RNN instead of Embedding layer of tensorflow.keras

1. "Internal" vs "External" Embedding
Keras Embedding Layer: Ye model ke andar hoti hai. Jab aap model train karte hain, toh ye layer aapke specific data (movie reviews ya medical terms) ke hisaab se words ka matlab khud seekhti hai.

Sentence Transformers: Ye bahar se aate hain. Ye pehle se millions of sentences par trained hain. Inhe use karne ke liye aapko pehle text ko vectors mein badalna padega, phir wo vectors LSTM ko dene honge.

2. Architecture ka Farq (Words vs Sentences)
LSTM ka kaam: LSTM sequence ko "word-by-word" padhta hai. Use har step par ek word ka vector chahiye hota hai.

Sentence Transformer ka kaam: Iska maqsad pure sentence ka ek single vector banana hota hai.

Agar aap pure sentence ka vector bana denge, toh LSTM ko dene ke liye "sequence" bachegi hi nahi! Phir aapko LSTM ki zaroorat hi nahi padegi, aap seedha ek simple Dense Layer use kar sakte hain.

In [12]:
model = Sequential([
    # input_dim is your vocab_size
    Embedding(input_dim=vocab_size, output_dim=32, input_length=max_length,input_shape=(max_length,)),
])

  super().__init__(**kwargs)


2. Embedding(input_dim=vocab_size, output_dim=32, input_length=max_length)
Ye layer sabse important hai kyunki ye Words ko Numbers mein badalti hai.

input_dim=vocab_size: Aapki dictionary mein total kitne words hain? Agar aapne 500 rakha hai, toh model sirf 500 unique words ko pehchanega.

output_dim=32: Har word ko kitne numbers se describe karna hai? Jaise ek insaan ko height, weight, aur age se describe karte hain, waise hi yahan har word ko 32 alag-alag features (numbers) milenge. Isse model ko "Good" aur "Great" ke beech ka connection samajh aata hai.

input_length=max_length: Ek review mein maximum kitne words honge? Agar user ne 10 words likhe aur aapka max_length=5 hai, toh ye sirf pehle 5 words lega.

LSTM(64)
Ye aapke model ka "Memory Cell" ya "Dimag" hai.

64 (Units): Iska matlab hai ki model ke paas 64 "dimagi nasien" (neurons) hain jo sequence ko yaad rakhti hain.

Kaam: LSTM sentence ko shuru se end tak padhta hai. Agar sentence hai "The movie was not good", toh LSTM yaad rakhega ki "good" se pehle "not" aaya tha, isliye iska matlab negative hai. Simple RNN ye aksar bhool jata hai, par LSTM yaad rakhta hai.

4. Dense(1, activation='sigmoid')Ye final "Decision Maker" layer hai.1: Humein sirf ek final answer chahiye (Ki review positive hai ya negative).activation='sigmoid': Ye ek filter hai jo result ko 0 aur 1 ke beech le aata hai.Agar result 0.8 aaya $\rightarrow$ Matlab 80% chance hai ki review Positive hai.Agar result 0.2 aaya $\rightarrow$ Matlab review Negative hai.

In [13]:
model.compile(
    optimizer='adam', 
    loss='binary_crossentropy', 
    metrics=['accuracy']
)

In [14]:
model.summary()

In [15]:
model.predict(padded_reviews)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 220ms/step


array([[[-0.04456632, -0.00750878,  0.04759869, -0.01851822,
         -0.00873413, -0.04987632,  0.00406854, -0.03519976,
         -0.00211903,  0.00092519,  0.03283632,  0.0207273 ,
         -0.0460847 , -0.04569472, -0.02675011, -0.00558486,
          0.02653401,  0.00088926,  0.00463434, -0.01090608,
         -0.02194557, -0.01826362,  0.01472653, -0.01031782,
         -0.04495148,  0.04729522, -0.03960258,  0.00525799,
         -0.00978957,  0.00914575,  0.02304592, -0.04096989],
        [-0.04909236,  0.04885465, -0.02873277, -0.02348728,
         -0.01546708, -0.0474305 , -0.01624734,  0.01899756,
         -0.02433624, -0.00179497, -0.01582281,  0.02415912,
         -0.03897215, -0.04519848,  0.02969139,  0.04172743,
          0.00397807, -0.01042148, -0.04584536,  0.02621825,
         -0.03939726, -0.02421489, -0.03189909, -0.04971664,
          0.03967513,  0.03024052, -0.03661814, -0.01110115,
         -0.02595755,  0.03313417, -0.04156876,  0.03931458],
        [-0.02947985, 

In [16]:
padded_reviews[0]

array([ 36, 241, 112, 170,   0], dtype=int32)

In [None]:
# Yeh sahi tareeka hai
prediction = model.predict(padded_reviews[0:1])
print(prediction) # word embedding of first sentence 

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 191ms/step
[[[-0.04456632 -0.00750878  0.04759869 -0.01851822 -0.00873413
   -0.04987632  0.00406854 -0.03519976 -0.00211903  0.00092519
    0.03283632  0.0207273  -0.0460847  -0.04569472 -0.02675011
   -0.00558486  0.02653401  0.00088926  0.00463434 -0.01090608
   -0.02194557 -0.01826362  0.01472653 -0.01031782 -0.04495148
    0.04729522 -0.03960258  0.00525799 -0.00978957  0.00914575
    0.02304592 -0.04096989]
  [-0.04909236  0.04885465 -0.02873277 -0.02348728 -0.01546708
   -0.0474305  -0.01624734  0.01899756 -0.02433624 -0.00179497
   -0.01582281  0.02415912 -0.03897215 -0.04519848  0.02969139
    0.04172743  0.00397807 -0.01042148 -0.04584536  0.02621825
   -0.03939726 -0.02421489 -0.03189909 -0.04971664  0.03967513
    0.03024052 -0.03661814 -0.01110115 -0.02595755  0.03313417
   -0.04156876  0.03931458]
  [-0.02947985  0.02949066 -0.03890799  0.03475604  0.03569383
   -0.01547105 -0.01386439  0.02437616  0.03058403  