###       Sarcasm Detection Using Positional Encoding, Self_Attention and Multi-Head Attention

## Project Overview :
    
    This project aims to develop a deep leaning model for detecting the Sarcasm in a given sentences. The model is built using Transformer-Encoder Block, which are well suited for the detecting the Sarcasm tasks. The project includes the following steps:

1. Data Collection: We use the text of Sarcasm_Detection_Dataset_v2.json  dataset. This dataset has rich complex text provides a good challenge for out mode.

2. Data Preprocessing: The text data is tokenized, converted into sequences, and padded to ensure unifrom input lengths. The sequences are then split into training and testing set.

3. Model Building: An Transformer-Based Model is constructed with an

  -> MultiHeadAttention
  -> Positional Encoding
  -> LayerNormalization
  -> Feed-Forward Network

4. Model Training: The model is trained using the prepared sequences.

5. Model Evaluation: The model is evaluated using a set of example sentences to test its ability to detect whether the given sentence is sarcastic or not.

In [None]:
# Step-1 : Import the necessary libraries
# Step-2 : Load the dataset from the JSON file
import pandas as pd
import json
with open('Sarcasm_Headlines_Dataset_v2.json', 'r') as f:
    data = [json.loads(line) for line in f]
df = pd.DataFrame(data)
df = df[['headline','is_sarcastic']]
print(df.head())

                                            headline  is_sarcastic
0  thirtysomething scientists unveil doomsday clo...             1
1  dem rep. totally nails why congress is falling...             0
2  eat your veggies: 9 deliciously different recipes             0
3  inclement weather prevents liar from getting t...             1
4  mother comes pretty close to using word 'strea...             1


### Code Explanation :

-> import pandas : importing the library pandas.
-> data = pd.DataFrame(data) : loads the dataset from the JSON file
-> data.head() : Display the first few rows to inspect the structure of the data

In [None]:
# Step-3 : Clean the text data
# Importing necessary libraries for text cleaning

import re

# Clean text function to remove unwanted characters and normalize
def clean_text(text):
    if isinstance(text, str):

        # Remove all characters except alphabets and space
        text = re.sub('[^a-zA-Z ]', ' ', text)

        # Remove extra spaces
        text = ' '.join(text.split())

        # Convert to lowercase
        return text.lower()
    return ""

# Apply cleaning to both input and output texts
df['headline'] = df['headline'].apply(clean_text)

### Code Explanation :

 TEXT CLEANING + FORMATTING FOR SEQ2SEQ
-> import re
   -> Imports Python’s built-in Regular Expression (regex) module.
   -> Reason : Required for pattern-based text substitution and cleaning (e.g., removing non-alphabet characters).
   -> Purpose: Helps in cleaning the raw text by removing special characters, numbers, and extra whitespace

-> def clean_text(text):
   -> Defines a custom function named clean_text that takes a single argument text.
   -> Reason : Modularizes the text cleaning process, so it can be reused on multiple text fields.
   -> Purpose: To ensure all input and output text is cleaned in a consistent way before feeding it to the model.

-> if isinstance(text, str):
   -> Checks if text is a string.
   -> Reason : Prevents errors if text is NaN or another non-string data type.
   -> Purpose: Defensive programming — ensures cleaning is applied only on valid strings.

-> text = re.sub('[^a-zA-Z ]', ' ', text)
   -> Replaces everything except alphabets and spaces with a space.
   -> Reason : Removes numbers, punctuation, special characters (e.g., .,?!@).
   -> Purpose: Keeps the text simple and clean — only words. Models like LSTM/GRU perform better with cleaner data.

-> text = ' '.join(text.split())
   ->  Breaks the text into words (.split()), removes extra whitespace, and joins it back with single spaces.
   -> Reason : Handles multiple spaces or irregular spacing.
   -> Purpose: Ensures consistent word separation and formatting.

-> return text.lower()
   -> Converts all characters in the text to lowercase.
   -> Reason : To reduce vocabulary size. E.g., India and india should be treated the same.
   -> Purpose: Simplifies training and improves model generalization.

-> return ""
  ->  If the input text is not a string, return an empty string.
  -> Prevents the function from failing on None or non-text inputs.
  -> Purpose: Robustness.

-> data['headline'] = data['headline'].apply(clean_text)
   -> Applies clean_text() to every row in the headline column.
   -> Reason : Prepares target output (headline) for model training.
   -> Purpose: Ensures the decoder learns from clean data.


In [None]:
# Step-4 : Tokenization and Padding
# Importing necessary libraries for tokenization and padding

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df['headline'])

word_index = tokenizer.word_index
vocab_size = len(word_index)+1

max_sent_len = max([len(x) for x in df['headline']])
# Convert headlines to sequences
sequences = tokenizer.texts_to_sequences(df['headline'])
print('sequneces',sequences[:5])           
# Pad sequences to ensure uniform input size
# 
# This is important for training neural network 
padded_sequences = pad_sequences(sequences, maxlen=max_sent_len, padding='pre') 
print('padded_sequences', padded_sequences[:5]) 
# Convert labels to numpy array
labels = df['is_sarcastic'].values 
print('labels', labels[:5])     


sequneces [[14808, 352, 3155, 6257, 2143, 2, 641, 1123], [7237, 1630, 732, 3156, 52, 233, 12, 1844, 984, 8, 1430, 1986, 1779], [872, 38, 10879, 14809, 618, 1478], [10880, 1533, 6258, 4519, 16, 145, 1, 147], [273, 486, 298, 923, 1, 565, 527, 3832, 6259]]
padded_sequences [[    0     0     0 ...     2   641  1123]
 [    0     0     0 ...  1430  1986  1779]
 [    0     0     0 ... 14809   618  1478]
 [    0     0     0 ...   145     1   147]
 [    0     0     0 ...   527  3832  6259]]
labels [1 0 0 1 1]


### Code Explanation :

-> import tensorflow
   -> Imports the TensorFlow library.
   -> Reason : TensorFlow is used to build and train deep learning models.
   -> Purpose: Required for using Keras layers, preprocessing tools, and models.

-> from tensorflow.keras.preprocessing.text import Tokenizer:Imports the Tokenizer class from Keras.
   -> Reason : It tokenizes (converts) text into sequences of integers.
   -> Purpose: To convert words to indices so that neural networks can process them.

-> from tensorflow.keras.preprocessing.sequence import pad_sequences
   -> Purpose: This imports the pad_sequences function, which is used to ensure that all input sequences (lists 
               of token IDs) are the same length, which is required for training most deep learning models.

-> tokenizer = Tokenizer()
-> tokenizer.fit_on_texts(df['headline'])
   -> Tokenizer(): Creates a tokenizer object. It’s used to convert text to a sequence of integers (each integer represents a word).
   -> fit_on_texts(): Goes through all the headlines and builds a vocabulary (i.e., a mapping from each word to a unique integer).

-> word_index = tokenizer.word_index
-> vocab_size = len(word_index) + 1
   -> word_index: A dictionary mapping words to their integer index.
   -> vocab_size: Total number of unique words in your vocabulary.
   -> +1 is added because Keras reserves index 0 (used for padding), so the actual vocab size needs to include that.

-> max_sent_len = max([len(x) for x in df['headline']])
   -> Purpose: Determines the maximum number of words in any single headline.
   -> Reason : You need this to know how long to pad your sequences so that they’re all the same length.

-> sequences = tokenizer.texts_to_sequences(df['headline'])
-> print('sequences', sequences[:5])
   -> texts_to_sequences(): Converts each headline (sentence) into a list of integers where each integer represents a word based on the tokenizer’s vocabulary.
   -> Example: "the dog barked" → [1, 7, 56]

-> padded_sequences = pad_sequences(sequences, maxlen=max_sent_len, padding='pre')
-> print('padded_sequences', padded_sequences[:5])
   -> pad_sequences(): Makes all sequences the same length by padding them.
   -> maxlen=max_sent_len: Ensures every sequence is padded to the length of the longest sentence.
   -> padding='pre': Adds zeros at the beginning of shorter sequences (e.g., [0, 0, 1, 7, 56]).
   -> This is necessary because neural networks require inputs of uniform shape.

-> labels = df['is_sarcastic'].values
-> print('labels', labels[:5])
   -> df['is_sarcastic']: This is the label column (0 = not sarcastic, 1 = sarcastic).
   -> values: Converts the column to a NumPy array for training.
   -> We need this format for model training in TensorFlow/Keras.

In [None]:
# Step-5 : Positional Encoding
import numpy as np

def get_positional_encoding(maxlen, d_model):
    pos = np.arange(maxlen)[:, np.newaxis]
    i = np.arange(d_model)[np.newaxis, :]
    angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))
    angle_rads = pos * angle_rates
    angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
    angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
    pos_encoding = angle_rads[np.newaxis, ...]
    return tf.cast(pos_encoding, dtype=tf.float32)

### Code Explanation :

-> Purpose of Positional Encoding:
   -> Transformers have no recurrence or convolution, so they need an explicit way to understand word order (position). Positional encoding injects position information into the input embeddings.

-> def get_positional_encoding(maxlen, d_model):
   -> maxlen: maximum number of tokens in a sequence (e.g., 30).
   -> d_model: embedding dimension (e.g., 64 or 128).
   -> The goal is to return a tensor of shape: (1, maxlen, d_model).

-> pos = np.arange(maxlen)[:, np.newaxis]
   -> np.arange(maxlen): creates a range like [0, 1, 2, ..., maxlen-1] → each row is a position index.
   -> [:, np.newaxis]: reshapes to a column vector with shape (maxlen, 1).

-> i = np.arange(d_model)[np.newaxis, :]
-> Creates the list [0, 1, 2, ..., d_model-1] as a row.
-> Shape becomes (1, d_model).
-> Example (if d_model = 4):
-> i = [[0, 1, 2, 3]]

-> Compute angle rates (formula from the original Transformer paper)
   -> angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))
   -> This formula ensures that each dimension of the positional encoding follows a different frequency scale.
   -> (i // 2): Even and odd indices share the same base frequency.
   -> np.power(...): Applies the exponential denominator for smooth variation.

-> Multiply positions with angle rates
   -> angle_rads = pos * angle_rates
   -> Element-wise multiplication.
   -> Output shape: (maxlen, d_model)
   -> This creates the raw angle values that will be passed through sin and cos.

-> Apply sine to even indices
   -> angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
   -> 0::2 selects all even columns (0, 2, 4, ...).
   -> Applies sin() to these values.

->  Apply cosine to odd indices
    -> angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
    -> 1::2 selects all odd columns (1, 3, 5, ...).
    -> Applies cos() to these values.

-> Add batch dimension and cast to Tensor
   -> pos_encoding = angle_rads[np.newaxis, ...]
   -> return tf.cast(pos_encoding, dtype=tf.float32)
   -> np.newaxis adds a batch dimension → shape becomes (1, maxlen, d_model)
   -> tf.cast(...): Converts to a TensorFlow tensor of float32, ready to use in models.

-> Final Output:
   -> A Tensor of shape (1, maxlen, d_model) containing sinusoidal positional encodings. This is added to the word embeddings before feeding into a Transformer layer.

In [None]:
# Step 6: Transformer Encoder Block
from tensorflow.keras import layers

class TransformerBlock(layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
        self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
        self.ffn = tf.keras.Sequential([
            layers.Dense(ff_dim, activation="relu"),
            layers.Dense(embed_dim),
        ])
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)

    def call(self, inputs, training):
        attn_output = self.att(inputs, inputs)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(inputs + attn_output)

        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        return self.layernorm2(out1 + ffn_output)


### Code Explanation :

-> Purpose of Transformer Encoder Block:
   -> The Transformer encoder block captures contextual relationships between words using multi-head self-attention, followed by a feed-forward network (FFN). Each part includes normalization and residual connections to help with gradient flow and training.

-> from tensorflow.keras import layers
   -> Import Keras layers to build custom components like attention, normalization, etc.

-> Class Definition and Initialization
   -> class TransformerBlock(layers.Layer):
   -> Creating a custom Keras layer that behaves like a Transformer encoder block.
-> def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        super().__init__()
   -> embed_dim: Dimensionality of the embeddings (e.g., 64 or 128).
   -> num_heads: Number of attention heads.
   -> ff_dim: Hidden layer size in the feed-forward network.
   -> rate: Dropout rate to prevent overfitting.

-> Multi-Head Attention Layer(att)
   -> self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
   -> Creates a Multi-Head Self-Attention layer.
   -> This lets the model look at the entire sequence at once and learn relationships between words, even far apart.
   -> key_dim=embed_dim: dimension of each attention head's key vectors.

-> Feed Forward Network (FFN)
        self.ffn = tf.keras.Sequential([
            layers.Dense(ff_dim, activation="relu"),
            layers.Dense(embed_dim),
        ])
 -> A small 2-layer feed-forward network:
 -> First Dense layer expands dimension (ff_dim) and applies ReLU.
 -> Second Dense layer projects it back to embed_dim.

-> Layer Normalization (helps training)
        self.layernorm1 = layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = layers.LayerNormalization(epsilon=1e-6)
   -> Normalizes the input to each sub-layer (attention and FFN) to stabilize and speed up training.
   -> epsilon=1e-6: a small number to avoid division by zero.

-> Dropout (for regularization)
        self.dropout1 = layers.Dropout(rate)
        self.dropout2 = layers.Dropout(rate)
   -> Dropout helps prevent overfitting by randomly turning off some neurons during training.
   -> Applied after attention and FFN.

-> Call Method (Forward Pass)
    def call(self, inputs, training):
   -> inputs: Tensor with shape (batch_size, seq_len, embed_dim)
   -> training: Boolean flag for enabling/disabling dropout.

-> Step 1: Self-Attention
        attn_output = self.att(inputs, inputs)
   -> Applies multi-head attention to the input. Since it's self-attention, both query and value are the same (inputs).

-> Step 2: Dropout after Attention
        attn_output = self.dropout1(attn_output, training=training)
   -> Apply dropout only during training.

-> Step 3: First Residual Connection + LayerNorm
        out1 = self.layernorm1(inputs + attn_output)
   -> Residual connection: adds original inputs back to attention output.
   -> Layer normalization helps with training stability.

-> Step 4: Feed Forward Network
    ffn_output = self.ffn(out1)
   -> Passes the normalized output into a feed-forward network to further transform features.

-> Step 5: Dropout after FFN
        ffn_output = self.dropout2(ffn_output, training=training)
   -> Dropout after the FFN layer to prevent overfitting.

->  Step 6: Second Residual Connection + LayerNorm
        return self.layernorm2(out1 + ffn_output)
    -> Adds the FFN output back to out1 (which came from the attention block).
    ->Final LayerNorm ensures smooth training.

In [None]:
# Step 7: Build the full model
embed_dim = 64
num_heads = 2
ff_dim = 64

inputs = layers.Input(shape=(max_sent_len,))
embedding_layer = layers.Embedding(vocab_size, embed_dim)(inputs)
pos_encoding = get_positional_encoding(max_sent_len, embed_dim)
x = embedding_layer + pos_encoding

transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = transformer_block(x)

x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()


Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 911)]             0         
                                                                 
 embedding_1 (Embedding)     (None, 911, 64)           1658624   
                                                                 
 tf.__operators__.add (TFOp  (None, 911, 64)           0         
 Lambda)                                                         
                                                                 
 transformer_block (Transfo  (None, 911, 64)           41792     
 rmerBlock)                                                      
                                                                 
 global_average_pooling1d (  (None, 64)                0         
 GlobalAveragePooling1D)                                         
                                                             

### Code Explanation : 

-> embed_dim: Dimension of the word embeddings and attention vectors.
-> num_heads: Number of attention heads in the Transformer block.
-> ff_dim: Dimension of the feed-forward network inside the Transformer.

-> inputs = layers.Input(shape=(max_sent_len,))
   -> Defines the input layer of the model.
   -> Input shape is (max_sent_len,) — a sequence of integers (tokenized and padded text).

-> embedding_layer = layers.Embedding(vocab_size, embed_dim)(inputs)
   -> Creates an embedding layer that converts word indices into dense vectors of size embed_dim.
   -> vocab_size: Total number of unique tokens.
   -> embed_dim: Each word is mapped to a 64-dimensional vector.
   ->  Output shape after this: (batch_size, max_sent_len, embed_dim)

-> pos_encoding = get_positional_encoding(max_sent_len, embed_dim)
   -> Calls your previously defined function to compute positional encodings.
   -> Positional encoding adds information about word positions (since the Transformer doesn't have a built-in sense of order like RNNs or CNNs).

-> x = embedding_layer + pos_encoding
   -> Adds the positional encoding to the word embeddings.
   -> Helps the model understand the order of words in the sequence.

-> transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
   -> x = transformer_block(x)
   -> Instantiates a custom Transformer encoder block and applies it to x.
   -> This step uses multi-head attention to learn relationships between all tokens in the sequence.

-> x = layers.GlobalAveragePooling1D()(x)
   -> Averages the sequence of vectors (one vector per word) into a single vector (per example).
   -> This reduces dimensionality and makes it easier to pass into dense layers.

-> x = layers.Dropout(0.1)(x)
   -> Applies dropout to prevent overfitting.

-> x = layers.Dense(20, activation="relu")(x)
   -> A fully connected (dense) layer with 20 neurons.
   -> Uses ReLU activation to introduce non-linearity.

-> x = layers.Dropout(0.1)(x)
   -> Another dropout layer for regularization.

-> outputs = layers.Dense(1, activation="sigmoid")(x)
   -> Final output layer with 1 neuron and a sigmoid activation.
   -> Outputs a probability between 0 and 1 — perfect for binary classification (sarcastic vs. not sarcastic).

-> model = tf.keras.Model(inputs=inputs, outputs=outputs)
   -> Defines the full Keras Model with specified inputs and outputs.

-> model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
   -> Compiles the model for training.

-> optimizer="adam": Adaptive optimizer — balances performance and speed.
-> loss="binary_crossentropy": Used for binary classification.
-> metrics=["accuracy"]: Track accuracy during training and validation.
-> model.summary()
   -> Prints a summary of the model architecture — including layer names, shapes, and parameter counts.



In [None]:
# Step 8: Train the model (runs fast on CPU)
history = model.fit(padded_sequences, labels, epochs=3, batch_size=64, validation_split=0.2)

Epoch 1/3
Epoch 2/3
Epoch 3/3


### Code Explanation : 

#### Train the Model : 

-> history = model.fit(padded_sequences, labels, epochs=3, batch_size=64, validation_split=0.2)
   -> Purpose: Train the model using the input data and labels.

-> model.fit(...): This function trains the model.
-> padded_sequences: Input data (numerical representation of headlines, padded to same length).
-> labels: Ground truth (0 = not sarcastic, 1 = sarcastic).
-> epochs=3: Number of times the model will go through the entire dataset.
-> batch_size=64: Number of samples the model will process before updating weights.
-> validation_split=0.2: 20% of the data will be used for validation (to monitor performance on unseen data  during training).
-> history: Stores training & validation accuracy/loss per epoch (used for plotting or analysis).

In [None]:
# Step 9: Evaluate and predict
loss, acc = model.evaluate(padded_sequences, labels)
print(f"Accuracy: {acc:.2f}")

sample_texts = [
    "this is totally what I expected",
    "the food was amazing",
    "I just love being ignored",
    "What a fantastic waste of time"
]
sample_seq = tokenizer.texts_to_sequences(sample_texts)
sample_pad = pad_sequences(sample_seq, maxlen=max_sent_len, padding='post')
predictions = model.predict(sample_pad)

for text, pred in zip(sample_texts, predictions):
    print(f"{text} --> Sarcastic: {pred[0] > 0.5:.0f} (Confidence: {pred[0]:.2f})")


Accuracy: 0.52
this is totally what I expected --> Sarcastic: 0 (Confidence: 0.48)
the food was amazing --> Sarcastic: 0 (Confidence: 0.48)
I just love being ignored --> Sarcastic: 0 (Confidence: 0.48)
What a fantastic waste of time --> Sarcastic: 0 (Confidence: 0.48)


### Code Explanation : 

-> loss, acc = model.evaluate(padded_sequences, labels)
   -> Purpose: Check how well the trained model performs on the full dataset.
-> model.evaluate(...): Returns the loss and accuracy on the given data.
-> padded_sequences: Input data used for evaluation.
-> labels: Ground truth labels.
-> loss: How bad the model's predictions are (lower is better).
-> acc: Accuracy — percentage of correct predictions.

-> print(f"Accuracy: {acc:.2f}")
   -> Purpose: Print the model's overall accuracy in a readable format (e.g., Accuracy: 0.91 for 91%).

->  Make Predictions on New Sentences
sample_texts = [
    "this is totally what I expected",
    "the food was amazing",
    "I just love being ignored",
    "What a fantastic waste of time"
]
-> Purpose: Define new sample headlines to check if the model can classify them as sarcastic or not sarcastic.
   -> These examples contain both genuine and sarcastic phrases.

-> sample_seq = tokenizer.texts_to_sequences(sample_texts)
   -> Purpose: Convert each text sample into a sequence of integers using the same tokenizer used for training.
   -> This step maps each word to its corresponding index in the vocabulary.
   -> Words not seen during training are replaced with an Out-Of-Vocabulary (OOV) token if configured.

-> sample_pad = pad_sequences(sample_seq, maxlen=max_sent_len, padding='post')
   -> Purpose: Pad sequences so all input samples have the same length (max_sent_len) as during training.
   -> padding='post': Adds zeros after the sentence if it’s shorter than the max length.

-> predictions = model.predict(sample_pad)
   -> Purpose: Use the trained model to predict sarcasm probabilities for the sample inputs.
   -> Returns an array of probabilities between 0 and 1.
   -> Closer to 1 → likely sarcastic.
   -> Closer to 0 → likely not sarcastic.

->  Display Predictions Clearly
for text, pred in zip(sample_texts, predictions):
    print(f"{text} --> Sarcastic: {pred[0] > 0.5:.0f} (Confidence: {pred[0]:.2f})")
-> Purpose: Loop through each sample and print whether the model classifies it as sarcastic or not.
   -> pred[0] > 0.5: If prediction > 0.5, we consider it sarcastic (1), else not (0).
   -> :.0f: Formats the sarcastic label as 0 or 1.
   -> :.2f: Formats confidence score to 2 decimal places.
-> Example output:
this is totally what I expected --> Sarcastic: 1 (Confidence: 0.87)
the food was amazing --> Sarcastic: 0 (Confidence: 0.10)


In [8]:
lstm_model = tf.keras.Sequential([
    layers.Embedding(vocab_size, 64, input_length=max_sent_len),
    layers.Bidirectional(layers.LSTM(64)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

lstm_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
lstm_model.fit(padded_sequences, labels, epochs=5, validation_split=0.2)



Epoch 1/5


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x1b3cc075330>

### Code Explanation : 

-> Model Definition
-> lstm_model = tf.keras.Sequential([
   ->  Purpose: Start building a sequential model, where layers are stacked one after the other in a linear fashion.

-> layers.Embedding(vocab_size, 64, input_length=max_sent_len),
   -> Purpose: This layer converts word indices into dense vectors (embeddings).
   -> vocab_size: Total number of unique words in the dataset (size of vocabulary).
   -> 64: The dimension of the embedding for each word (each word becomes a 64-length vector).
   -> input_length=max_sent_len: Specifies that each input sequence has max_sent_len tokens (padded if shorter).
   -> Reason : Neural networks can't operate directly on word strings. This turns each word index into a vector that captures semantic meaning.

-> layers.Bidirectional(layers.LSTM(64)),
   -> Purpose: This adds a Bidirectional LSTM layer with 64 units.
   -> LSTM(64): A standard Long Short-Term Memory (LSTM) unit with 64 memory cells.
   -> Bidirectional(...): Runs the LSTM both forward and backward, capturing information from past and future context.
   -> Reason : Bidirectional LSTMs are powerful for understanding text, especially when word order and context matter.
-> layers.Dense(64, activation='relu'),
   ->  Purpose: A fully connected (Dense) layer with 64 neurons and ReLU activation.
   -> activation='relu': Introduces non-linearity, helping the model learn complex relationships.
   -> Reason : After LSTM extracts temporal features, this layer helps in further learning patterns before the final prediction.

-> layers.Dense(1, activation='sigmoid')])
   -> Purpose: The final output layer.
   -> 1: Outputting a single number between 0 and 1.
   -> activation='sigmoid': Used for binary classification (sarcastic or not).
   -> Reason : The sigmoid output can be interpreted as a probability of the input being sarcastic.

->  Model Compilation
    -> lstm_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    -> Purpose: Configure the model for training.
    -> optimizer='adam': Efficient optimization algorithm for faster convergence.
    -> loss='binary_crossentropy': Appropriate loss function for binary classification tasks.
    -> metrics=['accuracy']: Track the model's accuracy during training and evaluation.

-> Model Training
   -> lstm_model.fit(padded_sequences, labels, epochs=5, validation_split=0.2)
   -> Purpose: Train the model using your data.
   -> padded_sequences: Input data (tokenized and padded headlines).
   -> labels: Corresponding sarcasm labels (0 or 1).
   -> epochs=5: Train for 5 full passes through the data.
   -> validation_split=0.2: Use 20% of the data to validate model performance after each epoch.



In [9]:
loss, acc = lstm_model.evaluate(padded_sequences, labels)
print(f"Accuracy: {acc:.2f}")

sample_texts = [
    "this is totally what I expected",
    "the food was amazing",
    "I just love being ignored",
    "What a fantastic waste of time"
]
sample_seq1 = tokenizer.texts_to_sequences(sample_texts)
sample_pad1 = pad_sequences(sample_seq1, maxlen=max_sent_len, padding='post')
predictions = lstm_model.predict(sample_pad1)

for text, pred in zip(sample_texts, predictions):
    print(f"{text} --> Sarcastic: {pred[0] > 0.5:.0f} (Confidence: {pred[0]:.2f})")

Accuracy: 0.96
this is totally what I expected --> Sarcastic: 0 (Confidence: 0.06)
the food was amazing --> Sarcastic: 0 (Confidence: 0.06)
I just love being ignored --> Sarcastic: 0 (Confidence: 0.06)
What a fantastic waste of time --> Sarcastic: 0 (Confidence: 0.06)


### Code Explanation :

-> Model Evaluation
   -> loss, acc = lstm_model.evaluate(padded_sequences, labels)
   -> Purpose: Evaluate the model’s performance on the entire dataset.
   -> padded_sequences: The input headlines (already tokenized and padded).
   -> labels: The corresponding binary labels (0 = not sarcastic, 1 = sarcastic).
   -> evaluate(...) returns:
   -> loss: How far the predictions are from the actual labels.
   -> acc: Accuracy of the model.

-> print(f"Accuracy: {acc:.2f}")
   ->  Purpose: Prints the accuracy as a percentage with 2 decimal places.
   ->  Example: Accuracy: 0.91 means 91% correct predictions.

-> Prediction on Sample Inputs
sample_texts = [
    "this is totally what I expected",
    "the food was amazing",
    "I just love being ignored",
    "What a fantastic waste of time"
]

 -> Purpose: These are custom sentences you want to test for sarcasm.
 -> A mix of clearly sarcastic and sincere phrases.

-> sample_seq1 = tokenizer.texts_to_sequences(sample_texts)
  -> Purpose: Converts the raw text into sequences of word indices using the same tokenizer you trained on.
  -> "this is totally what I expected" → [42, 7, 58, 13, 31, 267] (for example)
  -> Reason : Neural networks don’t work directly with text; they need numerical input.

-> sample_pad1 = pad_sequences(sample_seq1, maxlen=max_sent_len, padding='post')
   -> Purpose: Pads/truncates each sequence so they all have the same length as your training input (max_sent_len).
   -> padding='post': Adds zeros after the actual word indices if the sentence is too short.
   -> Reason : Models expect a fixed-size input for each example.

-> predictions = lstm_model.predict(sample_pad1)
   -> Purpose: Uses the trained model to predict sarcasm for each padded input.
   -> Returns a list of values between 0 and 1 for each input sentence.
   -> Example: [0.91], [0.02], [0.88], [0.95]
   -> The closer to 1 → more sarcastic. Closer to 0 → more sincere.

-> Print Predictions
for text, pred in zip(sample_texts, predictions):
    print(f"{text} --> Sarcastic: {pred[0] > 0.5:.0f} (Confidence: {pred[0]:.2f})")
  -> Purpose: Nicely formats and displays the prediction for each input sentence.
  -> pred[0] > 0.5: Checks if the prediction is above 0.5 (sarcastic if True).
  ->:.0f: Rounds the boolean to 0 (no) or 1 (yes).
  -> :.2f: Shows the prediction confidence up to two decimals.
-> Example Output:
this is totally what I expected --> Sarcastic: 1 (Confidence: 0.91)
the food was amazing --> Sarcastic: 0 (Confidence: 0.08)
