# Text Summarization with Seq2Seq + Attention

In this notebook, we’ll build a **sequence-to-sequence model with attention** for abstractive text summarization.

- **Input:** A long text (e.g., news article)
- **Output:** A shorter version capturing the main points

We’ll use a small dataset to demonstrate the approach.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding

print("TensorFlow version:", tf.__version__)

## Sample Dataset
We’ll use a **toy dataset** for demonstration.

In practice, you would use datasets like **CNN/DailyMail**, **Gigaword**, or **SAMSum**.

In [None]:
texts = [
    "The stock market crashed yesterday due to unexpected economic news.",
    "The football team won the championship after a thrilling match.",
    "Scientists discovered a new planet that might support life."
]

summaries = [
    "Stock market crash",
    "Team wins championship",
    "New planet discovered"
]

## Preprocessing
- Tokenize text and summaries
- Pad sequences

In [None]:
num_words = 1000
max_text_len = 20
max_summary_len = 5

tokenizer_text = Tokenizer(num_words=num_words)
tokenizer_text.fit_on_texts(texts)
X = tokenizer_text.texts_to_sequences(texts)
X = pad_sequences(X, maxlen=max_text_len, padding='post')

tokenizer_summary = Tokenizer(num_words=num_words)
tokenizer_summary.fit_on_texts(summaries)
y = tokenizer_summary.texts_to_sequences(summaries)
y = pad_sequences(y, maxlen=max_summary_len, padding='post')

print("Text shape:", X.shape)
print("Summary shape:", y.shape)

## Seq2Seq Model
- Encoder: LSTM that processes the input text
- Decoder: LSTM that generates the summary
- Dense layer with softmax for word prediction

In [None]:
embedding_dim = 50
latent_dim = 100

# Encoder
encoder_inputs = Input(shape=(max_text_len,))
enc_emb = Embedding(num_words, embedding_dim, mask_zero=True)(encoder_inputs)
encoder_lstm, state_h, state_c = LSTM(latent_dim, return_state=True)(enc_emb)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(max_summary_len,))
dec_emb = Embedding(num_words, embedding_dim, mask_zero=True)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb, initial_state=encoder_states)
decoder_dense = Dense(num_words, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.summary()

## Train the Model
(On real datasets this would take hours. Here we just show the setup.)

In [None]:
decoder_target_data = np.expand_dims(y, -1)
history = model.fit([X, y], decoder_target_data, batch_size=2, epochs=5)

## Inference
Normally we would:
- Use the encoder to encode input text.
- Use the decoder step-by-step to generate words until an end token is predicted.

But since this is a demo, we’ll stop here.