# Sequence-to-Sequence Models with Attention for Machine Translation

## üìö Learning Objectives

By completing this notebook, you will:
- Understand sequence-to-sequence (seq2seq) architectures
- Implement attention mechanisms for seq2seq models
- Build a machine translation model using seq2seq with attention
- Apply encoder-decoder architectures for sequence tasks

## üîó Prerequisites

- ‚úÖ Unit 4: RNNs and LSTMs completed
- ‚úÖ Understanding of encoder-decoder architectures
- ‚úÖ Python, TensorFlow/Keras knowledge

---

## Official Structure Reference

This notebook covers practical activities from **Course 07, Unit 4**:
- Implementing machine translation model using seq2seq with attention mechanisms
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 4 Practical Content

---

## Introduction

**Sequence-to-Sequence (seq2seq)** models use encoder-decoder architectures to map sequences to sequences. **Attention mechanisms** help models focus on relevant parts of the input when generating output.


In [None]:
import numpy as np

# Try importing TensorFlow/Keras
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import LSTM, Dense, Embedding, Input, Attention
    HAS_TF = True
    print("‚úÖ TensorFlow/Keras available!")
except ImportError:
    HAS_TF = False
    print("‚ö†Ô∏è  TensorFlow not available. Install with: pip install tensorflow")

print("‚úÖ Libraries imported!")


## Part 1: Understanding Seq2Seq Architecture


In [None]:
print("=" * 60)
print("Sequence-to-Sequence (Seq2Seq) Architecture")
print("=" * 60)

print("\nKey Components:")
print("  1. Encoder: Processes input sequence")
print("  2. Decoder: Generates output sequence")
print("  3. Attention: Connects encoder and decoder")
print("  4. Context Vector: Summary of input sequence")

print("\n" + "-" * 60)
print("Seq2Seq with Attention:")
print("-" * 60)
print("  Encoder: Input sequence ‚Üí Hidden states")
print("  Attention: Hidden states ‚Üí Attention weights")
print("  Decoder: Attention weights + Previous output ‚Üí Next token")
print("  Output: Generated sequence")

print("\n‚úÖ Attention allows model to focus on relevant input parts!")


## Part 2: Implementing Seq2Seq with Attention for Translation


In [None]:
if HAS_TF:
    print("=" * 60)
    print("Seq2Seq with Attention Implementation")
    print("=" * 60)
    
    print("\nImplementation Structure:")
    print("""
    # Simplified Seq2Seq with Attention
    
    # Encoder
    encoder_inputs = Input(shape=(max_len,))
    encoder_embedding = Embedding(vocab_size, embedding_dim)(encoder_inputs)
    encoder_lstm = LSTM(hidden_units, return_sequences=True, return_state=True)
    encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
    encoder_states = [state_h, state_c]
    
    # Decoder
    decoder_inputs = Input(shape=(max_len,))
    decoder_embedding = Embedding(vocab_size, embedding_dim)(decoder_inputs)
    decoder_lstm = LSTM(hidden_units, return_sequences=True, return_state=True)
    decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
    
    # Attention
    attention = Attention()
    context_vector = attention([decoder_outputs, encoder_outputs])
    
    # Output
    decoder_concat = tf.concat([decoder_outputs, context_vector], axis=-1)
    decoder_dense = Dense(vocab_size, activation='softmax')
    output = decoder_dense(decoder_concat)
    
    # Model
    model = Model([encoder_inputs, decoder_inputs], output)
    model.compile(optimizer='adam', loss='categorical_crossentropy')
    """)
    
    print("\n‚úÖ Model Architecture:")
    print("  - Encoder: LSTM processes source sequence")
    print("  - Decoder: LSTM generates target sequence")
    print("  - Attention: Focuses on relevant encoder states")
    print("  - Output: Probability distribution over target vocabulary")
    
    print("\n‚úÖ Applications:")
    print("  - Machine Translation (EN‚ÜíFR, EN‚ÜíAR, etc.)")
    print("  - Text Summarization")
    print("  - Question Answering")
    print("  - Chatbot dialogue generation")
else:
    print("=" * 60)
    print("Seq2Seq with Attention (Installation Required)")
    print("=" * 60)
    print("""
    To implement seq2seq with attention:
    
    1. Install TensorFlow:
       pip install tensorflow
    
    2. Build encoder-decoder architecture:
       - Encoder: LSTM/GRU to encode input
       - Decoder: LSTM/GRU to decode output
       - Attention: Connect encoder and decoder
    
    3. Train on parallel corpora:
       - Input sequences (source language)
       - Target sequences (target language)
    
    4. Use attention to focus on relevant input parts
    """)


## Summary

### Key Concepts:
1. **Seq2Seq Architecture**: Encoder-decoder for sequence tasks
   - **Encoder**: Processes input sequence into hidden states
   - **Decoder**: Generates output sequence from hidden states
   - **Attention**: Allows decoder to focus on relevant encoder states

2. **Attention Mechanism**: 
   - Computes attention weights for each encoder state
   - Creates context vector from weighted encoder states
   - Improves translation quality for long sequences

3. **Machine Translation**:
   - Input: Source language sequence
   - Output: Target language sequence
   - Train on parallel corpora (aligned sentence pairs)

### Applications:
- Machine Translation (Google Translate, DeepL)
- Text Summarization
- Question Answering
- Dialogue Systems

**Reference:** Course 07, Unit 4: "Deep Learning for NLP" - Seq2seq with attention practical content
