## 1. What are Vanilla Autoencoders?
**Answer:** Vanilla autoencoders are a type of neural network used for unsupervised learning. They consist of two parts: an encoder, which compresses the input data into a lower-dimensional representation (latent space), and a decoder, which reconstructs the original data from this representation. The goal is to minimize the difference between the input and the reconstructed output.

---

## 2. What are Sparse Autoencoders?
**Answer:** Sparse autoencoders are a variant of vanilla autoencoders that impose a sparsity constraint on the hidden units during training. This constraint encourages the model to learn a more efficient representation by activating only a few neurons in the hidden layer, which can lead to more meaningful features.

**Key Concepts:**
- **Sparsity Penalty:** Often implemented using an L1 regularization term or KL divergence to enforce sparsity.

---

## 3. What are Denoising Autoencoders?
**Answer:** Denoising autoencoders are designed to reconstruct the original input from a corrupted version of it. During training, noise is added to the input data, and the model learns to remove this noise, thereby improving the robustness of the learned representations.

**Purpose:**
- To make the model more robust to noise and improve generalization.

---

## 4. What are Convolutional Autoencoders?
**Answer:** Convolutional autoencoders use convolutional layers instead of fully connected layers in both the encoder and decoder parts. They are particularly well-suited for image data, as they can capture spatial hierarchies and reduce the number of parameters.

**Application:**
- Image denoising, image compression, and feature extraction from images.

---

## 5. What are Stacked Autoencoders?
**Answer:** Stacked autoencoders are deep neural networks formed by stacking multiple layers of autoencoders. Each layer is trained to encode the representation from the previous layer, allowing the network to learn more complex and hierarchical features.

**Training Method:**
- Often trained layer by layer in a greedy fashion to ensure proper initialization before fine-tuning the entire network.

---

## 6. Explain How to Generate Sentences Using LSTM Autoencoders
**Answer:** LSTM autoencoders can be used to generate sentences by first encoding an input sentence into a fixed-length vector using an LSTM encoder and then decoding this vector into a sequence of words using an LSTM decoder. During generation, the decoder can be conditioned on previously generated words to create coherent sentences.

**Steps:**
1. **Encoding:** Convert the input sentence into a latent representation.
2. **Decoding:** Use the latent representation to generate words one by one until an end-of-sentence token is produced.

---

## 7. Explain Extractive Summarization
**Answer:** Extractive summarization is a technique that involves selecting and extracting key sentences or phrases directly from the source text to create a summary. The goal is to retain the most important information from the original content without generating new text.

**Techniques:**
- **Sentence Ranking:** Rank sentences based on their importance.
- **TextRank Algorithm:** A graph-based ranking algorithm for extracting key sentences.

---

## 8. Explain Abstractive Summarization
**Answer:** Abstractive summarization involves generating new sentences that capture the essence of the source text, rather than just selecting existing sentences. This approach often requires a deeper understanding of the content and the ability to paraphrase and generalize information.

**Techniques:**
- **Sequence-to-Sequence Models:** Use encoder-decoder architectures to generate summaries.
- **Transformer Models:** Leverage self-attention mechanisms for more coherent summaries.

---

## 9. Explain Beam Search
**Answer:** Beam search is a heuristic search algorithm used in sequence generation tasks like machine translation and text summarization. It explores multiple possible sequences at each time step and keeps the top-k sequences (beams) based on their cumulative probabilities.

**Key Concepts:**
- **Beam Width:** The number of sequences retained at each step.
- **Pruning:** Discarding lower-probability sequences to focus on more promising candidates.

---

## 10. Explain Length Normalization
**Answer:** Length normalization is a technique used in sequence generation to adjust the probabilities of sequences based on their length. This prevents the model from favoring shorter sequences that might have higher raw probabilities due to the multiplication of probabilities for each word.

**Equation:**
\[ \text{Normalized Score} = \frac{1}{L^\alpha} \cdot \text{Score} \]
where \( L \) is the length of the sequence and \( \alpha \) is a hyperparameter.

---

## 11. Explain Coverage Normalization
**Answer:** Coverage normalization is used in sequence generation to ensure that the generated sequence covers all relevant parts of the input. It addresses the issue of over- or under-generation of certain parts by incorporating a coverage vector that tracks how much of the input has been attended to.

**Application:**
- Reduces redundancy and repetition in generated text.

---

## 12. Explain ROUGE Metric Evaluation
**Answer:** ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used to evaluate the quality of machine-generated text by comparing it to reference summaries. It measures the overlap of n-grams, word sequences, and word pairs between the generated and reference texts.

**Common Variants:**
- **ROUGE-N:** Measures n-gram overlap.
- **ROUGE-L:** Measures the longest common subsequence (LCS) between texts.
- **ROUGE-S:** Measures the overlap of skip-bigrams.

**Usage:**
- Commonly used in evaluating the performance of summarization and translation systems.
