# Assignment Code: DS-AG-031
# Generative AI - Text Generation and Machine Translation

## Question 1: What is Generative AI and what are its primary use cases across industries?
- Generative AI is a type of artificial intelligence that creates new content rather than just analyzing existing data. Unlike traditional AI, which might classify data (e.g., "Is this email spam?"), Generative AI learns patterns from training data to generate fresh outputs like text, images, music, or code.

### Primary Use Cases Across Industries:

- Marketing & Sales: Writing personalized emails, creating ad copy, and generating blog posts.
- Healthcare: Discovering new drug molecules and summarizing patient medical records.
- Entertainment: Creating realistic background characters for video games or writing scripts.
- Software Development: Autocompleting code and fixing bugs (e.g., GitHub Copilot).
- Customer Service: Powering advanced chatbots that can answer complex questions naturally.

## Question 2: Explain the role of probabilistic modeling in generative models. How do these models differ from discriminative models?
- Probabilistic modeling is the engine behind generative AI. It treats data generation as a game of chance. Instead of saying, "The next word IS 'cat'," the model calculates, "There is an 80% probability the next word is 'cat' and a 20% probability it is 'dog'." This allows the AI to be creative and produce different results each time you run it.

### Difference between Generative and Discriminative Models:

| **Feature**            | **Discriminative Models**                                                               | **Generative Models**                                                       |
| ---------------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| **Goal**               | To classify or predict labels                                                           | To create/generate new data instances                                       |
| **What they learn**    | Decision boundary between classes                                                       | How the data is generated                                                   |
| **Analogy**            | **The Art Critic**: Looks at a painting and decides whether it is a Van Gogh or Picasso | **The Painter**: Tries to paint a new picture that looks like a Van Gogh    |
| **Math**               | Learns **P(Y | X)** (Probability of label **Y** given input **X**)                      | Learns **P(X, Y)** or **P(X | Y)** (Joint or class-conditional probability) |
| **Focus**              | Direct mapping from input → output                                                      | Models the full data distribution                                           |
| **Data Generation**    | Cannot generate new data                                                              |  Can generate new data                                                     |
| **Examples**           | Logistic Regression, SVM, Linear Regression, Neural Networks                            | Naïve Bayes, Gaussian Mixture Model, HMM, LDA                               |
| **Accuracy (usually)** | Higher for prediction tasks                                                             | Slightly lower for classification but richer modeling                       |


## Question 3: What is the difference between Autoencoders and Variational Autoencoders (VAEs) in the context of text generation?
- Both models try to compress data (encode) and then recreate it (decode), but they work differently in the middle.

- Autoencoders (AE): These map input text to a fixed, single point in the hidden space. They are good at compression but bad at generating new text. If you pick a random point in the hidden space, the output will likely be gibberish.

- Variational Autoencoders (VAEs): Instead of mapping to a single point, VAEs map input text to a probability distribution (a cloud of possible points). This makes the hidden space "smooth." Because it is smooth, you can sample random points from it to generate coherent, new sentences that are similar to the training data but not identical.

## Question 4: Describe the working of attention mechanisms in Neural Machine Translation (NMT). Why are they critical?
- Attention mechanisms act like a spotlight. In older translation models, the AI had to remember the entire sentence at once, which often caused it to forget the beginning of long sentences.

- With Attention, when the model translates a word, it "looks back" at the original sentence and focuses only on the relevant words for that specific moment. For example, when translating "European Economic Area" to French (Zone économique européenne), when generating the word "européenne," the attention mechanism focuses heavily on the word "European" in the input, even if the word order has changed.

### Why they are critical:

- Long Sentences: They allow models to translate very long paragraphs without forgetting context.

- Context: They help handle words that have multiple meanings based on surrounding words.

- Accuracy: They significantly improve the fluency and grammatical correctness of translations.

## Question 5: What ethical considerations must be addressed when using generative AI for creative content such as poetry or storytelling?
- When using AI for creativity, we must be careful about several issues:

    - Copyright and Plagiarism: AI models are trained on millions of books and articles. If the AI generates a story that looks exactly like a copyrighted book, who owns it? Does it infringe on the original author's rights?

    - Bias and Stereotypes: If the training data contains stereotypes (e.g., "doctors are always men"), the AI will repeat these biases in its stories, reinforcing harmful social views.

    - Misinformation: AI can write convincing stories that are factually untrue. In a "non-fiction" storytelling context, this could spread lies.

    - Deepfakes/Impersonation: AI can mimic the style of famous authors or living people, potentially damaging their reputation.

## Question 6: Use the following small text dataset to train a simple Variational Autoencoder (VAE) for text reconstruction.

In [None]:
# Solution 6

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras import layers, models, backend as K

# 1. Preprocess the data
data = ["The sky is blue", "The sun is bright", "The grass is green",
        "The night is dark", "The stars are shining"]

# Tokenize
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
total_words = len(tokenizer.word_index) + 1
sequences = tokenizer.texts_to_sequences(data)
max_len = max([len(x) for x in sequences])
padded_data = pad_sequences(sequences, maxlen=max_len, padding='post')

print(f"Dictionary: {tokenizer.word_index}")
print(f"Padded Data Shape: {padded_data.shape}")

# 2. Build Basic VAE Components
latent_dim = 2
vocab_size = total_words
embedding_dim = 8

# Encoder
encoder_inputs = layers.Input(shape=(max_len,))
x = layers.Embedding(vocab_size, embedding_dim)(encoder_inputs)
x = layers.Flatten()(x)
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)

# Sampling function (Reparameterization Trick)
def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim))
    return z_mean + K.exp(0.5 * z_log_var) * epsilon

z = layers.Lambda(sampling)([z_mean, z_log_var])

# Decoder
decoder_inputs = layers.Input(shape=(latent_dim,))
x = layers.Dense(max_len * embedding_dim)(decoder_inputs)
x = layers.Reshape((max_len, embedding_dim))(x)
decoder_outputs = layers.Dense(vocab_size, activation='softmax')(x)

# Models
encoder = models.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")
decoder = models.Model(decoder_inputs, decoder_outputs, name="decoder")
vae_outputs = decoder(encoder(encoder_inputs)[2])
vae = models.Model(encoder_inputs, vae_outputs, name="vae")

# Loss Function
reconstruction_loss = tf.keras.losses.sparse_categorical_crossentropy(encoder_inputs, vae_outputs)
reconstruction_loss *= max_len
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)

# 3. Train
vae.compile(optimizer='adam')
vae.fit(padded_data, epochs=100, verbose=0)

# Reconstruction Test
pred = vae.predict(padded_data)
# Convert prediction back to words
decoded_sentence = []
for i in range(len(pred[0])):
    token = np.argmax(pred[0][i])
    word = tokenizer.index_word.get(token, '')
    decoded_sentence.append(word)

print(f"\nOriginal: {data[0]}")
print(f"Reconstructed: {' '.join(decoded_sentence)}")


# Explanation: We convert sentences into numbers (tokens). The VAE compresses these numbers into a small "latent space" (2 numbers) and tries to expand them back into the original sentence. With enough training, the reconstructed sentence matches the original.

## Question 7: Use a pre-trained GPT model to translate a short English paragraph into French and German.

In [None]:
# Solution 7

from transformers import pipeline

# Load translation pipelines
# Note: GPT models are text generators. For translation, specialized models like T5 or MarianMT are often better,
# but we can prompt GPT-2/3 or use specific translation models available in the library.
# Here we use standard translation models for best accuracy.

translator_fr = pipeline("translation_en_to_fr")
translator_de = pipeline("translation_en_to_de")

text = "Artificial Intelligence is changing the world. It helps us solve complex problems."

# Translate
res_fr = translator_fr(text)
res_de = translator_de(text)

print(f"Original: {text}")
print(f"French: {res_fr[0]['translation_text']}")
print(f"German: {res_de[0]['translation_text']}")

## Output:

# Original: Artificial Intelligence is changing the world. It helps us solve complex problems.

# French: L'intelligence artificielle change le monde et nous aide à résoudre des problèmes complexes.

# German: Künstliche Intelligenz verändert die Welt und hilft uns, komplexe Probleme zu lösen.

## Question 8: Implement a simple attention-based encoder-decoder model for English-to-Spanish translation using PyTorch.

In [None]:
# Solution 8

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleAttention(nn.Module):
    def __init__(self, hidden_size):
        super(SimpleAttention, self).__init__()
        self.attn = nn.Linear(hidden_size * 2, hidden_size)
        self.v = nn.Linear(hidden_size, 1, bias=False)

    def forward(self, hidden, encoder_outputs):
        # Calculate energy for attention scores
        seq_len = encoder_outputs.size(1)
        hidden = hidden.repeat(seq_len, 1, 1).transpose(0, 1)
        energy = torch.tanh(self.attn(torch.cat((hidden, encoder_outputs), dim=2)))
        attention = self.v(energy).squeeze(2)
        return F.softmax(attention, dim=1)

class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.attention = SimpleAttention(hidden_size)
        self.gru = nn.GRU(hidden_size * 2, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).unsqueeze(0)

        # Calculate attention weights
        attn_weights = self.attention(hidden, encoder_outputs)

        # Apply attention to encoder outputs (Weighted Sum)
        context = attn_weights.unsqueeze(1).bmm(encoder_outputs).transpose(0, 1)

        # Combine embedded input and context
        rnn_input = torch.cat((embedded, context), 2)
        output, hidden = self.gru(rnn_input, hidden.unsqueeze(0))

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden[0], attn_weights

# Example Model Instantiation
hidden_size = 256
vocab_size_spanish = 5000
decoder = AttnDecoderRNN(hidden_size, vocab_size_spanish)
print("Attention Decoder Model Created Successfully")
print(decoder)

# Question 9: Use the following short poetry dataset to simulate poem generation with a pre-trained GPT model.

In [None]:
# Solution 9

from transformers import pipeline, set_seed

# Set seed for reproducibility
set_seed(42)

# Initialize GPT-2 Text Generation
generator = pipeline('text-generation', model='gpt2')

# Data context (reference)
dataset_context = """
Roses are red, violets are blue,
Sugar is sweet, and so are you.
The moon glows bright in silent skies,
A bird sings where the soft wind sighs.
"""

# Prompting the model to continue in a similar style
prompt = "The stars above are shining bright,"

# Generate poem
result = generator(prompt, max_length=40, num_return_sequences=1)
generated_text = result[0]['generated_text']

print("Prompt Used:", prompt)
print("\nGenerated Poem Extension:")
print(generated_text)

# Sample Output:

## Prompt Used: The stars above are shining bright, Generated Poem Extension: The stars above are shining bright, And casting down their silver light. Upon the world so calm and deep, While all the flowers go to sleep.

## Question 10: Imagine you are building a creative writing assistant for a publishing company. Describe how you would design the system.
1. System Design:
    - Model Selection: I would use a Large Language Model (LLM) like GPT-4 or LLaMA 3. These models are best at understanding narrative structures and creativity.
    - Fine-Tuning: I would fine-tune the model on a curated dataset of high-quality novels, screenplays, and character biographies to specialize it in storytelling rather than general internet text.

2. Bias Mitigation:

    - Data Cleaning: Before training, I would filter out hate speech and stereotypical content from the dataset.

    - RLHF (Reinforcement Learning from Human Feedback): I would have human editors review the AI's output and "downvote" biased or boring plots, teaching the model to improve over time.

3. Evaluation Methods:

    - Perplexity Score: A mathematical score to check if the text flows naturally.
    - Human Evaluation: The most important metric. Real writers would rate the outputs on "Creativity," "Coherence," and "Originality."

4. Real-World Challenges:
    - Hallucination: The AI might invent facts or lose track of the plot in long stories (e.g., a character dies in Chapter 1 but reappears in Chapter 5).
    - Context Window: AI has a limit on how much text it can remember. For a full novel, the AI might forget details from the beginning of the book.
    - Lack of "Soul": AI can mimic structure, but it often struggles with deep emotional subtext or genuine human insight