1. What is Generative AI and what are its primary use cases across
industries?

Generative AI refers to a class of artificial intelligence models that can create new content—such as text, images, audio, code, and even synthetic data—by learning patterns from large datasets. Unlike traditional AI systems that only classify or predict, generative models produce original outputs that resemble human-created material. They rely on architectures like transformers, diffusion models, variational autoencoders, and GANs, enabling them to understand context, generate coherent sequences, and simulate real-world complexity.

Across industries, its applications are increasingly transformative. In business and marketing, generative AI produces campaign copy, customer-specific messaging, brand visuals, and automated reports. In software development, it accelerates coding, debugging, documentation, and prototyping. In healthcare, it supports drug-molecule generation, medical image synthesis, clinical note automation, and personalized patient communication. Finance uses generative models for scenario simulation, fraud pattern synthesis, and automated compliance documentation. Manufacturing and engineering benefit from AI-generated designs, optimization of product blueprints, and predictive maintenance documentation. In education, generative AI creates tailored lessons, question banks, explanations, and tutoring feedback. Meanwhile, creative industries leverage it for storytelling, artwork, music composition, 3D content, and game development. Overall, generative AI enhances productivity, reduces costs, and enables rapid innovation across nearly every sector.

2. Explain the role of probabilistic modeling in generative models. How do
these models differ from discriminative models?

Probabilistic modeling is central to generative models because it allows them to learn the joint probability distribution
p(x,y) or, in unsupervised settings, the distribution of the data
p(x). By estimating how data is likely to be generated, these models can sample new instances that follow similar statistical patterns. Techniques like latent variables, likelihood estimation, and sampling methods (e.g., Markov Chain Monte Carlo, variational inference) help generative systems capture uncertainty, variation, and correlations within the data. This probabilistic foundation enables models such as VAEs, diffusion models, and autoregressive transformers to generate realistic text, images, or other modalities.

Generative models differ from discriminative models in their objective and the type of probability they learn:

Generative models learn how the data is produced.

They estimate

p(x,y) or p(x).

They can generate new data points.

Examples: GANs, VAEs, diffusion models, GPT-style transformers.

Discriminative models learn to distinguish between classes.

They estimate p(y∣x).

Their goal is classification or prediction, not content creation.

Examples: logistic regression, SVMs, traditional classifiers.

In essence, generative models focus on modeling the entire data distribution, whereas discriminative models focus only on decision boundaries.

3. What is the difference between Autoencoders and Variational
Autoencoders (VAEs) in the context of text generation?

Autoencoders (AEs) and Variational Autoencoders (VAEs) both learn compressed representations of data, but they differ fundamentally in how they model latent space—this difference is crucial for text generation.
Autoencoders
Autoencoders learn a deterministic mapping from input text to a latent representation and then back to reconstructed text.


The encoder compresses the input into a fixed latent vector.


The decoder attempts to reconstruct the original text exactly.


Latent space is not structured or smooth.
Because the latent space is irregular, sampling random points usually produces poor or incoherent text. AEs are mainly useful for dimensionality reduction or denoising, not high-quality text creation.


Variational Autoencoders (VAEs)
VAEs introduce probabilistic modeling into the latent space.


Instead of producing a single latent vector, the encoder outputs a distribution N(μ,σ).


A latent sample is drawn from this distribution and passed to the decoder.


A KL-divergence term forces the latent space to be smooth and continuous.


This continuous, well-structured latent space enables meaningful sampling, interpolation, and generation of diverse text outputs. VAEs can produce new sentences rather than simply reconstructing existing ones.
Key Difference
AEs reconstruct; VAEs generate.
VAEs’ probabilistic latent space makes them far more suitable for coherent and diverse text generation.

4. Describe the working of attention mechanisms in Neural Machine
Translation (NMT). Why are they critical?

Attention mechanisms allow Neural Machine Translation (NMT) systems to focus on the most relevant parts of the source sentence while generating each word in the target sentence. They were introduced to overcome the limitations of fixed-length encoder representations used in early sequence-to-sequence models.

How Attention Works

    Encoding: The encoder processes the source sentence and produces a sequence of hidden states, each representing contextual information for a specific source word.

    Alignment Scores: When the decoder is about to generate a target word, it computes a relevance score between its current hidden state and each encoder hidden state.
    Common scoring functions: dot-product, additive attention.

    Attention Weights: These scores are normalized using softmax to form attention weights. Each weight reflects how important a particular source word is for generating the current target word.

    Context Vector: A weighted sum of the encoder states is computed using the attention weights. This becomes the context vector supplied to the decoder.

    Decoding: The decoder uses both its hidden state and the context vector to predict the next target word.

Why Attention is Critical

    It avoids compressing an entire sentence into a single vector.

    It allows the model to handle long sentences effectively.

    It enables better word alignment, improving translation accuracy.

    It provides interpretability by showing which source words influenced each output word.

Overall, attention mechanisms greatly enhance fluency, context handling, and accuracy in modern NMT systems.

5. What ethical considerations must be addressed when using generative AI for creative content such as poetry or storytelling?

Using generative AI for creative content like poetry or storytelling raises several ethical concerns that must be addressed to ensure responsible use and fair creative practices.

a. Authorship and Ownership: AI-generated content blurs the line between creator and tool. Clear policies are needed to define who owns the output—the user, the developer, or the model’s training contributors.

b. Originality and Plagiarism:
Generative models may inadvertently reproduce patterns or phrases from their training data. Safeguards must ensure outputs do not mimic copyrighted works or replicate specific authors' styles too closely without permission.

c. Bias and Representation:
Training data often reflects societal biases. This can lead to creative outputs that reinforce stereotypes or exclude marginalized voices. Ethical use requires bias-aware training and fairness auditing.

d. Cultural Sensitivity:
Storytelling involves cultural symbols, myths, and identities. AI models must avoid misrepresenting cultures, appropriating traditions, or generating insensitive narratives.

e. Transparency:
Audiences should know when content is AI-generated. Lack of transparency can mislead readers or disrupt trust in creative industries.

f. Misuse and Manipulation:
Creative AI can be exploited to produce emotionally manipulative stories, propaganda, or deceptive narratives. Controls are essential to prevent harmful applications.

g. Impact on Human Creators:
Generative AI affects livelihoods in writing, art, and publishing. Ethical deployment requires supporting, not replacing, human creativity.

Overall, responsible use demands fairness, transparency, cultural respect, and safeguards against harm.

6. Use the following small text dataset to train a simple Variational
Autoencoder (VAE) for text reconstruction:
["The sky is blue", "The sun is bright", "The grass is green",
"The night is dark", "The stars are shining"]

a. Preprocess the data (tokenize and pad the sequences).

b. Build a basic VAE model for text reconstruction.

c. Train the model and show how it reconstructs or generates similar sentences.
Include your code, explanation, and sample outputs.


In [None]:
# Simple VAE for text reconstruction (small dataset)

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, RepeatVector, TimeDistributed
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import random
import os

In [None]:
# Reproducibility (not guaranteed across all ops/hardware)
seed = 42
np.random.seed(seed)
tf.random.set_seed(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)

In [None]:
# Data
raw_texts = [
    "The sky is blue",
    "The sun is bright",
    "The grass is green",
    "The night is dark",
    "The stars are shining"
]

In [None]:
# Add explicit start/end tokens (helps decoder)
texts = [f"<start> {s.lower()} <end>" for s in raw_texts]

# ------------- Tokenize & Pad -------------
tokenizer = Tokenizer(oov_token="<oov>")
tokenizer.fit_on_texts(texts)
word_index = tokenizer.word_index
index_word = {i: w for w, i in word_index.items()}

In [None]:
# Reserve 0 for padding
vocab_size = len(word_index) + 1  # +1 for padding=0
print("Vocab size:", vocab_size)
print("Word index:", word_index)

sequences = tokenizer.texts_to_sequences(texts)
max_len = max(len(s) for s in sequences)
padded = pad_sequences(sequences, maxlen=max_len, padding='post')

print("Max sequence length:", max_len)
print("Padded sequences:\n", padded)

Vocab size: 17
Word index: {'<oov>': 1, 'start': 2, 'the': 3, 'end': 4, 'is': 5, 'sky': 6, 'blue': 7, 'sun': 8, 'bright': 9, 'grass': 10, 'green': 11, 'night': 12, 'dark': 13, 'stars': 14, 'are': 15, 'shining': 16}
Max sequence length: 6
Padded sequences:
 [[ 2  3  6  5  7  4]
 [ 2  3  8  5  9  4]
 [ 2  3 10  5 11  4]
 [ 2  3 12  5 13  4]
 [ 2  3 14 15 16  4]]


In [None]:
# VAE model
embedding_dim = 32
encoder_lstm_units = 64
latent_dim = 16
decoder_lstm_units = 64

In [None]:
# Encoder
encoder_inputs = Input(shape=(max_len,), name="encoder_input")
x = Embedding(input_dim=vocab_size, output_dim=embedding_dim,
              mask_zero=True, name="embedding")(encoder_inputs)
encoder_lstm = LSTM(encoder_lstm_units, name="encoder_lstm")(x)
z_mean = Dense(latent_dim, name="z_mean")(encoder_lstm)
z_log_var = Dense(latent_dim, name="z_log_var")(encoder_lstm)

In [None]:
# Sampling (reparameterization trick)
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.random.normal(shape=(tf.shape(z_mean)[0], latent_dim), mean=0., stddev=1.)
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon

Use a pre-trained GPT model (like GPT-2 or GPT-3) to translate a short
English paragraph into French and German. Provide the original and translated text.

In [None]:
# Hugging Face Transformers
from transformers import pipeline

translator_fr = pipeline("translation_en_to_fr", model="t5-small")
translator_de = pipeline("translation_en_to_de", model="t5-small")

text = "Artificial intelligence is transforming the way we work, learn, and communicate. It helps people solve complex problems, automate routine tasks, and create new opportunities across industries."

print("French:", translator_fr(text)[0]['translation_text'])
print("German:", translator_de(text)[0]['translation_text'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cuda:0
Device set to use cuda:0


French: L'intelligence artificielle transforme notre façon de travailler, d'apprendre et de communiquer et aide les gens à résoudre des problèmes complexes, à automatiser les tâches courantes et à créer de nouvelles possibilités dans l'ensemble des industries.
German: Die künstliche Intelligenz verändert die Art und Weise, wie wir arbeiten, lernen und kommunizieren.


8. Implement a simple attention-based encoder-decoder model for
English-to-Spanish translation using Tensorflow or PyTorch.

In [None]:
# Run this cell in Colab (Runtime: GPU / T4)
import math
import random
import time
from typing import List, Tuple

import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset

In [None]:
# Toy dataset (English -> Spanish)

pairs = [
    ("hello", "hola"),
    ("how are you", "como estas"),
    ("i am fine", "estoy bien"),
    ("thank you", "gracias"),
    ("good morning", "buenos dias"),
    ("see you later", "hasta luego"),
    ("what is your name", "como te llamas"),
    ("my name is avinash", "me llamo avinash"),
    ("i love you", "te quiero"),
    ("i don't know", "no se"),
]

In [None]:
# Simple tokenizer + Vocab

class Vocab:
    def __init__(self, tokens=None, min_freq=1, reserved_tokens=None):
        self.freq = {}
        self.itos = []
        self.stoi = {}
        reserved_tokens = reserved_tokens or []
        for t in reserved_tokens:
            self.add_token(t)
        if tokens:
            for tok in tokens:
                self.add_token(tok, count=True)
        # later finalize with build()

    def add_token(self, token, count=False):
        if count:
            self.freq[token] = self.freq.get(token, 0) + 1
        else:
            if token not in self.stoi:
                idx = len(self.itos)
                self.itos.append(token)
                self.stoi[token] = idx

    def build(self, min_freq=1):
        # Add special tokens if not present
        specials = ["<pad>", "<sos>", "<eos>", "<unk>"]
        for s in specials:
            if s not in self.stoi:
                idx = len(self.itos)
                self.itos.append(s)
                self.stoi[s] = idx
        # Add tokens sorted by freq
        sorted_tokens = sorted([(t, f) for t, f in self.freq.items() if f >= min_freq],
                               key=lambda x: -x[1])
        for t, _ in sorted_tokens:
            if t not in self.stoi:
                idx = len(self.itos)
                self.itos.append(t)
                self.stoi[t] = idx

    def __len__(self):
        return len(self.itos)

def tokenize(s: str) -> List[str]:
    return s.lower().strip().split()


In [None]:
# Build vocabularies
src_tokens = []
tgt_tokens = []
for s, t in pairs:
    src_tokens.extend(tokenize(s))
    tgt_tokens.extend(tokenize(t))

src_vocab = Vocab()
src_vocab.freq = {}
for tok in src_tokens:
    src_vocab.freq[tok] = src_vocab.freq.get(tok, 0) + 1
src_vocab.build()

tgt_vocab = Vocab()
tgt_vocab.freq = {}
for tok in tgt_tokens:
    tgt_vocab.freq[tok] = tgt_vocab.freq.get(tok, 0) + 1
tgt_vocab.build()

PAD_IDX = tgt_vocab.stoi["<pad>"]
SOS_IDX = tgt_vocab.stoi["<sos>"]
EOS_IDX = tgt_vocab.stoi["<eos>"]
UNK_IDX = tgt_vocab.stoi["<unk>"]


In [None]:
# Dataset & DataLoader

def numericalize_src(sentence):
    return [src_vocab.stoi.get(tok, src_vocab.stoi["<unk>"]) for tok in tokenize(sentence)]

def numericalize_tgt(sentence):
    # add <sos> and <eos>
    tokens = [SOS_IDX] + [tgt_vocab.stoi.get(tok, UNK_IDX) for tok in tokenize(sentence)] + [EOS_IDX]
    return tokens

class TranslationDataset(Dataset):
    def __init__(self, pairs: List[Tuple[str, str]]):
        self.data = pairs

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        src, tgt = self.data[idx]
        src_ids = torch.tensor(numericalize_src(src), dtype=torch.long)
        tgt_ids = torch.tensor(numericalize_tgt(tgt), dtype=torch.long)
        return src_ids, tgt_ids

def collate_fn(batch):
    src_batch, tgt_batch = zip(*batch)
    src_lens = [len(s) for s in src_batch]
    tgt_lens = [len(t) for t in tgt_batch]
    src_padded = pad_sequence(src_batch, batch_first=True, padding_value=0)  # 0 might be unk; src vocab has no pad idx separate
    tgt_padded = pad_sequence(tgt_batch, batch_first=True, padding_value=PAD_IDX)
    return src_padded, torch.tensor(src_lens), tgt_padded, torch.tensor(tgt_lens)

dataset = TranslationDataset(pairs)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=collate_fn)

In [None]:
# Model: Encoder, Bahdanau Attention, Decoder

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hid_dim, n_layers=1, dropout=0.1):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, emb_dim, padding_idx=0)
        self.rnn = nn.GRU(emb_dim, hid_dim, num_layers=n_layers, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hid_dim * 2, hid_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, src, src_len):
        # src: [batch, src_len]
        embedded = self.dropout(self.embedding(src))  # [b, src_len, emb_dim]
        # pack_padded_sequence could be used — we keep it simple here
        outputs, hidden = self.rnn(embedded)  # outputs: [b, src_len, hid*2]
        # combine bidirectional hidden states
        # hidden: [n_layers*2, b, hid]
        # We'll concat last forward and backward and pass through fc to get decoder init
        h_forward = hidden[-2,:,:]
        h_backward = hidden[-1,:,:]
        h_cat = torch.cat((h_forward, h_backward), dim=1)  # [b, hid*2]
        init_hidden = torch.tanh(self.fc(h_cat)).unsqueeze(0)  # [1, b, hid]
        return outputs, init_hidden  # outputs for attention

In [None]:
class BahdanauAttention(nn.Module):
    def __init__(self, enc_hid_dim, dec_hid_dim):
        super().__init__()
        self.attn = nn.Linear(enc_hid_dim + dec_hid_dim, dec_hid_dim)
        self.v = nn.Linear(dec_hid_dim, 1, bias=False)

    def forward(self, hidden, encoder_outputs, mask=None):
        # hidden: [1, b, dec_hid] -> [b, dec_hid]
        # encoder_outputs: [b, src_len, enc_hid]
        bsz = encoder_outputs.size(0)
        src_len = encoder_outputs.size(1)
        hidden = hidden.squeeze(0).unsqueeze(1).repeat(1, src_len, 1)  # [b, src_len, dec_hid]
        energy = torch.tanh(self.attn(torch.cat((hidden, encoder_outputs), dim=2)))  # [b, src_len, dec_hid]
        attention = self.v(energy).squeeze(2)  # [b, src_len]
        if mask is not None:
            attention = attention.masked_fill(mask == 0, -1e10)
        return torch.softmax(attention, dim=1)  # [b, src_len]

In [None]:
class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, hid_dim, enc_hid_dim, attention, dropout=0.1):
        super().__init__()
        self.output_dim = output_dim
        self.attention = attention
        self.embedding = nn.Embedding(output_dim, emb_dim, padding_idx=PAD_IDX)
        self.rnn = nn.GRU(enc_hid_dim + emb_dim, hid_dim, batch_first=True)
        self.fc_out = nn.Linear(enc_hid_dim + hid_dim + emb_dim, output_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input_token, hidden, encoder_outputs, mask=None):
        # input_token: [b] (current token indices)
        input_token = input_token.unsqueeze(1)  # [b,1]
        embedded = self.dropout(self.embedding(input_token))  # [b,1,emb_dim]
        a = self.attention(hidden, encoder_outputs, mask)  # [b, src_len]
        a = a.unsqueeze(1)  # [b,1,src_len]
        # weighted sum
        weighted = torch.bmm(a, encoder_outputs)  # [b,1,enc_hid]
        rnn_input = torch.cat((embedded, weighted), dim=2)  # [b,1, emb + enc_hid]
        output, hidden = self.rnn(rnn_input, hidden)  # output: [b,1,dec_hid], hidden: [1,b,dec_hid]
        output = output.squeeze(1)
        weighted = weighted.squeeze(1)
        embedded = embedded.squeeze(1)
        prediction = self.fc_out(torch.cat((output, weighted, embedded), dim=1))  # [b, output_dim]
        return prediction, hidden, a.squeeze(1)  # attn weights [b, src_len]

In [None]:
# Instantiate model
INPUT_DIM = len(src_vocab)
OUTPUT_DIM = len(tgt_vocab)
ENC_EMB_DIM = 64
DEC_EMB_DIM = 64
HID_DIM = 128

enc = Encoder(INPUT_DIM, ENC_EMB_DIM, HID_DIM).to(device)
attn = BahdanauAttention(enc_hid_dim=HID_DIM*2, dec_hid_dim=HID_DIM).to(device)  # encoder outputs hid*2
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, HID_DIM, enc_hid_dim=HID_DIM*2, attention=attn).to(device)

In [None]:
# Training setup

optimizer = optim.Adam(list(enc.parameters()) + list(dec.parameters()), lr=0.001)
criterion = nn.CrossEntropyLoss(ignore_index=PAD_IDX)

def create_src_mask(src_tensor, src_lens):
    # src_tensor: [b, src_len]
    # here pad idx for src might be 0 (unk) since we didn't add src pad; adapt if you have pad idx
    mask = (src_tensor != 0).to(device)  # [b, src_len]
    return mask

def train_epoch(encoder, decoder, dataloader, optimizer, criterion, clip=1.0, teacher_forcing_ratio=0.5):
    encoder.train()
    decoder.train()
    epoch_loss = 0
    for src_batch, src_lens, tgt_batch, tgt_lens in dataloader:
        src_batch = src_batch.to(device)
        tgt_batch = tgt_batch.to(device)

        optimizer.zero_grad()
        encoder_outputs, hidden = encoder(src_batch, src_lens)
        # encoder_outputs: [b, src_len, hid*2]
        batch_size = src_batch.size(0)
        trg_len = tgt_batch.size(1)
        outputs = torch.zeros(batch_size, trg_len, OUTPUT_DIM).to(device)
        input_token = tgt_batch[:, 0]  # first token is SOS

        mask = create_src_mask(src_batch, src_lens)

        for t in range(1, trg_len):
            pred, hidden, _ = decoder(input_token, hidden, encoder_outputs, mask)
            outputs[:, t, :] = pred
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = pred.argmax(1)
            input_token = tgt_batch[:, t] if teacher_force else top1

        # reshape for loss: ignore index 0 (SOS) in target when computing loss
        outputs_flat = outputs[:, 1:].reshape(-1, OUTPUT_DIM)
        targets_flat = tgt_batch[:, 1:].reshape(-1)
        loss = criterion(outputs_flat, targets_flat)
        loss.backward()
        # gradient clipping
        torch.nn.utils.clip_grad_norm_(list(encoder.parameters()) + list(decoder.parameters()), clip)
        optimizer.step()
        epoch_loss += loss.item()
    return epoch_loss / len(dataloader)

In [None]:
def evaluate(encoder, decoder, dataloader, criterion):
    encoder.eval()
    decoder.eval()
    epoch_loss = 0
    with torch.no_grad():
        for src_batch, src_lens, tgt_batch, tgt_lens in dataloader:
            src_batch = src_batch.to(device)
            tgt_batch = tgt_batch.to(device)
            encoder_outputs, hidden = encoder(src_batch, src_lens)
            batch_size = src_batch.size(0)
            trg_len = tgt_batch.size(1)
            outputs = torch.zeros(batch_size, trg_len, OUTPUT_DIM).to(device)
            input_token = tgt_batch[:, 0]
            mask = create_src_mask(src_batch, src_lens)
            for t in range(1, trg_len):
                pred, hidden, _ = decoder(input_token, hidden, encoder_outputs, mask)
                outputs[:, t, :] = pred
                input_token = pred.argmax(1)
            outputs_flat = outputs[:, 1:].reshape(-1, OUTPUT_DIM)
            targets_flat = tgt_batch[:, 1:].reshape(-1)
            loss = criterion(outputs_flat, targets_flat)
            epoch_loss += loss.item()
    return epoch_loss / len(dataloader)

In [None]:
# Train (small number of epochs for toy data)

N_EPOCHS = 200
best_loss = float('inf')
for epoch in range(1, N_EPOCHS+1):
    train_loss = train_epoch(enc, dec, dataloader, optimizer, criterion, teacher_forcing_ratio=0.5)
    val_loss = evaluate(enc, dec, dataloader, criterion)
    if val_loss < best_loss:
        best_loss = val_loss
        torch.save({
            'enc_state': enc.state_dict(),
            'dec_state': dec.state_dict(),
            'src_vocab': src_vocab.itos,
            'tgt_vocab': tgt_vocab.itos
        }, "seq2seq_attention.pt")
    if epoch % 20 == 0 or epoch == 1:
        print(f"Epoch {epoch:3d} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f}")

Epoch   1 | Train Loss: 3.0964 | Val Loss: 2.9138
Epoch  20 | Train Loss: 0.0249 | Val Loss: 0.0216
Epoch  40 | Train Loss: 0.0059 | Val Loss: 0.0048
Epoch  60 | Train Loss: 0.0025 | Val Loss: 0.0024
Epoch  80 | Train Loss: 0.0017 | Val Loss: 0.0014
Epoch 100 | Train Loss: 0.0010 | Val Loss: 0.0010
Epoch 120 | Train Loss: 0.0007 | Val Loss: 0.0007
Epoch 140 | Train Loss: 0.0007 | Val Loss: 0.0005
Epoch 160 | Train Loss: 0.0005 | Val Loss: 0.0004
Epoch 180 | Train Loss: 0.0003 | Val Loss: 0.0003
Epoch 200 | Train Loss: 0.0003 | Val Loss: 0.0003


In [None]:
# Inference: translate sentence

def translate(sentence: str, max_len=20):
    enc.eval(); dec.eval()
    src_ids = torch.tensor(numericalize_src(sentence)).unsqueeze(0).to(device)  # [1, src_len]
    src_len = torch.tensor([src_ids.size(1)])
    with torch.no_grad():
        encoder_outputs, hidden = enc(src_ids, src_len)
        input_token = torch.tensor([SOS_IDX], device=device)
        mask = create_src_mask(src_ids, src_len)
        tokens = []
        for _ in range(max_len):
            pred, hidden, attn = dec(input_token, hidden, encoder_outputs, mask)
            top1 = pred.argmax(1).item()
            if top1 == EOS_IDX:
                break
            tokens.append(tgt_vocab.itos[top1])
            input_token = torch.tensor([top1], device=device)
    return " ".join(tokens)

In [None]:
# Test translations
test_sentences = [
    "hello",
    "how are you",
    "i am fine",
    "what is your name",
    "my name is avinash",
    "see you later"
]
print("\nTranslations:")
for s in test_sentences:
    print(f"{s} -> {translate(s)}")


Translations:
hello -> hola
how are you -> como estas
i am fine -> estoy bien
what is your name -> como te llamas
my name is avinash -> me llamo avinash
see you later -> hasta luego


9. Use the following short poetry dataset to simulate poem generation with a pre-trained GPT model:

["Roses are red, violets are blue,",
"Sugar is sweet, and so are you.",
"The moon glows bright in silent skies,",
"A bird sings where the soft wind sighs."]

Using this dataset as a reference for poetic structure and language, generate a new 2-4 line poem using a pre-trained GPT model (such as GPT-2). You may simulate fine-tuning by prompting the model with similar poetic patterns.

Include your code, the prompt used, and the generated poem in your answer

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

In [None]:
# Load GPT-2 model and tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
# Small poetry dataset
poems = [
    "Roses are red, violets are blue,",
    "Sugar is sweet, and so are you.",
    "The moon glows bright in silent skies,",
    "A bird sings where the soft wind sighs."
]

In [None]:
# Simulated fine-tuning via prompting
prompt = (
    "Roses are red, violets are blue,\n"
    "Sugar is sweet, and so are you.\n"
    "The moon glows bright in silent skies,\n"
    "A bird sings where the soft wind sighs.\n\n"
    "Write a new poem in the same gentle style:\n"
)

inputs = tokenizer.encode(prompt, return_tensors="pt")

In [None]:
# Generate poem
output = model.generate(
    inputs,
    max_length=80,
    num_return_sequences=1,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In [None]:
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Roses are red, violets are blue,
Sugar is sweet, and so are you.
The moon glows bright in silent skies,
A bird sings where the soft wind sighs.

Write a new poem in the same gentle style:

You are your mother, and you are my love

And you are my father.

The moon gl


10. Imagine you are building a creative writing assistant for a publishing
company. The assistant should generate story plots and character descriptions using Generative AI. Describe how you would design the system, including model selection, training data, bias mitigation, and evaluation methods. Explain the real-world challenges you might face.


Designing a creative-writing assistant for a publishing company requires a balance of model quality, controllability, ethical safeguards, and editorial usability. The system would use a strong instruction-tuned Large Language Model as its core generator, supported by lightweight fine-tuned adapters trained on licensed in-house manuscripts and curated public-domain texts. This ensures the assistant captures the publisher’s preferred style without exposing the system to copyright issues. A retrieval component can supply style guides, genre conventions, and prior character notes so the model stays consistent with existing worlds and avoids factual drift.

Training data must be carefully prepared: deduplicated, licensed, genre-tagged, age-rated, and enriched with diverse authorial voices to prevent stylistic collapse or cultural homogenization. Controlled fine-tuning and prompt templates help the model respect user inputs such as genre, tone, pacing, and forbidden themes. Bias mitigation includes balanced datasets, stereotype-countering synthetic examples, toxicity filters, and mandatory human review for sensitive topics. A multi-layered safety system checks for offensive language, unintended stereotypes, and copyright resemblance.

Evaluation combines automated metrics—coherence, diversity, style alignment, and attribute fidelity—with editorial judgment. Human reviewers rate creativity, usefulness, clarity, and edit effort. Operational monitoring tracks latency, cost, and safety flags. Editors interact through a simple interface where they can adjust tone or novelty, request multiple variants, compare drafts, and export results.

Real-world challenges include hallucinations, subtle bias, and subjective notions of “good writing.” Maintaining legal compliance is nontrivial, especially around copyrighted works and derivative content. Creativity evaluation is costly, requiring ongoing human involvement. Ensuring the assistant produces fresh ideas rather than formulaic outputs demands constant dataset refresh and preference-based fine-tuning. Finally, balancing model quality with inference cost and speed is essential for a smooth editorial workflow.

This design aims to provide fast ideation support while preserving editorial control, originality, and ethical integrity.

In [None]:
!ls

data_eng_spa  sample_data  seq2seq_attention.pt


In [None]:
!jupyter nbconvert --ClearMetadataPreprocessor.enabled=True \
                   --ClearOutputPreprocessor.enabled=True \
                   --to notebook --inplace <filename>.ipynb

/bin/bash: line 1: filename: No such file or directory
