# Generative AI Insem 2 - Codes with Problem Statements
This notebook contains runnable codes along with their problem statements.

## 1. Basic Data Preprocessing for Generative AI
**Problem Statement:** Generate synthetic data and scale it between 0 and 1 using Min-Max scaling.

In [None]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Generate synthetic data
data = np.random.randint(0, 255, (10, 5))
print("Original Data:\n", data)

# Scale data between 0 and 1
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print("Scaled Data:\n", scaled_data)

## 2. Visualizing Data Distributions
**Problem Statement:** Generate synthetic data for two groups and visualize their distributions.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data
data_group1 = np.random.normal(loc=50, scale=10, size=500)
data_group2 = np.random.normal(loc=200, scale=15, size=500)

# Plot histogram
plt.hist(data_group1, label='Group 1', color='blue', alpha=0.7)
plt.hist(data_group2, label='Group 2', color='green', alpha=0.7)
plt.title("Data Distribution")
plt.xlabel("Data Values")
plt.ylabel("Frequency")
plt.legend()
plt.show()

## 3. TensorFlow Computation Graph with Eager Execution
**Problem Statement:** Perform computations in eager and graph execution modes.

In [None]:
import tensorflow as tf

a = tf.constant([5.0, 3.0])
b = tf.constant([2.0, 7.0])
c = a + b
print("Eager Execution Output:", c.numpy())

@tf.function
def multiply_tensors(x, y):
    return x * y

result = multiply_tensors(a, b)
print("Graph Mode Output:", result.numpy())

## 4. Word2Vec Embeddings
**Problem Statement:** Train a Word2Vec model and find word similarities.

In [None]:
from gensim.models import Word2Vec

sentences = [
    ["artificial", "intelligence", "is", "cool"],
    ["machine", "learning", "is", "fun"],
    ["ai", "learning", "uses", "neural", "networks"]
]

model = Word2Vec(sentences, vector_size=10, window=2, min_count=1, sg=1)

print("Vector for 'learning':", model.wv['learning'])
print("Most similar to 'learning':", model.wv.most_similar('learning'))

## 5. GloVe Pre-trained Embeddings
**Problem Statement:** Load GloVe embeddings and compute similarity.

In [None]:
import gensim.downloader as api

glove_model = api.load("glove-wiki-gigaword-50")

print("Vector for 'computer':", glove_model['computer'])
print("Similarity between 'computer' and 'laptop':", glove_model.similarity('computer', 'laptop'))

## 6. BERT Embeddings
**Problem Statement:** Extract embeddings using BERT.

In [None]:
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

inputs = tokenizer("Generative AI creates realistic images", return_tensors="pt")
outputs = model(**inputs)

print("BERT Output Shape:", outputs.last_hidden_state.shape)
print("First token embedding:", outputs.last_hidden_state[0][0][:5])

## 7. FAISS Similarity Search
**Problem Statement:** Perform nearest neighbor search using FAISS.

In [None]:
import faiss
import numpy as np

data = np.random.random((5, 4)).astype('float32')
index = faiss.IndexFlatL2(4)
index.add(data)

query = np.random.random((1, 4)).astype('float32')
distances, indices = index.search(query, k=3)

print("Query Vector:", query)
print("Top 3 Nearest Indices:", indices)
print("Distances:", distances)

## 8. Self-Attention Mechanism
**Problem Statement:** Simulate self-attention using PyTorch.

In [None]:
import torch
import torch.nn.functional as F

x = torch.rand(1, 3, 4)  # (batch, seq_len, features)
Q, K, V = x, x, x

scores = torch.matmul(Q, K.transpose(-2, -1)) / (4 ** 0.5)
weights = F.softmax(scores, dim=-1)
output = torch.matmul(weights, V)

print("Attention Weights:", weights)
print("Output:", output)

## 9. Simulating Diffusion Denoising
**Problem Statement:** Simulate iterative denoising in a diffusion process.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

image = np.random.rand(28, 28)
plt.imshow(image, cmap='gray')
plt.title("Step 0: Noise")
plt.show()

for step in range(1, 4):
    image = image * 0.9  # reduce noise
    plt.imshow(image, cmap='gray')
    plt.title(f"Step {step}: Denoising")
    plt.show()

## 10. FID Calculation
**Problem Statement:** Compute the Fréchet Inception Distance (FID) between two distributions.

In [None]:
from scipy.linalg import sqrtm
import numpy as np

def calculate_fid(mu1, sigma1, mu2, sigma2):
    diff = mu1 - mu2
    covmean = sqrtm(sigma1.dot(sigma2))
    fid = diff.dot(diff) + np.trace(sigma1 + sigma2 - 2 * covmean)
    return np.real(fid)

mu1, sigma1 = np.random.rand(3), np.eye(3)
mu2, sigma2 = np.random.rand(3), np.eye(3)

print("FID Score:", calculate_fid(mu1, sigma1, mu2, sigma2))