# 📊 Generative AI Models Overview

| **Model Family**              | **Mechanism of Work**                                                                                                   | **Applications (Domains)**                                                                                          | **Example Models**                                   |
|--------------------------------|--------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------|
| **Generative Adversarial Networks (GANs)** | Two neural networks (Generator vs. Discriminator) in a minimax game. Generator tries to fool the Discriminator; Discriminator learns to distinguish real from fake. | Images (faces, objects, art), Videos (deepfakes), Audio (music, speech), 3D data | DCGAN, StyleGAN, CycleGAN, BigGAN                   |
| **Variational Autoencoders (VAEs)**       | Encoder compresses input into latent distribution (mean, variance). Decoder reconstructs data by sampling from latent space. Uses the **reparameterization trick**. | Images (denoising, synthesis), Audio (speech representation), Medical imaging   | β-VAE, VQ-VAE                                       |
| **Autoregressive Models**                 | Generate data sequentially (token by token / pixel by pixel), conditioning on previously generated outputs.          | Text (language models), Images (pixel-based), Audio (waveform models), Video (frame prediction) | GPT, PixelCNN, PixelRNN, WaveNet                   |
| **Normalizing Flows**                     | Learn invertible transformations between simple distributions (e.g., Gaussian) and complex data distributions with tractable likelihoods. | Density estimation, Image synthesis, Audio modeling, Molecular generation        | RealNVP, Glow, NICE                                 |
| **Diffusion Models**                      | Start from noise and iteratively denoise to generate realistic samples, via forward (noising) and reverse (denoising) processes. | Images (Stable Diffusion, Imagen, DALL·E 3), Audio (DiffWave, AudioLM), Video (VideoDiffusion, Pika), 3D/Multimodal | DDPM, DDIM, Latent Diffusion                        |
| **Energy-Based Models (EBMs)**            | Define an energy function over inputs; generation via sampling (e.g., Langevin dynamics). Less common but powerful.  | Images, Physics simulations, Structured data                                    | Deep Energy Models                                  |
| **Transformers (Generative)**             | Self-attention mechanism models dependencies across sequences in parallel. Can be autoregressive (decoder-only) or seq2seq (encoder–decoder). | Text (ChatGPT, PaLM, LLaMA), Images (ViT-GPT, Parti), Audio (AudioLM, MusicLM), Video/Multimodal (PaLM-E, Flamingo) | GPT family, BERT (masked LM), ViT-GPT, DALL·E 2/3   |

---

## 🧭 Key Insights

- **GANs**: Dominate in high-quality image generation and deepfakes, but hard to train (mode collapse, instability).  
- **VAEs**: Excel at structured latent spaces → useful for **representation learning** and **controllable generation**.  
- **Autoregressive Models**: Power **language generation** (GPT, LLaMA) and early **image/audio generation** (PixelRNN, WaveNet).  
- **Normalizing Flows**: Allow **exact likelihood estimation**, valuable in **scientific modeling** (molecular design, density estimation).  
- **Diffusion Models**: Currently **state-of-the-art** for images, audio, video (e.g., Stable Diffusion, Imagen).  
- **Transformers**: Unify **generative modeling across all modalities** (text, images, audio, video, multimodal).  


# 📚 Key Academic Works on Generative AI Models

| **Model Family**                                           | **Key Paper (Original / Seminal)**                                                             | **Authors / Year**                                                                                                                 | **Venue**                 |
|------------------------------------------------------------|------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
| **GANs (Generative Adversarial Networks)**                 | *Generative Adversarial Nets*                                                                  | Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014) | NeurIPS 2014              |
| **VAEs (Variational Autoencoders)**                        | *Auto-Encoding Variational Bayes*                                                              | Kingma, Diederik P. & Welling, Max (2013, published 2014)                                                                          | ICLR 2014                 |
| **Autoregressive Models (Seq2Seq / Pixel / Audio)**        | *Sequence to Sequence Learning with Neural Networks* (text)                                    | Ilya Sutskever, Oriol Vinyals, Quoc V. Le (2014)                                                                                   | NeurIPS 2014              |
|                                                            | *Pixel Recurrent Neural Networks* (images)                                                     | Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu (2016)                                                                     | ICML 2016                 |
|                                                            | *WaveNet: A Generative Model for Raw Audio* (audio)                                            | Aaron van den Oord et al. (2016)                                                                                                   | DeepMind Tech Report 2016 |
| **Normalizing Flows**                                      | *NICE: Non-linear Independent Components Estimation*                                           | Laurent Dinh, David Krueger, Yoshua Bengio (2014)                                                                                  | arXiv 2014                |
|                                                            | *Density Estimation using Real NVP*                                                            | Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio (2016)                                                                            | ICLR 2017                 |
| **Diffusion Models**                                       | *Deep Unsupervised Learning using Nonequilibrium Thermodynamics*                               | Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, Surya Ganguli (2015)                                                      | ICML 2015                 |
|                                                            | *Denoising Diffusion Probabilistic Models* (modern revival)                                    | Jonathan Ho, Ajay Jain, Pieter Abbeel (2020)                                                                                       | NeurIPS 2020              |
| **Energy-Based Models (EBMs)**                             | *A Connectionist Approach to Knowledge and Its Application to the Problem of AI* (early EBMs)  | Geoffrey Hinton & Terry Sejnowski (1983)                                                                                           | Cognition & Brain Theory  |
|                                                            | *Energy-based Models for Structured Prediction*                                                | Yann LeCun, Sumit Chopra, Raia Hadsell, Fu Jie Huang, V. R. Javier (2006)                                                          | ICML 2006                 |
| **Transformers (Generative)**                              | *Attention Is All You Need*                                                                    | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, Illia Polosukhin (2017)       | NeurIPS 2017              |
