# LLM Learning Notes: Understanding Large Language Models

## What Are Models in AI?
- A model in AI/ML refers to a computational system trained on data to perform specific tasks
- For LLMs specifically, a model is a complex mathematical structure (typically a neural network) trained on vast amounts of text
- Frontier models refer to advanced AI systems at the cutting edge of capabilities
- Open-source frontier models have their code, weights, and sometimes training methodologies publicly available (e.g., Llama, Mistral, Falcon)

## How LLMs Work Programmatically

### Architecture and Components
- Modern LLMs are based on the Transformer architecture, introduced in the 2017 paper "Attention Is All You Need"
- Key components include:
  - **Tokenization**: Breaking text into tokens (words, subwords, or characters)
  - **Embedding Layer**: Converting tokens to numerical vectors
  - **Attention Mechanism**: Allowing the model to weigh the importance of different words in context
  - **Feedforward Networks**: Processing representations between attention layers
  - **Layer Normalization**: Stabilizing training
  - **Residual Connections**: Allowing information to flow directly between layers

### Generation Process
1. Input is tokenized and embedded
2. Embeddings flow through multiple transformer layers
3. Each layer refines the representation of the text
4. Final layer outputs probability distributions over possible next tokens
5. Model samples from this distribution to select the next token
6. Process repeats recursively until completion

### No Explicit Memory
- LLMs don't directly access or search through training data
- All knowledge is encoded within billions of parameters (weights and biases)
- Knowledge is distributed across the entire network, not stored in specific locations
- The model leverages statistical patterns learned during training, not explicit facts

## Training LLMs

### Data Collection and Preparation
- Gathering massive text datasets from various sources
- Cleaning and filtering data to remove harmful or low-quality content
- Tokenizing the data
- Formatting into training examples

### Model Architecture Definition
- Defining embedding layers, transformer blocks, and output layer
- Setting up hyperparameters like number of layers, hidden size, number of attention heads

### Training Process
- Feeding data through the model and computing loss
- Backpropagation to update weights
- Gradient clipping to prevent explosion
- Learning rate scheduling

### Distributed Training
- Training is distributed across hundreds or thousands of GPUs/TPUs
- Models are sharded across devices
- Parallel data loading and optimization

### Key Training Techniques
- Pre-training: Initial training on next-token prediction
- Fine-tuning: Further training on more specific datasets
- RLHF: Reinforcement Learning from Human Feedback to align with human preferences

## Neural Networks
- Computational systems inspired by the structure and function of the human brain
- Composed of interconnected nodes (neurons) organized in layers
- Basic structure includes input layer, hidden layers, and output layer
- Key components:
  - Weights and connections
  - Activation functions
  - Learning process (backpropagation)
  - Forward pass

## LLMs vs. Generative AI
- LLMs are a subset of generative AI, specifically focused on language
- Generative AI encompasses any AI system capable of generating new content
- While all LLMs are generative AI, not all generative AI systems are LLMs

## Other Subsets of Generative AI
- **Text Generation**: LLMs, specialized text generators, text-to-text transformers
- **Visual Generation**: Text-to-image models, image-to-image models, GAN-based generators, vector graphics
- **Video Generation**: Text-to-video models, image-to-video models, video editing AI
- **Audio Generation**: Text-to-speech, voice cloning, music generation, sound effects
- **3D Content Generation**: Text-to-3D models, sce