Hands-on workshop covering GenAI fundamentals and RAG pipelines.
| Topic | What You Learn |
|---|---|
| Tokenization | How LLMs break text into tokens using tiktoken (GPT-4's encoder) |
| Embeddings | Convert words to vectors using sentence-transformers |
| Similarity | Compare word meanings with cosine similarity |
| Temperature | Control LLM creativity (0 = factual, 1.5 = creative) |
# Quick Example - Tokenization
import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode("Hello AI!")Build a complete Retrieval-Augmented Generation system:
PDF → Chunks → Embeddings → Vector DB → Retrieval → LLM Answer
| Step | Tool Used |
|---|---|
| Load PDF | PyPDFLoader |
| Split Text | RecursiveCharacterTextSplitter (1000 chars, 200 overlap) |
| Embeddings | HuggingFaceEmbeddings (all-MiniLM-L6-v2) |
| Vector Store | ChromaDB |
| LLM | Groq (llama-3.3-70b-versatile) |
pip install -r requirements.txtSet your API key:
import os
os.environ["GROQ_API_KEY"] = "your-key-here"- Python 3.10+
- Groq API Key (free at groq.com)
- PDF file for RAG demo