Welcome! This mini-repository is intentionally didactic: every file is a self-contained lesson that incrementally introduces the idea of Cache-Augmented Generation (CAG) – a technique in which we preload relevant documents into an LLM’s context so the model can answer many questions without performing a live retrieval step.
The code is kept short, dependency-free and thoroughly logged so that you can run each script with plain Node and see what is happening at every phase.
# run any demo
node cag_demo.js| File | Pedagogical Focus | Key Concepts Introduced |
|---|---|---|
| cag_demo.js | 🍏 “Hello CAG” – the smallest viable example | • Pre-loaded text cache • Linear string matching |
| cag_demo_with_vector_store.js | 🍊 Adds semantic search via fake embeddings | • Document vectors • Cosine similarity • Similarity threshold ( 0.85 default) |
| cache_augmented_llm.js | 🍎 Modularises the code into a reusable CacheAugmentedLLM class and layers extra features |
• Embedding-function injection • Vector-store plug-in stub • Query-result cache (performance) • Runtime similarity-threshold tuning |
| cache_augmented_llm_with_search.js | 🍉 Separates the pipeline even further to highlight each sub-step | • Dedicated helpers: vectorizeQuery & searchVectorStore• Clear trace of vectorise ➜ search ➜ answer |
Tip for educators – Because each successive file only adds a single conceptual leap, you can walk learners through the scripts one after another, live-coding small deltas or using
git diffto highlight the change.
- Run
cag_demo.jsto observe basic string matching and discuss its limitations (lexical vs semantic). - Move to
cag_demo_with_vector_store.jsto show how embeddings plus cosine similarity overcome those limits. - Graduate to
cache_augmented_llm.jsfor a conversation about real-world concerns: external vector stores, plug-able embeddings and caching for latency. - Finish with
cache_augmented_llm_with_search.jsto underline the standard retrieval pipeline that underpins most production systems.
Retrieval-Augmented Generation (RAG) fetches documents at query time. CAG pre-loads a carefully selected subset into the model’s context (or fast in-memory vector store), trading memory for speed. This repo lets students experiment with that trade-off before touching heavyweight libraries or cloud services.
All scripts rely only on the Node.js standard library.
# Execute a script
node cache_augmented_llm.js
# View verbose logs for learning
NODE_OPTIONS="--trace-warnings" node cache_augmented_llm_with_search.jsFeel free to modify the contextCache objects, tweak the similarityThreshold, or replace the fake embedding function with your own model to explore further.
Happy learning – and happy caching! 🎉
This repository and all provided assets are maintained by admin@nguyenhongquan.com.