Skip to content

Light-weight JavaScript reference implementation and demos of Cache-Augmented Generation (CAG) – pre-loading relevant documents into an LLM’s in-memory cache, adding optional vector-store similarity search and simple ANN optimisation to deliver faster, retrieval-free responses.

211211/cache-augmented-generation-js

Repository files navigation

Cache-Augmented Generation (CAG) – JavaScript Teaching Repository

Welcome! This mini-repository is intentionally didactic: every file is a self-contained lesson that incrementally introduces the idea of Cache-Augmented Generation (CAG) – a technique in which we preload relevant documents into an LLM’s context so the model can answer many questions without performing a live retrieval step.

The code is kept short, dependency-free and thoroughly logged so that you can run each script with plain Node and see what is happening at every phase.

# run any demo
node cag_demo.js

File Tour

File Pedagogical Focus Key Concepts Introduced
cag_demo.js 🍏 “Hello CAG” – the smallest viable example • Pre-loaded text cache
• Linear string matching
cag_demo_with_vector_store.js 🍊 Adds semantic search via fake embeddings • Document vectors
• Cosine similarity
• Similarity threshold (0.85 default)
cache_augmented_llm.js 🍎 Modularises the code into a reusable CacheAugmentedLLM class and layers extra features • Embedding-function injection
• Vector-store plug-in stub
• Query-result cache (performance)
• Runtime similarity-threshold tuning
cache_augmented_llm_with_search.js 🍉 Separates the pipeline even further to highlight each sub-step • Dedicated helpers: vectorizeQuery & searchVectorStore
• Clear trace of vectorise ➜ search ➜ answer

Tip for educators – Because each successive file only adds a single conceptual leap, you can walk learners through the scripts one after another, live-coding small deltas or using git diff to highlight the change.


Suggested Learning Path

  1. Run cag_demo.js to observe basic string matching and discuss its limitations (lexical vs semantic).
  2. Move to cag_demo_with_vector_store.js to show how embeddings plus cosine similarity overcome those limits.
  3. Graduate to cache_augmented_llm.js for a conversation about real-world concerns: external vector stores, plug-able embeddings and caching for latency.
  4. Finish with cache_augmented_llm_with_search.js to underline the standard retrieval pipeline that underpins most production systems.

Why CAG instead of RAG?

Retrieval-Augmented Generation (RAG) fetches documents at query time. CAG pre-loads a carefully selected subset into the model’s context (or fast in-memory vector store), trading memory for speed. This repo lets students experiment with that trade-off before touching heavyweight libraries or cloud services.


Running the demos

All scripts rely only on the Node.js standard library.

# Execute a script
node cache_augmented_llm.js

# View verbose logs for learning
NODE_OPTIONS="--trace-warnings" node cache_augmented_llm_with_search.js

Feel free to modify the contextCache objects, tweak the similarityThreshold, or replace the fake embedding function with your own model to explore further.


Happy learning – and happy caching! 🎉


Repository owner

This repository and all provided assets are maintained by admin@nguyenhongquan.com.

About

Light-weight JavaScript reference implementation and demos of Cache-Augmented Generation (CAG) – pre-loading relevant documents into an LLM’s in-memory cache, adding optional vector-store similarity search and simple ANN optimisation to deliver faster, retrieval-free responses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published