A comprehensive suite of benchmarks to compare the performance, resource efficiency, and resiliency of LangChainGo against the original LangChain Python implementation under realistic, production-grade workloads.
This repository moves beyond simple "hello world" examples to provide quantitative data on how each framework performs on tasks that matter for building scalable, robust, and cost-effective AI applications. All tests are designed to run against a local Ollama server to isolate the performance of the client-side framework.
- π Throughput (ops/sec)
- β±οΈ Latency (Average, P95, P99)
- π» CPU Usage (User Time)
- π§ Memory Usage (RSS & Allocation Churn)
- ποΈ Garbage Collection Pauses (Go)
- π₯ Error Handling & Resiliency
- LangChain in Go vs. LangChain in Python
- Key Metrics Measured
- Table of Contents
- π Quick Start: Setup & Run All
- β Prerequisites
- π οΈ Environment Setup
- π Running the Benchmarks (Step-by-Step)
- π‘ Scenario 1: Core LLM Interaction
- π Scenario 2: Retrieval-Augmented Generation (RAG) Pipeline
- π€ Scenario 3: Agentic Behavior
- π Scenario 4: Concurrency & Scalability
- π§ Scenario 5: Long-Term Memory Footprint
- π‘οΈ Scenario 6: Resiliency & Error Handling
- βοΈ Scenario 7: Complex Workflows
- πΎ Scenario 8: Data Processing Pipeline
- π Scenario 9: Observability Overhead
- π‘οΈ Scenario 10: GPU Saturation
- π Summary of Results
- π License
For a quick setup, a shell script is provided to install all dependencies.
# Make the setup script executable
chmod +x setup_all.sh
# Run the script to install all Go and Python dependencies
./setup_all.sh
You can then navigate into each scenario's directory and run the benchmarks as described below.
Before you begin, ensure you have the following installed:
- Git: To clone the repository.
- Go: Version 1.22 or later.
- Python: Version 3.9 or later (with
pip
). - Docker & Docker Compose: To run the ChromaDB vector store.
- Ollama: Installed and running. Download here.
-
Clone the Repository:
git clone https://github.com/FareedKhan-dev/langchain-go-vs-python.git cd langchain-go-vs-python
-
Pull the Ollama Model:
ollama pull llama3:8b
-
Start Services: Start the ChromaDB vector store using Docker Compose. This service is required for all RAG-related benchmarks (Scenarios 2 and 4.1).
docker-compose up -d
Ensure your Ollama server application is running in the background.
-
Install Dependencies: You can install all dependencies at once using the provided helper script.
setup_all.sh
(Create this file in thelangchain-go-vs-python
root)#!/bin/bash set -e echo "--- Installing all Go and Python dependencies ---" # Find all go.mod files and run `go mod tidy` in their directories find . -name "go.mod" -print0 | while IFS= read -r -d $'\0' file; do dir=$(dirname "$file") echo "Setting up Go dependencies in: $dir" (cd "$dir" && go mod tidy) done # Find all requirements.txt files and run `pip install` find . -name "requirements.txt" -print0 | while IFS= read -r -d $'\0' file; do dir=$(dirname "$file") echo "Setting up Python dependencies in: $dir" (cd "$dir" && pip install -r requirements.txt) done echo "--- All dependencies installed successfully! ---"
Run it:
chmod +x setup_all.sh ./setup_all.sh
Navigate into each directory to run the tests.
Objective: Measure baseline framework overhead and streaming responsiveness.
- Go:
cd single_turn_latency && go run .
- Python:
cd single_turn_latency && python latency.py
- Analysis: Compares the raw speed of a simple request-response cycle. Expect Go to have slightly lower average latency and a tighter standard deviation, indicating more consistent performance.
- Go:
cd TTFT_streaming && go run .
- Python:
cd TTFT_streaming && python ttft.py
- Analysis: Measures how quickly the first piece of text appears. This is a critical UX metric. Go's efficient I/O and lightweight concurrency should yield a lower TTFT.
Objective: Benchmark the full RAG lifecycle, from data ingestion to query answering.
- Go:
cd ingestion_throughput && go run .
- Python:
cd ingestion_throughput && python ingestion.py
- Analysis: This is a heavy I/O and CPU-bound task. Go is expected to be significantly faster, especially in the document splitting phase. Compare the
Total time
for a clear winner.
Important: First, populate the database by running the Go ingestion script (ingestion_throughput/ingestion.go
) after commenting out the final store.RemoveCollection()
line.
- Retrieval Latency:
- Go:
cd retrieval_latency && go run .
- Python:
cd retrieval_latency && python retrieval.py
- Go:
- End-to-End RAG:
- Go:
cd end_to_end_rag && go run .
- Python:
cd end_to_end_rag && python rag.py
- Go:
- Analysis: Compare the
Average Latency
in both tests. The cumulative efficiency of Go's framework should result in faster retrieval and a quicker final answer.
Objective: Measure the overhead of the agent's "reasoning loop."
- Single Tool Call:
- Go:
cd agent_single_tool && go run . single
- Python:
cd agent_single_tool && python agent.py single
- Go:
- Multi-Hop Reasoning:
- Go:
cd agent_multi_hop && go run . multi
- Python:
cd agent_multi_hop && python agent.py multi
- Go:
- High-Frequency Tool Calling:
- Go:
cd agent_high_frequency && go run . high_freq
- Python:
cd agent_high_frequency && python agent.py high_freq
- Go:
- Analysis: Compare the
End-to-End Latency
and resource metrics (Total Memory Allocated
in Go vs.RSS Increase
in Python). The performance gap should widen as the number of agent steps increases, highlighting Go's lower loop overhead.
Objective: Simulate a production server handling simultaneous user requests. This is a critical test of real-world performance.
- Go:
cd concurrent_rag && go run .
- Python:
cd concurrent_rag && python rag.py
- Analysis: The key metric is Throughput (ops/sec). Also, compare the P99 Latency, which shows the worst-case performance for users under load. Go's true parallelism is expected to yield dramatically better results here.
- Go:
cd concurrent_agents && go run .
- Python:
cd concurrent_agents && python agent.py
- Analysis: This is the ultimate stress test. Compare Throughput and P99 Latency. Expect Go to handle the complex, stateful concurrent load far more efficiently.
Objective: Evaluate resource efficiency for long-running, stateful conversations.
- Go:
cd memory_footprint && go run .
- Python:
cd memory_footprint && python memory.py
- Analysis: Examine the results table. Compare Go's low
Heap Alloc
and efficientTotal GC Pause
time against Python's steadily growingMemory Increase (RSS)
. This demonstrates long-term stability and lower operational cost.
Objective: Test how gracefully each framework handles common production failures.
- Go:
cd resiliency_timeout && go run .
- Python:
cd resiliency_timeout && python timeout.py
- Analysis: Verify both scripts exit promptly around the 2-second mark and report a timeout error. Note Go's use of the standard
context.DeadlineExceeded
, a more idiomatic and flexible pattern.
- Go:
cd resiliency_tool_failure && go run .
- Python:
cd resiliency_tool_failure && python tool_fail.py
- Analysis: Observe the different error handling philosophies. Go fails fast, propagating the specific error for programmatic handling. Python's agent internally reasons about the failure and produces a conversational response.
- Go:
cd resiliency_parsing_failure && go run .
- Python:
cd resiliency_parsing_failure && python parsing.py
- Analysis: Both fail correctly, but the Python ecosystem's advantage is the availability of built-in
RetryOutputParser
s, which provide automatic self-healing.
Objective: Measure the internal framework overhead for non-linear logic and data manipulation.
- Go:
cd workflow_routing && go run .
/cd workflow_transformation && go run .
- Python:
cd workflow_routing && python routing.py
/cd workflow_transformation && python transformation.py
- Analysis: The most important metric is Framework Logic Overhead (Go) vs. Process CPU Time Used (Python). This isolates the cost of the framework's "glue code" and shows Go's massive efficiency advantage.
Objective: Deep dive into the performance of CPU- and I/O-intensive data preparation for RAG.
- Go:
cd data_text_splitting && go run .
- Python:
cd data_text_splitting && python splitter.py
- Analysis: Compare Throughput (MB/s). This is a raw CPU benchmark where Go's compiled nature is expected to provide a near order-of-magnitude advantage.
- Go:
cd data_embedding_batching && go run .
- Python:
cd data_embedding_batching && python embedding.py
- Analysis: Compare the Throughput (docs/sec) for both the Sequential and Concurrent tests. The performance gap should widen significantly in the concurrent test.
Objective: Quantify the performance cost of adding production-style logging.
- Go:
cd observability_overhead && go run .
- Python:
cd observability_overhead && python observability.py
- Analysis: Look at the final summary table. Throughput Degradation (%) shows the real-world cost of turning on monitoring. A lower percentage is drastically better.
Objective: Test the client's ability to keep a GPU-powered LLM server fully utilized.
Instructions:
- Open a separate terminal and run
ollama stats
orwatch -n 1 nvidia-smi
. - Run the benchmark scripts:
- Go:
cd gpu_saturation && go run .
- Python:
cd gpu_saturation && python saturation.py
- Go:
- Analysis: While monitoring your GPU, compare the final Throughput (req/sec). Higher throughput means the client is more efficient at feeding the GPU. Also, compare the client-side CPU and Memory metrics to see the resource cost of managing the load.
After running all scenarios, the results will paint a clear picture of the trade-offs:
Scenario | Key Metric | π Winner | Insight |
---|---|---|---|
1. Core | Latency & TTFT | Go (Slight) | Lower framework overhead per call. More responsive streaming. |
2. RAG | Ingestion & Query Speed | Go (Significant) | Much faster data prep and lower latency on queries due to cumulative efficiency. |
3. Agents | Loop Overhead | Go (Moderate) | More efficient per-step reasoning, advantage grows with task complexity. |
4. Concurrency | Throughput & P99 | Go (Massive) | The key production differentiator. Handles high user loads far more effectively. |
5. Memory | Memory Footprint | Go (Significant) | Drastically lower RAM usage and more efficient GC for long-running stateful apps. |
6. Resiliency | Predictability | Go | Idiomatic context and explicit error propagation are ideal for microservices. |
Self-Healing | Python | Better built-in tools for automatically recovering from parsing errors. | |
7. Workflows | Framework Overhead | Go (Massive) | Over 100x lower overhead for internal logic, freeing up CPU for useful work. |
8. Data Pipeline | Processing Speed | Go (Massive) | Nearly 10x faster on CPU-bound text splitting. 50%+ higher throughput on concurrent embedding. |
9. Observability | Performance Cost | Go (Significant) | Minimal (~5%) overhead for callbacks vs. Python's substantial (~20-40%) performance hit. |
10. GPU Saturation | Throughput | Go (Significant) | Feeds the GPU more efficiently with far fewer client-side resources. |
This project is licensed under the MIT License. See the LICENSE file for details.