LangChain in Go vs. LangChain in Python

A comprehensive suite of benchmarks to compare the performance, resource efficiency, and resiliency of LangChainGo against the original LangChain Python implementation under realistic, production-grade workloads.

This repository moves beyond simple "hello world" examples to provide quantitative data on how each framework performs on tasks that matter for building scalable, robust, and cost-effective AI applications. All tests are designed to run against a local Ollama server to isolate the performance of the client-side framework.

Key Metrics Measured

🚀 Throughput (ops/sec)
⏱️ Latency (Average, P95, P99)
💻 CPU Usage (User Time)
🧠 Memory Usage (RSS & Allocation Churn)
🗑️ Garbage Collection Pauses (Go)
💥 Error Handling & Resiliency

🚀 Quick Start: Setup & Run All

For a quick setup, a shell script is provided to install all dependencies.

# Make the setup script executable
chmod +x setup_all.sh

# Run the script to install all Go and Python dependencies
./setup_all.sh

You can then navigate into each scenario's directory and run the benchmarks as described below.

✅ Prerequisites

Before you begin, ensure you have the following installed:

Git: To clone the repository.
Go: Version 1.22 or later.
Python: Version 3.9 or later (with pip).
Docker & Docker Compose: To run the ChromaDB vector store.
Ollama: Installed and running. Download here.

🛠️ Environment Setup

Clone the Repository:

git clone https://github.com/FareedKhan-dev/langchain-go-vs-python.git
cd langchain-go-vs-python

Pull the Ollama Model:
```
ollama pull llama3:8b
```
Start Services: Start the ChromaDB vector store using Docker Compose. This service is required for all RAG-related benchmarks (Scenarios 2 and 4.1).
```
docker-compose up -d
```
Ensure your Ollama server application is running in the background.

Install Dependencies: You can install all dependencies at once using the provided helper script.

setup_all.sh (Create this file in the langchain-go-vs-python root)

#!/bin/bash
set -e
echo "--- Installing all Go and Python dependencies ---"

# Find all go.mod files and run `go mod tidy` in their directories
find . -name "go.mod" -print0 | while IFS= read -r -d $'\0' file; do
    dir=$(dirname "$file")
    echo "Setting up Go dependencies in: $dir"
    (cd "$dir" && go mod tidy)
done

# Find all requirements.txt files and run `pip install`
find . -name "requirements.txt" -print0 | while IFS= read -r -d $'\0' file; do
    dir=$(dirname "$file")
    echo "Setting up Python dependencies in: $dir"
    (cd "$dir" && pip install -r requirements.txt)
done

echo "--- All dependencies installed successfully! ---"

Run it:

chmod +x setup_all.sh
./setup_all.sh

📊 Running the Benchmarks (Step-by-Step)

Navigate into each directory to run the tests.

📡 Scenario 1: Core LLM Interaction

Objective: Measure baseline framework overhead and streaming responsiveness.

1.1: Single-Turn Completion Latency

Go: cd single_turn_latency && go run .
Python: cd single_turn_latency && python latency.py
Analysis: Compares the raw speed of a simple request-response cycle. Expect Go to have slightly lower average latency and a tighter standard deviation, indicating more consistent performance.

1.2: Streaming Time-to-First-Token (TTFT)

Go: cd TTFT_streaming && go run .
Python: cd TTFT_streaming && python ttft.py
Analysis: Measures how quickly the first piece of text appears. This is a critical UX metric. Go's efficient I/O and lightweight concurrency should yield a lower TTFT.

📚 Scenario 2: Retrieval-Augmented Generation (RAG) Pipeline

Objective: Benchmark the full RAG lifecycle, from data ingestion to query answering.

2.1: Ingestion Throughput

Go: cd ingestion_throughput && go run .
Python: cd ingestion_throughput && python ingestion.py
Analysis: This is a heavy I/O and CPU-bound task. Go is expected to be significantly faster, especially in the document splitting phase. Compare the Total time for a clear winner.

2.2 & 2.3: Retrieval and End-to-End Latency

Important: First, populate the database by running the Go ingestion script (ingestion_throughput/ingestion.go) after commenting out the final store.RemoveCollection() line.

Retrieval Latency:
- Go: cd retrieval_latency && go run .
- Python: cd retrieval_latency && python retrieval.py
End-to-End RAG:
- Go: cd end_to_end_rag && go run .
- Python: cd end_to_end_rag && python rag.py
Analysis: Compare the Average Latency in both tests. The cumulative efficiency of Go's framework should result in faster retrieval and a quicker final answer.

🤖 Scenario 3: Agentic Behavior

Objective: Measure the overhead of the agent's "reasoning loop."

Single Tool Call:
- Go: cd agent_single_tool && go run . single
- Python: cd agent_single_tool && python agent.py single
Multi-Hop Reasoning:
- Go: cd agent_multi_hop && go run . multi
- Python: cd agent_multi_hop && python agent.py multi
High-Frequency Tool Calling:
- Go: cd agent_high_frequency && go run . high_freq
- Python: cd agent_high_frequency && python agent.py high_freq
Analysis: Compare the End-to-End Latency and resource metrics (Total Memory Allocated in Go vs. RSS Increase in Python). The performance gap should widen as the number of agent steps increases, highlighting Go's lower loop overhead.

🌐 Scenario 4: Concurrency & Scalability

Objective: Simulate a production server handling simultaneous user requests. This is a critical test of real-world performance.

4.1: Concurrent RAG Queries

Go: cd concurrent_rag && go run .
Python: cd concurrent_rag && python rag.py
Analysis: The key metric is Throughput (ops/sec). Also, compare the P99 Latency, which shows the worst-case performance for users under load. Go's true parallelism is expected to yield dramatically better results here.

4.2: Concurrent Agent Executions

Go: cd concurrent_agents && go run .
Python: cd concurrent_agents && python agent.py
Analysis: This is the ultimate stress test. Compare Throughput and P99 Latency. Expect Go to handle the complex, stateful concurrent load far more efficiently.

🧠 Scenario 5: Long-Term Memory Footprint

Objective: Evaluate resource efficiency for long-running, stateful conversations.

Go: cd memory_footprint && go run .
Python: cd memory_footprint && python memory.py
Analysis: Examine the results table. Compare Go's low Heap Alloc and efficient Total GC Pause time against Python's steadily growing Memory Increase (RSS). This demonstrates long-term stability and lower operational cost.

🛡️ Scenario 6: Resiliency & Error Handling

Objective: Test how gracefully each framework handles common production failures.

6.1: LLM Service Timeout

Go: cd resiliency_timeout && go run .
Python: cd resiliency_timeout && python timeout.py
Analysis: Verify both scripts exit promptly around the 2-second mark and report a timeout error. Note Go's use of the standard context.DeadlineExceeded, a more idiomatic and flexible pattern.

6.2: Agent Tool Failure

Go: cd resiliency_tool_failure && go run .
Python: cd resiliency_tool_failure && python tool_fail.py
Analysis: Observe the different error handling philosophies. Go fails fast, propagating the specific error for programmatic handling. Python's agent internally reasons about the failure and produces a conversational response.

6.3: Output Parsing Failure

Go: cd resiliency_parsing_failure && go run .
Python: cd resiliency_parsing_failure && python parsing.py
Analysis: Both fail correctly, but the Python ecosystem's advantage is the availability of built-in RetryOutputParsers, which provide automatic self-healing.

⛓️ Scenario 7: Complex Workflows

Objective: Measure the internal framework overhead for non-linear logic and data manipulation.

7.1 & 7.2: Routing and Transformation

Go: cd workflow_routing && go run . / cd workflow_transformation && go run .
Python: cd workflow_routing && python routing.py / cd workflow_transformation && python transformation.py
Analysis: The most important metric is Framework Logic Overhead (Go) vs. Process CPU Time Used (Python). This isolates the cost of the framework's "glue code" and shows Go's massive efficiency advantage.

💾 Scenario 8: Data Processing Pipeline

Objective: Deep dive into the performance of CPU- and I/O-intensive data preparation for RAG.

8.1: Text Splitter Throughput

Go: cd data_text_splitting && go run .
Python: cd data_text_splitting && python splitter.py
Analysis: Compare Throughput (MB/s). This is a raw CPU benchmark where Go's compiled nature is expected to provide a near order-of-magnitude advantage.

8.2: Embedding Batching Performance

Go: cd data_embedding_batching && go run .
Python: cd data_embedding_batching && python embedding.py
Analysis: Compare the Throughput (docs/sec) for both the Sequential and Concurrent tests. The performance gap should widen significantly in the concurrent test.

🔭 Scenario 9: Observability Overhead

Objective: Quantify the performance cost of adding production-style logging.

Go: cd observability_overhead && go run .
Python: cd observability_overhead && python observability.py
Analysis: Look at the final summary table. Throughput Degradation (%) shows the real-world cost of turning on monitoring. A lower percentage is drastically better.

🌡️ Scenario 10: GPU Saturation

Objective: Test the client's ability to keep a GPU-powered LLM server fully utilized.

Instructions:

Open a separate terminal and run ollama stats or watch -n 1 nvidia-smi.
Run the benchmark scripts:
- Go: cd gpu_saturation && go run .
- Python: cd gpu_saturation && python saturation.py

Analysis: While monitoring your GPU, compare the final Throughput (req/sec). Higher throughput means the client is more efficient at feeding the GPU. Also, compare the client-side CPU and Memory metrics to see the resource cost of managing the load.

🏆 Summary of Results

After running all scenarios, the results will paint a clear picture of the trade-offs:

Scenario	Key Metric	🏆 Winner	Insight
1. Core	Latency & TTFT	Go (Slight)	Lower framework overhead per call. More responsive streaming.
2. RAG	Ingestion & Query Speed	Go (Significant)	Much faster data prep and lower latency on queries due to cumulative efficiency.
3. Agents	Loop Overhead	Go (Moderate)	More efficient per-step reasoning, advantage grows with task complexity.
4. Concurrency	Throughput & P99	Go (Massive)	The key production differentiator. Handles high user loads far more effectively.
5. Memory	Memory Footprint	Go (Significant)	Drastically lower RAM usage and more efficient GC for long-running stateful apps.
6. Resiliency	Predictability	Go	Idiomatic `context` and explicit error propagation are ideal for microservices.
	Self-Healing	Python	Better built-in tools for automatically recovering from parsing errors.
7. Workflows	Framework Overhead	Go (Massive)	Over 100x lower overhead for internal logic, freeing up CPU for useful work.
8. Data Pipeline	Processing Speed	Go (Massive)	Nearly 10x faster on CPU-bound text splitting. 50%+ higher throughput on concurrent embedding.
9. Observability	Performance Cost	Go (Significant)	Minimal (~5%) overhead for callbacks vs. Python's substantial (~20-40%) performance hit.
10. GPU Saturation	Throughput	Go (Significant)	Feeds the GPU more efficiently with far fewer client-side resources.

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangChain in Go vs. LangChain in Python

Key Metrics Measured

Table of Contents

🚀 Quick Start: Setup & Run All

✅ Prerequisites

🛠️ Environment Setup

📊 Running the Benchmarks (Step-by-Step)

📡 Scenario 1: Core LLM Interaction

1.1: Single-Turn Completion Latency

1.2: Streaming Time-to-First-Token (TTFT)

📚 Scenario 2: Retrieval-Augmented Generation (RAG) Pipeline

2.1: Ingestion Throughput

2.2 & 2.3: Retrieval and End-to-End Latency

🤖 Scenario 3: Agentic Behavior

🌐 Scenario 4: Concurrency & Scalability

4.1: Concurrent RAG Queries

4.2: Concurrent Agent Executions

🧠 Scenario 5: Long-Term Memory Footprint

🛡️ Scenario 6: Resiliency & Error Handling

6.1: LLM Service Timeout

6.2: Agent Tool Failure

6.3: Output Parsing Failure

⛓️ Scenario 7: Complex Workflows

7.1 & 7.2: Routing and Transformation

💾 Scenario 8: Data Processing Pipeline

8.1: Text Splitter Throughput

8.2: Embedding Batching Performance

🔭 Scenario 9: Observability Overhead

🌡️ Scenario 10: GPU Saturation

🏆 Summary of Results

📜 License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
TTFT_streaming		TTFT_streaming
agent_high_frequency		agent_high_frequency
agent_multi_hop		agent_multi_hop
agent_single_tool		agent_single_tool
concurrent_agents		concurrent_agents
concurrent_rag		concurrent_rag
data		data
data_embedding_batching		data_embedding_batching
data_large		data_large
data_text_splitting		data_text_splitting
end_to_end_rag		end_to_end_rag
gpu_saturation		gpu_saturation
ingestion_throughput		ingestion_throughput
memory_footprint		memory_footprint
observability_overhead		observability_overhead
resiliency_parsing_failure		resiliency_parsing_failure
resiliency_timeout		resiliency_timeout
resiliency_tool_failure		resiliency_tool_failure
retrieval_latency		retrieval_latency
single_turn_latency		single_turn_latency
workflow_routing		workflow_routing
workflow_transformation		workflow_transformation
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

FareedKhan-dev/langchain-go-vs-python

Folders and files

Latest commit

History

Repository files navigation

LangChain in Go vs. LangChain in Python

Key Metrics Measured

Table of Contents

🚀 Quick Start: Setup & Run All

✅ Prerequisites

🛠️ Environment Setup

📊 Running the Benchmarks (Step-by-Step)

📡 Scenario 1: Core LLM Interaction

1.1: Single-Turn Completion Latency

1.2: Streaming Time-to-First-Token (TTFT)

📚 Scenario 2: Retrieval-Augmented Generation (RAG) Pipeline

2.1: Ingestion Throughput

2.2 & 2.3: Retrieval and End-to-End Latency

🤖 Scenario 3: Agentic Behavior

🌐 Scenario 4: Concurrency & Scalability

4.1: Concurrent RAG Queries

4.2: Concurrent Agent Executions

🧠 Scenario 5: Long-Term Memory Footprint

🛡️ Scenario 6: Resiliency & Error Handling

6.1: LLM Service Timeout

6.2: Agent Tool Failure

6.3: Output Parsing Failure

⛓️ Scenario 7: Complex Workflows

7.1 & 7.2: Routing and Transformation

💾 Scenario 8: Data Processing Pipeline

8.1: Text Splitter Throughput

8.2: Embedding Batching Performance

🔭 Scenario 9: Observability Overhead

🌡️ Scenario 10: GPU Saturation

🏆 Summary of Results

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages