Skip to content

FareedKhan-dev/langchain-go-vs-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LangChain in Go vs. LangChain in Python

Go Version Python Version License Tests

A comprehensive suite of benchmarks to compare the performance, resource efficiency, and resiliency of LangChainGo against the original LangChain Python implementation under realistic, production-grade workloads.


This repository moves beyond simple "hello world" examples to provide quantitative data on how each framework performs on tasks that matter for building scalable, robust, and cost-effective AI applications. All tests are designed to run against a local Ollama server to isolate the performance of the client-side framework.

Key Metrics Measured

  • πŸš€ Throughput (ops/sec)
  • ⏱️ Latency (Average, P95, P99)
  • πŸ’» CPU Usage (User Time)
  • 🧠 Memory Usage (RSS & Allocation Churn)
  • πŸ—‘οΈ Garbage Collection Pauses (Go)
  • πŸ’₯ Error Handling & Resiliency

Table of Contents

πŸš€ Quick Start: Setup & Run All

For a quick setup, a shell script is provided to install all dependencies.

# Make the setup script executable
chmod +x setup_all.sh

# Run the script to install all Go and Python dependencies
./setup_all.sh

You can then navigate into each scenario's directory and run the benchmarks as described below.

βœ… Prerequisites

Before you begin, ensure you have the following installed:

  1. Git: To clone the repository.
  2. Go: Version 1.22 or later.
  3. Python: Version 3.9 or later (with pip).
  4. Docker & Docker Compose: To run the ChromaDB vector store.
  5. Ollama: Installed and running. Download here.

πŸ› οΈ Environment Setup

  1. Clone the Repository:

    git clone https://github.com/FareedKhan-dev/langchain-go-vs-python.git
    cd langchain-go-vs-python
  2. Pull the Ollama Model:

    ollama pull llama3:8b
  3. Start Services: Start the ChromaDB vector store using Docker Compose. This service is required for all RAG-related benchmarks (Scenarios 2 and 4.1).

    docker-compose up -d

    Ensure your Ollama server application is running in the background.

  4. Install Dependencies: You can install all dependencies at once using the provided helper script.

    setup_all.sh (Create this file in the langchain-go-vs-python root)

    #!/bin/bash
    set -e
    echo "--- Installing all Go and Python dependencies ---"
    
    # Find all go.mod files and run `go mod tidy` in their directories
    find . -name "go.mod" -print0 | while IFS= read -r -d $'\0' file; do
        dir=$(dirname "$file")
        echo "Setting up Go dependencies in: $dir"
        (cd "$dir" && go mod tidy)
    done
    
    # Find all requirements.txt files and run `pip install`
    find . -name "requirements.txt" -print0 | while IFS= read -r -d $'\0' file; do
        dir=$(dirname "$file")
        echo "Setting up Python dependencies in: $dir"
        (cd "$dir" && pip install -r requirements.txt)
    done
    
    echo "--- All dependencies installed successfully! ---"

    Run it:

    chmod +x setup_all.sh
    ./setup_all.sh

πŸ“Š Running the Benchmarks (Step-by-Step)

Navigate into each directory to run the tests.


πŸ“‘ Scenario 1: Core LLM Interaction

Objective: Measure baseline framework overhead and streaming responsiveness.

1.1: Single-Turn Completion Latency

  • Go: cd single_turn_latency && go run .
  • Python: cd single_turn_latency && python latency.py
  • Analysis: Compares the raw speed of a simple request-response cycle. Expect Go to have slightly lower average latency and a tighter standard deviation, indicating more consistent performance.

1.2: Streaming Time-to-First-Token (TTFT)

  • Go: cd TTFT_streaming && go run .
  • Python: cd TTFT_streaming && python ttft.py
  • Analysis: Measures how quickly the first piece of text appears. This is a critical UX metric. Go's efficient I/O and lightweight concurrency should yield a lower TTFT.

πŸ“š Scenario 2: Retrieval-Augmented Generation (RAG) Pipeline

Objective: Benchmark the full RAG lifecycle, from data ingestion to query answering.

2.1: Ingestion Throughput

  • Go: cd ingestion_throughput && go run .
  • Python: cd ingestion_throughput && python ingestion.py
  • Analysis: This is a heavy I/O and CPU-bound task. Go is expected to be significantly faster, especially in the document splitting phase. Compare the Total time for a clear winner.

2.2 & 2.3: Retrieval and End-to-End Latency

Important: First, populate the database by running the Go ingestion script (ingestion_throughput/ingestion.go) after commenting out the final store.RemoveCollection() line.

  • Retrieval Latency:
    • Go: cd retrieval_latency && go run .
    • Python: cd retrieval_latency && python retrieval.py
  • End-to-End RAG:
    • Go: cd end_to_end_rag && go run .
    • Python: cd end_to_end_rag && python rag.py
  • Analysis: Compare the Average Latency in both tests. The cumulative efficiency of Go's framework should result in faster retrieval and a quicker final answer.

πŸ€– Scenario 3: Agentic Behavior

Objective: Measure the overhead of the agent's "reasoning loop."

  • Single Tool Call:
    • Go: cd agent_single_tool && go run . single
    • Python: cd agent_single_tool && python agent.py single
  • Multi-Hop Reasoning:
    • Go: cd agent_multi_hop && go run . multi
    • Python: cd agent_multi_hop && python agent.py multi
  • High-Frequency Tool Calling:
    • Go: cd agent_high_frequency && go run . high_freq
    • Python: cd agent_high_frequency && python agent.py high_freq
  • Analysis: Compare the End-to-End Latency and resource metrics (Total Memory Allocated in Go vs. RSS Increase in Python). The performance gap should widen as the number of agent steps increases, highlighting Go's lower loop overhead.

🌐 Scenario 4: Concurrency & Scalability

Objective: Simulate a production server handling simultaneous user requests. This is a critical test of real-world performance.

4.1: Concurrent RAG Queries

  • Go: cd concurrent_rag && go run .
  • Python: cd concurrent_rag && python rag.py
  • Analysis: The key metric is Throughput (ops/sec). Also, compare the P99 Latency, which shows the worst-case performance for users under load. Go's true parallelism is expected to yield dramatically better results here.

4.2: Concurrent Agent Executions

  • Go: cd concurrent_agents && go run .
  • Python: cd concurrent_agents && python agent.py
  • Analysis: This is the ultimate stress test. Compare Throughput and P99 Latency. Expect Go to handle the complex, stateful concurrent load far more efficiently.

🧠 Scenario 5: Long-Term Memory Footprint

Objective: Evaluate resource efficiency for long-running, stateful conversations.

  • Go: cd memory_footprint && go run .
  • Python: cd memory_footprint && python memory.py
  • Analysis: Examine the results table. Compare Go's low Heap Alloc and efficient Total GC Pause time against Python's steadily growing Memory Increase (RSS). This demonstrates long-term stability and lower operational cost.

πŸ›‘οΈ Scenario 6: Resiliency & Error Handling

Objective: Test how gracefully each framework handles common production failures.

6.1: LLM Service Timeout

  • Go: cd resiliency_timeout && go run .
  • Python: cd resiliency_timeout && python timeout.py
  • Analysis: Verify both scripts exit promptly around the 2-second mark and report a timeout error. Note Go's use of the standard context.DeadlineExceeded, a more idiomatic and flexible pattern.

6.2: Agent Tool Failure

  • Go: cd resiliency_tool_failure && go run .
  • Python: cd resiliency_tool_failure && python tool_fail.py
  • Analysis: Observe the different error handling philosophies. Go fails fast, propagating the specific error for programmatic handling. Python's agent internally reasons about the failure and produces a conversational response.

6.3: Output Parsing Failure

  • Go: cd resiliency_parsing_failure && go run .
  • Python: cd resiliency_parsing_failure && python parsing.py
  • Analysis: Both fail correctly, but the Python ecosystem's advantage is the availability of built-in RetryOutputParsers, which provide automatic self-healing.

⛓️ Scenario 7: Complex Workflows

Objective: Measure the internal framework overhead for non-linear logic and data manipulation.

7.1 & 7.2: Routing and Transformation

  • Go: cd workflow_routing && go run . / cd workflow_transformation && go run .
  • Python: cd workflow_routing && python routing.py / cd workflow_transformation && python transformation.py
  • Analysis: The most important metric is Framework Logic Overhead (Go) vs. Process CPU Time Used (Python). This isolates the cost of the framework's "glue code" and shows Go's massive efficiency advantage.

πŸ’Ύ Scenario 8: Data Processing Pipeline

Objective: Deep dive into the performance of CPU- and I/O-intensive data preparation for RAG.

8.1: Text Splitter Throughput

  • Go: cd data_text_splitting && go run .
  • Python: cd data_text_splitting && python splitter.py
  • Analysis: Compare Throughput (MB/s). This is a raw CPU benchmark where Go's compiled nature is expected to provide a near order-of-magnitude advantage.

8.2: Embedding Batching Performance

  • Go: cd data_embedding_batching && go run .
  • Python: cd data_embedding_batching && python embedding.py
  • Analysis: Compare the Throughput (docs/sec) for both the Sequential and Concurrent tests. The performance gap should widen significantly in the concurrent test.

πŸ”­ Scenario 9: Observability Overhead

Objective: Quantify the performance cost of adding production-style logging.

  • Go: cd observability_overhead && go run .
  • Python: cd observability_overhead && python observability.py
  • Analysis: Look at the final summary table. Throughput Degradation (%) shows the real-world cost of turning on monitoring. A lower percentage is drastically better.

🌑️ Scenario 10: GPU Saturation

Objective: Test the client's ability to keep a GPU-powered LLM server fully utilized.

Instructions:

  1. Open a separate terminal and run ollama stats or watch -n 1 nvidia-smi.
  2. Run the benchmark scripts:
    • Go: cd gpu_saturation && go run .
    • Python: cd gpu_saturation && python saturation.py
  • Analysis: While monitoring your GPU, compare the final Throughput (req/sec). Higher throughput means the client is more efficient at feeding the GPU. Also, compare the client-side CPU and Memory metrics to see the resource cost of managing the load.

πŸ† Summary of Results

After running all scenarios, the results will paint a clear picture of the trade-offs:

Scenario Key Metric πŸ† Winner Insight
1. Core Latency & TTFT Go (Slight) Lower framework overhead per call. More responsive streaming.
2. RAG Ingestion & Query Speed Go (Significant) Much faster data prep and lower latency on queries due to cumulative efficiency.
3. Agents Loop Overhead Go (Moderate) More efficient per-step reasoning, advantage grows with task complexity.
4. Concurrency Throughput & P99 Go (Massive) The key production differentiator. Handles high user loads far more effectively.
5. Memory Memory Footprint Go (Significant) Drastically lower RAM usage and more efficient GC for long-running stateful apps.
6. Resiliency Predictability Go Idiomatic context and explicit error propagation are ideal for microservices.
Self-Healing Python Better built-in tools for automatically recovering from parsing errors.
7. Workflows Framework Overhead Go (Massive) Over 100x lower overhead for internal logic, freeing up CPU for useful work.
8. Data Pipeline Processing Speed Go (Massive) Nearly 10x faster on CPU-bound text splitting. 50%+ higher throughput on concurrent embedding.
9. Observability Performance Cost Go (Significant) Minimal (~5%) overhead for callbacks vs. Python's substantial (~20-40%) performance hit.
10. GPU Saturation Throughput Go (Significant) Feeds the GPU more efficiently with far fewer client-side resources.

πŸ“œ License

This project is licensed under the MIT License. See the LICENSE file for details.