Skip to content

RETprojects/ERGO-Extended

Repository files navigation

ERGO-Extended: Inference-Time Multi-Turn Context Consolidation

Overview

Multi-turn LLMs show great potential in a wide range of domains, and increasingly in long-context deployment. However, they are vulnerable to errors, often caused in part by degraded context, which then propagate throughout a conversation. Existing work reduces the occurrence of errors by giving feedback, retrieving a specific piece of context or external data, or resetting context, but these solutions often lack robustness or may lead to more errors. We propose to improve upon existing work in error correction by detecting more signals of potential for hallucination, consolidating degraded multi-turn context, and using safeguards to ensure that the model is not distracted by irrelevant information and does not hallucinate misinformation. We present a combination of existing methods for utilizing context effectively and efficiently so that the model can provide an appropriate response given the context while minimizing additional cost, latency, complexity, information loss, and risk of hallucination.

We build upon ERGO for inference-time context rewriting, extending it by detecting three signals to trigger context consolidation: Shannon entropy, probability, and perplexity.

Quick Start

Prerequisites

# Clone the repository
git clone https://github.com/RETprojects/ERGO-Extended.git
cd ERGO-Extended
pip install -r requirements.txt
  • To use OpenAI models you will need the environment variable "OPENAI_KEY" to be set to your key.

  • You will need to downloaded the following sharded dataset from Laban et al

Basic Usage

from experiments.runExperiment import RunExperiment

# Initialize experiment with your chosen model
experiment = RunExperiment(
    model_name="HuggingFaceTB/SmolLM-135M-Instruct",
    device="cpu",
    device_map=None,
    max_new_tokens=1000
)

# Run ERGO-Extended on GSM8K dataset
experiment.run_GSM8K(
    dataset_path="sharded_dataset.json", # path to sharded dataset from Laban et al.
    num_Qs=20,
    num_runs=1,
    threshold_H=0.03, 
    threshold_p=-0.1, 
    threshold_PPL=50, 
    output_path="outputs/gsm8k_example.json"
)

Run from root directory:

python -m main.example_main

Repository Structure

ERGO-Extended/
│
├── evaluation/         # Evaluation metrics and scoring
│   └── evaluator.py
|   └── utils.py 
|   └── eval.bfcl.py    # Taken from Laban et al.
│
├── core/               # Core ERGO-Extended implementation
│   ├── dataset.py         
│   ├── model.py          
│   └── utils.py          
│
├── experiments/        # Experiment runner
│   └── runExperiment.py  
│
├── generation/         # Generate with ERGO-Extended
│   └── generator.py
│
└── main/              # Example scripts
    └── example_main.py

Evaluated Tasks

ERGO-Extended has been rigorously tested across three diverse generation tasks:

Task Dataset Description Metric
Math GSM8K Elementary math word problems Exact Match
Code LiveCodeBench Python function generation Test Suite Pass
API Calls Berkeley FCL Function calling from instructions Call Validity

Citation

If you use ERGO-Extended in your research, please cite our paper:

@misc{toutin-etal-2026-ergo-extended,
    title = "ERGO-Extended: Multi-Signal Context Consolidation for Multi-Turn LLMs",
    author = "Toutin, Rémi  and
      Madisetti, Vijay K.",
}

Contact

Lead Author: Rémi Toutin
📧 rtoutin3@gatech.edu

Corresponding Author: Dr. Vijay K. Madisetti
📧 vkm@gatech.edu

Code References

About

Inference-Time Multi-Turn Context Consolidation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages