Skip to content

MegaSF/Week4_AI_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Signal Relay: Multi-Agent Telephone

Measure how meaning decays across a chain of LLM agents and evaluate error-correction strategies that preserve fidelity.

Project Structure

FinalProject/
├── main.py                      # CLI entry point
├── config.py                    # Configuration dataclasses
├── requirements.txt             # Python dependencies
├── Knowledge/
│   └── instructions.md          # Experiment design document
├── signal_relay/
│   ├── __init__.py
│   ├── schema.py                # Message, HopRecord data models
│   ├── prompts.py               # Relay prompt templates
│   ├── relay.py                 # RelayChain engine (LLM calls)
│   ├── metrics.py               # Scoring & fidelity metrics
│   ├── experiment.py            # Experiment runner & batch matrix
│   ├── tasks.py                 # Pre-built baseline task messages
│   └── visualize.py             # Decay curves & comparison plots
└── experiments/                 # Auto-created output directory
    └── <run_id>/
        ├── original.yaml
        ├── hop_01.yaml … hop_NN.yaml
        ├── metrics.csv
        ├── decay_curve.png
        └── run_meta.json

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Set your API key

export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."

3. List available tasks

python main.py tasks

4. Run a single experiment

# Baseline relay, 5 hops, solar flare task
python main.py run --task solar_flare --mode baseline --hops 5

# Error-corrected relay, 7 hops
python main.py run --task solar_flare --mode error_corrected --hops 7

# With periodic repair prompts every 3 hops
python main.py run --task recipe --mode error_corrected --hops 10 --repair

5. Run the full experiment matrix

# All tasks × both modes × depths 3,5,7,10
python main.py matrix

# Custom subset
python main.py matrix --tasks solar_flare,recipe --modes baseline,error_corrected --depths 3,5,7

6. Plot & compare results

# Plot a single run
python main.py plot --run-dir experiments/<run_id>

# Compare multiple runs
python main.py compare --run-dirs experiments/run1,experiments/run2 --labels "Baseline,Error-Corrected"

Configuration Options

Flag Default Description
--provider openai LLM provider (openai, anthropic)
--model gpt-4o-mini Model name
--temperature 0.0 Sampling temperature (0 = deterministic)
--seed 42 Random seed for reproducibility
--output-dir experiments Base output directory

Metrics

Each hop is scored against the original message:

Metric Weight Description
Constraint Fidelity 0.4 % of constraints preserved exactly
Keyword Retention 0.3 % of checksum keywords still present
Item Retention 0.2 % of content items retained (fuzzy match ≥ 0.7)
Order Preservation 0.1 Longest common subsequence of item IDs
Overall Fidelity Weighted aggregate of the above

Additional tracked metrics: edit distance ratio, hallucination count, number retention.

Modes

  • Baseline — Simple "rewrite for the next agent" instruction
  • Error-Corrected — Strict fidelity rules + self-check verification
  • Repair (optional) — Periodic drift-correction prompt every N hops

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages