# RemyxCodeExecutor: Run Research Papers in 30 Seconds

**Search → Execute → Explore** research papers with pre-built Docker environments and AI-guided code execution.

## The Reproducibility Problem

Running code from research papers is a dependency resolution and environment configuration problem:
```python
# Standard approach:
git clone https://github.com/author/paper-code
conda create -n paper python=3.8
pip install -r requirements.txt  
# 30+ minutes of downloading dependencies
# CUDA 11.3 required, you have 12.1
# PyTorch 1.9 conflicts with transformers>=4.0
# Missing apt packages, system libraries
# Papers don't specify exact environment
# Code written for one GPU, you have different hardware
```
Even when authors provide installation instructions, you're resolving:

* Conflicting transitive dependencies
* CUDA/cuDNN version matching
* System-level library requirements
* Hardware-specific configurations
* Undocumented environment assumptions

**Hours of environment debugging before running a single experiment.**

## What RemyxCodeExecutor Provides
RemyxCodeExecutor gives you executable Docker images for 1000+ ArXiv papers where:

* All dependencies are pre-installed and version-locked
* CUDA/system libraries are pre-configured
* Environments are tested and reproducible
* Code execution is instrumented for programmatic access

Integration with AG2 (AutoGen) adds:

* Programmatic code execution in isolated containers
* AI agents that can read, explain, and modify paper code
* Interactive and batch execution modes
* Structured exploration of research codebases

This notebook demonstrates:

* Searching Remyx's paper index for Docker-enabled papers
* Instantiating RemyxCodeExecutor with a paper's environment
* Using AG2 agents to explore and run experiments programmatically

**Replace hours of dependency resolution with API calls to pre-built environments and accelerate your research and development.**

## Prerequisites

Install AG2 with Remyx support:
```bash
pip install ag2[remyx]
```

Make sure you have the following dependencies:
* [Remyx AI API key](https://docs.remyx.ai/cli)
* [Docker](https://www.docker.com/get-started/)
* [OpenAI API key for LLM agents](https://platform.openai.com/api-keys)

Set your API tokens as environment variables:

```bash
export REMYXAI_API_KEY=your_remyxai_token
export OPENAI_API_KEY=your_openai_key
```

In [None]:
import os

# Ensure you have your API tokens set
assert os.getenv("REMYXAI_API_KEY"), "Please set REMYXAI_API_KEY environment variable"
assert os.getenv("OPENAI_API_KEY"), "Please set OPENAI_API_KEY environment variable

## Step 1: Discover Papers

Search 1000+ research papers with pre-built Docker environments:

In [None]:
from remyxai.client.search import SearchClient

client = SearchClient()

# Search for papers
papers = client.search(
    query="CLIP semantic alignment",
    has_docker=True,
    max_results=5
)

# Browse results
for paper in papers:
    print(f"📖 {paper.title[:50]}...")
    print(f"   arXiv: {paper.arxiv_id}")
    print(f"   image: {paper.docker_image}")
    print(f"   abstract: {paper.abstract}\n")

You'll see results like:
```python
📖 CLIPin: A Non-contrastive Plug-in to CLIP for Mult...
   arXiv: 2508.06434v1
   image: remyxai/2508.06434v1:latest
   abstract: Large-scale natural image-text datasets, especially those automatically
collected from the web, often suffer from loose semantic alignment due to weak
supervision, while medical datasets tend to have high cross-modal correlation
but low content diversity. These properties pose a common challenge for
contrastive language-image pretraining (CLIP): they hinder the model's ability
to learn robust and generalizable representations. In this work, we propose
CLIPin, a unified non-contrastive plug-in th

📖 COOkeD: Ensemble-based OOD detection in the era of...
   arXiv: 2507.22576v1
   image: remyxai/2507.22576v1:latest
   abstract: Out-of-distribution (OOD) detection is an important building block in
trustworthy image recognition systems as unknown classes may arise at
test-time. OOD detection methods typically revolve around a single classifier,
leading to a split in the research field between the classical supervised
setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot
setting (class names fed as prompts to CLIP). In both cases, an overarching
challenge is that the OOD detection performance is implici

📖 Mammo-CLIP Dissect: A Framework for Analysing Mamm...
   arXiv: 2509.21102v1
   image: remyxai/2509.21102v1:latest
   abstract: Understanding what deep learning (DL) models learn is essential for the safe
deployment of artificial intelligence (AI) in clinical settings. While previous
work has focused on pixel-based explainability methods, less attention has been
paid to the textual concepts learned by these models, which may better reflect
the reasoning used by clinicians. We introduce Mammo-CLIP Dissect, the first
concept-based explainability framework for systematically dissecting DL vision
models trained for mammograp

📖 CLASP: General-Purpose Clothes Manipulation with S...
   arXiv: 2507.19983v1
   image: remyxai/2507.19983v1:latest
   abstract: Clothes manipulation, such as folding or hanging, is a critical capability
for home service robots. Despite recent advances, most existing methods remain
limited to specific tasks and clothes types, due to the complex,
high-dimensional geometry of clothes. This paper presents CLothes mAnipulation
with Semantic keyPoints (CLASP), which aims at general-purpose clothes
manipulation over different clothes types, T-shirts, shorts, skirts, long
dresses, ... , as well as different tasks, folding, flatt

📖 Personalized Education with Ranking Alignment Reco...
   arXiv: 2507.23664v1
   image: remyxai/2507.23664v1:latest
   abstract: Personalized question recommendation aims to guide individual students
through questions to enhance their mastery of learning targets. Most previous
methods model this task as a Markov Decision Process and use reinforcement
learning to solve, but they struggle with efficient exploration, failing to
identify the best questions for each student during training. To address this,
we propose Ranking Alignment Recommendation (RAR), which incorporates
collaborative ideas into the exploration mechanism,
```

## Step 2: Fast Exploration

You can quickly explore the contents of the codebase and environment using the `explore()` method of the `RemyxCodeExecutor`.

How it works:

1. Pulls Docker image with paper's code and dependencies
2. Creates AI agents (one explores, one executes)
3. Interactive session starts - you guide the exploration
4. Ask free form questions about the code, create your own tests, and expand upon the research!

You can launch an interactive session where you are able to chat with the system of agents or run automatically without pausing to run default tests and exploration.

### Quick Start (Default Exploration)

In [None]:
from autogen.coding import RemyxCodeExecutor

executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")
executor.explore()

### Batch Mode (Automated)

In [None]:
# Runs automatically without pausing
executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")
result = executor.explore(
    goal="Run the default example quickstart",
    interactive=False,
    max_turns=20
)

print(f"✅ Completed {len(result.chat_history)} steps")

## Real-world Example: Exploring CLIPin
Let's explore the [CLIPin paper](https://arxiv.org/pdf/2508.06434) - a method that improves CLIP's semantic alignment using non-contrastive learning.


In [None]:
from autogen.coding import RemyxCodeExecutor

# Create executor for CLIPin paper
executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")

# Start interactive exploration
result = executor.explore(
    goal="""Explore CLIPin step-by-step:

    Phase 1: Understanding
    - Show repository structure
    - Read the README
    - Find the CLIPin model code

    Phase 2: Architecture
    - Explain the non-contrastive approach
    - Show the loss function
    - Compare with standard CLIP

    Phase 3: Demo
    - Load a model
    - Run inference example

    Work step-by-step. Explain clearly.
    """,
    interactive=False
)

You can expect the output after multiple turns to look like:
```python
...

exitcode: 0 (execution succeeded)
Code output: Dummy Similarity Scores: tensor([[1.]])


--------------------------------------------------------------------------------
research_explorer (to code_executor):

The inference completed successfully, and we generated dummy similarity scores! The output indicates that the similarity between the randomly generated image features and text features resulted in a score of `1`. This output was produced using placeholder data since we didn't load actual model weights.

### Summary of What We Completed:
1. **Explored Repository**: We examined the repository structure, README, and model code.
2. **Learned About CLIPin**: We discussed the non-contrastive nature of CLIPin and its loss functions.
3. **Set Up Inference**: We created a placeholder setup to demonstrate the inference process, including generating a dummy image and running the model's inference logic.

Would you like to analyze anything further related to CLIPin, or is there another question you have in mind?

--------------------------------------------------------------------------------
```

## Building on Research
Use paper code as starting point for your own projects and research

In [None]:
from autogen import ConversableAgent
from autogen.coding import RemyxCodeExecutor

# Start with paper's environment
executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")

# Create your own agent for custom experiments
agent = ConversableAgent(
    "my_researcher",
    llm_config=False,
    code_execution_config={"executor": executor},
    human_input_mode="NEVER"
)

# Run your custom code in paper's environment
agent.generate_reply(messages=[{
    "role": "user",
    "content": """```python
# Your custom experiment here
from clip.model import CLIPin
model = CLIPin.load_pretrained()
# ... your modifications ...
```"""
}])

## Advanced Features

### Custom Docker Args
You can pass additional args to `container_create_kwargs` for further customization and configuration of containers like passing additional environment variables or switching to a GPU enabled container runtime.

In [None]:
executor = RemyxCodeExecutor(
    arxiv_id="2508.06434v1",
    timeout=600,
    container_create_kwargs={
        "environment": {
            "HF_TOKEN": os.getenv("HF_TOKEN"),
            "WANDB_API_KEY": os.getenv("WANDB_API_KEY"),
        },
        "mem_limit": "16g",
    }
)

### Paper Metadata

In [None]:
executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")
context = executor.get_paper_context()
print(context)

### Direct Use of Docker Images

In [None]:
# If you know the image name
executor = RemyxCodeExecutor(
    image="remyxai/2508.06434v1:latest",
    timeout=300
)

### Manual Agent Control

In [None]:
# For advanced users who want full control
executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")
executor_agent, writer_agent = executor.create_agents(
    goal="Custom exploration",
    llm_model="gpt-4o"
)

# Customize the chat
result = executor_agent.initiate_chat(
    writer_agent,
    message="Custom starting message",
    max_turns=10
)

## Tips & Tricks


**Start with Search**

You can quickly browse the catalog of pre-built images for papers you may want to experiment. Search papers and prebuilt Docker images using full text, keywords, or arXiv IDs.

In [None]:
from remyxai.client.search import SearchClient

papers = SearchClient().search(
    query="data synthesis techniques",
    has_docker=True,
    max_results=10
)

for p in papers:
    print(f"{p.arxiv_id}: {p.title[:50]}...")

**Use Interactive Mode for Learning**

Pause at each step to guide the agents in your exploration

In [None]:
executor.explore(
    goal="Explain this paper's approach",
    interactive=True  # Lets you guide each step
)

**Use Batch Mode for Experiments**

Expand your experimentation by running multiple papers automatically:

In [None]:
paper_ids = ["2508.06434v1", "2103.00020v1", "2010.11929v2"]

results = {}
for arxiv_id in paper_ids:
    executor = RemyxCodeExecutor(arxiv_id=arxiv_id)
    result = executor.explore(
        goal="Run quickstart",
        interactive=False,
        verbose=False
    )
    results[arxiv_id] = result

# Compare results
for arxiv_id, result in results.items():
    print(f"{arxiv_id}: {len(result.chat_history)} steps")

**Check Metadata**

Get a quick summary of all the available resources for a paper you may be interested in exploring further

In [None]:
executor = RemyxCodeExecutor(arxiv_id="2508.06434v1")
print(executor.get_paper_context())
# Shows: title, GitHub, working directory, quickstart hints

## Summary

This notebook showed you how RemyxCodeExecutor transforms research paper execution:

**Three Powerful Modes:**
-  **Quick Start**: `executor.explore()` - AI-guided exploration with defaults
-  **Learning Mode**: Interactive step-by-step with custom goals
-  **Batch Mode**: Automated experiments across multiple papers

**What Makes It Special:**
- Pre-configured Docker environments for 1000+ papers
- Zero dependency setup (everything pre-installed)
- AI agents that explain as they explore
- Reproducible execution every time


### Quick Reference
```python
# 1. Search
from remyxai.client.search import SearchClient
papers = SearchClient().search("your topic", has_docker=True)

# 2. Create executor
from autogen.coding import RemyxCodeExecutor
executor = RemyxCodeExecutor(arxiv_id=papers[0].arxiv_id)

# 3. Explore (pick one mode)
executor.explore()                                   # Quick start
executor.explore(goal="...", interactive=True)       # Learning
executor.explore(goal="...", interactive=False)      # Batch