## MLE-Dojo APIs and Interface
MLE-Dojo provides flexible, Gym-style APIs that enable users to develop with ease and flexibility through well-designed interfaces.

In [None]:
# =====================================================================
# Basic Imports from mledojo
# =====================================================================
import os
from pathlib import Path
from mledojo.gym.competition import CompetitionRegistry, CompInfo, Competition
from mledojo.competitions import get_metric
from mledojo.gym.interface import (
    Interface,
    InfoInterface, 
    CodeValidationInterface, 
    CodeExecutionInterface
)
from mledojo.gym.sandbox import Sandbox
from mledojo.gym.env import KaggleEnvironment
from mledojo.gym.feedback import FeedbackManager, Feedback
from mledojo.utils import get_metric



In [16]:
# =====================================================================
# 1. Setup Configuration
# =====================================================================
competition_name = "random-acts-of-pizza"
base_dir = Path("../")
data_dir = base_dir / "data" / "prepared" / competition_name / "data"
output_dir = base_dir / "results" / competition_name
output_dir.mkdir(parents=True, exist_ok=True)

# GPU and Timeout Settings
gpu_device = 0
gpu_memory_limit = 32  # GiB
execution_timeout = 600 # Seconds (reduced for direct testing)

#### Competition and Registry

The `CompetitionRegistry` class acts as a central hub for managing multiple machine learning competitions within the MLE-Dojo framework. It provides functionalities to:

*   **Register** new competitions, associating them with their name, data location, metadata (`CompInfo`), and evaluation metrics (`CompetitionMetrics`).
*   **Retrieve** specific `Competition` objects by their unique name.
*   **List** all currently registered competitions.
*   **Filter** and retrieve competitions based on specific criteria like category (e.g., "Tabular", "Vision") or difficulty level (e.g., "beginner", "intermediate").
*   **Check** if a competition with a given name is already registered.
*   **Unregister** competitions that are no longer needed.

This registry pattern simplifies the management and access of different competition environments.

In [None]:
# =====================================================================
# 2. Competition and Registry
# =====================================================================
registry = CompetitionRegistry(
    name=competition_name,
    data_dir=str(data_dir),
    comp_info=CompInfo(
        category="General",
        level="beginner",
        output_type="submission.csv",
        higher_is_better=True
    ),
    metric_class=get_metric(competition_name)
)

# Get Competition
competition: Competition = registry.get(competition_name)

# Play with Competition
print(f"data_path: {competition.get_data_path()}\n\n")
print(f"public_data_path: {competition.get_public_data_path()}\n\n")
print(f"private_data_path: {competition.get_private_data_path()}\n\n")
print(f"metric_class: {competition.create_metrics()}\n\n")

# Play with Registry
print(f"competition_name: {registry.get(competition_name)}")
print(f"competitions: {registry.list_competitions()}")
print(f"general competitions: {registry.get_competitions_by_category("General")}")
print(f"beginner competitions: {registry.get_competitions_by_level("beginner")}")
assert competition_name in registry
len(registry)

# Additional functionality
# registry.unregister(competition_name)
# assert competition_name not in registry




1

### Interface
`Interface` serves as the class for info and code interaction. Specifically, the main `Interface` class acts as a central hub, registering basic interface components that handle different aspects of the interaction process. By default, it includes:

*   `InfoInterface`: Responsible for retrieving various types of competition-related information, such as the competition overview, data structure details, and sample submission formats.
*   `CodeValidationInterface`: Handles the validation of user-submitted code, checking for both syntax errors and basic runtime behavior within a secure sandbox environment before full execution.
*   `CodeExecutionInterface`: Manages the execution of user code within the sandbox, processes the generated submission file, and orchestrates the evaluation against the ground truth data.

This design allows for modularity and extensibility. Users can also dynamically register custom components using the `register` method, tailoring the interface to specific competition requirements or adding new functionalities.

In [18]:
# =====================================================================
# 3. Interface
# =====================================================================

# 3.1 Info Interface
info_interface = InfoInterface(competition, output_dir)
info_types = ["overview", "data_structure", 
              "sample_submission", "name", 
              "metadata", "data_path", "output_path"]

# Get various information from the competition
for info_type in info_types:
    info_result = info_interface.get_info(info_type)
    print(info_result)
try:
    # This should raise an error, since the info_type is not valid
    info_interface.get_info("error_test")
except Exception as e:
    print(e)

# Register a new info provider
info_interface.register_provider("error_test", lambda: "error_test")
# This should now work after the new provider is registered
info_interface.get_info("error_test")


{'status': 'SUCCESS', 'data': {'overview': '### Description\n\nDive into the Random Acts of Pizza competition, where participants will harness machine learning to predict the success of pizza requests made on Reddit. With a dataset of 5,671 requests, each accompanied by its outcome (successful or unsuccessful) and relevant meta-data, the challenge is to develop an algorithm that can accurately forecast which requests will receive a pizza. \n\nThis competition, hosted by Kaggle, is designed for the machine learning community to engage in a fun and practical exercise. The dataset, collected by Althoff et al., provides a unique opportunity to explore the dynamics of altruistic requests. \n\n### Evaluation\n\nSubmissions are evaluated on area under the ROC curve between the predicted probability that a request will get pizza and the observed outcomes.\n\n## Submission File\n\nFor each request in the test set, you should predict a real-valued probability that it resulted in a pizza. The fil

{'status': 'SUCCESS', 'data': {'error_test': 'error_test'}}

In [19]:
# 3.2 Code Validation Interface

# Initialize the Sandbox with specific resource limits
sandbox = Sandbox(
    gpu_device=gpu_device,
    gpu_memory_limit=gpu_memory_limit,
    execution_timeout=execution_timeout
)

# This should generate a validation.py file in the output directory
validation_interface = CodeValidationInterface()
code_to_validate = "import pandas as pd\nprint('Validation check successful!')"
validation_result = validation_interface.validate(
    code=code_to_validate,
    sandbox=sandbox,
    output_dir=output_dir
)


2025-04-16 17:35:38,519 - mledojo.sandbox - INFO - Starting execution of /tmp/tmpk_egtyck.py
2025-04-16 17:35:38,520 - mledojo.sandbox - INFO - Command: python3 /tmp/tmpk_egtyck.py
2025-04-16 17:35:38,521 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, 
2025-04-16 17:35:39,261 - mledojo.sandbox - INFO - STDOUT:
Validation check successful!

2025-04-16 17:35:39,263 - mledojo.sandbox - INFO - Execution completed in 0.74s with return code 0


In [20]:

# 3.3 Code Execution Interface
# This should generate a execution.py file in the output directory
execution_interface = CodeExecutionInterface()

# Note that the directories used in the code for Sandbox should be absolute
# This is because the code will be executed in undetermined directory
absolute_data_dir = Path(os.path.abspath(data_dir))
absolute_output_dir = Path(os.path.abspath(output_dir))
code_to_execute = f'''
import pandas as pd
submission = pd.read_csv('{absolute_data_dir / "public" / "sample_submission.csv"}')
submission.to_csv('{absolute_output_dir / "submission.csv"}', index=False)
print("Submission created successfully.")
'''
execution_result = execution_interface.execute(
    code=code_to_execute,
    sandbox=sandbox,
    competition=competition,
    output_dir=output_dir
)
print(execution_result)

2025-04-16 17:35:39,281 - mledojo.sandbox - INFO - Starting execution of /tmp/tmppaeh3l4t.py
2025-04-16 17:35:39,282 - mledojo.sandbox - INFO - Command: python3 /tmp/tmppaeh3l4t.py
2025-04-16 17:35:39,283 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, 
2025-04-16 17:35:39,738 - mledojo.sandbox - INFO - STDOUT:
Submission created successfully.

2025-04-16 17:35:39,741 - mledojo.sandbox - INFO - Execution completed in 0.46s with return code 0


   rank  score
0     1    1.0
1     2    1.0
2     3    1.0
3     4    1.0
4     5    1.0
   rank  score
0     1    1.0
1     2    1.0
2     3    1.0
3     4    1.0
4     5    1.0
{'execution': {'status': 'SUCCESS', 'output': 'Submission created successfully.\n', 'error': '', 'execution_time': '0.46s'}, 'submission': {'status': 'SUCCESS', 'raw_score': np.float64(0.5), 'details': 'Submission processed successfully', 'position_score': {'private': {'position': 418, 'total': 462, 'position_score': 0.09740259740259741}, 'public': {'position': 418, 'total': 462, 'position_score': 0.09740259740259741}, 'avg_score': 0.09740259740259741}}, 'status': 'SUCCESS'}


In [21]:
# 3.4 Design and register custom interfaces
# For example, we allow the users to validate their submission without score evaluation
# This can be used in situations as MLE-Bench, where the number of evaluation is limited
class SubValidationInterface:
    def sub_validate(self, code: str, sandbox: Sandbox, output_dir: Path) -> dict:
        pass # Refer to CodeExecutionInterface for the detailed implementation

# Main Interface
interface = Interface(competition=competition, output_dir=output_dir)
interface.register("sub_validation", SubValidationInterface)


### Feedback

The `Feedback` module provides a structured system for generating and managing feedback within the MLE-Dojo environment. It centralizes feedback generation from various sources, offering insights into code validation, execution performance, and potential improvements.

*   `FeedbackManager`: Acts as the central registry for different feedback providers. It manages the available feedback types and routes requests to the appropriate provider.
*   `BaseFeedback`: This is the default provider that generates automated feedback by processing the raw results from code validation (`CodeValidationInterface`) and code execution (`CodeExecutionInterface`). It formats technical results, scores, and errors into a human-readable summary.
*   `LLMFeedback`: A placeholder for integrating external Large Language Models (LLMs) to provide AI-driven code analysis, suggestions, and qualitative feedback. (Release soon!).
*   `HumanFeedback`: A placeholder for incorporating interactive feedback mechanisms, allowing human users or instructors to provide input. (Release soon!).

The system is designed for extensibility, allowing new feedback providers (e.g., for specific error patterns, style checking) to be easily registered and integrated using the `FeedbackManager`. The `get_feedback` method of the manager allows retrieving feedback from multiple providers simultaneously.


In [22]:
# --- 4.1 Test BaseFeedback for Execution ---
# Prepare context for execution feedback (e.g., best scores if tracked)
feedback_manager = FeedbackManager()
exec_context = {
    "score_mode": "position",  # or "raw"
    "best_raw_score": 0.75, # Example best score
    "best_position_score": 0.9 # Example best score
}

# call back the execution result
execution_feedback_request = {
    "base": {
        "interface_mode": "execute_code",
        "raw_results": execution_result,
        "env_context": exec_context
    }
}
execution_feedback = feedback_manager.get_feedback(execution_feedback_request)
print(execution_feedback)


# --- 4.2 Test BaseFeedback for Validation ---
validation_feedback_request = {
    "base": {
        "interface_mode": "validate_code",
        "raw_results": validation_result,
        "env_context": {}
    }
}
validation_feedback = feedback_manager.get_feedback(validation_feedback_request)
print(validation_feedback)

# --- 4.3 Test BaseFeedback for Info Request ---
# call back the info result
info_feedback_request = {
    "base": {
        "interface_mode": "request_info",
        "raw_results": info_result,
        "env_context": {}
    }
}
info_feedback = feedback_manager.get_feedback(info_feedback_request)
print(info_feedback)


# --- 4.4 DIY your own feedback and register it ---
class MyFeedback(Feedback):
    def get_feedback(self, raw_results: dict, env_context: dict) -> dict:
        pass

feedback_manager.register("my_feedback", MyFeedback())




{'base': {'feedback_status': 'SUCCESS', 'feedback': '=== Code Execution Results ===\n                            Execution successful\nCode execution time: 0.46s\nCode output: Submission created successfully.\n\n\n                            === Submission Evaluation ===\n                                Submission successful\n                                Private Leaderboard: Position 418 / 462\nPublic Leaderboard: Position 418 / 462\n                                Raw Score: 0.5\nAverage Position Score: 0.0974\nBest Raw Score: 0.75\nBest Position Score: 0.9'}}
{'base': {'feedback_status': 'SUCCESS', 'feedback': '=== Code Validation Results ===\n                            Syntax check passed: Valid Python code\n                            Runtime check passed: Code executes without errors\nCode output: Validation check successful!\n\n                            Code execution time: 0.74s\n                            '}}
{'base': {'feedback_status': 'SUCCESS', 'feedback': "=== Compe

### Environment

The `KaggleEnvironment` class provides a standardized, `Gymnasium`-compatible interface for interacting with machine learning competitions within the MLE-Dojo framework. It orchestrates the entire competition workflow, acting as the primary entry point for users or automated agents. Key responsibilities include:

*   **Competition Management**: Leverages `CompetitionRegistry` to load and manage competition-specific details, data paths, and evaluation metrics.
*   **Interaction Interface**: Integrates an `Interface` object (by default containing `InfoInterface`, `CodeValidationInterface`, `CodeExecutionInterface`) to handle various actions like requesting information, validating code syntax/runtime, and executing submission code.
*   **Sandboxed Execution**: Utilizes a `Sandbox` to run user-submitted code securely within defined resource limits (GPU, CPU, memory, time), ensuring safe and fair execution.
*   **Feedback Generation**: Employs a `FeedbackManager` to process the results from the `Interface` actions and generate structured, informative feedback based on validation outcomes, execution performance, and scoring results.
*   **State Management & Tracking**: Follows the `Gymnasium Env` standard API (`step`, `reset`, `render`, `close`), maintaining internal state such as cumulative rewards, current/best scores (supporting both raw and position-based scoring), and a detailed history of interactions.

This environment encapsulates the complexities of competition setup, code execution, evaluation, and feedback, offering a consistent and robust platform for developing and testing ML solutions.

In [23]:
# 1. Initialize Environment
env = KaggleEnvironment.make(
    competition_name=competition_name,      
    output_dir=str(output_dir),         
    competition_registry=registry,      
    render_mode="human",                      
    score_mode="position",              
    gpu_device=gpu_device,                     
    gpu_memory_limit=gpu_memory_limit,                   
    execution_timeout=execution_timeout             
)

In [24]:
# All the actions now could be called by env.step()
env.step("request_info", **{"info_type": "overview"})



({'action_status': 'SUCCESS',
  'feedback': {'base': {'feedback_status': 'SUCCESS',
    'feedback': '=== Competition Info ===\n                        Your requested information: {\'overview\': \'### Description\\n\\nDive into the Random Acts of Pizza competition, where participants will harness machine learning to predict the success of pizza requests made on Reddit. With a dataset of 5,671 requests, each accompanied by its outcome (successful or unsuccessful) and relevant meta-data, the challenge is to develop an algorithm that can accurately forecast which requests will receive a pizza. \\n\\nThis competition, hosted by Kaggle, is designed for the machine learning community to engage in a fun and practical exercise. The dataset, collected by Althoff et al., provides a unique opportunity to explore the dynamics of altruistic requests. \\n\\n### Evaluation\\n\\nSubmissions are evaluated on area under the ROC curve between the predicted probability that a request will get pizza and the

In [25]:
env.step("validate_code", **{"code": "import pandas as pd\nprint('Validation check successful!')"})

2025-04-16 17:35:41,055 - mledojo.sandbox - INFO - Starting execution of /tmp/tmp2895guar.py
2025-04-16 17:35:41,057 - mledojo.sandbox - INFO - Command: python3 /tmp/tmp2895guar.py
2025-04-16 17:35:41,057 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, 
2025-04-16 17:35:41,514 - mledojo.sandbox - INFO - STDOUT:
Validation check successful!

2025-04-16 17:35:41,516 - mledojo.sandbox - INFO - Execution completed in 0.46s with return code 0


({'action_status': 'SUCCESS',
  'feedback': {'base': {'feedback_status': 'SUCCESS',
    'feedback': '=== Code Validation Results ===\n                            Syntax check passed: Valid Python code\n                            Runtime check passed: Code executes without errors\nCode output: Validation check successful!\n\n                            Code execution time: 0.46s\n                            '}},
  'current_raw_score': 0.0,
  'current_position_score': 0.0,
  'best_raw_score': None,
  'best_position_score': None,
  'history_summary': 'Total Actions: 2, Last Action: validate_code'},
 0.0)

In [26]:
env.step("execute_code", **{"code": "import pandas as pd\nprint('Execution check successful!')"})

2025-04-16 17:35:41,538 - mledojo.sandbox - INFO - Starting execution of /tmp/tmpe0bl7oo9.py
2025-04-16 17:35:41,539 - mledojo.sandbox - INFO - Command: python3 /tmp/tmpe0bl7oo9.py
2025-04-16 17:35:41,540 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, 
2025-04-16 17:35:41,997 - mledojo.sandbox - INFO - STDOUT:
Execution check successful!

2025-04-16 17:35:41,999 - mledojo.sandbox - INFO - Execution completed in 0.46s with return code 0


({'action_status': 'FAILED',
  'feedback': {'base': {'feedback_status': 'SUCCESS',
    'feedback': '=== Code Execution Results ===\n                            Execution successful\nCode execution time: 0.46s\nCode output: Execution check successful!\n\n\n                            === Submission Evaluation ===\n                                Submission error (SubmissionNotFoundError): Submission file not found\nError details: No submission file found at ../results/random-acts-of-pizza/submission.csv'}},
  'current_raw_score': 0.0,
  'current_position_score': 0.0,
  'best_raw_score': None,
  'best_position_score': None,
  'history_summary': 'Total Actions: 3, Last Action: execute_code'},
 0.0)

In [27]:
absolute_data_dir = Path(os.path.abspath(data_dir))
absolute_output_dir = Path(os.path.abspath(output_dir))
code_to_execute = f'''
import pandas as pd
submission = pd.read_csv('{absolute_data_dir / "public" / "sample_submission.csv"}')
submission.to_csv('{absolute_output_dir / "submission.csv"}', index=False)
print("Submission created successfully.")
'''

env.step("execute_code", **{"code": code_to_execute})

2025-04-16 17:35:42,022 - mledojo.sandbox - INFO - Starting execution of /tmp/tmpr9rvoirn.py
2025-04-16 17:35:42,023 - mledojo.sandbox - INFO - Command: python3 /tmp/tmpr9rvoirn.py
2025-04-16 17:35:42,024 - mledojo.sandbox - INFO - Resource limits: CPU=Nones, GPU=0, GPU MEM=32.00GB, 
2025-04-16 17:35:42,539 - mledojo.sandbox - INFO - STDOUT:
Submission created successfully.

2025-04-16 17:35:42,542 - mledojo.sandbox - INFO - Execution completed in 0.52s with return code 0


   rank  score
0     1    1.0
1     2    1.0
2     3    1.0
3     4    1.0
4     5    1.0
   rank  score
0     1    1.0
1     2    1.0
2     3    1.0
3     4    1.0
4     5    1.0


({'action_status': 'SUCCESS',
  'feedback': {'base': {'feedback_status': 'SUCCESS',
    'feedback': '=== Code Execution Results ===\n                            Execution successful\nCode execution time: 0.52s\nCode output: Submission created successfully.\n\n\n                            === Submission Evaluation ===\n                                Submission successful\n                                Private Leaderboard: Position 418 / 462\nPublic Leaderboard: Position 418 / 462\n                                Raw Score: 0.5\nAverage Position Score: 0.0974\nBest Raw Score: 0.5\nBest Position Score: 0.09740259740259741'}},
  'current_raw_score': np.float64(0.5),
  'current_position_score': 0.09740259740259741,
  'best_raw_score': np.float64(0.5),
  'best_position_score': 0.09740259740259741,
  'history_summary': 'Total Actions: 4, Last Action: execute_code'},
 0.09740259740259741)

In [28]:
env.close()