# Introduction

Welcome to this Jupyter notebook, where we will navigate the extensive features of the Moonshot framework, utilizing OpenAI's GPT-3.5 as a powerful tool within our arsenal. Our journey is segmented into distinct sections, each crafted to equip you with the necessary expertise to leverage the Moonshot framework for diverse AI-driven applications.

## Establishing a Connection with GPT-3.5
Our first step is to integrate OpenAI's GPT-3.5 into the Moonshot ecosystem. We'll walk you through setting up an endpoint, which serves as a conduit between your local environment and the AI model residing on OpenAI's servers. This process ensures a robust and uninterrupted flow of interaction with GPT-3.5.

## Mastering Moonshot Recipes and Cookbooks
We'll explore the creation of Moonshot recipes and cookbooks. Recipes are the core instructions directing Moonshot's interaction with GPT-3.5, dictating the data inputs, prompt formatting, and evaluation metrics. Cookbooks compile these recipes into a structured format, facilitating scalable and organized model evaluations. We'll guide you through each step to craft these components effectively.

## Benchmarking within Moonshot
To gauge the performance of GPT-3.5, we'll employ Moonshot's benchmarking capabilities. By conducting a series of tests, we'll assess the model's prowess across various tasks, shedding light on its efficiency and precision. These insights are invaluable in understanding and maximizing the model's potential.

## A Deep Dive into Moonshot's Workflow
Throughout this notebook, we will immerse you in a hands-on experience with the Moonshot framework, covering essential tasks such as:

- **Endpoint Management**: Establish and maintain connections to GPT-3.5.
- **Recipe Development**: Construct detailed recipes for precise model interaction.
- **Cookbook Assembly**: Compile recipes into cookbooks for comprehensive evaluations.
- **Execution and Analysis**: Implement recipes and cookbooks, followed by in-depth analysis.

By the conclusion of this notebook, you will possess an in-depth comprehension of the Moonshot framework's functionalities, empowering you to conduct advanced AI experiments and analyses with the aid of GPT-3.5.

Let's embark on this technological adventure!

## Pre-requisites

Before diving into the capabilities of GPT-3.5 and the Moonshot framework, it's essential to set up a proper working environment. This will help in avoiding any potential conflicts with Python libraries and ensure that all necessary dependencies are correctly installed.

### Setting Up a Virtual Environment

A virtual environment is an isolated Python environment that allows you to manage dependencies for different projects separately. If you haven't already created a virtual environment for this notebook, we highly recommend doing so. Here's how you can set it up:

1. Navigate to the notebook's directory: <br>
<code> cd /path/to/notebook/directory </code>

2. Create a virtual environment named 'env' (or any name you prefer): <br>
<code> python -m venv env</code>

3. Activate the virtual environment:
   - On macOS and Linux:<br>
   <code>bash source env/bin/activate</code>
   - On Windows:<br>
   <code>bash .\env\Scripts\activate</code>

4. With the virtual environment activated, install the required Python libraries using the provided `requirements.txt` file:<br>
<code>pip install -r requirements.txt</code>


### Downloading and Setting Up the Dataset

The dataset required for this notebook can be found in the Moonshot data repository. You will need to download it and place it into the `data` folder within the `examples/jupyter-notebook` directory. Follow these steps:

1. Download the dataset from the Moonshot data repository:<br>
<code>git clone https://github.com/moonshot-admin/moonshot-data.git</code>

2. Move the cloned repository's contents into the `data` folder within the `examples/jupyter-notebook` directory:<br>
<code>mv moonshot-data/* /path_to_moonshot_directory/moonshot/examples/jupyter-notebook/data/</code>

3. Navigate to the `data` directory and install any additional requirements for the dataset:<br>
<code>cd /path_to_moonshot_directory/moonshot/examples/jupyter-notebook/data</code><br>
<code>pip install -r requirements.txt</code>


### Final Preparations

Before starting, ensure the following:
- The virtual environment is active whenever you're working on this project.
- All datasets and required libraries are installed within the virtual environment.
- You have the necessary permissions to read and write within the data directories.

With these steps completed, your environment is now ready for you to engage with the Moonshot framework and harness the capabilities of GPT-3.5 for advanced AI experimentation and analysis.


# Import and Environment Variables

In this section, we prepare our Jupyter notebook environment by importing necessary libraries and setting up environment variables. The libraries are categorized based on their functionality: display enhancements for better visualization, standard libraries for basic operations, and specific Moonshot framework APIs for interacting with GPT-3.5. Additionally, we configure the environment variables to define the structure and access points for the Moonshot framework, ensuring that all components are correctly referenced and accessible.

## Results Display Enhancement Functions

These functions aid in enhancing the presentation of results obtained from Moonshot libraries and APIs. By leveraging the `rich` library, we can transform plain text outputs into well-structured and visually appealing tables, making it easier to interpret and analyze the data. The functions provided below are designed to display various types of information, such as connector types, endpoints, recipes, cookbooks, and benchmarking results, in a user-friendly tabular format. Each function is equipped with detailed documentation and error handling to ensure clarity and robustness in output display.

Whether you're managing connectors, executing recipes, or reviewing benchmarking outcomes, these functions will provide a consistent and polished look to your results, contributing to a more engaging and productive experience with the Moonshot framework.

<a id='prettified_functions'></a>

In [5]:
# Display Enhancements
# These imports are for improving the visual presentation of outputs in the notebook.
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# Standard Library Imports
# These are built-in Python modules used for system operations and JSON file manipulation.
import sys
import os
import json

# Rich Library Imports
# The 'rich' library is used to create visually appealing tables, panels, and console outputs.
# This enhances the readability and presentation of data in the notebook.
from rich.columns import Columns
from rich.console import Console
from rich.panel import Panel
from rich.table import Table

# Ensure that the root of the Moonshot framework is in the system path for module importing.
sys.path.insert(0, '../../')

# Moonshot Framework API Imports
# These imports from the Moonshot framework allow us to interact with the API, 
# creating and managing various components such as recipes, cookbooks, and endpoints.
import asyncio
from moonshot.api import (
    api_create_recipe,
    api_create_cookbook,
    api_create_endpoint,
    api_create_session,
    api_get_all_connector_type,
    api_get_all_endpoint,
    api_get_all_cookbook,
    api_get_all_recipe,
    api_get_all_runner,
    api_get_all_prompt_template_detail,
    api_load_runner,
    api_read_result,
    api_set_environment_variables,
    api_update_context_strategy,
    api_update_prompt_template,
)

# Environment Configuration
# Here we set up the environment variables for the Moonshot framework.
# These variables define the paths to various modules and components used by Moonshot,
# organizing the framework's structure and access points.
moonshot_path = "./data"
env = {
    "ATTACK_MODULES": os.path.join(moonshot_path, "attack-modules"),
    "CONNECTORS": os.path.join(moonshot_path, "connectors"),
    "CONNECTORS_ENDPOINTS": os.path.join(moonshot_path, "connectors-endpoints"),
    "CONTEXT_STRATEGY": os.path.join(moonshot_path, "context-strategy"),
    "COOKBOOKS": os.path.join(moonshot_path, "cookbooks"),
    "DATABASES": os.path.join(moonshot_path, "generated-outputs/databases"),
    "DATABASES_MODULES": os.path.join(moonshot_path, "databases-modules"),
    "DATASETS": os.path.join(moonshot_path, "datasets"),
    "IO_MODULES": os.path.join(moonshot_path, "io-modules"),
    "METRICS": os.path.join(moonshot_path, "metrics"),
    "PROMPT_TEMPLATES": os.path.join(moonshot_path, "prompt-templates"),
    "RECIPES": os.path.join(moonshot_path, "recipes"),
    "RESULTS": os.path.join(moonshot_path, "generated-outputs/results"),
    "RESULTS_MODULES": os.path.join(moonshot_path, "results-modules"),
    "RUNNERS": os.path.join(moonshot_path, "generated-outputs/runners"),
    "RUNNERS_MODULES": os.path.join(moonshot_path, "runners-modules"),
}

# Apply the environment variables to configure the Moonshot framework.
api_set_environment_variables(env)

# Initialize the global console for rich text display, which will be used throughout the notebook.
console = Console()



In [6]:
from rich.markup import escape
from moonshot.integrations.cli.benchmark.recipe import display_view_grading_scale_format, display_view_statistics_format
from moonshot.integrations.cli.common.display_helper import display_view_list_format, display_view_str_format


def display_connector_types(connector_types):
    """
    Display a list of connector types.

    This function takes a list of connector types and displays them in a table format. If the list is empty, it prints a
    message indicating that no connector types were found.

    Args:
        connector_types (list): A list of connector types.

    Returns:
        None
    """
    if connector_types:
        table = Table(
            title="List of Connector Types",
            show_lines=True,
            expand=True,
            header_style="bold",
        )
        table.add_column("No.", width=2)
        table.add_column("Connector Type", justify="left", width=78)
        for connector_id, connector_type in enumerate(connector_types, 1):
            table.add_section()
            table.add_row(str(connector_id), connector_type)
        console.print(table)
    else:
        console.print("[red]There are no connector types found.[/red]")

def display_endpoints(endpoints_list):
    """
    Display a list of endpoints.

    This function takes a list of endpoints and displays them in a table format. If the list is empty, it prints a
    message indicating that no endpoints were found.

    Args:
        endpoints_list (list): A list of endpoints. Each endpoint is a dictionary with keys 'id', 'name',
        'connector_type', 'uri', 'token', 'max_calls_per_second', 'max_concurrency', 'params', and 'created_date'.

    Returns:
        None
    """
    if endpoints_list:
        table = Table(
            title="List of Connector Endpoints",
            show_lines=True,
            expand=True,
            header_style="bold",
        )
        table.add_column("No.", justify="left", width=2)
        table.add_column("Id", justify="left", width=10)
        table.add_column("Name", justify="left", width=10)
        table.add_column("Connector Type", justify="left", width=10)
        table.add_column("Uri", justify="left", width=10)
        table.add_column("Token", justify="left", width=10)
        table.add_column("Max Calls Per Second", justify="left", width=5)
        table.add_column("Max concurrency", justify="left", width=5)
        table.add_column("Params", justify="left", width=30)
        table.add_column("Created Date", justify="left", width=8)

        for endpoint_id, endpoint in enumerate(endpoints_list, 1):
            (
                id,
                name,
                connector_type,
                uri,
                token,
                max_calls_per_second,
                max_concurrency,
                params,
                created_date,
            ) = endpoint.values()
            table.add_section()
            table.add_row(
                str(endpoint_id),
                id,
                name,
                connector_type,
                uri,
                token,
                str(max_calls_per_second),
                str(max_concurrency),
                escape(str(params)),
                created_date,
            )
        console.print(table)
    else:
        console.print("[red]There are no endpoints found.[/red]")

def display_recipes(recipes_list: list) -> None:
    """
    Display the list of recipes in a tabular format.

    This function takes a list of recipe dictionaries and displays each recipe's details in a table.
    The table includes the recipe's ID, name, description, and associated details such as tags, categories,
    datasets, prompt templates, metrics, attack strategies, grading scale, and statistics. If the list is empty,
    it prints a message indicating that no recipes are found.

    Args:
        recipes_list (list): A list of dictionaries, where each dictionary contains the details of a recipe.
    """
    if recipes_list:
        table = Table(
            title="List of Recipes", show_lines=True, expand=True, header_style="bold"
        )
        table.add_column("No.", width=2)
        table.add_column("Recipe", justify="left", width=78)
        table.add_column("Contains", justify="left", width=20, overflow="fold")
        for recipe_id, recipe in enumerate(recipes_list, 1):
            (
                id,
                name,
                description,
                tags,
                categories,
                datasets,
                prompt_templates,
                metrics,
                attack_strategies,
                grading_scale,
                stats,
            ) = recipe.values()

            tags_info = display_view_list_format("Tags", tags)
            categories_info = display_view_list_format("Categories", categories)
            datasets_info = display_view_list_format("Datasets", datasets)
            prompt_templates_info = display_view_list_format(
                "Prompt Templates", prompt_templates
            )
            metrics_info = display_view_list_format("Metrics", metrics)
            attack_strategies_info = display_view_list_format(
                "Attack Strategies", attack_strategies
            )
            grading_scale_info = display_view_grading_scale_format(
                "Grading Scale", grading_scale
            )
            stats_info = display_view_statistics_format("Statistics", stats)

            recipe_info = (
                f"[red]id: {id}[/red]\n\n[blue]{name}[/blue]\n{description}\n\n"
                f"{tags_info}\n\n{categories_info}\n\n{grading_scale_info}\n\n{stats_info}"
            )
            contains_info = f"{datasets_info}\n\n{prompt_templates_info}\n\n{metrics_info}\n\n{attack_strategies_info}"

            table.add_section()
            table.add_row(str(recipe_id), recipe_info, contains_info)
        console.print(table)
    else:
        console.print("[red]There are no recipes found.[/red]")

def display_cookbooks(cookbooks_list):
    """
    Display the list of cookbooks in a tabular format.

    This function takes a list of cookbook dictionaries and displays each cookbook's details in a table.
    The table includes the cookbook's ID, name, description, and associated recipes. If the list is empty,
    it prints a message indicating that no cookbooks are found.

    Args:
        cookbooks_list (list): A list of dictionaries, where each dictionary contains the details of a cookbook.
    """
    if cookbooks_list:
        table = Table(
            title="List of Cookbooks", show_lines=True, expand=True, header_style="bold"
        )
        table.add_column("No.", width=2)
        table.add_column("Cookbook", justify="left", width=78)
        table.add_column("Contains", justify="left", width=20, overflow="fold")
        for cookbook_id, cookbook in enumerate(cookbooks_list, 1):
            id, name, description, recipes = cookbook.values()
            cookbook_info = f"[red]ID: {id}[/red]\n\n[blue]{name}[/blue]\n{description}"
            recipes_info = display_view_list_format("Recipes", recipes)
            table.add_section()
            table.add_row(str(cookbook_id), cookbook_info, recipes_info)
        console.print(table)
    else:
        console.print("[red]There are no cookbooks found.[/red]")

def display_prompt_templates(prompt_templates) -> None:
    """
    Display the list of prompt templates in a formatted table.

    This function takes a list of prompt templates and displays them in a formatted table.
    Each row in the table represents a prompt template with its ID, name, description, and contents.
    If the list of prompt templates is empty, it prints a message indicating that no prompt templates were found.

    Args:
        prompt_templates (list): A list of dictionaries, each representing a prompt template.
    """
    table = Table(
        title="List of Prompt Templates",
        show_lines=True,
        expand=True,
        header_style="bold",
    )
    table.add_column("No.", width=2)
    table.add_column("Prompt Template", justify="left", width=50)
    table.add_column("Contains", justify="left", width=48, overflow="fold")
    if prompt_templates:
        for prompt_index, prompt_template in enumerate(prompt_templates, 1):
            (
                id,
                name,
                description,
                contents,
            ) = prompt_template.values()

            prompt_info = f"[red]id: {id}[/red]\n\n[blue]{name}[/blue]\n{description}"
            table.add_section()
            table.add_row(str(prompt_index), prompt_info, contents)
        console.print(table)
    else:
        console.print("[red]There are no prompt templates found.[/red]")

def show_cookbook_results(cookbooks, endpoints, cookbook_results, duration):
    """
    Show the results of the cookbook benchmarking.

    This function takes the cookbooks, endpoints, cookbook results, results file, and duration as arguments.
    If there are results, it generates a table with the cookbook results and prints a message indicating
    where the results are saved. If there are no results, it prints a message indicating that no results were found.
    Finally, it prints the duration of the run.

    Args:
        cookbooks (list): A list of cookbooks.
        endpoints (list): A list of endpoints.
        cookbook_results (dict): A dictionary with the results of the cookbook benchmarking.
        duration (float): The duration of the run.

    Returns:
        None
    """
    if cookbook_results:
        # Display recipe results
        generate_cookbook_table(cookbooks, endpoints, cookbook_results)
    else:
        console.print("[red]There are no results.[/red]")

    # Print run stats
    console.print(f"{'='*50}\n[blue]Time taken to run: {duration}s[/blue]\n*Overall rating will be the lowest grade that the recipes have in each cookbook\n{'='*50}")

def generate_cookbook_table(cookbooks: list, endpoints: list, results: dict) -> None:
    """
    Generate and display a table with the cookbook benchmarking results.

    This function creates a table that includes the index, cookbook name, recipe name, and the results
    for each endpoint.

    The cookbook names are prefixed with "Cookbook:" and are displayed with their overall grades. Each recipe under a
    cookbook is indented and prefixed with "Recipe:" followed by its individual grades for each endpoint. If there are
    no results for a cookbook, a row with dashes across all endpoint columns is added to indicate this.

    Args:
        cookbooks (list): A list of cookbook names to display in the table.
        endpoints (list): A list of endpoints for which results are to be displayed.
        results (dict): A dictionary containing the benchmarking results for cookbooks and recipes.

    Returns:
        None: The function prints the table to the console but does not return any value.
    """
    table = Table(
        title="Cookbook Result", show_lines=True, expand=True, header_style="bold"
    )
    table.add_column("No.", width=2)
    table.add_column("Cookbook (with its recipes)", justify="left", width=78)
    for endpoint in endpoints:
        table.add_column(endpoint, justify="center")

    index = 1
    for cookbook in cookbooks:
        # Get cookbook result
        cookbook_result = next(
            (
                result
                for result in results["results"]["cookbooks"]
                if result["id"] == cookbook
            ),
            None,
        )

        if cookbook_result:
            # Add the cookbook name with the "Cookbook: " prefix as the first row for this section
            endpoint_results = []
            for endpoint in endpoints:
                # Find the evaluation summary for the endpoint
                evaluation_summary = next(
                    (
                        temp_eval
                        for temp_eval in cookbook_result["overall_evaluation_summary"]
                        if temp_eval["model_id"] == endpoint
                    ),
                    None,
                )

                # Get the grade from the evaluation_summary, or use "-" if not found
                grade = "-"
                if evaluation_summary and evaluation_summary["overall_grade"]:
                    grade = evaluation_summary["overall_grade"]
                endpoint_results.append(grade)
            table.add_row(
                str(index),
                f"Cookbook: [blue]{cookbook}[/blue]",
                *endpoint_results,
                end_section=True,
            )

            for recipe in cookbook_result["recipes"]:
                endpoint_results = []
                for endpoint in endpoints:
                    # Find the evaluation summary for the endpoint
                    evaluation_summary = next(
                        (
                            temp_eval
                            for temp_eval in recipe["evaluation_summary"]
                            if temp_eval["model_id"] == endpoint
                        ),
                        None,
                    )

                    # Get the grade from the evaluation_summary, or use "-" if not found
                    grade = "-"
                    if (
                        evaluation_summary
                        and "grade" in evaluation_summary
                        and "avg_grade_value" in evaluation_summary
                        and evaluation_summary["grade"]
                    ):
                        grade = f"{evaluation_summary['grade']} [{evaluation_summary['avg_grade_value']}]"
                    endpoint_results.append(grade)

                # Add the recipe name indented under the cookbook name
                table.add_row(
                    "",
                    f"  └──  Recipe: [blue]{recipe['id']}[/blue]",
                    *endpoint_results,
                    end_section=True,
                )

            # Increment index only after all recipes of the cookbook have been added
            index += 1
        else:
            # If no results for the cookbook, add a row indicating this with the "Cookbook: " prefix
            # and a dash for each endpoint column
            table.add_row(
                str(index),
                f"Cookbook: {cookbook}",
                *(["-"] * len(endpoints)),
                end_section=True,
            )
            index += 1

    # Display table
    console.print(table)

def show_recipe_results(recipes, endpoints, recipe_results, duration):
    """
    Show the results of the recipe benchmarking.

    This function takes the recipes, endpoints, recipe results, results file, and duration as arguments.
    If there are any recipe results, it generates a table to display them using the generate_recipe_table function.
    It also prints the location of the results file and the time taken to run the benchmarking.
    If there are no recipe results, it prints a message indicating that there are no results.

    Args:
        recipes (list): A list of recipes that were benchmarked.
        endpoints (list): A list of endpoints that were used in the benchmarking.
        recipe_results (dict): A dictionary with the results of the recipe benchmarking.
        duration (float): The time taken to run the benchmarking in seconds.

    Returns:
        None
    """
    if recipe_results:
        # Display recipe results
        generate_recipe_table(recipes, endpoints, recipe_results)
    else:
        console.print("[red]There are no results.[/red]")

    # Print run stats
    console.print(f"{'='*50}\n[blue]Time taken to run: {duration}s[/blue]\n*Overall rating will be the lowest grade that the recipes have in each cookbook\n{'='*50}")

def generate_recipe_table(recipes: list, endpoints: list, results: dict) -> None:
    """
    Generate and display a table of recipe results.

    This function creates a table that lists the results of running recipes against various endpoints.
    Each row in the table corresponds to a recipe, and each column corresponds to an endpoint.
    The results include the grade and average grade value for each recipe-endpoint pair.

    Args:
        recipes (list): A list of recipe IDs that were benchmarked.
        endpoints (list): A list of endpoint IDs against which the recipes were run.
        results (dict): A dictionary containing the results of the benchmarking.

    Returns:
        None: This function does not return anything. It prints the table to the console.
    """
    # Create a table with a title and headers
    table = Table(
        title="Recipes Result", show_lines=True, expand=True, header_style="bold"
    )
    table.add_column("No.", width=2)
    table.add_column("Recipe", justify="left", width=78)
    # Add a column for each endpoint
    for endpoint in endpoints:
        table.add_column(endpoint, justify="center")

    # Iterate over each recipe and populate the table with results
    for index, recipe_id in enumerate(recipes, start=1):
        # Attempt to find the result for the current recipe
        recipe_result = next(
            (
                result
                for result in results["results"]["recipes"]
                if result["id"] == recipe_id
            ),
            None,
        )

        # If the result exists, extract and format the results for each endpoint
        if recipe_result:
            endpoint_results = []
            for endpoint in endpoints:
                # Find the evaluation summary for the endpoint
                evaluation_summary = next(
                    (
                        eval_summary
                        for eval_summary in recipe_result["evaluation_summary"]
                        if eval_summary["model_id"] == endpoint
                    ),
                    None,
                )

                # Format the grade and average grade value, or use "-" if not found
                grade = "-"
                if (
                    evaluation_summary
                    and "grade" in evaluation_summary
                    and "avg_grade_value" in evaluation_summary
                    and evaluation_summary["grade"]
                ):
                    grade = f"{evaluation_summary['grade']} [{evaluation_summary['avg_grade_value']}]"
                endpoint_results.append(grade)

            # Add a row for the recipe with its results
            table.add_row(
                str(index),
                f"Recipe: [blue]{recipe_result['id']}[/blue]",
                *endpoint_results,
                end_section=True,
            )
        else:
            # If no result is found, add a row with placeholders
            table.add_row(
                str(index),
                f"Recipe: [blue]{recipe_id}[/blue]",
                *(["-"] * len(endpoints)),
                end_section=True,
            )

    # Print the table to the console
    console.print(table)

def display_runners(
    runner_list: list, runner_run_info_list: list, runner_session_info_list: list
) -> None:
    """
    Display runners in a table format.

    This function takes lists of runner information, run information, and session information, then displays them in a
    table format on the command line interface. Each runner is listed with details such as the runner's ID, name,
    description, number of runs, number of sessions, database file, and endpoints.

    Args:
        runner_list: A list of dictionaries, where each dictionary contains information about a runner.

        runner_run_info_list: A list of dictionaries, where each dictionary contains information about a run
        associated with a runner.

        runner_session_info_list: A list of dictionaries, where each dictionary contains information about a session
        associated with a runner.

    Returns:
        None
    """
    if runner_list:
        table = Table(
            title="List of Runners", show_lines=True, expand=True, header_style="bold"
        )
        table.add_column("No.", width=2)
        table.add_column("Runner", justify="left", width=78)
        table.add_column("Contains", justify="left", width=20, overflow="fold")
        for runner_id, runner in enumerate(runner_list, 1):
            (id, name, db_file, endpoints, description) = runner.values()

            db_info = display_view_str_format("Database", db_file)
            endpoints_info = display_view_list_format("Endpoints", endpoints)

            runs_count = sum(
                run_info["runner_id"] == id for run_info in runner_run_info_list
            )
            # Handle the case where session_info can be None
            sessions_count = sum(
                session_info is not None and session_info["session_id"] == id
                for session_info in runner_session_info_list
            )

            runner_info = (
                f"[red]id: {id}[/red]\n\n[blue]{name}[/blue]\n{description}\n"
                f"[blue]Number of Runs:[/blue] {runs_count}\n"
                f"[blue]Number of Sessions:[/blue] {sessions_count}"
            )
            contains_info = f"{db_info}\n\n{endpoints_info}"

            table.add_section()
            table.add_row(str(runner_id), runner_info, contains_info)
        console.print(table)
    else:
        console.print("[red]There are no runners found.[/red]")

def display_runs(runs_list: list):
    """
    Display a list of runs in a table format.

    This function takes a list of run information and displays it in a table format using the rich library's
    Table object.

    Each run's details are formatted and added as a row in the table.
    If there are no runs to display, a message is printed to indicate that no results were found.

    Args:
        runs_list (list): A list of dictionaries, where each dictionary contains details of a run.

    Returns:
        None
    """
    if runs_list:
        table = Table(
            title="List of Runs", show_lines=True, expand=True, header_style="bold"
        )
        table.add_column("No.", width=2)
        table.add_column("Run", justify="left", width=78)
        table.add_column("Contains", justify="left", width=20, overflow="fold")
        for run_number, run in enumerate(runs_list, 1):
            (
                run_id,
                runner_id,
                runner_type,
                runner_args,
                endpoints,
                results_file,
                start_time,
                end_time,
                duration,
                error_messages,
                raw_results,
                results,
                status,
            ) = run.values()

            duration_info = (
                f"[blue]Period:[/blue] {start_time} - {end_time} ({duration}s)"
            )
            run_id = display_view_str_format("Run ID", run_id)
            runner_id = display_view_str_format("Runner ID", runner_id)
            runner_type = display_view_str_format("Runner Type", runner_type)
            runner_args = display_view_str_format("Runner Args", runner_args)
            status_info = display_view_str_format("Status", status)
            results_info = display_view_str_format("Results File", results_file)
            endpoints_info = display_view_list_format("Endpoints", endpoints)
            error_messages_info = display_view_list_format(
                "Error Messages", error_messages
            )

            has_raw_results = bool(raw_results)
            has_results = bool(results)

            result_info = f"[red]{runner_id}[/red]\n\n{run_id}\n\n{duration_info}\n\n{status_info}"
            contains_info = (
                f"{results_info}\n\n{error_messages_info}\n\n{endpoints_info}\n\n"
                f"[blue]Has Raw Results: {has_raw_results}[/blue]\n\n"
                f"[blue]Has Results: {has_results}[/blue]"
            )

            table.add_section()
            table.add_row(str(run_number), result_info, contains_info)
        console.print(table)
    else:
        console.print("[red]There are no results found.[/red]")

ImportError: cannot import name 'display_view_grading_scale_format' from 'moonshot.integrations.cli.benchmark.recipe' (/Users/jacksonboey/PycharmProjects/moonshot/examples/jupyter-notebook/../../moonshot/integrations/cli/benchmark/recipe.py)

## Understanding Connectors in Moonshot

A `connector` in the Moonshot framework acts as an interface between the framework itself and an external AI model, such as OpenAI's GPT-3.5. It is responsible for two primary functions:

1. **Communication**: The connector handles all the API calls to the AI model, including sending requests and receiving responses. It abstracts the complexity of direct API interactions, providing a simple interface for the Moonshot framework to execute commands and retrieve results.

2. **Response Processing**: Once a response is received from the AI model, the connector processes this information, translating it into a format that is usable within the Moonshot framework. This may involve parsing text, handling data structures, or extracting specific metrics from the model's output.

In essence, connectors are customizable modules that dictate how Moonshot communicates with different AI models. They are designed to be modular, allowing developers to add support for new models or modify existing interactions.

When setting up an endpoint, you will select an appropriate connector that matches the AI model you wish to interact with. This ensures that the endpoint can correctly manage the flow of data to and from the model, according to the protocols and formats required by both Moonshot and the AI service provider.

### Connector Types and Available Models

In the Moonshot framework, connectors define the specific methods of interaction with various AI models. To see a list of all the connectors currently available within Moonshot, we use the `api_get_all_connector_type()` function. This will enumerate the types of connectors that you can use to establish endpoints for different models.

Each connector encapsulates two essential behaviors:

1. **Model Invocation**: This defines how the Moonshot framework calls the AI model. Developers can refer to the `get_response()` async function within the connector's Python file located at `moonshot\data\connectors\` to understand the specifics of making API calls to the model.

2. **Response Handling**: After receiving a response from the AI model, the connector must process this data appropriately. The `_process_response()` function within the connector's implementation is responsible for parsing and formatting the model's output so that it can be utilized effectively within the Moonshot framework.

In the following cell, we will execute `api_get_all_connector_type()` to display a list of all the available models that Moonshot can connect to through these connectors.

In [4]:
connection_types = api_get_all_connector_type()
display_connector_types(connection_types)

NameError: name 'api_get_all_connector_type' is not defined

### Understanding the Role of an Endpoint

Within the Moonshot framework, an endpoint represents the configured access point that facilitates communication between Moonshot and an AI model. It is the practical implementation of a connector, operationalizing the communication and response processing logic encapsulated in the connector's code.

Endpoints are crucial for sending requests to AI models and receiving their responses. They encompass all the necessary configurations, such as API URLs, authentication tokens, and rate limits, which are defined when you create an endpoint using a specific connector.

#### Retrieving Existing Endpoints

To view all the endpoints that have been configured in your Moonshot environment, you can use the `api_get_all_endpoint()` function. This will provide you with a list of all endpoints, including their details and statuses, allowing you to manage and select the appropriate endpoint for your tasks.

By understanding and managing endpoints effectively, you can streamline your interactions with AI models, whether for conducting benchmarks, running red teaming exercises, or other analytical operations within the Moonshot framework.

In [5]:
endpoints_list = api_get_all_endpoint()
display_endpoints(endpoints_list)

### Step-by-Step Guide to Endpoint Creation

In this section, we provide a detailed walkthrough for establishing an endpoint within the Moonshot framework using an existing connector. An endpoint is a configured interface that allows Moonshot to communicate with an AI model for various tasks, such as benchmarking and red teaming.

#### Creating an Endpoint with `api_create_endpoint()`

To set up a new endpoint, we utilize the `api_create_endpoint()` function. This involves specifying the connector details, such as the name, connector type, and any additional parameters required for the connection.

#### Utilizing the Endpoint

Once the endpoint is configured, it becomes an integral part of the Moonshot framework, ready to be used for evaluating AI models. You can:

- **Benchmarking**: Use the endpoint to run benchmarks on the model, assessing its performance on different tasks.
- **Red Teaming**: Employ the endpoint in red teaming exercises to test the model's robustness against potential adversarial inputs.

In [6]:
# Create a new endpoint for interacting with OpenAI's GPT-3.5 model.
# Replace 'ADD_NEW_TOKEN_HERE' with your actual OpenAI API token.
endpoint_id = api_create_endpoint(
    "test-openai-endpoint",  # name: Assign a unique name to identify this endpoint later.
    "openai-connector",      # connector_type: Specify the connector type for the model you want to evaluate.
    "",                      # uri: Leave blank as the OpenAI library handles the connection.
    "ADD_NEW_TOKEN_HERE",    # token: Insert your OpenAI API token here.
    1,                       # max_calls_per_second: Set the maximum number of calls allowed per second.
    1,                       # max_concurrency: Set the maximum number of concurrent calls.
    {
        "timeout": 300,      # Define the timeout for API calls in seconds.
        "allow_retries": True,  # Specify whether to allow retries on failed calls.
        "num_of_retries": 3,  # Set the number of retries if allowed.
        "temperature": 0.5,   # Set the temperature for response variability.
        "model": "gpt-3.5-turbo"  # Define the model version to use.
    }  # params: Include any additional parameters required for this model.
)
print(f"The newly created endpoint id: {endpoint_id}")

# Retrieve and display the list of all configured endpoints to verify the addition of the new endpoint.
endpoints_list = api_get_all_endpoint()
display_endpoints(endpoints_list)

The newly created endpoint id: test-openai-endpoint


## Crafting a Moonshot Recipe

In the Moonshot framework, a recipe is akin to a blueprint for an experiment or test. It contains all the details required to run a benchmark or analysis on an AI model. A recipe guides Moonshot on how to interact with the model, what data to use, and how to evaluate the model's responses.

### What Does a Recipe Include?

A recipe typically includes the following components:

1. **Name**: A unique name for the recipe.
2. **Description**: An explanation of what the recipe does and what it's for.
3. **Tags**: Keywords that categorize the recipe, making it easier to find and group with similar recipes.
4. **Categories**: Broader classifications that help organize recipes into collections.
5. **Datasets**: The data that will be used when running the recipe. This could be a set of prompts, questions, or any input that the model will respond to.
6. **Prompt Templates**: Pre-defined text structures that shape how prompts are presented to the model.
7. **Metrics**: Criteria or measurements used to evaluate the model's responses, such as accuracy, fluency, or adherence to a prompt.
8. **Attack Strategies**: Optional components that introduce adversarial testing scenarios to probe the model's robustness.
9. **Grading Scale**: A set of thresholds or criteria used to grade or score the model's performance.

### Creating a Custom Test Dataset

Before creating a recipe, you may need to create a custom test dataset that the recipe will use. This dataset should be relevant to the specific task you want the AI model to perform. Here's an example of how you might define a simple test dataset:

In [7]:
test_dataset = {
    "name": "test-dataset",
    "description": "This dataset contains questions on general items and its category.",
    "license": "CC BY-SA",
    "reference": "https://my-reference-location.org/",
    "examples": [
        {
            "input": "What is an apple?",
            "target": "Fruit"
        },
        {
            "input": "What is a chair?",
            "target": "Furniture"
        },
        {
            "input": "What is a laptop?",
            "target": "Electronic"
        },
        {
            "input": "What is a biscuit?",
            "target": "Food"
        }
        ,
        {
            "input": "What is a pear?",
            "target": "Fruit"
        }
    ]
}

# to change later when notebook is shifted
in_file = "data/datasets/test-dataset.json"
json.dump(test_dataset, open(in_file, "w+"), indent=2)

### Creating a Custom Prompt Template

In addition to the dataset, you will need to define a prompt template. This template serves as a scaffold for how the prompts, based on your dataset, will be structured when presented to the AI model. The template ensures consistency in the way prompts are delivered, which is crucial for reliable model evaluation.

Here's an example of how to craft a simple prompt template:<br>
```
Template Name: Simple Question Answering Template
Description: This template formats questions for the AI to answer.
Template Structure:
---
Question: {{ input }}
Answer:
---
```


With this template, when you run the recipe, Moonshot will format the prompts as follows, using the provided dataset:<br>
```
Question: What is an apple?
Answer:
```

The placeholder `{{ input }}` in the template will be replaced with the actual content from your dataset. This structured approach ensures that the AI model receives the prompts in a consistent and expected format, allowing for accurate and standardized responses.


In [8]:
prompt_template = {
    "name": "Simple Question Answering Template",
    "description": "This is a simple question and answering template.",
    "template": "Answer this question:\n{{ prompt }}\nA:"
}

in_file = "data/prompt-templates/test-prompt-template.json"
json.dump(prompt_template, open(in_file, "w+"), indent=2)

### Creating the Recipe

Now that you have prepared your custom test dataset and prompt template, you are ready to create a recipe. The recipe is a set of instructions that tells the Moonshot framework how to conduct a test or benchmark using an AI model.

To create a recipe, you will use the `api_create_recipe()` function. This function requires certain mandatory parameters, while others are optional and can be tailored to your specific testing needs.

Here's a breakdown of the parameters for the `api_create_recipe()` function:

- **Name** (required): A unique identifier for the recipe.
- **Description** (required): A clear description of the recipe's purpose.
- **Tags** (optional): Keywords to help categorize and search for the recipe.
- **Categories** (optional): Groupings to organize recipes into collections.
- **Datasets** (required): The names of the datasets to be used when running the recipe.
- **Prompt Templates** (Optional): The names of the prompt templates that format the prompts sent to the model.
- **Metrics** (required): The names of the metrics used to evaluate the model's responses.
- **Attack Strategies** (optional): The names of any attack strategies to test the model's resilience.
- **Grading Scale** (optional): A dictionary defining the grading scale for scoring the model's performance.

Here's an example of how you might call this function with your custom dataset and template:


In [9]:
api_create_recipe(
    "Item Category",
    "This recipe is created to test model's ability in answering question.",
    ["tag1"],
    ["category1"],
    ["test-dataset"],
    ["test-prompt-template"],
    ["exactstrmatch", 'bertscore'],
    [],
    {
        "A": [
            0,
            19
        ],
        "B": [
            20,
            39
        ],
        "C": [
            40,
            59
        ],
        "D": [
            60,
            79
        ],
        "E": [
            80,
            100
        ]
    }
)

recipes_list = api_get_all_recipe()
display_recipes(recipes_list)

## Creating a Cookbook in Moonshot

A cookbook in the Moonshot framework is a collection of recipes. Think of it as an anthology that groups together various tests, benchmarks, and analyses for AI models. A cookbook allows you to organize and execute multiple recipes in a structured manner, which is particularly useful when you want to evaluate a model across different dimensions or datasets.

### Components of a Cookbook

A cookbook typically includes:

1. **Name**: A unique name for the cookbook.
2. **Description**: A detailed explanation of the cookbook's purpose and the types of recipes it contains.
3. **Recipes**: A list of recipe names that are included in the cookbook. Each recipe represents a specific test or benchmark.

### Creating a Cookbook

To create a cookbook, you will use the `api_create_cookbook()` function provided by the Moonshot API. This function requires you to specify the name and description, and then you can add the recipes you have created.

Here's an example of how you might call this function to create a cookbook:

In [10]:
api_create_cookbook(
    "test-category-cookbook",
    "This cookbook tests if the model is able to group items into different categories",
    ["item-category"]
)

cookbooks_list = api_get_all_cookbook()
display_cookbooks(cookbooks_list)

## Executing Recipes in Moonshot

The Moonshot framework enables you to run recipes, which are sets of instructions for evaluating AI models against predefined tasks and metrics. Executing recipes allows you to measure the model's performance and gain valuable insights.

### Running Recipes with `api_create_runner`

To execute recipes, you can use the `api_create_runner` function, which allows for running multiple recipes on specified endpoints. This function is particularly useful for conducting parallel evaluations and comparisons across different models or configurations.

Here's a step-by-step guide to running recipes:

1. **Define the Runner**: Assign a name to your recipe runner and specify the recipes and endpoints you wish to use.
2. **Set Execution Parameters**: Choose the number of prompts to test and other optional parameters like `random_seed` and `system_prompt`.
3. **Advanced Configuration**: Optionally, you can customize the runner processing module and result processing module.
4. **Execute the Recipes**: Use the runner to run the specified recipes with the given parameters.
5. **Close the Runner**: Ensure to close the runner after execution to free up resources.
6. **Review Results**: Access the results of the run, which include performance metrics and other relevant data.

The results, runners and databases are located at ```data/generated-outputs/```

Here's how you can implement this in code:

In [13]:
from slugify import slugify
from moonshot.api import api_get_all_run, api_create_runner, api_get_all_runner_name

name = "my new recipe runner" # Indicate the name
recipes = ["item-category", "bbq"] # Test against 2 recipes, item-category and bbq
endpoints = ["test-openai-endpoint"]  # Test against 1 endpoint, test-openai-endpoint
num_of_prompts = 5 # use a smaller number to test out the function; 0 means using all prompts in dataset

# Below are the optional fields
random_seed = 0   # Default: 0; this allows for randomness in dataset selection when num_of_prompts are set
system_prompt = ""  # Default: ""; this allows setting the system prompt for the endpoints

# Advanced user - Modify runner processing module and result processing module
# Default: benchmarking and benchmarking-result
runner_proc_module = "benchmarking"  # Default: "benchmarking"
result_proc_module = "benchmarking-result"  # Default: "benchmarking-result"

# Run the recipes with the defined endpoints
# If the id exists, it will perform a load on the runner, instead of creating a new runner.
# The benefit of this, allows the new run to use possible cached results from previous runs which greatly enhances the run time.
slugify_id = slugify(name, lowercase=True)
if slugify_id in api_get_all_runner_name():
    rec_runner = api_load_runner(slugify_id)
else:
    rec_runner = api_create_runner(name, endpoints)

# run_cookbooks is an async function. Currently there is no sync version.
# We will get an existing event loop and execute the run cookbooks process.
await rec_runner.run_recipes(
    recipes,
    num_of_prompts,
    random_seed,
    system_prompt,
    runner_proc_module,
    result_proc_module,
)
rec_runner.close()  # Perform a close on the runner to allow proper cleanup.

# Display results
runner_runs = api_get_all_run(rec_runner.id)
result_info = runner_runs[-1].get("results")
if result_info:
    show_recipe_results(
        recipes, endpoints, result_info, result_info["metadata"]["duration"]
    )
else:
    raise RuntimeError("no run result generated")


Established connection to database (data/generated-outputs/databases/my-new-recipe-runner.db)
[Runner] my-new-recipe-runner - Running benchmark recipe run...
[Run] Part 0: Initialising run...
[Run] Initialise run took 0.0011s
[Run] Part 1: Loading asyncio running loop...
[Run] Part 2: Loading modules...
[Run] Module loading took 0.0023s
[Run] Part 3: Running runner processing module...
[Benchmarking] Load recipe connectors took 0.0074s
[Benchmarking] Set connectors system prompt took 0.0000s
[Benchmarking] Part 1: Running recipes (['item-category', 'bbq'])...
[Benchmarking] Running recipe item-category... (1/2)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0010s
[Benchmarking] Load recipe metrics took 0.0012s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset test-dataset, using 5 of 5 prompts.
Predicting prompt 1 [test-openai-endpoint]
Predicting prompt 2 [test-openai-endpoint]
Predicting prompt 3 [test-openai-endpoint]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1] took 1.0448s
[Prompt 3] took 1.0641s
[Prompt 2] took 1.0728s
[Prompt 4] took 1.1161s
[Benchmarking] Predicting prompts for recipe [item-category] took 1.1224s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [item-category] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (item-category), dataset_id (test-dataset), prompt_template_id (test-prompt-template)
[exactstrmatch] Running [get_results] took 0.0000s
[bertscore] Running [get_results] took 0.0000s


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[Benchmarking] Performing metrics calculation for recipe [item-category] took 1.9347s
[Benchmarking] Running recipe bbq... (2/2)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0144s
[Benchmarking] Load recipe metrics took 0.0006s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset bbq-lite-age-ambiguous, using 5 of 1840 prompts.
[Benchmarking] Dataset bbq-lite-age-disamb, using 5 of 1840 prompts.
[Benchmarking] Dataset bbq-lite-disability-status-ambiguous, using 5 of 778 prompts.
[Benchmarking] Dataset bbq-lite-disability-status-disamb, using 5 of 778 prompts.
[Benchmarking] Dataset bbq-lite-gender-ambiguous, using 5 of 2836 prompts.
[Benchmarking] Dataset bbq-lite-gender-disamb, using 5 of 2836 prompts.
[Benchmarking] Dataset bbq-lite-nationality-ambiguous, using 5 of 1540 prompts.
[Benchmarking] Dataset bbq-lite-nationality-disamb, using 5 of 1540 prompts.
[Benchmarking] Dataset bbq-lite-physical-appearance-ambiguous, us

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1553] took 0.5374s
[Prompt 789] took 0.5741s
[Prompt 1730] took 0.5724s
[Prompt 1824] took 0.5778s
[Prompt 1553] took 0.6058s
[Prompt 862] took 0.6188s
[Prompt 862] took 0.6157s
[Prompt 789] took 0.6190s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1824] took 0.7498s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1730] took 0.9956s
Predicting prompt 42 [test-openai-endpoint]
Predicting prompt 266 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 431 [test-openai-endpoint]
Predicting prompt 777 [test-openai-endpoint]
Predicting prompt 42 [test-openai-endpoint]
Predicting prompt 266 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 431 [test-openai-endpoint]
Predicting prompt 777 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 42] took 0.5191s
[Prompt 42] took 0.5639s
[Prompt 777] took 0.5658s
[Prompt 777] took 0.5826s
[Prompt 431] took 0.5880s
[Prompt 266] took 0.5916s
[Prompt 395] took 0.6047s
[Prompt 266] took 0.6085s
[Prompt 431] took 0.6120s
[Prompt 395] took 0.7020s
Predicting prompt 166 [test-openai-endpoint]
Predicting prompt 1061 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1723 [test-openai-endpoint]
Predicting prompt 2095 [test-openai-endpoint]
Predicting prompt 166 [test-openai-endpoint]
Predicting prompt 1061 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1723 [test-openai-endpoint]
Predicting prompt 2095 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 166] took 0.4682s
[Prompt 1723] took 0.4799s
[Prompt 1061] took 0.4989s
[Prompt 1578] took 0.4895s
[Prompt 1061] took 0.4991s
[Prompt 2095] took 0.5179s
[Prompt 1723] took 0.5219s
[Prompt 1578] took 0.5286s
[Prompt 2095] took 0.5169s
[Prompt 166] took 0.5487s
Predicting prompt 83 [test-openai-endpoint]
Predicting prompt 531 [test-openai-endpoint]
Predicting prompt 789 [test-openai-endpoint]
Predicting prompt 862 [test-openai-endpoint]
Predicting prompt 1048 [test-openai-endpoint]
Predicting prompt 83 [test-openai-endpoint]
Predicting prompt 531 [test-openai-endpoint]
Predicting prompt 789 [test-openai-endpoint]
Predicting prompt 862 [test-openai-endpoint]
Predicting prompt 1048 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 83] took 0.4962s
[Prompt 1048] took 0.4926s
[Prompt 862] took 0.5145s
[Prompt 531] took 0.5391s
[Prompt 789] took 0.5469s
[Prompt 1048] took 0.5485s
[Prompt 83] took 0.5515s
[Prompt 531] took 0.5614s
[Prompt 789] took 0.5936s
[Prompt 862] took 0.6365s
Predicting prompt 42 [test-openai-endpoint]
Predicting prompt 266 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 431 [test-openai-endpoint]
Predicting prompt 777 [test-openai-endpoint]
Predicting prompt 42 [test-openai-endpoint]
Predicting prompt 266 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 431 [test-openai-endpoint]
Predicting prompt 777 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 395] took 0.4557s
[Prompt 431] took 0.5053s
[Prompt 42] took 0.5140s
[Prompt 266] took 0.5335s
[Prompt 431] took 0.5348s
[Prompt 777] took 0.5687s
[Prompt 395] took 0.5950s
[Prompt 777] took 0.6064s
[Prompt 266] took 0.6358s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 42] took 0.7461s
Predicting prompt 166 [test-openai-endpoint]
Predicting prompt 1061 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1723 [test-openai-endpoint]
Predicting prompt 3105 [test-openai-endpoint]
Predicting prompt 166 [test-openai-endpoint]
Predicting prompt 1061 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1723 [test-openai-endpoint]
Predicting prompt 3105 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1061] took 0.4599s
[Prompt 1578] took 0.4752s
[Prompt 3105] took 0.4700s
[Prompt 166] took 0.5141s
[Prompt 166] took 0.5147s
[Prompt 1723] took 0.5196s
[Prompt 1061] took 0.5450s
[Prompt 1578] took 0.5638s
[Prompt 1723] took 0.6020s
[Prompt 3105] took 0.6031s
Predicting prompt 3156 [test-openai-endpoint]
Predicting prompt 3446 [test-openai-endpoint]
Predicting prompt 6210 [test-openai-endpoint]
Predicting prompt 6918 [test-openai-endpoint]
Predicting prompt 7293 [test-openai-endpoint]
Predicting prompt 3156 [test-openai-endpoint]
Predicting prompt 3446 [test-openai-endpoint]
Predicting prompt 6210 [test-openai-endpoint]
Predicting prompt 6918 [test-openai-endpoint]
Predicting prompt 7293 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 6918] took 0.4529s
[Prompt 6210] took 0.4628s
[Prompt 7293] took 0.4826s
[Prompt 3156] took 0.5390s
[Prompt 6918] took 0.5410s
[Prompt 7293] took 0.5497s
[Prompt 6210] took 0.5579s
[Prompt 3446] took 0.5702s
[Prompt 3446] took 0.5914s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 3156] took 0.6682s
Predicting prompt 332 [test-openai-endpoint]
Predicting prompt 2122 [test-openai-endpoint]
Predicting prompt 3156 [test-openai-endpoint]
Predicting prompt 3446 [test-openai-endpoint]
Predicting prompt 4189 [test-openai-endpoint]
Predicting prompt 332 [test-openai-endpoint]
Predicting prompt 2122 [test-openai-endpoint]
Predicting prompt 3156 [test-openai-endpoint]
Predicting prompt 3446 [test-openai-endpoint]
Predicting prompt 4189 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 4189] took 0.4586s
[Prompt 4189] took 0.4647s
[Prompt 332] took 0.5551s
[Prompt 3156] took 0.5541s
[Prompt 3156] took 0.5902s
[Prompt 332] took 0.5914s
[Prompt 3446] took 0.6192s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 2122] took 0.6681s
[Prompt 2122] took 0.8078s
[Prompt 3446] took 0.8083s
Predicting prompt 42 [test-openai-endpoint]
Predicting prompt 266 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 431 [test-openai-endpoint]
Predicting prompt 524 [test-openai-endpoint]
Predicting prompt 42 [test-openai-endpoint]
Predicting prompt 266 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 431 [test-openai-endpoint]
Predicting prompt 524 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 395] took 0.4464s
[Prompt 431] took 0.4683s
[Prompt 431] took 0.4818s
[Prompt 524] took 0.4928s
[Prompt 395] took 0.5164s
[Prompt 42] took 0.5539s
[Prompt 42] took 0.5497s
[Prompt 266] took 0.5527s
[Prompt 524] took 0.5511s
[Prompt 266] took 0.5771s
Predicting prompt 166 [test-openai-endpoint]
Predicting prompt 1061 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1723 [test-openai-endpoint]
Predicting prompt 3105 [test-openai-endpoint]
Predicting prompt 166 [test-openai-endpoint]
Predicting prompt 1061 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1723 [test-openai-endpoint]
Predicting prompt 3105 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 3105] took 0.4471s
[Prompt 166] took 0.4724s
[Prompt 1578] took 0.5147s
[Prompt 1061] took 0.5286s
[Prompt 166] took 0.5480s
[Prompt 1578] took 0.5442s
[Prompt 3105] took 0.5439s
[Prompt 1723] took 0.5905s
[Prompt 1061] took 0.6160s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1723] took 0.6426s
Predicting prompt 21 [test-openai-endpoint]
Predicting prompt 133 [test-openai-endpoint]
Predicting prompt 198 [test-openai-endpoint]
Predicting prompt 216 [test-openai-endpoint]
Predicting prompt 389 [test-openai-endpoint]
Predicting prompt 21 [test-openai-endpoint]
Predicting prompt 133 [test-openai-endpoint]
Predicting prompt 198 [test-openai-endpoint]
Predicting prompt 216 [test-openai-endpoint]
Predicting prompt 389 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 133] took 0.4830s
[Prompt 216] took 0.4761s
[Prompt 216] took 0.5506s
[Prompt 198] took 0.5467s
[Prompt 21] took 0.5858s
[Prompt 198] took 0.6002s
[Prompt 21] took 0.6529s
[Prompt 389] took 0.6379s
[Prompt 133] took 0.6606s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 389] took 0.6900s
[Benchmarking] Predicting prompts for recipe [bbq] took 7.9236s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [bbq] took 0.0002s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), dataset_id (bbq-lite-age-ambiguous), prompt_template_id (mcq-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), dataset_id (bbq-lite-age-disamb), prompt_template_id (mcq-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), dataset_id (bbq-lite-disability-status-ambiguous), prompt_template_id (mcq-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), datase

## Running a Cookbook in Moonshot

A cookbook in Moonshot is a curated collection of recipes designed to be executed together. This allows for comprehensive testing or benchmarking across multiple scenarios or models. Running a cookbook is similar to running individual recipes but on a larger scale, enabling simultaneous execution of multiple tests.


### Executing the Cookbook

The process of running a cookbook involves creating a runner, which is a task manager that handles the execution of the recipes contained within the cookbook. The runner can be configured with various parameters, such as the number of prompts to use and whether to include a system prompt.

Here's a step-by-step guide to running a cookbook, as demonstrated in the code below:

1. **Define the Runner**: Give your cookbook runner a name and specify the cookbooks and endpoints to use.
2. **Set Execution Parameters**: Determine the number of prompts to test and set optional parameters like `random_seed` and `system_prompt`.
3. **Advanced Configuration**: Optionally, adjust the runner processing module and result processing module.
4. **Execute the Cookbook**: Use the runner to execute the specified cookbooks with the given parameters.
5. **Close the Runner**: After execution, close the runner to ensure proper cleanup.
6. **Review Results**: Access the results of the run, which include performance metrics and other relevant data.

The results, runners and databases are located at ```data/generated-outputs/```

Here's the code that implements the cookbook execution process:


In [14]:
from slugify import slugify
from moonshot.api import api_get_all_run, api_create_runner, api_get_all_runner_name

name = "my new cookbook runner" # Indicate the name
cookbooks = ["test-category-cookbook", "common-risk-easy"] # Test against 2 cookbooks, test-category-cookbook and common-risk-easy
endpoints = ["test-openai-endpoint"] # Test against 1 endpoint, test-openai-endpoint
num_of_prompts = 1 # use a smaller number to test out the function; 0 means using all prompts in dataset

# Below are the optional fields
random_seed = 0   # Default: 0; this allows for randomness in dataset selection when num_of_prompts are set
system_prompt = ""  # Default: ""; this allows setting the system prompt for the endpoints

# Advanced user - Modify runner processing module and result processing module
# Default: benchmarking and benchmarking-result
runner_proc_module = "benchmarking"  # Default: "benchmarking"
result_proc_module = "benchmarking-result"  # Default: "benchmarking-result"

# Run the cookbooks with the defined endpoints
# If the id exists, it will perform a load on the runner, instead of creating a new runner.
# The benefit of this, allows the new run to use possible cached results from previous runs which greatly enhances the run time.
slugify_id = slugify(name, lowercase=True)
if slugify_id in api_get_all_runner_name():
    cb_runner = api_load_runner(slugify_id)
else:
    cb_runner = api_create_runner(name, endpoints)

# run_cookbooks is an async function. Currently there is no sync version.
# We will get an existing event loop and execute the run cookbooks process.
await cb_runner.run_cookbooks(
        cookbooks,
        num_of_prompts,
        random_seed,
        system_prompt,
        runner_proc_module,
        result_proc_module,
    )
cb_runner.close()  # Perform a close on the runner to allow proper cleanup.

# Display results
runner_runs = api_get_all_run(cb_runner.id)
result_info = runner_runs[-1].get("results")
if result_info:
    show_cookbook_results(
        cookbooks, endpoints, result_info, result_info["metadata"]["duration"]
    )
else:
    raise RuntimeError("no run result generated")

Established connection to database (data/generated-outputs/databases/my-new-cookbook-runner.db)
[Runner] my-new-cookbook-runner - Running benchmark cookbook run...
[Run] Part 0: Initialising run...
[Run] Initialise run took 0.0017s
[Run] Part 1: Loading asyncio running loop...
[Run] Part 2: Loading modules...
[Run] Module loading took 0.0024s
[Run] Part 3: Running runner processing module...
[Benchmarking] Load recipe connectors took 0.0098s
[Benchmarking] Set connectors system prompt took 0.0000s
[Benchmarking] Part 1: Running cookbooks (['test-category-cookbook', 'common-risk-easy'])...
[Benchmarking] Running cookbook test-category-cookbook... (1/2)
[Benchmarking] Load required instances...
[Benchmarking] Load cookbook instance took 0.0008s
[Benchmarking] Running cookbook recipes...
[Benchmarking] Running recipe item-category... (1/1)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0009s
[Benchmarking] Load recipe metrics took 0.0010s
[Benchmarkin

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 4] took 0.9297s
[Benchmarking] Predicting prompts for recipe [item-category] took 0.9348s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [item-category] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (item-category), dataset_id (test-dataset), prompt_template_id (test-prompt-template)
[exactstrmatch] Running [get_results] took 0.0000s
[bertscore] Running [get_results] took 0.0000s


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


[Benchmarking] Performing metrics calculation for recipe [item-category] took 1.0303s
[Benchmarking] Running cookbook [test-category-cookbook] took 1.9688s
[Benchmarking] Running cookbook common-risk-easy... (2/2)
[Benchmarking] Load required instances...
[Benchmarking] Load cookbook instance took 0.0007s
[Benchmarking] Running cookbook recipes...
[Benchmarking] Running recipe uciadult... (1/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0075s
[Benchmarking] Load recipe metrics took 0.0004s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset uciadult, using 1 of 32561 prompts.
Predicting prompt 27671 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 27671] took 0.5968s
[Benchmarking] Predicting prompts for recipe [uciadult] took 0.6831s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [uciadult] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (uciadult), dataset_id (uciadult), prompt_template_id (uciadult-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [uciadult] took 0.0000s
[Benchmarking] Running recipe bbq... (2/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0178s
[Benchmarking] Load recipe metrics took 0.0007s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset bbq-lite-age-ambiguous, using 1 of 1840 prompts.
[Benchmarking] Dataset bbq-lite-age-disamb, using 1 of 1840 prompts.
[Benchmarking] Dataset bbq-lite-disability-sta

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 395] took 0.4861s
[Prompt 1730] took 0.5210s
[Prompt 395] took 0.5177s
[Prompt 395] took 0.5301s
[Prompt 1578] took 0.5439s
[Prompt 395] took 0.5577s
[Prompt 789] took 0.5708s
[Prompt 1578] took 0.5759s
[Prompt 1730] took 0.5803s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 789] took 0.7183s
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 6918 [test-openai-endpoint]
Predicting prompt 6918 [test-openai-endpoint]
Predicting prompt 3156 [test-openai-endpoint]
Predicting prompt 3156 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 395 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]
Predicting prompt 1578 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1578] took 0.4967s
[Prompt 1578] took 0.5358s
[Prompt 6918] took 0.5482s
[Prompt 395] took 0.5558s
[Prompt 6918] took 0.5698s
[Prompt 1578] took 0.5742s
[Prompt 395] took 0.5782s
[Prompt 3156] took 0.5941s
[Prompt 1578] took 0.7142s


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 3156] took 0.9710s
Predicting prompt 198 [test-openai-endpoint]
Predicting prompt 198 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 198] took 0.5132s
[Prompt 198] took 0.5174s
[Benchmarking] Predicting prompts for recipe [bbq] took 2.4352s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [bbq] took 0.0001s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), dataset_id (bbq-lite-age-ambiguous), prompt_template_id (mcq-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), dataset_id (bbq-lite-age-disamb), prompt_template_id (mcq-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (bbq), dataset_id (bbq-lite-disability-status-ambiguous), prompt_template_id (mcq-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Running metrics for conn_id (test-openai-endpoint

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 198] took 0.5738s
[Benchmarking] Predicting prompts for recipe [winobias] took 0.5805s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [winobias] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (winobias), dataset_id (winobias-type1), prompt_template_id (no-template)
[reverseexactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [winobias] took 0.0000s
[Benchmarking] Running recipe challenging-toxicity-prompts-completion... (4/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0031s
[Benchmarking] Load recipe metrics took 0.0087s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset challenging-toxicity-prompts, using 1 of 2392 prompts.
Predicting prompt 1578 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 1578] took 0.5797s
[Benchmarking] Predicting prompts for recipe [challenging-toxicity-prompts-completion] took 0.5967s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [challenging-toxicity-prompts-completion] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (challenging-toxicity-prompts-completion), dataset_id (challenging-toxicity-prompts), prompt_template_id (complete-sentence)
[toxicity-classifier] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [challenging-toxicity-prompts-completion] took 0.9853s
[Benchmarking] Running recipe realtime-qa... (5/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0018s
[Benchmarking] Load recipe metrics took 0.0005s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset realtim

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 25] took 0.5642s
[Benchmarking] Predicting prompts for recipe [realtime-qa] took 0.5678s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [realtime-qa] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (realtime-qa), dataset_id (realtimeqa-past), prompt_template_id (no-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [realtime-qa] took 0.0000s
[Benchmarking] Running recipe commonsense-morality-easy... (6/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0020s
[Benchmarking] Load recipe metrics took 0.0005s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset commonsense-morality-easy-variation1, using 1 of 1000 prompts.
Predicting prompt 865 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 865] took 0.4802s
[Benchmarking] Predicting prompts for recipe [commonsense-morality-easy] took 0.4912s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [commonsense-morality-easy] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (commonsense-morality-easy), dataset_id (commonsense-morality-easy-variation1), prompt_template_id (no-template)
[exactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [commonsense-morality-easy] took 0.0000s
[Benchmarking] Running recipe jailbreak-dan... (7/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0036s
[Benchmarking] Load recipe metrics took 0.0011s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset jailbreak-dan, using 1 of 22 prompts.
Predicting prompt 13 [test-o

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 13] took 0.6553s
[Benchmarking] Predicting prompts for recipe [jailbreak-dan] took 0.6596s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [jailbreak-dan] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (jailbreak-dan), dataset_id (jailbreak-dan), prompt_template_id (no-template)
[reverseexactstrmatch] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [jailbreak-dan] took 0.0000s
[Benchmarking] Running recipe advglue... (8/8)
[Benchmarking] Load required instances...
[Benchmarking] Load recipe instance took 0.0026s
[Benchmarking] Load recipe metrics took 0.0018s
[Benchmarking] Build and execute generator pipeline...
[Benchmarking] Dataset advglue-all, using 1 of 721 prompts.
Predicting prompt 395 [test-openai-endpoint]


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[Prompt 395] took 0.5712s
[Benchmarking] Predicting prompts for recipe [advglue] took 0.5806s
[Benchmarking] Sorting the recipe predictions into groups
[Benchmarking] Sorted the recipe predictions into groups for recipe [advglue] took 0.0000s
[Benchmarking] Performing metrics calculation
[Benchmarking] Running metrics for conn_id (test-openai-endpoint), recipe_id (advglue), dataset_id (advglue-all), prompt_template_id (no-template)
[advglue] Running [get_results] took 0.0000s
[Benchmarking] Performing metrics calculation for recipe [advglue] took 0.0000s
[Benchmarking] Running cookbook [common-risk-easy] took 7.6499s
[Benchmarking] Run took 9.6236s
[Benchmarking] Updating completion status...
[Benchmarking] Preparing results...
[Benchmarking] Preparing results took 0.0000s
[Run] Running runner processing module took 9.6347s
[Run] Part 4: Running result processing module...
[BenchmarkingResult] Generate results took 0.0256s
[Run] Running result processing module took 0.0269s
[Run] Part 

## Understanding Runners in Moonshot

Runners in the Moonshot framework are the engines that drive the execution of recipes and cookbooks, as well as facilitate red teaming sessions. They orchestrate the interaction between the Moonshot framework and AI models, managing the flow of data and ensuring that tests are carried out according to the specified parameters.

### Role of Runners

A runner acts as a versatile task manager capable of:

1. **Initiating Communication**: It sends prompts or inputs to the AI model and manages the exchange of information.
2. **Managing Execution**: It oversees the running of multiple recipes or an entire cookbook, allowing for batch processing and parallel testing.
3. **Collecting Results**: It gathers the responses from the AI model, preparing them for analysis and review.
4. **Executing Benchmark Runs**: Runners can execute benchmark runs where they run individual recipes or entire cookbooks, as demonstrated earlier in this notebook.
5. **Conducting Red Team Sessions**: Runners can also be used to conduct red team sessions, which are designed to test the model's robustness against adversarial inputs. Examples of red team sessions will be shown later in this notebook.

### Benchmark and Red Teaming with Runners

Runners provide the flexibility to perform a variety of tests:

- **Benchmark Runs**: You can use runners to execute recipes or cookbooks, which are sets of tests designed to evaluate the AI model's performance on specific tasks.
- **Red Team Sessions**: For more adversarial testing, runners can manage red team sessions, challenging the AI model with scenarios intended to probe its weaknesses and assess its resilience.

## Retrieving the List of Runners

In the Moonshot framework, keeping track of your runners is essential for managing the execution of recipes, cookbooks, and red teaming sessions. To facilitate this, Moonshot provides a function that allows you to retrieve a list of all the runners you have created.

### Displaying Runners Information

To get an overview of the runners, including their IDs, names, and statuses, you can use the `api_get_all_runner()` function. This function returns a comprehensive list of all the runners, which can be useful for monitoring ongoing processes, reviewing past runs, or initiating new tests.

Here's an example of how to use this function:

In [15]:
from moonshot.api import api_get_available_session_info, api_get_all_runner, api_get_all_run

# Retrieve a list of all runners and their information
runner_info = api_get_all_runner()

# Retrieve a list of all runs and their information
runner_run_info = api_get_all_run()

# Retrieve session information for runners that have available sessions
# The function returns a tuple, but we're only interested in the session info here
_, runner_session_info = api_get_available_session_info()

# Display the information about runners, their runs, and session info in a tabular format
# This provides a clear overview of the runners' statuses and activities
display_runners(runner_info, runner_run_info, runner_session_info)

Established connection to database (data/generated-outputs/databases/my-new-recipe-runner.db)
Established connection to database (data/generated-outputs/databases/my-new-cookbook-runner.db)
Established connection to database (data/generated-outputs/databases/my-new-recipe-runner.db)
Established connection to database (data/generated-outputs/databases/my-new-cookbook-runner.db)


## Listing All Runs in a Runner

In the Moonshot framework, you may want to review all the runs that a particular runner has executed. This is useful for tracking the progress of your tests, analyzing results, and ensuring that your evaluations are proceeding as expected.

### Retrieving Run Information

To list all the runs for a runner, you can use the `api_get_all_run()` function. This function provides detailed information about each run, including its ID, status, and any results or metrics collected during the run.

Here's how you can retrieve and list all the runs for a runner:

In [16]:
runner_run_info = api_get_all_run()
display_runs(runner_run_info)

Established connection to database (data/generated-outputs/databases/my-new-recipe-runner.db)
Established connection to database (data/generated-outputs/databases/my-new-cookbook-runner.db)


## List all Prompt Templates

Similarly, to list all available prompt templates, which define the structure of the prompts sent to the AI model, you can call the following function:

In [17]:
prompt_templates = api_get_all_prompt_template_detail()
display_prompt_templates(prompt_templates)

## Additional Moonshot API Functions

While you've become familiar with creating and running recipes and cookbooks, the Moonshot framework offers a suite of additional API functions. These functions extend your capabilities beyond setup and execution, allowing for comprehensive management of endpoints, cookbooks, and red teaming activities. Here's how you can leverage these APIs to gain full control over your AI model evaluation workflow.

### Managing Endpoints

- **Deleting Endpoints**: Clean up your workspace by removing unused endpoints with the `api_delete_endpoint()` function.

### Updating Cookbooks

- **Updating Cookbooks**: Keep your cookbooks current by adding new recipes or modifying existing ones using the `api_update_cookbook()` function.

### Other Useful APIs

- **Listing Connectors and Templates**: Get an overview of all available connectors with `api_get_all_connector_type()` and manage prompt templates with corresponding functions.

- **Retrieving Session Information**: For ongoing red teaming or benchmarking sessions, use `api_get_available_session_info()` to get session IDs and statuses.

- **Refreshing Recipes**: If you've made changes to recipes or added new ones, refresh the list of available recipes with `api_get_all_recipe()`.

Each of these functions is designed to enhance your testing environment, providing you with the tools needed to manage, update, and optimize your AI model evaluations within Moonshot.

## Additional Resources and Contributions

To maximize your experience with the Moonshot framework and GPT-3.5, we encourage you to explore the following resources:

### Comprehensive Documentation
For a deeper understanding of the framework's capabilities and how to utilize them effectively, check out the full documentation:
- [Moonshot Documentation](Link1)

### API Reference
If you need detailed information about the API, including endpoints, request formats, and response structures, refer to the API reference:
- [Moonshot API Reference](Link2)

### Contributing to Moonshot
The Moonshot framework is open to contributions. If you're interested in developing your own connectors or other components, or if you want to contribute to the project in other ways, please refer to the contributor's guide:
- [Contributors Guide](Link3)

Your contributions and feedback are invaluable in helping us improve and expand the capabilities of the Moonshot framework.