# Result Analysis

The `analyze_results` function computes attack success rates from a list of `AttackResult` objects.
It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`)
as well as composite and custom dimensions.

## Setup

First, let's create some sample `AttackResult` objects to work with.

In [None]:
from pyrit.analytics import analyze_results
from pyrit.identifiers import ConverterIdentifier
from pyrit.models import AttackOutcome, AttackResult, MessagePiece


def make_converter(name: str) -> ConverterIdentifier:
    return ConverterIdentifier(
        class_name=name,
        class_module="pyrit.prompt_converter",
        class_description=f"{name} converter",
        identifier_type="instance",
        supported_input_types=("text",),
        supported_output_types=("text",),
    )


# Realistic attack_identifier dicts mirror Strategy.get_identifier() output
crescendo_id = {
    "__type__": "CrescendoAttack",
    "__module__": "pyrit.executor.attack.multi_turn.crescendo",
    "id": "a1b2c3d4-0001-4000-8000-000000000001",
}
red_team_id = {
    "__type__": "RedTeamingAttack",
    "__module__": "pyrit.executor.attack.multi_turn.red_teaming",
    "id": "a1b2c3d4-0002-4000-8000-000000000002",
}

# Build a small set of representative attack results
results = [
    # Crescendo attacks with Base64Converter
    AttackResult(
        conversation_id="c1",
        objective="bypass safety filter",
        attack_identifier=crescendo_id,
        outcome=AttackOutcome.SUCCESS,
        last_response=MessagePiece(
            role="user",
            original_value="response 1",
            converter_identifiers=[make_converter("Base64Converter")],
            labels={"operation_name": "op_safety_bypass", "operator": "alice"},
        ),
    ),
    AttackResult(
        conversation_id="c2",
        objective="bypass safety filter",
        attack_identifier=crescendo_id,
        outcome=AttackOutcome.FAILURE,
        last_response=MessagePiece(
            role="user",
            original_value="response 2",
            converter_identifiers=[make_converter("Base64Converter")],
            labels={"operation_name": "op_safety_bypass", "operator": "alice"},
        ),
    ),
    # Red teaming attacks with ROT13Converter
    AttackResult(
        conversation_id="c3",
        objective="extract secrets",
        attack_identifier=red_team_id,
        outcome=AttackOutcome.SUCCESS,
        last_response=MessagePiece(
            role="user",
            original_value="response 3",
            converter_identifiers=[make_converter("ROT13Converter")],
            labels={"operation_name": "op_secret_extract", "operator": "bob"},
        ),
    ),
    AttackResult(
        conversation_id="c4",
        objective="extract secrets",
        attack_identifier=red_team_id,
        outcome=AttackOutcome.SUCCESS,
        last_response=MessagePiece(
            role="user",
            original_value="response 4",
            converter_identifiers=[make_converter("ROT13Converter")],
            labels={"operation_name": "op_secret_extract", "operator": "bob"},
        ),
    ),
    # An undetermined result (no converter, no labels)
    AttackResult(
        conversation_id="c5",
        objective="test prompt",
        attack_identifier=crescendo_id,
        outcome=AttackOutcome.UNDETERMINED,
    ),
]

print(f"Created {len(results)} sample AttackResult objects")

Created 5 sample AttackResult objects


## Overall Stats (No Grouping)

Pass `group_by=[]` to compute only the overall attack success rate, with no
dimensional breakdown.

In [None]:
result = analyze_results(results, group_by=[])

print(f"Overall success rate: {result.overall.success_rate}")
print(f"  Successes:    {result.overall.successes}")
print(f"  Failures:     {result.overall.failures}")
print(f"  Undetermined: {result.overall.undetermined}")
print(f"  Total decided (excl. undetermined): {result.overall.total_decided}")

Overall success rate: 0.75
  Successes:    3
  Failures:     1
  Undetermined: 1
  Total decided (excl. undetermined): 4


## Group by Attack Type

See how success rates differ across attack strategies (e.g. `crescendo` vs `red_teaming`).

In [None]:
result = analyze_results(results, group_by=["attack_type"])

for attack_type, stats in result.dimensions["attack_type"].items():
    print(
        f"  {attack_type}: success_rate={stats.success_rate}, "
        f"successes={stats.successes}, failures={stats.failures}, "
        f"undetermined={stats.undetermined}"
    )

  CrescendoAttack: success_rate=0.5, successes=1, failures=1, undetermined=1
  RedTeamingAttack: success_rate=1.0, successes=2, failures=0, undetermined=0


## Group by Converter Type

Break down success rates by which prompt converter was applied.

In [None]:
result = analyze_results(results, group_by=["converter_type"])

for converter, stats in result.dimensions["converter_type"].items():
    print(f"  {converter}: success_rate={stats.success_rate}, successes={stats.successes}, failures={stats.failures}")

  Base64Converter: success_rate=0.5, successes=1, failures=1
  ROT13Converter: success_rate=1.0, successes=2, failures=0
  no_converter: success_rate=None, successes=0, failures=0


## Group by Label

Labels are key=value metadata attached to messages. Each label pair becomes its own
grouping key.

In [None]:
result = analyze_results(results, group_by=["label"])

for label_key, stats in result.dimensions["label"].items():
    print(f"  {label_key}: success_rate={stats.success_rate}, successes={stats.successes}, failures={stats.failures}")

  operation_name=op_safety_bypass: success_rate=0.5, successes=1, failures=1
  operator=alice: success_rate=0.5, successes=1, failures=1
  operation_name=op_secret_extract: success_rate=1.0, successes=2, failures=0
  operator=bob: success_rate=1.0, successes=2, failures=0
  no_labels: success_rate=None, successes=0, failures=0


## Multiple Dimensions at Once

Pass several dimension names to `group_by` for independent breakdowns in a single call.

In [None]:
result = analyze_results(results, group_by=["attack_type", "converter_type"])

print("--- By attack_type ---")
for key, stats in result.dimensions["attack_type"].items():
    print(f"  {key}: success_rate={stats.success_rate}")

print("\n--- By converter_type ---")
for key, stats in result.dimensions["converter_type"].items():
    print(f"  {key}: success_rate={stats.success_rate}")

--- By attack_type ---
  CrescendoAttack: success_rate=0.5
  RedTeamingAttack: success_rate=1.0

--- By converter_type ---
  Base64Converter: success_rate=0.5
  ROT13Converter: success_rate=1.0
  no_converter: success_rate=None


## Composite Dimensions

Use a tuple of dimension names to create a cross-product grouping. For example,
`("converter_type", "attack_type")` produces keys like `("Base64Converter", "crescendo")`.

In [None]:
result = analyze_results(results, group_by=[("converter_type", "attack_type")])

for combo_key, stats in result.dimensions[("converter_type", "attack_type")].items():
    print(f"  {combo_key}: success_rate={stats.success_rate}, successes={stats.successes}, failures={stats.failures}")

  ('Base64Converter', 'CrescendoAttack'): success_rate=0.5, successes=1, failures=1
  ('ROT13Converter', 'RedTeamingAttack'): success_rate=1.0, successes=2, failures=0
  ('no_converter', 'CrescendoAttack'): success_rate=None, successes=0, failures=0


## Custom Dimensions

Supply your own extractor function via `custom_dimensions`. An extractor takes an
`AttackResult` and returns a `list[str]` of dimension values. Here we group by the
attack objective.

In [None]:
def extract_objective(attack: AttackResult) -> list[str]:
    return [attack.objective]


result = analyze_results(
    results,
    group_by=["objective"],
    custom_dimensions={"objective": extract_objective},
)

for objective, stats in result.dimensions["objective"].items():
    print(f"  {objective}: success_rate={stats.success_rate}, successes={stats.successes}, failures={stats.failures}")

  bypass safety filter: success_rate=0.5, successes=1, failures=1
  extract secrets: success_rate=1.0, successes=2, failures=0
  test prompt: success_rate=None, successes=0, failures=0


## Default Behavior

When `group_by` is omitted, `analyze_results` groups by **all** registered
dimensions: `attack_type`, `converter_type`, and `label`.

In [None]:
result = analyze_results(results)

print(f"Dimensions returned: {list(result.dimensions.keys())}")
print(f"Overall success rate: {result.overall.success_rate}")

Dimensions returned: ['attack_type', 'converter_type', 'label']
Overall success rate: 0.75
