⚠️ Deprecated — this package is no longer maintained.
assert_llm_toolshas been superseded by focused, independently-versioned packages:
Capability New package Install Summary evaluation assert-eval pip install assert-evalCompliance note evaluation assert-review pip install assert-reviewVersion 1.0.0 is the final release. No further updates will be made. Please migrate to the packages above.
Automated Summary Scoring & Evaluation of Retained Text
ASSERT LLM Tools is a lightweight Python library for LLM-based text evaluation. It provides two main capabilities:
- Summary evaluation — score a summary against source text for coverage, factual accuracy, coherence, and more
- Compliance note evaluation — evaluate adviser meeting notes against regulatory frameworks (FCA, MiFID II) and return a structured gap report
All evaluation is LLM-based. No PyTorch, no BERT, no heavy dependencies.
pip install assert-llm-toolsfrom assert_llm_tools import evaluate_summary, LLMConfig
config = LLMConfig(
provider="bedrock",
model_id="us.amazon.nova-pro-v1:0",
region="us-east-1",
)
results = evaluate_summary(
full_text="Original long text goes here...",
summary="Summary to evaluate goes here...",
metrics=["coverage", "factual_consistency", "coherence"],
llm_config=config,
)
print(results)
# {'coverage': 0.85, 'factual_consistency': 0.92, 'coherence': 0.88}from assert_llm_tools import evaluate_note, LLMConfig
config = LLMConfig(
provider="bedrock",
model_id="us.amazon.nova-pro-v1:0",
region="us-east-1",
)
report = evaluate_note(
note_text="Client meeting note text goes here...",
framework="fca_suitability_v1",
llm_config=config,
)
print(report.overall_rating) # "Compliant" / "Minor Gaps" / "Requires Attention" / "Non-Compliant"
print(report.overall_score) # 0.0–1.0
print(report.passed) # True / False
for item in report.items:
print(f"{item.element_id}: {item.status} (score: {item.score:.2f})")
if item.suggestions:
for s in item.suggestions:
print(f" → {s}")| Metric | Description |
|---|---|
coverage |
How completely the summary captures claims from the source text |
factual_consistency |
Whether claims in the summary are supported by the source |
factual_alignment |
Combined coverage + consistency score |
topic_preservation |
How well the summary preserves the main topics |
conciseness |
Information density — does the summary avoid padding? |
redundancy |
Detects repetitive content within the summary |
coherence |
Logical flow and readability of the summary |
Deprecated names (still accepted for backwards compatibility):
faithfulness→ usecoverage;hallucination→ usefactual_consistency.
Tailor LLM evaluation criteria for your domain:
results = evaluate_summary(
full_text=text,
summary=summary,
metrics=["coverage", "factual_consistency"],
llm_config=config,
custom_prompt_instructions={
"coverage": "Apply strict standards. Only mark a claim as covered if it is clearly and explicitly represented.",
"factual_consistency": "Flag any claim that adds detail not present in the original text.",
},
)Pass verbose=True to include per-claim reasoning in the results:
results = evaluate_summary(..., verbose=True)
⚠️ Experimental — do not use in live or production systems.
evaluate_note()is under active development. Outputs are non-deterministic (LLM-based), the API may change between releases, and results have not been validated against real regulatory decisions. This feature is intended for research, prototyping, and internal tooling only. It is not a substitute for qualified compliance review and must not be used to make or support live regulatory or client-facing decisions.
from assert_llm_tools import evaluate_note, LLMConfig
from assert_llm_tools.metrics.note.models import PassPolicy
report = evaluate_note(
note_text=note,
framework="fca_suitability_v1", # built-in ID or path to a custom YAML
llm_config=config,
mask_pii=False, # mask client PII before sending to LLM
verbose=False, # include LLM reasoning in GapItem.notes
custom_instruction=None, # additional instruction appended to all element prompts
pass_policy=None, # custom PassPolicy (see below)
metadata={"note_id": "N-001"}, # arbitrary key/value pairs, passed through to GapReport
)| Field | Type | Description |
|---|---|---|
framework_id |
str |
Framework used for evaluation |
framework_version |
str |
Framework version |
passed |
bool |
Whether the note passes the framework's policy thresholds |
overall_score |
float |
Weighted mean element score, 0.0–1.0 |
overall_rating |
str |
Human-readable compliance rating (see below) |
items |
List[GapItem] |
Per-element evaluation results |
summary |
str |
LLM-generated narrative summary of the evaluation |
stats |
GapReportStats |
Counts by status and severity |
pii_masked |
bool |
Whether PII masking was applied |
metadata |
dict |
Caller-supplied metadata, passed through unchanged |
Overall rating values:
| Rating | Meaning |
|---|---|
Compliant |
Passed — all elements fully present |
Minor Gaps |
Passed — but some elements are partial or optional elements missing |
Requires Attention |
Failed — high/medium gaps, no critical blockers |
Non-Compliant |
Failed — one or more critical required elements missing or below threshold |
| Field | Type | Description |
|---|---|---|
element_id |
str |
Element identifier from the framework |
status |
str |
"present", "partial", or "missing" |
score |
float |
0.0–1.0 quality score for this element |
evidence |
Optional[str] |
Quote or paraphrase from the note supporting the assessment. None when element is missing. |
severity |
str |
"critical", "high", "medium", or "low" |
required |
bool |
Whether this element is required by the framework |
suggestions |
List[str] |
Actionable remediation suggestions for gaps (empty when status == "present") |
notes |
Optional[str] |
LLM reasoning (only populated when verbose=True) |
| Framework ID | Description |
|---|---|
fca_suitability_v1 |
FCA suitability note requirements under COBS 9.2 / PS13/1 (9 elements) |
Pass a path to your own YAML file:
report = evaluate_note(
note_text=note,
framework="/path/to/my_framework.yaml",
llm_config=config,
)The YAML schema mirrors the built-in frameworks. See assert_llm_tools/frameworks/fca_suitability_v1.yaml for a reference example.
from assert_llm_tools.metrics.note.models import PassPolicy
policy = PassPolicy(
critical_partial_threshold=0.5, # partial critical element treated as blocker if score < this
required_pass_threshold=0.6, # required element must score >= this to pass
score_correction_missing_cutoff=0.2,
score_correction_present_min=0.5,
score_correction_present_floor=0.7,
)
report = evaluate_note(note_text=note, framework="fca_suitability_v1", pass_policy=policy, llm_config=config)from assert_llm_tools import LLMConfig
# AWS Bedrock
config = LLMConfig(
provider="bedrock",
model_id="us.amazon.nova-pro-v1:0",
region="us-east-1",
# api_key / api_secret / aws_session_token for explicit credentials (optional — uses ~/.aws by default)
)
# OpenAI
config = LLMConfig(
provider="openai",
model_id="gpt-4o",
api_key="your-openai-api-key",
)| Model Family | Example Model IDs |
|---|---|
| Amazon Nova | us.amazon.nova-pro-v1:0, amazon.nova-lite-v1:0 |
| Anthropic Claude | anthropic.claude-3-sonnet-20240229-v1:0 |
| Meta Llama | meta.llama3-70b-instruct-v1:0 |
| Mistral AI | mistral.mistral-large-2402-v1:0 |
| Cohere Command | cohere.command-r-plus-v1:0 |
| AI21 Labs | ai21.jamba-1-5-large-v1:0 |
# Single proxy
config = LLMConfig(provider="bedrock", model_id="...", region="us-east-1",
proxy_url="http://proxy.example.com:8080")
# Protocol-specific
config = LLMConfig(provider="bedrock", model_id="...", region="us-east-1",
http_proxy="http://proxy.example.com:8080",
https_proxy="http://proxy.example.com:8443")
# Authenticated proxy
config = LLMConfig(provider="bedrock", model_id="...", region="us-east-1",
proxy_url="http://username:password@proxy.example.com:8080")Standard HTTP_PROXY / HTTPS_PROXY environment variables are also respected.
Apply PII detection and masking before any text is sent to the LLM:
# Summary evaluation
results = evaluate_summary(
full_text=text, summary=summary, metrics=["coverage"],
llm_config=config, mask_pii=True,
)
# Note evaluation
report = evaluate_note(note_text=note, framework="fca_suitability_v1",
llm_config=config, mask_pii=True)Note:
mask_pii=Falseis the default. For production use with real client data, setmask_pii=True. Output files (e.g.--output report.json) may contain verbatim evidence quotes — treat them accordingly.
MIT