# Sanctions Screening Evaluation

- **Purpose:** Evaluate sanctions screening accuracy and validate precision/recall targets
- **Author:** Devbrew LLC  
- **Last Updated:** November 17, 2025  
- **Status:** In progress  
- **License:** Apache 2.0

## Overview

This notebook implements the evaluation protocol for the sanctions screening module. The evaluation measures matching accuracy through a labeled test set and validates that the system meets production accuracy targets.

**Evaluation Metrics:**
- Precision@1: Percentage of queries where top candidate is the correct match (target: ≥95%)
- Recall@top3: Percentage of queries where ground truth match appears in top 3 (target: ≥98%)
- False Positive Rate: Percentage of non-matches incorrectly flagged as matches
- Decision Accuracy: Alignment between predicted and expected decision categories

The evaluation validates that the screening system correctly identifies sanctioned entities while minimizing false positives, meeting production readiness requirements.

## Setup: Artifacts and Functions

The evaluation loads artifacts generated by the implementation pipeline:

- **Sanctions Index**: Canonicalized names and metadata (`sanctions_index.parquet`)
- **Blocking Indices**: Inverted indices for candidate retrieval (`blocking_indices.json`)
- **Metadata**: Version tracking and dataset statistics

Helper functions for text normalization, tokenization, and screening are loaded to enable independent evaluation runs without re-executing the full implementation pipeline.

### Environment Configuration

We configure the Python environment with standardized settings, import required libraries, and set a fixed random seed for reproducibility. This ensures consistent evaluation results across runs.

In [10]:
import sys
import warnings
from pathlib import Path
import json
import unicodedata
import re
from typing import Dict, Any, Optional, List, Tuple
import time
import random
from functools import lru_cache
from collections import OrderedDict

import pandas as pd
import numpy as np

import rapidfuzz as rf
from rapidfuzz import fuzz, process

# Configuration
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 100)
pd.set_option("display.float_format", '{:.2f}'.format)

# Reproducibility
RANDOM_STATE = 42
random.seed(RANDOM_STATE)
np.random.seed(RANDOM_STATE)

print("Environment configured successfully")
print(f" pandas: {pd.__version__}")
print(f" numpy: {np.__version__}")
print(f" rapidfuzz: {rf.__version__}")

Environment configured successfully
 pandas: 2.3.3
 numpy: 2.3.3
 rapidfuzz: 3.14.1


### Load Artifacts

The evaluation loads pre-computed artifacts from the implementation pipeline. The sanctions index contains 39,350 canonicalized name records with metadata. Blocking indices enable O(1) candidate retrieval through inverted index lookups.

In [6]:
# Path configuration
PROJECT_ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
MODELS_DIR = PROJECT_ROOT / "packages" / "models"
DATA_DIR = PROJECT_ROOT / "data_catalog" / "processed"


print("Loading artifacts...\n")

# Load sanctions index
sanctions_index_path = MODELS_DIR / "sanctions_index.parquet"
if not sanctions_index_path.exists():
    raise FileNotFoundError(f"Sanctions index not found: {sanctions_index_path}\n"
                          f"Please run notebooks/04_sanctions_screening.ipynb first to generate artifacts.")

sanctions_index = pd.read_parquet(sanctions_index_path)
print(f"Loaded sanctions index: {len(sanctions_index):,} records")

# Load blocking indices
blocking_indices_path = MODELS_DIR / "blocking_indices.json"
if not blocking_indices_path.exists():
    raise FileNotFoundError(f"Blocking indices not found: {blocking_indices_path}\n"
                          f"Please run notebooks/04_sanctions_screening.ipynb first to generate artifacts.")

with open(blocking_indices_path, 'r') as f:
    blocking_indices = json.load(f)

first_token_index = {k: v for k, v in blocking_indices['first_token'].items()}
bucket_index = {k: v for k, v in blocking_indices['bucket'].items()}
initials_index = {k: v for k, v in blocking_indices['initials'].items()}

print(f"Loaded blocking indices:")
print(f" - First token index: {len(first_token_index):,} keys")
print(f" - Bucket index: {len(bucket_index):,} keys")
print(f" - Initials index: {len(initials_index):,} keys")

# Load metadata (optional, for version tracking)
metadata_path = MODELS_DIR / "sanctions_index_metadata.json"
if metadata_path.exists():
    with open(metadata_path, 'r') as f:
        sanctions_index_metadata = json.load(f)
    print(f"\nLoaded metadata: version {sanctions_index_metadata.get('created_at', 'unknown')}")
else:
    sanctions_index_metadata = {}
    print("[Warning] Metadata not found (optional)")

print(f"\nAll artifacts loaded successfully")

Loading artifacts...

Loaded sanctions index: 39,350 records
Loaded blocking indices:
 - First token index: 15,597 keys
 - Bucket index: 4 keys
 - Initials index: 15,986 keys

Loaded metadata: version 2025-11-17T06:00:56.218723

All artifacts loaded successfully


### Helper Functions

Text normalization and tokenization functions are imported from the shared `packages.compliance.sanctions` module. This module provides standardized functions used by both `04_sanctions_screening.ipynb` and this evaluation notebook, ensuring consistency across the screening pipeline.

The shared functions include:
- `normalize_text()`: Text normalization for robust fuzzy matching
- `tokenize()`: Tokenization with stopword filtering

In [19]:
from packages.compliance.sanctions import (
    normalize_text,
    tokenize
)

# Verify imports work
print("Helper functions imported successfully")
print(f"  - normalize_text: {normalize_text.__name__}")
print(f"  - tokenize: {tokenize.__name__}")


Helper functions imported successfully
  - normalize_text: normalize_text
  - tokenize: tokenize
