# Compliance Agent (CRM Validator) - **Double-Check Logic**

This agent validates CRM messages against two regulations:
1. **Spam Regulation** (`불법스팸_방지_안내서_임베딩.json`)
2. **Cosmetics Regulation** (`화장품_지침_임베딩.json`)

**Logic**:
- **Retrieval**: Query Expansion (True RAG).
- **Double-Check**: Validation runs twice. If EITHER run fails, the final verdict is FAIL. This minimizes false positives (unsafe passes).

In [2]:
!pip install openai numpy scikit-learn



In [3]:
import os
import json
import getpass
import numpy as np
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity

# 1. Setup OpenAI
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    api_key = getpass.getpass("Enter your OpenAI API Key: ")

client = OpenAI(api_key=api_key)

def get_embedding(text, model="text-embedding-3-small"):
    text = text.replace("\n", " ")
    return client.embeddings.create(input=[text], model=model).data[0].embedding

In [4]:
# 2. Load Vector Databases
spam_db_path = "불법스팸_방지_안내서_임베딩.json"
cosmetics_db_path = "화장품_지침_임베딩.json"

def load_db(path):
    if not os.path.exists(path):
        print(f"Warning: {path} not found.")
        return []
    with open(path, 'r', encoding='utf-8') as f:
        return json.load(f)

spam_db = load_db(spam_db_path)
cosmetics_db = load_db(cosmetics_db_path)

print(f"Loaded Spam DB: {len(spam_db)} chunks")
print(f"Loaded Cosmetics DB: {len(cosmetics_db)} chunks")

Loaded Spam DB: 40 chunks
Loaded Cosmetics DB: 9 chunks


In [5]:
# 3. True RAG Retrieval Logic (Deep Retrieval with Query Expansion)
def retrieve_top_k(query_embedding, db, k=5):
    if not db:
        return []
    
    db_embeddings = [item['embedding'] for item in db]
    similarities = cosine_similarity([query_embedding], db_embeddings)[0]
    
    # Get top-k indices
    top_indices = similarities.argsort()[-k:][::-1]
    
    results = []
    for idx in top_indices:
        results.append({
            "score": similarities[idx],
            "metadata": db[idx]['metadata']
        })
    return results

def generate_legal_queries(crm_message):
    """
    Smart Query Generation.
    """
    prompt = f"""
    Analyze the CRM message and generate 3 specific legal search queries.
    Goal: Retrieve rules that apply to SMS/LMS, but ALSO common rules for all advertising media (e.g., Article 50).
    
    CRM Message:
    {crm_message}
    
    Generate concise queries for:
    1. SMS-specific marking requirements (Opt-out, Sender ID).
    2. Common advertising prohibitions (False/Exaggerated claims, common to all media).
    3. Product-specific restrictions (e.g., Cosmetics medical claims).
    
    Output List only.
    """
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    
    queries = response.choices[0].message.content.strip().split("\n")
    return [q.split(". ")[-1] for q in queries if q.strip()]

def get_combined_context(message):
    # 1. Query Expansion
    search_queries = generate_legal_queries(message)
    print(f"Generated Search Queries: {search_queries}")
    
    all_spam_docs = []
    all_cosmetics_docs = []
    
    # 2. Retrieve for EACH query
    for q in search_queries:
        q_vec = get_embedding(q)
        all_spam_docs.extend(retrieve_top_k(q_vec, spam_db, k=3))
        all_cosmetics_docs.extend(retrieve_top_k(q_vec, cosmetics_db, k=3))
        
    # Also retrieve for original message
    original_vec = get_embedding(message)
    all_spam_docs.extend(retrieve_top_k(original_vec, spam_db, k=3))
    all_cosmetics_docs.extend(retrieve_top_k(original_vec, cosmetics_db, k=3))
    
    # 3. Deduplicate
    def deduplicate(docs):
        unique_docs = []
        seen_headers = set()
        for doc in docs:
            key = doc['metadata']['header'] + doc['metadata']['content'][:30]
            if key not in seen_headers:
                unique_docs.append(doc)
                seen_headers.add(key)
        return unique_docs

    final_spam_docs = deduplicate(all_spam_docs)
    final_cosmetics_docs = deduplicate(all_cosmetics_docs)
    
    context_text = f"-- [Regulation 1: Spam Prevention & IT Network Act (Total {len(final_spam_docs)})] --\n"
    for doc in final_spam_docs:
        context_text += f"Header: {doc['metadata']['header']}\nContent: {doc['metadata']['content']}\n\n"
        
    context_text += f"\n-- [Regulation 2: Cosmetics Guidelines (Total {len(final_cosmetics_docs)})] --\n"
    for doc in final_cosmetics_docs:
        context_text += f"Header: {doc['metadata']['header']}\nContent: {doc['metadata']['content']}\n\n"
        
    return context_text

In [6]:
# 4. Compliance Check Function (Double-Check Logic)
def _run_single_check(crm_message, run_id):
    """Internal function for a single pass"""
    print(f"  > Run {run_id}: Generating queries and validating...")
    
    # 1. Retrieve Context
    context = get_combined_context(crm_message)
    
    # 2. Construct Prompt
    system_prompt = """
    당신은 한국 기업의 엄격한 컴플라이언스(규제 준수) 담당자입니다.
    입력된 메시지는 **휴대폰 문자 메시지(SMS/LMS)**입니다.
    
    [규정 적용 원칙 - 중요]
    1. 매체 특수성: 문자 메시지 특유의 규칙은 최우선 적용하십시오.
        - 주의: 이메일 전용(제목란 등)이나 팩스 전용 규칙은 배제하십시오.
    2. 공통 규정 적용: 정보통신망법 제50조 등 "영리목적 광고성 정보 전송 시 공통 준수사항"은 매체와 무관하게 적용되므로 놓치지 마십시오.
       - 예: '전송자의 명칭 및 연락처 표시', '수신거부 비용 무료' 등은 공통사항입니다.
    
    [심사 Process]
    1. [Context Regulations]에서 SMS에 적용 가능한 조항과, 모든 매체에 적용되는 공통 조항을 식별하십시오.
    2. [CRM Message]가 해당 조항들을 문자 그대로 준수하는지 대조하십시오.
    
    [출력 양식]
    Case 1: 위반 사항 발견 (FAIL)
    - 판정: [실패]
    - 근거 규정: [Context 조항 명] (예: 정보통신망법 제50조 제4항)
    - 위반 설명: [구체적 내용]
    - 수정 제안 (Before -> After):
      1. [현재] -> [수정]
    
    Case 2: 문제 없음 (PASS)
    - 판정: [통과]
    - 심사 내용: [Context]의 공통 규정(명칭, 연락처, 무료수신거부) 및 SMS 특화 규정((광고)위치) 준수 확인됨.
    """
    
    user_prompt = f"""
    Context Regulations (Source of Truth):
    {context}
    
    CRM Message (SMS/LMS):
    {crm_message}
    
    Check for violations significantly strictly based on Context.
    """
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0
    )
    return response.choices[0].message.content

def check_compliance(crm_message):
    print("Analyzing message with Dual-Pass Logic (2 checks for safety)...\n")
    
    # Run 1
    result1 = _run_single_check(crm_message, 1)
    
    # Run 2
    result2 = _run_single_check(crm_message, 2)
    
    # Conservatism Rule: If ANY run fails, the final verdict is FAIL.
    if "[실패]" in result1:
        print("\n[Final Verdict]: FAIL (Detected in Run 1)")
        return result1
    elif "[실패]" in result2:
        print("\n[Final Verdict]: FAIL (Detected in Run 2 - Run 1 missed it)")
        return result2
    else:
        print("\n[Final Verdict]: PASS (Both runs confirmed safety)")
        return result1 # Doesn't matter which one, both are Pass

In [11]:
# 5. Run Test Cases
message = """
(광고) [삼성물산] 주말 특가 안내 - 삼성물산 : 02-123-4567
고객님, 이번 주말 전 품목 20% 할인 행사를 진행합니다.
무료수신거부 080-1234-5678
"""



print("--- Message ---")
print(check_compliance(message))


--- Message ---
Analyzing message with Dual-Pass Logic (2 checks for safety)...

  > Run 1: Generating queries and validating...
Generated Search Queries: ['SMS-specific marking requirements for opt-out and sender ID in advertising.', 'Common advertising prohibitions against false or exaggerated claims applicable to all media.', 'Product-specific advertising restrictions for cosmetics, including medical claims limitations.']
  > Run 2: Generating queries and validating...
Generated Search Queries: ['"SMS marketing legal requirements opt-out sender ID South Korea"', '"Common advertising prohibitions false exaggerated claims South Korea"', '"Product-specific advertising restrictions cosmetics medical claims South Korea"']

[Final Verdict]: PASS (Both runs confirmed safety)
- 판정: [통과]
- 심사 내용: [Context]의 공통 규정(명칭, 연락처, 무료수신거부) 및 SMS 특화 규정((광고)위치) 준수 확인됨.
