In [6]:
pip install sentence-transformers scikit-learn

Collecting torch>=1.6.0 (from sentence-transformers)
  Obtaining dependency information for torch>=1.6.0 from https://files.pythonhosted.org/packages/1e/86/477ec85bf1f122931f00a2f3889ed9322c091497415a563291ffc119dacc/torch-2.1.2-cp311-none-macosx_11_0_arm64.whl.metadata
  Using cached torch-2.1.2-cp311-none-macosx_11_0_arm64.whl.metadata (25 kB)
Using cached torch-2.1.2-cp311-none-macosx_11_0_arm64.whl (59.6 MB)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 2.4.1
    Uninstalling torch-2.4.1:
      Successfully uninstalled torch-2.4.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.4.1 requires torch==2.4.1, but you have torch 2.1.2 which is incompatible.[0m[31m
[0mSuccessfully installed torch-2.1.2
Note: you may need to restart the kernel to use updated packages.


In [51]:
import google.generativeai as genai
import os
import fitz
import difflib
import re
import json
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from tqdm import tqdm
import time

In [46]:
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")


In [47]:
api_key = os.getenv("GOOGLE_API_KEY")

genai.configure(api_key=api_key)

# Initialize the Gemini model
model = genai.GenerativeModel("gemini-2.0-flash")

In [48]:
# Ask a question
response = model.generate_content("Explain quantum computing in simple terms.")

# Print the response
print(response.text)


Okay, imagine regular computers are like light switches: they can be either ON (representing 1) or OFF (representing 0).

Quantum computers are more like dimmer switches. They can be ON, OFF, or **anywhere in between**, or even a mix of both *at the same time*.  This "in-between" state is called **superposition**.

Here's the breakdown:

* **Classical Bits (Regular Computers):**  Think of a light switch. It's either ON (1) or OFF (0). That's a bit, the basic unit of information.

* **Quantum Bits (Qubits - Quantum Computers):** Think of a dimmer switch. It can be ON (1), OFF (0), or any brightness in between. This is a qubit. Because it can be in multiple states *simultaneously*, it's much more powerful.

**Why is this powerful?**

* **Superposition:**  A qubit can represent 0, 1, or a combination of both *at the same time*.  This means a quantum computer can explore many possibilities simultaneously.  Imagine trying to find the best route through a maze. A regular computer would try e

In [8]:
# Load the MiniLM embedding model from Hugging Face
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

In [9]:
# 1. Example input: LLM-generated "contradicted" law statement
llm_generated_law = "According to Section 18 of the Federal Privacy Act, data must be deleted after 6 months."

# 2. Example real laws in your database (you can load these from a file or dataset)
real_laws = [
    "Under Section 18 of the Federal Privacy Act, users must be notified before data deletion.",
    "According to the Fair Credit Reporting Act, individuals have a right to dispute errors.",
    "The California Consumer Privacy Act requires companies to delete user data upon request.",
    "Section 11 of the Data Privacy Act mandates data retention for at least one year."
]

In [11]:
# 3. Encode both the LLM output and the real laws
llm_embedding = model.encode([llm_generated_law])
real_law_embeddings = model.encode(real_laws)

# 4. Compute cosine similarities
similarities = cosine_similarity(llm_embedding, real_law_embeddings)[0]

# 5. Get top match
best_match_idx = np.argmax(similarities)
best_match = real_laws[best_match_idx]
confidence = similarities[best_match_idx]

# 6. Display result
print(" LLM Generated Law:")
print(llm_generated_law)
print("\n Closest Real Law Match:")
print(best_match)
print(f"\n Similarity Score: {confidence:.4f}")

# Optional: Threshold-based judgment
if confidence > 0.8:
    print("\n Law is likely real or referenced correctly.")
else:
    print("\n Law may be hallucinated or does not closely match any known entry.")

 LLM Generated Law:
According to Section 18 of the Federal Privacy Act, data must be deleted after 6 months.

 Closest Real Law Match:
Under Section 18 of the Federal Privacy Act, users must be notified before data deletion.

 Similarity Score: 0.7701

 Law may be hallucinated or does not closely match any known entry.


# Law Validation

In [126]:
INPUT_FOLDER = "scraped_laws_v7/ommision_legal/"
OUTPUT_FOLDER = "scraped_laws_v7/ommision_legal/law_validation2/"

### Changes
- Confidence score should be dependent on justifcation
- Using pre-trained BERT models to assess tone and key words from the LLM justification, to provide a confidence score(NOT DOING NOW)
- should be contradiction_score, based off how strong the law was contradicted
- contradiction_score should be based off scraped_snippet, perturbation explanation, and law explanation, original text and changed text
- DISCUSS: confidence score vs. contradiction score
    - confidence score: How sure the LLM is that its own output is correct or appropriate(LLM opinion)
    - contradiction score: How strongly the modified legal text contradicts the real law

# Next Steps
- keep binary verification
- keep 3 snippets links

- scrape most updated version of law

## Human verification(TODO)
- expert(someone with Legal background)
- Law Students

In [117]:
"""
    You are a legal verification expert. Your task is to determine how strongly a legal contradiction was created when a contract was modified. This contradiction must be evaluated based on legal standards, using the information provided.

    You are given:
    - The **type** of perturbation introduced
    - The **original contract text**
    - The **changed (perturbed) version** of that text
    - A short **perturbation explanation**, describing how the legal obligation was altered
    - A **law explanation**, generated by an LLM, describing why it believes this change contradicts a specific law
    - **Two scraped law snippets**, pulled from official legal sources (e.g., .gov/.org sites)
    - The definitions for each type of Perturbation. 

    ---

    ### Perturbation Type:
    {perturbation.get("type", "")}

    ### Original Text:
    {perturbation.get("original_text", "")}

    ### Changed Text:
    {perturbation.get("changed_text", "")}

    ### Perturbation Explanation:
    {perturbation.get("explanation", "")}

    ### Law Explanation (Generated by LLM):
    {perturbation.get("law_explanation", "")}

    ### Scraped Law Snippet 1:
    {perturbation.get("scraped_snippet_1", "")}

    ### Scraped Law Snippet 2:
    {perturbation.get("scraped_snippet_2", "")}
    
    ### Perturbation Definitions:
    1. Ambiguity Legal Contradiction: Ambiguities occur when a legal statement is vague, leading to multiple interpretations. A **legal contradiction** under this category happens when an obligation is introduced ambiguously, making it difficult to enforce under state or national law. This can result in non-compliance with regulatory requirements, leaving legal obligations open to dispute.
    2. Omission Legal Contradiction: Omissions occur when a contract **removes essential information**, creating legal loopholes. A **legal contradiction** in this category happens when a contract omits **a legally mandated consumer protection**, making it non-compliant.
    3. Misaligned Terminalogy Legal Contradiction: Misaligned terminology occurs when a contract **uses terms that do not match their statutory meaning**, leading to non-compliance. A **legal contradiction** in this category arises when contractual language deviates from legally recognized definitions, making the contract unenforceable.
    4. Structural Flaws Legal Contradiction: Structural flaws occur when the **organization of a contract affects its clarity or enforceability**. A **legal contradiction** in this category arises when a required legal disclosure is **placed in an irrelevant or misleading section**, making it difficult for parties to locate or interpret.
    5. Inconsistencies Legal Contradiction: Inconsistencies arise when **time-sensitive obligations** in a contract do not align with legal requirements. A **legal contradiction** in this category happens when a contract sets **a deadline or requirement that violates federal or state law**, making the contractual terms unenforceable or illegal.
    
    ---

    ### Your Task (Strict Instructions):

    1. Analyze all the provided inputs thoroughly, along with the definitions to understand what perturbation type was used.
    2. Focus your contradiction assessment on the following 5 core elements:
       - Scraped Law Snippet 1
       - Scraped Law Snippet 2
       - Perturbation Explanation
       - Law Explanation
       - The comparison between the Original and Changed Text

    3. Use **both scraped law snippets**, even if one is missing, short, or incomplete. If both are valid, use both to strengthen your understanding of the law’s scope and language.

    4. Assign a **contradiction_category** of LOW(Little to No contradiction) MEDIUM(In-direct Contradiction), HIGH(very strong, direct contradiction) based on how much the changed contract text violates or weakens the obligation stated in the actual law.
        

    5. Then provide a **justification**, explaining clearly:
       - How the law was contradicted (or not)
       - Which parts of the scraped snippets support your reasoning
       - Why the contradiction score is appropriate, without simply repeating the score

     Do not infer legal meaning beyond what’s in the snippets or explanations. Only use what's provided.

     The contradiction_score and justification must be **independently valid** — don’t base one on the other.

    ---

    ### Output Format (Strictly Follow This):

    {{
      "contradiction_score": "Assign LOW, MEDIUM, HIGH",
      "justification": "A legally grounded explanation of how (and if) the modified text contradicts the law, referencing the scraped snippets where helpful."
    }}

    Return ONLY this JSON object. No markdown, no notes, no commentary — just valid JSON with double quotes.
    """

'\n    You are a legal verification expert. Your task is to determine how strongly a legal contradiction was created when a contract was modified. This contradiction must be evaluated based on legal standards, using the information provided.\n\n    You are given:\n    - The **type** of perturbation introduced\n    - The **original contract text**\n    - The **changed (perturbed) version** of that text\n    - A short **perturbation explanation**, describing how the legal obligation was altered\n    - A **law explanation**, generated by an LLM, describing why it believes this change contradicts a specific law\n    - **Two scraped law snippets**, pulled from official legal sources (e.g., .gov/.org sites)\n    - The definitions for each type of Perturbation. \n\n    ---\n\n    ### Perturbation Type:\n    {perturbation.get("type", "")}\n\n    ### Original Text:\n    {perturbation.get("original_text", "")}\n\n    ### Changed Text:\n    {perturbation.get("changed_text", "")}\n\n    ### Pertur

In [118]:
"""
    You are a legal verification expert. Your task is to determine how strongly a legal contradiction was created when a contract was modified. This contradiction must be evaluated based on legal standards, using the information provided.

    You are given:
    - The **type** of perturbation introduced
    - The **original contract text**
    - The **changed (perturbed) version** of that text
    - A short **perturbation explanation**, describing how the legal obligation was altered
    - A **law explanation**, generated by an LLM, describing why it believes this change contradicts a specific law
    - **Two scraped law snippets**, pulled from official legal sources (e.g., .gov/.org sites)
    - The definitions for each type of Perturbation. 

    ---

    ### Perturbation Type:
    {perturbation.get("type", "")}

    ### Original Text:
    {perturbation.get("original_text", "")}

    ### Changed Text:
    {perturbation.get("changed_text", "")}

    ### Perturbation Explanation:
    {perturbation.get("explanation", "")}

    ### Law Explanation (Generated by LLM):
    {perturbation.get("law_explanation", "")}

    ### Scraped Law Snippet 1:
    {perturbation.get("scraped_snippet_1", "")}

    ### Scraped Law Snippet 2:
    {perturbation.get("scraped_snippet_2", "")}
    
    ### Perturbation Definitions:
    1. Ambiguity Legal Contradiction: Ambiguities occur when a legal statement is vague, leading to multiple interpretations. A **legal contradiction** under this category happens when an obligation is introduced ambiguously, making it difficult to enforce under state or national law. This can result in non-compliance with regulatory requirements, leaving legal obligations open to dispute.
    2. Omission Legal Contradiction: Omissions occur when a contract **removes essential information**, creating legal loopholes. A **legal contradiction** in this category happens when a contract omits **a legally mandated consumer protection**, making it non-compliant.
    3. Misaligned Terminalogy Legal Contradiction: Misaligned terminology occurs when a contract **uses terms that do not match their statutory meaning**, leading to non-compliance. A **legal contradiction** in this category arises when contractual language deviates from legally recognized definitions, making the contract unenforceable.
    4. Structural Flaws Legal Contradiction: Structural flaws occur when the **organization of a contract affects its clarity or enforceability**. A **legal contradiction** in this category arises when a required legal disclosure is **placed in an irrelevant or misleading section**, making it difficult for parties to locate or interpret.
    5. Inconsistencies Legal Contradiction: Inconsistencies arise when **time-sensitive obligations** in a contract do not align with legal requirements. A **legal contradiction** in this category happens when a contract sets **a deadline or requirement that violates federal or state law**, making the contractual terms unenforceable or illegal.
    
    ---

    ### Your Task (Strict Instructions):

    1. Analyze all the provided inputs thoroughly, along with the definitions to understand what perturbation type was used.
    2. Focus your contradiction assessment on the following 5 core elements:
       - Scraped Law Snippet 1
       - Scraped Law Snippet 2
       - Perturbation Explanation
       - Law Explanation
       - The comparison between the Original and Changed Text

    3. Use **both scraped law snippets**, even if one is missing, short, or incomplete. If both are valid, use both to strengthen your understanding of the law’s scope and language.

    4. Assign a **contradiction_score** between **0.0** (no contradiction) and **1.0** (very strong, direct contradiction) based on how much the changed contract text violates or weakens the obligation stated in the actual law.

    5. Then provide a **justification**, explaining clearly:
       - How the law was contradicted (or not)
       - Which parts of the scraped snippets support your reasoning
       - Why the contradiction score is appropriate, without simply repeating the score

     Do not infer legal meaning beyond what’s in the snippets or explanations. Only use what's provided.

     The contradiction_score and justification must be **independently valid** — don’t base one on the other.

    ---

    ### Output Format (Strictly Follow This):

    {{
      "contradiction_score": float between 0.0 and 1.0,
      "justification": "A legally grounded explanation of how (and if) the modified text contradicts the law, referencing the scraped snippets where helpful."
    }}

    Return ONLY this JSON object. No markdown, no notes, no commentary — just valid JSON with double quotes.
    """

'\n    You are a legal verification expert. Your task is to determine how strongly a legal contradiction was created when a contract was modified. This contradiction must be evaluated based on legal standards, using the information provided.\n\n    You are given:\n    - The **type** of perturbation introduced\n    - The **original contract text**\n    - The **changed (perturbed) version** of that text\n    - A short **perturbation explanation**, describing how the legal obligation was altered\n    - A **law explanation**, generated by an LLM, describing why it believes this change contradicts a specific law\n    - **Two scraped law snippets**, pulled from official legal sources (e.g., .gov/.org sites)\n    - The definitions for each type of Perturbation. \n\n    ---\n\n    ### Perturbation Type:\n    {perturbation.get("type", "")}\n\n    ### Original Text:\n    {perturbation.get("original_text", "")}\n\n    ### Changed Text:\n    {perturbation.get("changed_text", "")}\n\n    ### Pertur

# TODO
- add more examples
- Binary Classification (Yes, No)

# Law Validation

In [127]:
def generate_verification_prompt(perturbation):
    return f"""
    You are a legal verification expert. Your task is to determine how strongly a legal contradiction was created when a contract was modified. This contradiction must be evaluated based on legal standards, using the information provided.

    You are given:
    - The **type** of perturbation introduced
    - The **original contract text**
    - The **changed (perturbed) version** of that text
    - A short **perturbation explanation**, describing how the legal obligation was altered
    - A **law explanation**, generated by an LLM, describing why it believes this change contradicts a specific law
    - **Two scraped law snippets**, pulled from official legal sources (e.g., .gov/.org sites)
    - The definitions for each type of Perturbation. 

    ---

    ### Perturbation Type:
    {perturbation.get("type", "")}

    ### Original Text:
    {perturbation.get("original_text", "")}

    ### Changed Text:
    {perturbation.get("changed_text", "")}

    ### Perturbation Explanation:
    {perturbation.get("explanation", "")}

    ### Law Explanation (Generated by LLM):
    {perturbation.get("law_explanation", "")}

    ### Scraped Law Snippet 1:
    {perturbation.get("scraped_snippet_1", "")}

    ### Scraped Law Snippet 2:
    {perturbation.get("scraped_snippet_2", "")}
    
    ### Perturbation Definitions:
    1. Ambiguity Legal Contradiction: Ambiguities occur when a legal statement is vague, leading to multiple interpretations. A **legal contradiction** under this category happens when an obligation is introduced ambiguously, making it difficult to enforce under state or national law. This can result in non-compliance with regulatory requirements, leaving legal obligations open to dispute.
    2. Omission Legal Contradiction: Omissions occur when a contract **removes essential information**, creating legal loopholes. A **legal contradiction** in this category happens when a contract omits **a legally mandated consumer protection**, making it non-compliant.
    3. Misaligned Terminalogy Legal Contradiction: Misaligned terminology occurs when a contract **uses terms that do not match their statutory meaning**, leading to non-compliance. A **legal contradiction** in this category arises when contractual language deviates from legally recognized definitions, making the contract unenforceable.
    4. Structural Flaws Legal Contradiction: Structural flaws occur when the **organization of a contract affects its clarity or enforceability**. A **legal contradiction** in this category arises when a required legal disclosure is **placed in an irrelevant or misleading section**, making it difficult for parties to locate or interpret.
    5. Inconsistencies Legal Contradiction: Inconsistencies arise when **time-sensitive obligations** in a contract do not align with legal requirements. A **legal contradiction** in this category happens when a contract sets **a deadline or requirement that violates federal or state law**, making the contractual terms unenforceable or illegal.
    
    ---

    ### Your Task (Strict Instructions):

    1. Analyze all the provided inputs thoroughly, along with the definitions to understand what perturbation type was used.
    2. Focus your contradiction assessment on the following 5 core elements:
       - Scraped Law Snippet 1
       - Scraped Law Snippet 2
       - Perturbation Explanation
       - Law Explanation
       - The comparison between the Original and Changed Text

    3. Use **both scraped law snippets**, even if one is missing, short, or incomplete. If both are valid, use both to strengthen your understanding of the law’s scope and language.

    4. ### Contradiction Category (Choose ONE: "LOW" or "HIGH")

        You must assign a contradiction category based on how significantly the modified contract text violates or undermines the legal obligation stated in the law.

        This decision should consider:
        - The **importance and clarity** of the legal requirement in the scraped law
        - How much the **modified text weakens, removes, or reverses** that legal requirement
        - Whether the contradiction is **explicit and impactful**, or **subtle and arguable**

        #### LOW Contradiction

        Assign `NO` when:
        - The change introduces **minor weakening**, **softening**, or **interpretive ambiguity**
        - The modified text still **preserves the legal intent**, even if it's more flexible or vague
        - The contradiction is **context-dependent**, **non-obvious**, or requires legal interpretation to identify
        - The contract could **reasonably be interpreted** as still being compliant

        **Examples**:
        - Adjustments that make language more discretionary or subjective  
        - Modifications that reduce precision, but not legal force  
        - Contradictions that rely on tone or nuance

        #### HIGH Contradiction

        Assign `YES` when:
        - The modification results in a **clear non-compliance** with the law
        - The changed text **removes or contradicts core legal obligations**
        - The contradiction is **direct, enforceable, and serious** from a legal or regulatory perspective
        - The modified clause would likely **fail legal scrutiny** or **nullify required protections or standards**

        **Examples**:
        - Contract change eliminates required action or accountability  
        - Legal mandates are replaced by optional or conflicting obligations  
        - Any change that would trigger a **legal violation** or breach of compliance
        

    5. Then provide a **justification**, explaining clearly:
       - How the law was contradicted (or not)
       - Which parts of the scraped snippets support your reasoning
       - Why the contradiction score is appropriate, without simply repeating the score

     Do not infer legal meaning beyond what’s in the snippets or explanations. Only use what's provided.

     The contradiction_score and justification must be **independently valid** — don’t base one on the other.

    ---

    ### Output Format (Strictly Follow This):

    {{
      "contradiction_exists": "Assign NO or YES",
      "justification": "A legally grounded explanation of how (and if) the modified text contradicts the law, referencing the scraped snippets where helpful."
    }}

    Return ONLY this JSON object. No markdown, no notes, no commentary — just valid JSON with double quotes.
    """



# In-text Validation

In [None]:
def generate_intext_contradiction_prompt(perturbation):
    return f"""
You are a contract analyst. Your task is to determine whether a strong **in-text contradiction** was introduced by a modification to the document. An in-text contradiction occurs when one part of a contract directly or implicitly conflicts with another part of the same document, leading to uncertainty, inconsistency, or unenforceable terms.

You are given:
- The **type** of perturbation introduced
- The **original text** that was modified
- The **changed version** of that text
- A short **explanation** of how the change creates contradiction
- A **contradicted section** from elsewhere in the document where the contradiction is triggered
- The definitions of each perturbation type

---

### Perturbation Type:
{perturbation.get("type", "")}

### Original Text:
{perturbation.get("original_text", "")}

### Changed Text:
{perturbation.get("changed_text", "")}

### Perturbation Explanation:
{perturbation.get("explanation", "")}

### Contradicted Text:
{perturbation.get("contradicted_text", "")}

### Perturbation Definitions:
1. Ambiguities – In Text Contradiction: Ambiguities occur when key terms or obligations are defined inconsistently in different parts of a contract, making enforcement difficult or interpretation unclear.
2. Omissions – In Text Contradiction: Omissions occur when essential language is removed or altered in one section, creating a contradiction with obligations or constraints stated in other sections.

---

### Your Task (Strict Instructions):

1. **Compare the changed text** with the **contradicted text**.
2. Determine if the change creates a **clear conflict, inconsistency, or contradiction** within the document. This must be based solely on the provided excerpts.
3. Only assign `"contradiction_exists": "YES"` if the change:
   - Alters or narrows a definition, obligation, or entitlement that **directly breaks compatibility** with the contradicted section
   - Makes enforcement unclear or creates a logical inconsistency
4. Assign `"NO"` if:
   - The changed clause still aligns with the contradicted section
   - Any difference is minor, interpretative, or does not meaningfully impact understanding or enforcement

---

### Contradiction Classification Guidelines:

#### YES (Strong In-Text Contradiction)
Assign `"YES"` if the modified text:
- Removes key terms or items that are later required or referenced
- Changes timing, eligibility, or definitions in a way that breaks consistency
- Introduces ambiguity where the contradicted clause expects certainty

**Examples**:
- Section 2 defines "PRODUCT" excluding 'bags'; Section 5 requires the use of bags → **YES**
- Section 3: "Benefits start on Day 1" → Changed to: "May be eligible after probation"; Section 6: "Enroll in benefits in Week 1" → **YES**

#### NO (No Meaningful Contradiction)
Assign `"NO"` if:
- The change is additive or softens the language but does not conflict
- Both clauses can be reasonably interpreted as compatible
- The contradiction is only apparent with legal inference, not internal inconsistency

**Examples**:
- Changed: "Data must be stored securely" → "Data must be stored securely and backed up" — does not contradict another section → **NO**
- Changed: "Contractors comply with local codes" → "Contractors comply with local and state codes"; Section requires compliance → **NO**

---

### Output Format (Strictly Follow This):

    {{
      "contradiction_exists": "Assign NO or YES",
      "justification": "A legally grounded explanation of how (and if) the modified text contradicts the law, referencing the scraped snippets where helpful."
    }}

    Return ONLY this JSON object. No markdown, no notes, no commentary — just valid JSON with double quotes.
"""

In [121]:
def get_llm_verification(prompt):
    # === Send prompt to LLM (e.g., Gemini) ===
    try:
        response = model.generate_content(prompt)
        content = response.text.strip()
        print("This is the response from LLM:", content)
        clean_json_text = re.sub(r"```json|```", "", content).strip()
        result = json.loads(clean_json_text)
        return result
    except Exception as e:
        print(f"[!] Error during Gemini call: {e}")
        return {"contradiction_score": None, "justification": "Gemini call failed"}

In [122]:
def process_file(file_path, output_path):
    with open(file_path, 'r') as f:
        data = json.load(f)

    root = data[0]    
    
    for perturbation in root.get("perturbation", []):
        prompt = generate_verification_prompt(perturbation)
        result = get_llm_verification(prompt)
        time.sleep(4.5)
        # Attach the result to the current perturbation
        perturbation["contradiction_score"] = result.get("contradiction_score")
        perturbation["justification"] = result.get("justification")

    # Save to output folder
    with open(output_path, 'w') as f_out:
        json.dump(data, f_out, indent=2)

In [123]:
os.makedirs(OUTPUT_FOLDER, exist_ok=True)
input_files = [f for f in os.listdir(INPUT_FOLDER) if f.endswith(".json")]

for file_name in tqdm(input_files, desc="Verifying contradictions"):
    print('this is file_name:', file_name)
    input_path = os.path.join(INPUT_FOLDER, file_name)
    output_path = os.path.join(OUTPUT_FOLDER, file_name)
    process_file(input_path, output_path)

Verifying contradictions:   0%|          | 0/20 [00:00<?, ?it/s]

this is file_name: perturbed_WHITESMOKE,INC_11_08_2011-EX-10.26-PROMOTIONANDDISTRIBUTIONAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The change from 'shall comply' and 'shall ensure' to 'shall endeavor to comply' and 'shall encourage' significantly weakens the binding nature of the Distributor's obligations. The original text created a direct legal duty to comply with the guidelines, while the modified text introduces ambiguity, suggesting a best-effort approach rather than a mandatory requirement. This contradicts the FTC Act, which, according to Scraped Law Snippet 1, aims to 'prevent unfair methods of competition and unfair or deceptive acts or practices in or affecting commerce'. The modified language makes it more difficult to enforce compliance with guidelines designed to prevent such practices, as the Distributor can argue that it 'endeavored' or 'encouraged' compliance without actually ensuring it. Scra

Verifying contradictions:   5%|▌         | 1/20 [00:18<05:59, 18.90s/it]

this is file_name: perturbed_DovaPharmaceuticalsInc_20181108_10-Q_EX-10.2_11414857_EX-10.2_PromotionAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modified text introduces ambiguity and weakens the original obligation, potentially leading to non-compliance with the FD&C Act, as indicated by the law explanation. Replacing 'shall instruct' with 'shall encourage,' reducing monitoring efforts from 'monitor and audit (in accordance with Valeant's standard practice)' to 'from time to time, monitor,' and allowing for 'generally in accordance with' instead of 'in adherence in all respects' significantly dilutes the mandatory nature of compliance. This is a HIGH contradiction because 21 USC 331(a) explicitly prohibits 'the introduction or delivery for introduction into interstate commerce of any food, drug, device, tobacco product, or cosmetic that is adulterated or misbranded.' The weakened language in the modified c

Verifying contradictions:  10%|█         | 2/20 [00:36<05:30, 18.34s/it]

this is file_name: perturbed_PACIRAPHARMACEUTICALS,INC.-A_RSTRATEGICLICENSING,DISTRIBUTIONANDMARKETINGAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "LOW",
  "justification": "The modification changes \"to make or have made\" to \"to consider making or having made,\" which introduces ambiguity regarding EKR's obligation to manufacture products. The Law Explanation suggests this could conflict with postmarketing reporting requirements under 21 CFR § 314.81(b)(3)(vii), which necessitates reporting discontinuance of manufacture, implying a need to maintain adequate supply. However, the scraped snippets (Snippet 1 about 21 CFR 314.81 and Snippet 2 generally referencing CFR Title 21 and FDA regulations) do not explicitly mandate continuous or guaranteed production. While the change introduces uncertainty, it doesn't directly contradict any explicit legal requirement in the provided snippets. The obligation is weakened, making it discretionary, b

Verifying contradictions:  15%|█▌        | 3/20 [00:55<05:11, 18.33s/it]

this is file_name: perturbed_PfHospitalityGroupInc_20150923_10-12G_EX-10.1_9266710_EX-10.1_FranchiseAgreement3.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "LOW",
  "justification": "The change from \"shall not\" to \"should endeavor not to\" weakens the obligation to protect confidential information. The Law Explanation says that this weakening can make it more difficult to enforce non-disclosure agreements, potentially conflicting with the DTSA's goal of protecting trade secrets. The provided snippet from the Defend Trade Secrets Act of 2016 indicates the act's existence and relates to trade secret misappropriation, but doesn't directly address the level of obligation required in contracts regarding confidential information. While the change introduces ambiguity and reduces the enforceability of the non-disclosure agreement, it doesn't explicitly violate the DTSA, as the contract could be interpreted as still intending to provide some level of pr

Verifying contradictions:  20%|██        | 4/20 [01:13<04:55, 18.45s/it]

this is file_name: perturbed_LohaCompanyltd_20191209_F-1_EX-10.16_11917878_EX-10.16_SupplyAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modification introduces ambiguities that significantly weaken the original contract's enforceability, creating a high contradiction. The original text provided clear timelines and obligations for the seller's acceptance or rejection of an order. The altered text replaces these clear terms with vague alternatives such as 'reasonable timeframe,' 'potentially unacceptable,' and 'suitable period.' This shift undermines the definiteness required for a contract to be legally binding. The LLM's Law Explanation references Article 17 of the Contract Law of the People's Republic of China, stating that 'An offer shall become effective when it reaches the offeree.' The change introduces ambiguity around acceptance, which makes it difficult to determine when the contract legally becomes 

Verifying contradictions:  25%|██▌       | 5/20 [01:32<04:36, 18.41s/it]

this is file_name: perturbed_CerenceInc_20191002_8-K_EX-10.4_11827494_EX-10.4_IntellectualPropertyAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "LOW",
  "justification": "The LLM explanation argues that making IP assignment execution and recordation optional undermines the ability to maintain accurate records, potentially violating Delaware law regarding corporate record-keeping. However, the provided scraped snippet (Delaware Code Online Title 8) focuses on mergers and consolidations of domestic corporations, specifically addressing agreements of merger or consolidation. It outlines requirements for such agreements, including terms, conditions, mode of effectuation, and the manner of converting shares. The snippet does not directly mandate how companies must record or maintain records of intellectual property, nor does it explicitly state that IP assignments must be executed without exception. Therefore, while weakening the obligation to 

Verifying contradictions:  30%|███       | 6/20 [01:50<04:18, 18.45s/it]

this is file_name: perturbed_IntegrityFunds_20200121_485BPOS_EX-99.EUNDRCONTR_11948727_EX-99.EUNDRCONTR_ServiceAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modified text weakens the obligation to comply with applicable laws by introducing the subjective standard of 'to the extent Integrity deems practical.' This directly contradicts the Securities Exchange Act of 1934, which mandates strict adherence to regulations, as the Law Explanation indicates. Scraped Law Snippet 2, 15 USC 78o, addresses the 'Registration and regulation of brokers and dealers,' establishing legal requirements for securities activities. By allowing Integrity to bypass legal requirements based on its own assessment of practicality, the modified text creates a significant contradiction with the Act's demand for full compliance and undermines the regulatory framework governing broker-dealer activities. The original 'shall comply' is a cle

Verifying contradictions:  35%|███▌      | 7/20 [02:08<03:59, 18.41s/it]

this is file_name: perturbed_ADAMSGOLFINC_03_21_2005-EX-10.17-ENDORSEMENTAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The original contract outlined specific conditions under which ADAMS GOLF could terminate the agreement, providing a clear framework for both parties. The modified text replaces these objective standards with subjective and vague language, such as 'reasonably believes,' 'somewhat affected,' and 'arguably acted.' This significant weakening of the termination clause directly undermines the implied duty of good faith and fair dealing as argued by the Law Explanation. While the scraped snippets are empty, the Law Explanation highlights that the changes introduce subjectivity, allowing ADAMS GOLF to potentially terminate based on easily manipulated interpretations, potentially acting in bad faith. This is a significant departure from the original contract's objective criteria, creating a high risk of

Verifying contradictions:  40%|████      | 8/20 [02:26<03:39, 18.25s/it]

this is file_name: perturbed_FTENETWORKS,INC_02_18_2016-EX-99.4-STRATEGICALLIANCEAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The change from \"shall comply\" to \"shall endeavor to comply with what they perceive to be applicable laws\" creates a HIGH contradiction because it undermines the mandatory nature of legal compliance, particularly in areas like safety and health regulations. Scraped Law Snippet 1, specifically 1926.20(a)(1), states that contractors and subcontractors cannot require laborers to work in unsanitary, hazardous, or dangerous conditions. By making compliance subjective ('what they perceive'), the modified text allows subcontractors to potentially disregard safety regulations, directly contradicting the legal requirement for maintaining safe working conditions. The original text requires strict adherence, while the altered text introduces ambiguity and discretion, making enforcement difficul

Verifying contradictions:  45%|████▌     | 9/20 [02:44<03:18, 18.01s/it]

this is file_name: perturbed_LIMEENERGYCO_09_09_1999-EX-10-DISTRIBUTORAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "LOW",
  "justification": "The modification introduces ambiguity by changing 'shall' to 'endeavor to' and 'will attempt to,' weakening the company's obligation. The LLM's Law Explanation suggests this could violate the implied covenant of good faith and fair dealing. However, Scraped Law Snippet 1 is about Euro conversion and doesn't address contractual obligations or good faith. Scraped Law Snippet 2 is missing. Without a relevant law snippet directly addressing the issue of 'shall' vs. 'endeavor to' in contractual obligations within the Illinois UCC, the contradiction is not definitively 'HIGH.' It hinges on the interpretation of 'good faith,' making it context-dependent and arguable, therefore categorized as 'LOW'."
}
```
This is the response from LLM: ```json
{
  "contradiction_score": "LOW",
  "justification": "The chang

Verifying contradictions:  50%|█████     | 10/20 [03:02<02:59, 17.91s/it]

this is file_name: perturbed_DOMINIADVISORTRUST_02_18_2005-EX-99.(H)(2)-SPONSORSHIPAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modification significantly weakens the Sponsor's obligations, creating a high risk of non-compliance with Section 15(a) of the Investment Company Act of 1940. The original text uses 'shall perform,' a clear mandate, while the modified text uses 'will endeavor to perform,' which is discretionary. Similarly, 'reasonably requested' becomes 'periodically discussed,' further reducing the Sponsor's commitment. The phrase 'potentially including' introduces ambiguity about the services to be provided. The shift from 'overseeing' to 'offering input' diminishes the Sponsor's responsibility. According to the law explanation, Section 15(a) requires investment advisory contracts to be specific, and these changes introduce vagueness. Scraped Law Snippet 1 refers to 15 USC 80a-15, which covers co

Verifying contradictions:  55%|█████▌    | 11/20 [03:20<02:42, 18.05s/it]

this is file_name: perturbed_MetLife,Inc.-RemarketingAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The original text imposes clear obligations on the Remarketing Agents, while the changed text weakens these obligations to mere contemplations and attempts. This undermines the implied covenant of good faith and fair dealing, which, according to the Law Explanation, requires parties to act honestly and reasonably. The modification introduces ambiguity that allows for a weaker effort, potentially violating the UCC § 1-304. as shown in both scraped snippets, which states \"Every contract or duty within the Uniform Commercial Code imposes an obligation of good faith in its performance and enforcement.\" By replacing definitive actions with vague ones, the modified text creates a high likelihood of non-compliance, directly contradicting the good faith requirement."
}
```
This is the response from LLM: ```json
{
  "cont

Verifying contradictions:  60%|██████    | 12/20 [03:38<02:25, 18.17s/it]

this is file_name: perturbed_VEONEER,INC_02_21_2020-EX-10.11-JOINTVENTUREAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The change from 'governed by and construed in accordance with the laws of Japan' to 'generally guided by principles similar to the laws of Japan' significantly weakens the legal obligation. The original text establishes a clear legal standard, while the modified text introduces ambiguity, making compliance uncertain and potentially unenforceable. While the scraped snippets do not explicitly outline the requirement, the Perturbation Explanation and Law Explanation highlight the risk of conflicting interpretations and diminished legal certainty due to the vague nature of 'generally guided.' This introduces a significant risk of non-compliance, thus creating a High contradiction."
}
```
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The original text m

Verifying contradictions:  65%|██████▌   | 13/20 [03:56<02:05, 17.91s/it]

this is file_name: perturbed_SIBANNAC,INC_12_04_2017-EX-2.1-StrategicAllianceAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modification significantly weakens the obligation to protect confidential information, potentially undermining the enforceability of trade secret protections under the Texas Uniform Trade Secrets Act (TUTSA). The original contract language, \"agree to maintain in strict confidence and not to disclose,\" creates a clear and binding duty. The changed language, \"endeavor to maintain in confidence and avoid disclosure,\" introduces ambiguity and reduces the stringency of the obligation. According to the Law Explanation, this ambiguity makes it more difficult to prove misappropriation under TUTSA because it's less clear what level of care is required. Scraped Law Snippet 1 defines \"Misappropriation\" as the \"disclosure or use of a trade secret of another without express or implied consent.

Verifying contradictions:  70%|███████   | 14/20 [04:15<01:49, 18.24s/it]

this is file_name: perturbed_PREMIERBIOMEDICALINC_05_14_2020-EX-10.2-INTELLECTUALPROPERTYAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modified text significantly weakens the obligation to protect confidential information by replacing 'agrees' with 'should endeavor' and introducing the 'unless business needs indicate otherwise' exception. While Scraped Law Snippet 1 is a 404 error, and Snippet 2 is empty, the Law Explanation correctly points out that PUTSA (Pennsylvania Uniform Trade Secrets Act) provides remedies for misappropriation of trade secrets. The added clause introduces significant ambiguity, potentially undermining the enforceability of confidentiality by allowing parties to disclose information based on a vague 'business needs' justification. This directly contradicts the intent of trade secret protection laws, as it makes proving misappropriation more difficult. Therefore, the changes create a H

Verifying contradictions:  75%|███████▌  | 15/20 [04:33<01:30, 18.14s/it]

this is file_name: perturbed_ThriventVariableInsuranceAccountB_20190701_N-6_EX-99.D(IV)_11720968_EX-99.D(IV)_EndorsementAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modified contract significantly weakens the original obligation for members to make extra payments if the Society becomes insolvent, introducing substantial ambiguity and subjectivity. The original text creates a clear obligation, while the modified text uses terms like 'might be asked to consider,' 'perceived fair share,' and 'reasonable time.' The Law Explanation states that the original text aligns with Wisconsin Statutes Section 611.26(2)(a) requiring a definite and enforceable assessment process. While Scraped Law Snippet 1 does not directly mention section 611.26(2)(a), the explanation uses that section as a basis for the contradiction. The ambiguity introduced by the changes would make the obligation unenforceable under Wisconsin law as t

Verifying contradictions:  80%|████████  | 16/20 [04:51<01:12, 18.09s/it]

this is file_name: perturbed_KIROMICBIOPHARMA,INC_05_11_2020-EX-10.23-CONSULTINGAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The modified text weakens the obligation to protect trade secrets by introducing ambiguity with terms like \"reasonable judgment\" and \"try to avoid.\" This directly undermines the intent of the Texas Uniform Trade Secrets Act (TUTSA). Scraped Law Snippet 1 defines \"Misappropriation\" which includes the \"disclosure or use of a trade secret of another without express or implied consent\" and also defines \"Improper means\" which includes \"breach or inducement of a breach of a duty to maintain secrecy\". By weakening the consultant's obligation, the modified text increases the risk of misappropriation through improper means. The original text provided a stronger safeguard against such actions, making the modified version a HIGH contradiction."
}
```
This is the response from LLM: ```jso

Verifying contradictions:  85%|████████▌ | 17/20 [05:08<00:53, 17.95s/it]

this is file_name: perturbed_NELNETINC_04_08_2020-EX-1-JOINTFILINGAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The change from \"shall be responsible\" to \"shall endeavor to be responsible\" significantly weakens the obligation to file accurate and timely amendments, creating a high contradiction. Additionally, changing the standard from 'knows or has reason to believe' to 'definitive proof' introduces a higher, and potentially unattainable, evidentiary burden. Scraped Law Snippet 1, referencing 17 CFR § 240.13d-1, requires the filing of a statement containing specific information within five business days after acquisition of beneficial ownership. The original contract language aligns with this requirement by mandating responsibility for timely and accurate filings. The modified language introduces ambiguity and allows for potential negligence, undermining the intent of the law to ensure transparency and acco

Verifying contradictions:  90%|█████████ | 18/20 [05:14<00:28, 14.40s/it]

this is file_name: perturbed_CENTRACKINTERNATIONALINC_10_29_1999-EX-10.3-WEBSITEHOSTINGAGREEMENT.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The changed text significantly weakens the original commitment to continuous operation, creating a high contradiction with Florida Statute 501.204, which prohibits unfair or deceptive acts in trade or commerce. The original text guaranteed continuous operation, only allowing for 'reasonable' maintenance within specific timeframes. The modified text replaces 'will maintain' with 'will endeavor to maintain,' changing a guarantee to a mere attempt, and replaces 'best efforts' with 'attempt to schedule,' further reducing the obligation. This directly undermines the consumer's expectation of service uptime. According to Scraped Law Snippet 1, F.S. 501.204(1) declares 'Unfair methods of competition, unconscionable acts or practices, and unfair or deceptive acts or practices in the conduc

Verifying contradictions:  95%|█████████▌| 19/20 [05:33<00:15, 15.66s/it]

this is file_name: perturbed_ReynoldsConsumerProductsInc_20191115_S-1_EX-10.18_11896469_EX-10.18_SupplyAgreement.txt.snippet.json
This is the response from LLM: ```json
{
  "contradiction_score": "HIGH",
  "justification": "The original text mandates strict compliance with applicable laws. The changed text weakens this obligation by using terms like \"generally formed\" and \"alignment with the spirit of the laws,\" which introduces ambiguity and potential for non-compliance. The Law Explanation supports this, stating that ILCS (Illinois Compiled Statutes) requires specific adherence to registration, licensing, and operational requirements, and that the modified text undermines the legal obligation to fully comply with all applicable laws. While Scraped Law Snippet 1, listing Illinois Aeronautics Act and other acts, does not directly demonstrate a specific legal requirement being violated, it provides context on the breadth and importance of statutory compliance within Illinois. The ch

Verifying contradictions: 100%|██████████| 20/20 [05:51<00:00, 17.56s/it]


## Filter after Verification

In [124]:
def filter_high_perturbations(input_folder, output_folder):
    os.makedirs(output_folder, exist_ok=True)

    for file_name in os.listdir(input_folder):
        if not file_name.endswith(".json"):
            continue

        input_path = os.path.join(input_folder, file_name)
        with open(input_path, "r") as f:
            data = json.load(f)
            root = data[0] if isinstance(data, list) else data

        # Filter out only HIGH contradiction_score perturbations
        high_perturbations = [
            pert for pert in root.get("perturbation", [])
            if pert.get("contradiction_score", "").strip().upper() == "HIGH"
        ]

        # Skip files that have no HIGH perturbations
        if not high_perturbations:
            print(f"Skipping file (no HIGH found): {file_name}")
            continue

        # Build new filtered JSON
        filtered_json = [{
            "file_name": root.get("file_name"),
            "perturbation": high_perturbations
        }]

        # Write to output folder
        output_path = os.path.join(output_folder, file_name)
        with open(output_path, "w") as f_out:
            json.dump(filtered_json, f_out, indent=2)

        print(f"Saved filtered file: {file_name} with {len(high_perturbations)} HIGH perturbation(s)")

In [125]:
filter_high_perturbations(
    input_folder="scraped_laws_v7/ambiguity_legal/law_validation3/",
    output_folder="scraped_laws_v7/ambiguity_legal_cleaned/"
)

Saved filtered file: perturbed_WHITESMOKE,INC_11_08_2011-EX-10.26-PROMOTIONANDDISTRIBUTIONAGREEMENT.txt.snippet.json with 3 HIGH perturbation(s)
Saved filtered file: perturbed_DovaPharmaceuticalsInc_20181108_10-Q_EX-10.2_11414857_EX-10.2_PromotionAgreement.txt.snippet.json with 3 HIGH perturbation(s)
Saved filtered file: perturbed_PACIRAPHARMACEUTICALS,INC.-A_RSTRATEGICLICENSING,DISTRIBUTIONANDMARKETINGAGREEMENT.txt.snippet.json with 2 HIGH perturbation(s)
Saved filtered file: perturbed_PfHospitalityGroupInc_20150923_10-12G_EX-10.1_9266710_EX-10.1_FranchiseAgreement3.txt.snippet.json with 2 HIGH perturbation(s)
Saved filtered file: perturbed_LohaCompanyltd_20191209_F-1_EX-10.16_11917878_EX-10.16_SupplyAgreement.txt.snippet.json with 3 HIGH perturbation(s)
Saved filtered file: perturbed_CerenceInc_20191002_8-K_EX-10.4_11827494_EX-10.4_IntellectualPropertyAgreement.txt.snippet.json with 2 HIGH perturbation(s)
Saved filtered file: perturbed_IntegrityFunds_20200121_485BPOS_EX-99.EUNDRCONTR