# Contract Review Workflow

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/contract_review.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/contract_review.png?raw=1)

This tutorial shows you how to create an agentic workflow that can review a contract for compliance with certain regulations. We will parse the contract into a set of key clauses, match it with relevant clauses from a guideline repository (here, we specifically do GDPR), and then produce a compliance summary.

In [2]:
!pip install llama-index llama-index-indices-managed-llama-cloud llama-cloud llama-parse

Collecting llama-index
  Downloading llama_index-0.12.23-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-indices-managed-llama-cloud
  Downloading llama_index_indices_managed_llama_cloud-0.6.8-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-cloud
  Downloading llama_cloud-0.1.14-py3-none-any.whl.metadata (902 bytes)
Collecting llama-parse
  Downloading llama_parse-0.6.4.post1-py3-none-any.whl.metadata (6.9 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.6-py3-none-any.whl.metadata (727 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.1 (from llama-index)
  Downloading llama_index_cli-0.4.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.23 (from llama-index)
  Downloading llama_index_core-0.12.23.post2-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metad

In [3]:
import nest_asyncio
nest_asyncio.apply()

## Setup

We setup an index for guidelines. In this case it's just the GDPR document.

We also setup our parser.

In [4]:
!mkdir -p data
!wget "https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679" -O data/gdpr.pdf

--2025-03-11 02:35:43--  https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
Resolving eur-lex.europa.eu (eur-lex.europa.eu)... 99.84.252.45, 99.84.252.86, 99.84.252.46, ...
Connecting to eur-lex.europa.eu (eur-lex.europa.eu)|99.84.252.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/pdf]
Saving to: ‘data/gdpr.pdf’

data/gdpr.pdf           [ <=>                ] 959.27K  --.-KB/s    in 0.1s    

2025-03-11 02:35:44 (6.69 MB/s) - ‘data/gdpr.pdf’ saved [982296]



### Setup Index
Here we use the open-source VectorStoreIndex.

In [5]:
!pip install llama-index-llms-huggingface

Collecting llama-index-llms-huggingface
  Downloading llama_index_llms_huggingface-0.4.2-py3-none-any.whl.metadata (2.9 kB)
Collecting text-generation<0.8.0,>=0.7.0 (from llama-index-llms-huggingface)
  Downloading text_generation-0.7.0-py3-none-any.whl.metadata (8.5 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3.0.0,>=2.1.2->llama-index-llms-huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch<3.0.0,>=2.1.2->llama-index-llms-huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch<3.0.0,>=2.1.2->llama-index-llms-huggingface)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch<3.0.0,>=2.1.2->llama-index-llms-huggingface)
  Downloading nvidia_cudnn_cu12

In [7]:
!pip install llama-index-embeddings-huggingface json_repair llama-index-utils-workflow

Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.5.2-py3-none-any.whl.metadata (767 bytes)
Collecting json_repair
  Downloading json_repair-0.39.1-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-utils-workflow
  Downloading llama_index_utils_workflow-0.3.0-py3-none-any.whl.metadata (665 bytes)
Collecting pyvis<0.4.0,>=0.3.2 (from llama-index-utils-workflow)
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Collecting jedi>=0.16 (from ipython>=5.3.0->pyvis<0.4.0,>=0.3.2->llama-index-utils-workflow)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading llama_index_embeddings_huggingface-0.5.2-py3-none-any.whl (8.9 kB)
Downloading json_repair-0.39.1-py3-none-any.whl (20 kB)
Downloading llama_index_utils_workflow-0.3.0-py3-none-any.whl (2.8 kB)
Downloading pyvis-0.3.2-py3-none-any.whl (756 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m756.0/756.0 kB[0m [31m14.9 MB/s[0m eta [36m

In [8]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

In [10]:
retriever = index.as_retriever(similarity_top_k=1)

### Setup Parser

Here we use LlamaParse to parse the vendor agremeent.

In [11]:
import os
from llama_parse import LlamaParse
os.environ['LLAMA_CLOUD_API_KEY']='<llama_cloud_token>'
# use our multimodal models for extractions
parser = LlamaParse(result_type="markdown")

### Define Contract Output Schema

We want to extract relevant clauses from the agreement in order to match it against relevant clauses in the GDPR. This schema defines a way to structuring the set of extracted clauses.

In [12]:
from typing import List, Optional
from pydantic import BaseModel, Field

class ContractClause(BaseModel):
    clause_text: str = Field(..., description="The exact text of the clause.")
    mentions_data_processing: bool = Field(False, description="True if the clause involves personal data collection or usage.")
    mentions_data_transfer: bool = Field(False, description="True if the clause involves transferring personal data, especially to third parties or across borders.")
    requires_consent: bool = Field(False, description="True if the clause explicitly states that user consent is needed for data activities.")
    specifies_purpose: bool = Field(False, description="True if the clause specifies a clear purpose for data handling or transfer.")
    mentions_safeguards: bool = Field(False, description="True if the clause mentions security measures or other safeguards for data.")

class ContractExtraction(BaseModel):
    vendor_name: Optional[str] = Field(None, description="The vendor's name if identifiable.")
    effective_date: Optional[str] = Field(None, description="The effective date of the agreement, if available.")
    governing_law: Optional[str] = Field(None, description="The governing law of the contract, if stated.")
    clauses: List[ContractClause] = Field(..., description="List of extracted clauses and their relevant indicators.")

### Define Compliance Check Schema

Define a schema that matches clauses with relevant guidelines in GDPR.

In [13]:
from typing import Optional
from pydantic import BaseModel, Field

class GuidelineMatch(BaseModel):
    guideline_text: str = Field(..., description="The single most relevant guideline excerpt related to this clause.")
    similarity_score: float = Field(..., description="Similarity score indicating how closely the guideline matches the clause, e.g., between 0 and 1.")
    relevance_explanation: Optional[str] = Field(None, description="Brief explanation of why this guideline is relevant.")



class ClauseComplianceCheck(BaseModel):
    clause_text: str = Field("默认条款文本", description="The exact text of the clause from the contract.")
    matched_guideline: Optional[GuidelineMatch] = Field(None, description="The most relevant guideline extracted via vector retrieval.")
    compliant: bool = Field(True, description="Indicates whether the clause is considered compliant with the referenced guideline.")
    notes: Optional[str] = Field("无附加说明", description="Additional commentary or recommendations.")


### Define Final Output Schema

This is the schema for the final compliance report. It contains the vendor name, if it's overall compliant, and also the summary notes.

It will be inferred from the individual checks for every clause.

In [14]:
from typing import Optional, List
from pydantic import BaseModel, Field

class ComplianceReport(BaseModel):
    vendor_name: Optional[str] = Field(None, description="The vendor's name if identified from the contract.")
    overall_compliant: bool = Field(..., description="Indicates if the contract is considered overall compliant.")
    summary_notes: Optional[str] = Field(None, description="General summary or recommendations for achieving full compliance.")

### Test Schema to prompt

In [15]:
import re

GLM_JSON_RESPONSE_PREFIX = """You should always follow the instructions and output a valid JSON object.
The structure of the JSON object you can found in the instructions, use {format_json} as the default structure
if you are not sure about the structure.

And you should always end the block with a "```" to indicate the end of the JSON object.

<instructions>
"""

GLM_JSON_RESPONSE_SUFFIX = """Output:
</instructions>

"""
# TODO </instructions>下方需要有一个换行符 \r\n

PATTERN = re.compile(r"```(?:json\s+)?(\W.*?)```", re.DOTALL)
"""Regex pattern to parse the output."""
from llama_index.core.output_parsers.pydantic import PydanticOutputParser


output_parser1=PydanticOutputParser(output_cls=ComplianceReport)

output_parser1.get_format_string()

'\nHere\'s a JSON schema to follow:\n{{"properties": {{"vendor_name": {{"anyOf": [{{"type": "string"}}, {{"type": "null"}}], "default": null, "description": "The vendor\'s name if identified from the contract.", "title": "Vendor Name"}}, "overall_compliant": {{"description": "Indicates if the contract is considered overall compliant.", "title": "Overall Compliant", "type": "boolean"}}, "summary_notes": {{"anyOf": [{{"type": "string"}}, {{"type": "null"}}], "default": null, "description": "General summary or recommendations for achieving full compliance.", "title": "Summary Notes"}}}}, "required": ["overall_compliant"], "title": "ComplianceReport", "type": "object"}}\n\nOutput a valid JSON object but do not repeat the schema.\n'

In [16]:
output_parser1.get_format_string(escape_json=False)

'\nHere\'s a JSON schema to follow:\n{"properties": {"vendor_name": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "description": "The vendor\'s name if identified from the contract.", "title": "Vendor Name"}, "overall_compliant": {"description": "Indicates if the contract is considered overall compliant.", "title": "Overall Compliant", "type": "boolean"}, "summary_notes": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "description": "General summary or recommendations for achieving full compliance.", "title": "Summary Notes"}}, "required": ["overall_compliant"], "title": "ComplianceReport", "type": "object"}\n\nOutput a valid JSON object but do not repeat the schema.\n'

In [None]:
|# messages = ChatPromptTemplate.from_messages([
#             ("system",  f"{GLM_JSON_RESPONSE_PREFIX}{output_parser1.format_string}"),
#             ("user",  f"{CONTRACT_EXTRACT_PROMPT}{GLM_JSON_RESPONSE_SUFFIX}")
#         ])

## Setup Contract Review Workflow

Let's define the following contract review workflow:
1. Extract out structured data from the vendor agreement.
2. For each clause, do retrieval against GDPR to see if it's compliant with guidelines.
3. Generate a final summary.

In [17]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step,
)
from llama_index.core.llms import LLM
from typing import Optional
from pydantic import BaseModel
from llama_index.core import SimpleDirectoryReader
from llama_index.core.schema import Document
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.retrievers import BaseRetriever
from pathlib import Path
import logging
import json
import os
from try_parse_json_object import try_parse_json_object

from logs import get_config_dict, get_log_file, get_timestamp_ms

logging_conf = get_config_dict(
        "debug",
        get_log_file(log_path="logs", sub_dir=f"local_{get_timestamp_ms()}"),
        1024 * 1024 * 1024 * 3,
        1024 * 1024 * 1024 * 3,
        )
logging.config.dictConfig(logging_conf)  # type: ignore

_logger = logging.getLogger(__name__)

CONTRACT_EXTRACT_PROMPT = """\
You are given contract data below. \
Please extract out relevant information from the contract into the defined schema - the schema is defined as a function call.\

{contract_data}
"""

CONTRACT_MATCH_PROMPT = """\
Given the following contract clause and the corresponding relevant guideline text, evaluate the compliance \
and provide a JSON object that matches the ClauseComplianceCheck schema.

**Contract Clause:**
{clause_text}

**Matched Guideline Text(s):**
{guideline_text}
"""


COMPLIANCE_REPORT_SYSTEM_PROMPT = """\
You are a compliance reporting assistant. Your task is to generate a final compliance report \
based on the results of clause compliance checks against \
a given set of guidelines.

Analyze the provided compliance results and produce a structured report according to the specified schema.
Ensure that if there are no noncompliant clauses, the report clearly indicates full compliance.
"""

COMPLIANCE_REPORT_USER_PROMPT = """\
A set of clauses within a contract were checked against GDPR compliance guidelines for the following vendor: {vendor_name}.
The set of noncompliant clauses are given below.

Each section includes:
- **Clause:** The exact text of the contract clause.
- **Guideline:** The relevant GDPR guideline text.
- **Compliance Status:** Should be `False` for noncompliant clauses.
- **Notes:** Additional information or explanations.

{compliance_results}

Based on the above compliance results, generate a final compliance report following the `ComplianceReport` schema below.
If there are no noncompliant clauses, the report should indicate that the contract is fully compliant.
"""


class ContractExtractionEvent(Event):
    contract_extraction: ContractExtraction


class MatchGuidelineEvent(Event):
    clause: ContractClause


class MatchGuidelineResultEvent(Event):
    result: ClauseComplianceCheck


class GenerateReportEvent(Event):
    match_results: List[ClauseComplianceCheck]


class LogEvent(Event):
    msg: str
    delta: bool = False


class ContractReviewWorkflow(Workflow):
    """Contract review workflow."""

    def __init__(
        self,
        parser: LlamaParse,
        guideline_retriever: BaseRetriever,
        llm: LLM | None = None,
        similarity_top_k: int = 20,
        output_dir: str = "data_out",
        **kwargs,
    ) -> None:
        """Init params."""
        super().__init__(**kwargs)

        self.parser = parser
        self.guideline_retriever = guideline_retriever

        self.llm = llm
        self.similarity_top_k = similarity_top_k

        # if not exists, create
        out_path = Path(output_dir) / "workflow_output"
        if not out_path.exists():
            out_path.mkdir(parents=True, exist_ok=True)
            os.chmod(str(out_path), 0o0777)
        self.output_dir = out_path

    @step
    async def parse_contract(
        self, ctx: Context, ev: StartEvent
    ) -> ContractExtractionEvent:
        # load output template file
        contract_extraction_path = Path(
            f"{self.output_dir}/contract_extraction.json"
        )
        if contract_extraction_path.exists():
            if self._verbose:
                ctx.write_event_to_stream(LogEvent(msg=">> Loading contract from cache"))
            contract_extraction_dict = json.load(open(str(contract_extraction_path), "r"))
            contract_extraction = ContractExtraction.model_validate(contract_extraction_dict)
        else:
            if self._verbose:
                ctx.write_event_to_stream(LogEvent(msg=">> Reading contract"))

            # no need to parse contract, it's already in markdown
            # you can use LlamaParse to parse more complex PDFs + other docs

            docs = SimpleDirectoryReader(input_files=[ev.contract_path]).load_data()

            # extract from contract

            output_parser=PydanticOutputParser(output_cls=ContractExtraction)
            prompts = ChatPromptTemplate.from_messages([
                ("system",  f"{GLM_JSON_RESPONSE_PREFIX}"),
                ("user",  f"{CONTRACT_EXTRACT_PROMPT}{GLM_JSON_RESPONSE_SUFFIX}")
            ])
            messages = prompts.format_messages(llm=llm, format_json=output_parser.format_string, contract_data="\n".join([d.get_content(metadata_mode="all") for d in docs]))

            response = llm.chat(messages=messages)

            _logger.info(f"from CONTRACT_EXTRACT_PROMPT  output_parser {output_parser.get_format_string(escape_json=False)}")
            _logger.info(f"from CONTRACT_EXTRACT_PROMPT {response.message.content}")

            action_match = PATTERN.search(str(response))
            if action_match is  None:
                raise ValueError(f"Invalid extraction from contract:  action_match json CONTRACT_EXTRACT_PROMPT")

            json_text, json_object = try_parse_json_object(action_match.group(1).strip())

            contract_extraction = output_parser.parse(json_text)

            if not isinstance(contract_extraction, ContractExtraction):
                raise ValueError(f"Invalid extraction from contract: {contract_extraction}")
            # save output template to file
            with open(contract_extraction_path, "w") as fp:
                fp.write(contract_extraction.model_dump_json())
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=f">> Contract data: {contract_extraction.dict()}"))

        return ContractExtractionEvent(contract_extraction=contract_extraction)

    @step
    async def dispatch_guideline_match(
        self, ctx: Context, ev: ContractExtractionEvent
    ) -> MatchGuidelineEvent:
        """For each clause in the contract, find relevant guidelines.

        Use a map-reduce pattern.

        """
        await ctx.set("num_clauses", len(ev.contract_extraction.clauses))
        await ctx.set("vendor_name", ev.contract_extraction.vendor_name)

        for clause in ev.contract_extraction.clauses:
            ctx.send_event(MatchGuidelineEvent(clause=clause, vendor_name=ev.contract_extraction.vendor_name))

    @step
    async def handle_guideline_match(
        self, ctx: Context, ev: MatchGuidelineEvent
    ) -> MatchGuidelineResultEvent:
        """Handle matching clause against guideline."""

        # retrieve matching guideline
        query = f"""\
Please find the relevant guideline from {ev.vendor_name} that aligns with the following contract clause:

{ev.clause.clause_text}
"""
        guideline_docs = self.guideline_retriever.retrieve(query)
        guideline_text="\n\n".join([g.get_content() for g in guideline_docs])
        if self._verbose:
            ctx.write_event_to_stream(
                LogEvent(msg=f">> Found guidelines: {guideline_text[:200]}...")
            )

        # extract from contract

        output_parser=PydanticOutputParser(output_cls=ClauseComplianceCheck)
        prompts = ChatPromptTemplate.from_messages([
            ("system",  f"{GLM_JSON_RESPONSE_PREFIX}"),
            ("user",  f"{CONTRACT_MATCH_PROMPT}{GLM_JSON_RESPONSE_SUFFIX}")
        ])
        messages = prompts.format_messages(llm=llm,
                                           format_json=output_parser.get_format_string(escape_json=False),
                                           clause_text=ev.clause.model_dump_json(),
                                           guideline_text=guideline_text)
        response = llm.chat(messages=messages)

        _logger.info(f"from CONTRACT_MATCH_PROMPT  output_parser {output_parser.format_string}")
        _logger.info(f"from CONTRACT_MATCH_PROMPT {response.message.content}")
        try:
            # 使用正则表达式从 response 中提取匹配项
            action_match = PATTERN.search(str(response))
            if action_match is None:
                raise ValueError("Invalid extraction from contract: action_match json CONTRACT_EXTRACT_PROMPT")

            # 尝试解析 JSON 对象
            json_text, json_object = try_parse_json_object(action_match.group(1).strip())
            # 使用输出解析器解析 JSON 文本
            compliance_output = output_parser.parse(json_text)

            if not isinstance(compliance_output, ClauseComplianceCheck):
                raise ValueError(f"Invalid compliance check: {compliance_output}")

            # 如果没有异常发生，可在此处继续处理合规输出
            print("合规性检查成功:", compliance_output)

            return MatchGuidelineResultEvent(result=compliance_output)

        except Exception as e:
            # 捕获所有异常并输出错误信息
            print(f"处理过程中发生错误: {e}")

        return MatchGuidelineResultEvent(result=ClauseComplianceCheck())
    @step
    async def gather_guideline_match(
        self, ctx: Context, ev: MatchGuidelineResultEvent
    ) -> GenerateReportEvent:
        """Handle matching clause against guideline."""
        num_clauses = await ctx.get("num_clauses")
        events = ctx.collect_events(ev, [MatchGuidelineResultEvent] * num_clauses)
        if events is None:
            return

        match_results = [e.result for e in events]
        # save match results
        match_results_path = Path(
            f"{self.output_dir}/match_results.jsonl"
        )
        with open(match_results_path, "w") as fp:
            for mr in match_results:
                fp.write(mr.model_dump_json() + "\n")


        return GenerateReportEvent(match_results=[e.result for e in events])

    @step
    async def generate_output(
        self, ctx: Context, ev: GenerateReportEvent
    ) -> StopEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Generating Compliance Report"))

        # if all clauses are compliant, return a compliant result
        non_compliant_results = [r for r in ev.match_results if not r.compliant]

        # generate compliance results string
        result_tmpl = """
1. **Clause**: {clause}
2. **Guideline:** {guideline}
3. **Compliance Status:** {compliance_status}
4. **Notes:** {notes}
"""
        non_compliant_strings = []
        for nr in non_compliant_results:
            non_compliant_strings.append(
                result_tmpl.format(
                    clause=nr.clause_text,
                    guideline=nr.matched_guideline.guideline_text,
                    compliance_status=nr.compliant,
                    notes=nr.notes
                )
            )
        non_compliant_str = "\n\n".join(non_compliant_strings)

        # prompt = ChatPromptTemplate.from_messages([
        #     ("system", COMPLIANCE_REPORT_SYSTEM_PROMPT),
        #     ("user", COMPLIANCE_REPORT_USER_PROMPT)
        # ])


        output_parser=PydanticOutputParser(output_cls=ComplianceReport)
        prompts = ChatPromptTemplate.from_messages([
            ("system",  f"{GLM_JSON_RESPONSE_PREFIX}{COMPLIANCE_REPORT_SYSTEM_PROMPT}"),
            ("user",  f"{COMPLIANCE_REPORT_USER_PROMPT}{GLM_JSON_RESPONSE_SUFFIX}")
        ])
        messages = prompts.format_messages(llm=llm, format_json=output_parser.get_format_string(escape_json=False), compliance_results=non_compliant_str, vendor_name=await ctx.get("vendor_name"))

        response = llm.chat(messages=messages )

        _logger.info(f"from COMPLIANCE_REPORT_USER_PROMPT  output_parser {output_parser.format_string}")
        _logger.info(f"from COMPLIANCE_REPORT_USER_PROMPT {response.message.content}")

        action_match = PATTERN.search(str(response))
        if action_match is  None:
            raise ValueError(f"Invalid extraction from contract:  action_match json CONTRACT_EXTRACT_PROMPT")

        json_text, json_object = try_parse_json_object(action_match.group(1).strip())
        compliance_report = output_parser.parse(json_text)

        return StopEvent(result={"report": compliance_report, "non_compliant_results": non_compliant_results})

In [18]:
from llama_index.llms.openai import OpenAI

workflow = ContractReviewWorkflow(
    parser=parser,
    guideline_retriever=retriever,
    llm=llm,
    verbose=True,
    timeout=None,  # don't worry about timeout to make sure it completes
)

#### Visualize the workflow

In [19]:
from llama_index.utils.workflow import draw_all_possible_flows
draw_all_possible_flows(ContractReviewWorkflow, filename="contract_workflow.html")

<class 'NoneType'>
<class '__main__.MatchGuidelineEvent'>
<class '__main__.GenerateReportEvent'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class '__main__.MatchGuidelineResultEvent'>
<class '__main__.ContractExtractionEvent'>
contract_workflow.html


## Run the Workflow

Let's run the full workflow and generate the output!

In [31]:
from llama_index.llms.zhipuai import ZhipuAI
llm = ZhipuAI(model="glm-4", api_key="<glm_token>")

In [32]:
from IPython.display import clear_output

handler = workflow.run(contract_path="/content/vendor_agreement.md")
async for event in handler.stream_events():
    if isinstance(event, LogEvent):
        if event.delta:
            print(event.msg, end="")
        else:
            print(event.msg)

response_dict = await handler
print(str(response_dict["report"]))

Running step parse_contract
Step parse_contract produced event ContractExtractionEvent
>> Loading contract from cache
>> Contract data: {'vendor_name': 'EFG, Inc.', 'effective_date': 'January 1, 2025', 'governing_law': 'France', 'clauses': [{'clause_text': 'Vendor shall process Personal Data only: - To fulfill orders and manage deliveries - To provide customer support services - To maintain business records - To comply with legal obligations', 'mentions_data_processing': True, 'mentions_data_transfer': False, 'requires_consent': False, 'specifies_purpose': True, 'mentions_safeguards': False}, {'clause_text': 'Vendor maintains primary data centers in the United States - Vendor may transfer data to any country where it maintains operations - No prior notification required for new data storage locations - Vendor will rely on its standard data transfer mechanisms - Data may be processed by staff operating outside the EEA', 'mentions_data_processing': False, 'mentions_data_transfer': True, 

<ipython-input-17-65ec6d8cae8a>:180: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  ctx.write_event_to_stream(LogEvent(msg=f">> Contract data: {contract_extraction.dict()}"))


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-11 02:43:40,320 llama_index.core.indices.utils 642 DEBUG    > Top 1 nodes:
> [Node bc2ca95f-3152-4abf-9806-0ff638c9c205] [Similarity score:             0.791121] 2. The controller shall make reasonable eff or ts to ver ify in such cases that consent is given ...
2025-03-11 02:43:40,324 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:43:40,324 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:43:40,326 httpcore.connection 642 DEBUG    connect_tcp.started host='open.bigmodel.cn' port=443 local_address=None timeout=30.0 socket_options=None
2025-03-11 02:43:40,763 httpcore.connection 642 DEBUG    connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7d394da536d0>
2025-03-11 02:43:40,764 httpcore.connection 642 DEBUG    start_tls.started ssl_context=<ssl.SSLContext object at 0x7d394cf532f0> server_hostname='open.bigmodel.cn' timeout=30.0
2025-03-1

合规性检查成功: clause_text='Vendor shall process Personal Data only: - To fulfill orders and manage deliveries - To provide customer support services - To maintain business records - To comply with legal obligations' matched_guideline=GuidelineMatch(guideline_text='Paragraph 1 shall not apply if one of the following applies: (a) the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject; (b) processing is necessary for the purposes of carrying out the obligations and exercising specific rights of the controller or of the data subject in the field of employment and social security and social protection law in so far as it is authorised by Union or Member State law or a collective agreement pursuant to Member State law providing for appropriate safeguards for the fundamental rights and the interests of

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-11 02:43:51,882 llama_index.core.indices.utils 642 DEBUG    > Top 1 nodes:
> [Node 8567698f-99ee-4e9f-8c04-d903e3b7fe3a] [Similarity score:             0.736103] standard data-protecti on clauses in a wider contract, such as a contract between the processor a...
2025-03-11 02:43:51,885 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:43:51,886 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:43:51,887 httpcore.http11 642 DEBUG    send_request_headers.started request=<Request [b'POST']>
2025-03-11 02:43:51,888 httpcore.http11 642 DEBUG    send_request_headers.complete
2025-03-11 02:43:51,889 httpcore.http11 642 DEBUG    send_request_body.started request=<Request [b'POST']>
2025-03-11 02:43:51,889 httpcore.http11 642 DEBUG    send_request_body.complete
2025-03-11 02:43:51,890 httpcore.http11 642 DEBUG    receive_response_headers.started request=<Request [b'POST']>
2025-03

合规性检查成功: clause_text='Vendor maintains primary data centers in the United States - Vendor may transfer data to any country where it maintains operations - No prior notification required for new data storage locations - Vendor will rely on its standard data transfer mechanisms - Data may be processed by staff operating outside the EEA' matched_guideline=GuidelineMatch(guideline_text='standard data-protection clauses in a wider contract, such as a contract between the processor and another processor, nor from adding other clauses or additional safeguards provided that they do not contradict, directly or indirectly, the standard contractual clauses adopted by the Commission or by a supervisory authority or prejudice the fundamental rights or freedoms of the data subjects. Controllers and processors should be encouraged to provide additional safeguards via contractual commitments that supplement standard protection clauses.', similarity_score=0.7, relevance_explanation="The guideline sugge

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-11 02:44:02,989 llama_index.core.indices.utils 642 DEBUG    > Top 1 nodes:
> [Node 8567698f-99ee-4e9f-8c04-d903e3b7fe3a] [Similarity score:             0.781379] standard data-protecti on clauses in a wider contract, such as a contract between the processor a...
2025-03-11 02:44:02,992 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:02,992 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:02,994 httpcore.http11 642 DEBUG    send_request_headers.started request=<Request [b'POST']>
2025-03-11 02:44:02,995 httpcore.http11 642 DEBUG    send_request_headers.complete
2025-03-11 02:44:02,996 httpcore.http11 642 DEBUG    send_request_body.started request=<Request [b'POST']>
2025-03-11 02:44:02,997 httpcore.http11 642 DEBUG    send_request_body.complete
2025-03-11 02:44:02,998 httpcore.http11 642 DEBUG    receive_response_headers.started request=<Request [b'POST']>
2025-03

合规性检查成功: clause_text='Vendor shall implement appropriate measures including: - Encryption of Personal Data in transit and at rest - Access controls and authentication - Regular security testing and assessments - Employee training on data protection - Incident response procedures' matched_guideline=GuidelineMatch(guideline_text='Controllers and processors should be encouraged to provide additional safeguards via contractual commitments that supplement standard protection clauses.', similarity_score=0.8, relevance_explanation="The guideline emphasizes the importance of additional safeguards in contracts, which aligns with the vendor's obligations to implement various security measures.") compliant=True notes="The clause aligns with the guideline's emphasis on additional safeguards and seems to be compliant with data protection standards."
Step handle_guideline_match produced event MatchGuidelineResultEvent
Running step handle_guideline_match


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-11 02:44:10,942 llama_index.core.indices.utils 642 DEBUG    > Top 1 nodes:
> [Node 432d72ad-6b17-40c2-a409-35c7f3472668] [Similarity score:             0.744435] awar e that a personal data breach has occur red, the controller should notify the personal data ...
2025-03-11 02:44:10,945 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:10,945 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:10,946 httpcore.http11 642 DEBUG    send_request_headers.started request=<Request [b'POST']>
2025-03-11 02:44:10,947 httpcore.http11 642 DEBUG    send_request_headers.complete
2025-03-11 02:44:10,948 httpcore.http11 642 DEBUG    send_request_body.started request=<Request [b'POST']>
2025-03-11 02:44:10,949 httpcore.http11 642 DEBUG    send_request_body.complete
2025-03-11 02:44:10,950 httpcore.http11 642 DEBUG    receive_response_headers.started request=<Request [b'POST']>
2025-03

合规性检查成功: clause_text="Vendor shall: - Notify Client of any Personal Data breach within 72 hours - Provide details necessary to meet regulatory requirements - Cooperate with Client's breach investigation - Maintain records of all data breaches" matched_guideline=GuidelineMatch(guideline_text='The controller should notify the personal data breach to the supervisory authority without undue delay and, where feasible, not later than 72 hours after having become aware of it, unless the controller is able to demonstrate, in accordance with the accountability principle, that the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons.', similarity_score=0.8, relevance_explanation='The guideline specifies the timeframe and requirements for reporting personal data breaches, which is directly relevant to the clause.') compliant=True notes='The clause aligns with the specified guideline, ensuring compliance with the requirement to report personal data bre

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-11 02:44:19,314 llama_index.core.indices.utils 642 DEBUG    > Top 1 nodes:
> [Node 8567698f-99ee-4e9f-8c04-d903e3b7fe3a] [Similarity score:             0.722007] standard data-protecti on clauses in a wider contract, such as a contract between the processor a...
2025-03-11 02:44:19,317 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:19,318 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:19,319 httpcore.http11 642 DEBUG    send_request_headers.started request=<Request [b'POST']>
2025-03-11 02:44:19,320 httpcore.http11 642 DEBUG    send_request_headers.complete
2025-03-11 02:44:19,321 httpcore.http11 642 DEBUG    send_request_body.started request=<Request [b'POST']>
2025-03-11 02:44:19,321 httpcore.http11 642 DEBUG    send_request_body.complete
2025-03-11 02:44:19,322 httpcore.http11 642 DEBUG    receive_response_headers.started request=<Request [b'POST']>
2025-03

合规性检查成功: clause_text='Upon termination of services: - Return all Personal Data in standard format - Delete existing copies within 30 days - Provide written confirmation of deletion - Cease all processing activities' matched_guideline=GuidelineMatch(guideline_text='Controllers and processors should be encouraged to provide additional safeguards via contractual commitments that supplement standard protection clauses.', similarity_score=0.7, relevance_explanation="The guideline suggests that additional safeguards can be provided through contractual commitments which aligns with the clause's requirements for returning and deleting personal data.") compliant=True notes='The clause appears to be compliant with the guideline. However, it is recommended to ensure that any additional safeguards are included in the contract to enhance data protection.'
Step handle_guideline_match produced event MatchGuidelineResultEvent
Running step handle_guideline_match


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-11 02:44:26,214 llama_index.core.indices.utils 642 DEBUG    > Top 1 nodes:
> [Node 8567698f-99ee-4e9f-8c04-d903e3b7fe3a] [Similarity score:             0.719104] standard data-protecti on clauses in a wider contract, such as a contract between the processor a...
2025-03-11 02:44:26,217 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:26,217 zhipuai.api_resource.chat.completions 642 DEBUG    temperature:NOT_GIVEN, top_p:NOT_GIVEN
2025-03-11 02:44:26,219 httpcore.http11 642 DEBUG    send_request_headers.started request=<Request [b'POST']>
2025-03-11 02:44:26,219 httpcore.http11 642 DEBUG    send_request_headers.complete
2025-03-11 02:44:26,220 httpcore.http11 642 DEBUG    send_request_body.started request=<Request [b'POST']>
2025-03-11 02:44:26,221 httpcore.http11 642 DEBUG    send_request_body.complete
2025-03-11 02:44:26,222 httpcore.http11 642 DEBUG    receive_response_headers.started request=<Request [b'POST']>
2025-03

合规性检查成功: clause_text='Vendor shall maintain: - Records of all processing activities - Security measure documentation - Data transfer mechanisms - Subprocessor agreements' matched_guideline=GuidelineMatch(guideline_text='standard data-protection clauses in a wider contract, such as a contract between the processor and another processor, nor from adding other clauses or additional safeguards provided that they do not contradict, directly or indirectly, the standard contractual clauses adopted by the Commission or by a supervisory authority or prejudice the fundamental rights or freedoms of the data subjects. Controllers and processors should be encouraged to provide additional safeguards via contractual commitments that supplement standard protection clauses.', similarity_score=0.7, relevance_explanation="The guideline emphasizes the importance of maintaining standard data-protection clauses and additional safeguards, which aligns with the clause's requirement for the vendor to maintain 

2025-03-11 02:44:38,752 httpcore.http11 642 DEBUG    receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Server', b'ZenZGA/1.13'), (b'Date', b'Tue, 11 Mar 2025 02:44:38 GMT'), (b'Content-Type', b'application/json; charset=UTF-8'), (b'Transfer-Encoding', b'chunked'), (b'Connection', b'keep-alive'), (b'Vary', b'Accept-Encoding'), (b'Vary', b'Origin'), (b'Vary', b'Access-Control-Request-Method'), (b'Vary', b'Access-Control-Request-Headers'), (b'X-LOG-ID', b'202503111044359a0eb896cd5e4722'), (b'Vary', b'Origin'), (b'Vary', b'Access-Control-Request-Method'), (b'Vary', b'Access-Control-Request-Headers'), (b'Strict-Transport-Security', b'max-age=31536000; includeSubDomains'), (b'Content-Encoding', b'gzip')])
2025-03-11 02:44:38,753 httpx        642 INFO     HTTP Request: POST https://open.bigmodel.cn/api/paas/v4/chat/completions "HTTP/1.1 200 OK"
2025-03-11 02:44:38,754 httpcore.http11 642 DEBUG    receive_response_body.started request=<Request [b'POST']>
2025-03-11 0

Step generate_output produced event StopEvent
>> Found guidelines: 2. The controller shall make reasonable eff or ts to ver ify in such cases that consent is given or author ised by the 
holder of parental responsibility o ver the c hild, taking into consideration av...
>> Found guidelines: standard data-protecti on clauses in a wider contract, such as a contract between the processor and another 
processor , nor from adding other clauses or additional safe guards pro vided that they do ...
>> Found guidelines: standard data-protecti on clauses in a wider contract, such as a contract between the processor and another 
processor , nor from adding other clauses or additional safe guards pro vided that they do ...
>> Found guidelines: awar e that a personal data breach has occur red, the controller should notify the personal data breac h to the 
super visor y author ity without undue dela y and, where feasible, not later than 72 ho...
>> Found guidelines: standard data-protecti on clauses

In [35]:
print(response_dict["report"])

vendor_name='EFG, Inc.' overall_compliant=False summary_notes='The contract contains at least one clause that is not fully compliant with GDPR guidelines. It is recommended to revise the clause related to data transfer and storage to include additional safeguards and consent requirements as per GDPR recommendations.'


In [55]:
print(response_dict['non_compliant_results'])

[ClauseComplianceCheck(clause_text='Vendor maintains primary data centers in the United States - Vendor may transfer data to any country where it maintains operations - No prior notification required for new data storage locations - Vendor will rely on its standard data transfer mechanisms - Data may be processed by staff operating outside the EEA', matched_guideline=GuidelineMatch(guideline_text='standard data-protection clauses in a wider contract, such as a contract between the processor and another processor, nor from adding other clauses or additional safeguards provided that they do not contradict, directly or indirectly, the standard contractual clauses adopted by the Commission or by a supervisory authority or prejudice the fundamental rights or freedoms of the data subjects. Controllers and processors should be encouraged to provide additional safeguards via contractual commitments that supplement standard protection clauses.', similarity_score=0.7, relevance_explanation="The 