![hero](https://github.com/cerebraljam/llms-at-work/blob/43977899647beebef92c3b65c05ce7da0f179a93/hero.png)

# LLMs at Work: Outsourcing Vendor Assessment Toil to AI

This is the supplement code to [LLMs at Work: Outsourcing Vendor Assessment Toil to AI](https://engineering.mercari.com/en/blog/entry/20241215-llms-at-work)

Technology used
* [OpenAI ChatGPT 4o and 4o-mini](https://platform.openai.com/api-keys) through the API
* [Google Search Programmable Search Engine](https://programmablesearchengine.google.com/about/)
* [LangChain](https://www.langchain.com/) and [Langgraph](https://www.langchain.com/langgraph) libraries
* Hero image created with Stable Diffusion 2

# Installation instructions

This notebook was developed using Python 3.11 and Visual Studio Code.

## Install dependencies
```
pip install -r requirements.txt
```

## Configuration of API keys
You will need:
* OpenAI API Key: https://platform.openai.com/api-keys
* Google Search Programmable Search Engine ID: https://programmablesearchengine.google.com/
* Google Cloud API key https://console.cloud.google.com/apis/credentials/key

copy `env.example` to `.env` and configure the values. The script should then be ready to execute.

## Questions

Questions are configured in the `questions_code_sample.py` and `questions_code_complete.py` file. 

```python
# Modify the following line to use the complete set of questions:
from questions_code_sample import prepare_questions
# to
from questions_code_complete import prepare_questions
```

### Notes
> The script caches the pages downloaded in `./download_cache`, as well as previously given answers in `assessment_answers_{company}_{product}.json`. If you wish to force a re-execution, these files will have to be deleted.

# Configuring Vendor Details

In [None]:
from llm_code import Profile

profile = Profile(
        **{
            "company": "Company Name", # Enter the company name
            "product": "Product Name", # Enter the product name
            "url": "https://www.company.com/product", # Enter the product URL
        }
    )

import time
start = time.time()

## Customizing questions

The questions themselves are defined as a function in a python library. The script sends the profile of the company as a parameter, and a custom questionnaire comes out.

Here is a short version for demonstration. 

In [None]:
from questions_code_sample import prepare_questions
# from questions_code_complete import prepare_questions
from IPython.display import Image, display, Markdown


questions = prepare_questions(profile)
for i, question in enumerate(questions):
    if i >= 3: 
        break
    display(Markdown(f"## Question {i+1}: {question.get('label', 'General')}"))
    for key in question.keys():
        display(Markdown(f"**({key})**\n{question[key]}\n"))

## Configuring an AI agent using Langgraph

The [Langgraph library](https://www.langchain.com/langgraph) provides a nice framework to control the execution flow of an AI agent. This agent can then use tools to perform some of the tasks and use a LLM to produce the final response to a question.

As described by the graph below, the agent
1. receives the question from the script,
2. decides if it needs to use Google Search to find relevant documents, 
3. gives back the content recovered to the LLM to decide what to do with it,
4. will search the internet again if content isn't good, or will give up if there were too many attempts,
5. asks the LLM to answer the question.


In [None]:
from llm_code import build_graph
from langchain_core.runnables.graph import MermaidDrawMethod # CurveStyle, NodeStyles

graph = build_graph()
display(
    Image(
       graph.get_graph().draw_mermaid_png(
            draw_method=MermaidDrawMethod.API,
        )
    )
)

Image 4: Visual representation of the agent’s workflow

## Asking the agent to answer each question

With the agent defined, we can then pass all our questions and ask it to search for answers.

In [None]:
from llm_code import perform_assessment

answers = perform_assessment(questions, profile, graph)

After asking all questions and follow-up questions, answers are returned in JSON format, which allows us to easily manipulate them.


In [None]:
display(Markdown(str(answers[0])))

# Producing the report

With the answers collected, we can ask the LLM to produce an executive summary and a detailed report.

In [None]:
from llm_code import ask_llm
from prompt_code import make_summary_prompt
from reporting_code import summary_markdown, report_markdown

summary_prompt = make_summary_prompt(answers, profile)
summary = ask_llm(summary_prompt)
report = report_markdown(answers, profile)

In [None]:
display(Markdown(summary_markdown(summary, profile)))

In [None]:
display(Markdown(report))

# Reviewing the report

The Security Management Team (and any other teams involved in the review for the service) will then evaluate the reports to quickly gain a broad understanding of the service to guide their decision-making. To use their time as efficiently as possible, in most cases, they will read just the Executive Summary and only refer to the more detailed report if needed to confirm any specific concerns.

We can grasp whether sufficient information was available online to answer each question based on the ‘confidence score’ that the LLM assigns to each of its answers. If the confidence score is low, there was likely little information available. If the score is zero, there was nothing that the LLM thought it could use.

If there are many low-or-zero confidence scores in the report, we can disregard the report and resort to the old-fashioned method of sending a questionnaire to the vendor, but if there are just a few, we can reach out to the vendor and simply ask them these few specific questions; we may have an answer for this in just hours, or minutes during a call, rather than the weeks (or longer) it typically takes to complete a full questionnaire.


In [None]:
from reporting_code import report_confidence
confidence_report, improvements = report_confidence(answers, profile)

display(Markdown(confidence_report))

Some questions might fail, especially if the web site isn’t friendly with automation, because the information isn’t where we expect to find it, or because the context window wasn’t big enough to read all pages. For these questions, a manual check is likely to be necessary. We could also ask the vendor to improve their pages to cover these questions. See below for more about this.

# How much does executing this script cost?

The following code reports the estimated token costs.

In [None]:
from reporting_code import calculate_token_counts, token_count_markdown

token_report = calculate_token_counts(profile)
display(Markdown(token_count_markdown(token_report)))

# Asking the vendor to provide additional details on their website

We are now done with our assessment. This was a one way process; our script searched the internet and collected answers to the questions we were interested in. Bonus – *the vendor didn't have to do anything* – assuming all the information we needed was already published somewhere on their website.

But what if not all the information we needed was on their website? For information that is necessary for us to move forward, we will have to reach out to the vendor. One day, security teams across companies might talk to each other through APIs and secure handshakes. In the meantime, we could also let the vendor know what we couldn’t find by signaling them through their corporate web site. 

The following step lists the questions for which our agent couldn't find answers and performs a GET request on `[vendor.domain]/compliance.txt` for each one with the question as a parameter.

Unlike `robots.txt` or `security.txt`, `compliance.txt` isn't used as a standard (to this date). The query is likely to fail. However, a vendor that monitors for errors on their corporate web site is likely to notice the hits on `/compliance.txt` and see the question. The user-agent configured to perform this request points back to the blog post. The `compliance.txt` file can actually be empty, especially if everything is already documented in the webpages. The file could however, for example, contain the URL to the vendor’s Privacy Policy and any statements of evidence regarding their compliance. If these pages are hard to process through automation (ex.: Javascript), populating this file in plain text with terms of services, privacy policies, and other details about the company’s compliance status directly in this file could actually simplify the overall review process. Protecting the agent against prompt injection attacks is however important.


In [None]:
from reporting_code import request_for_improvement

for answer in improvements:
    request_for_improvement(answer, profile)

In [None]:
print(f"Total execution time: {time.time() - start:0.2f} seconds")