# Cluedo Client
This script demonstrates the workflow of a Retrieval-Augmented Generation (RAG) architecture. A Pinecone vector database has been set up and populated with data in a separate script (cluedo_vdb_setup.ipynb).

The use case of this project is a murder mystery scenario inspired by the board game Cluedo. Users can submit questions about the characters, the events of the case, and—most importantly—the identity of the murderer.

The user’s query is first used to retrieve relevant contextual information from the vector database. This retrieved context is appended to the original question, forming an enhanced prompt that is then submitted to a large language model (LLM). If the provided context is insufficient to answer the question, the LLM is explicitly instructed to request additional information from the database.

For this project, the GPT-OSS-120B model with 120 billion parameters was selected. The model is accessed via the Inference API provided by Hugging Face (see https://huggingface.co/openai/gpt-oss-120b).

In [375]:
import yaml
import textwrap
from transformers import pipeline
from huggingface_hub import InferenceClient
from pinecone import Pinecone

In [376]:
conf = yaml.load(open("conf.yml", "r"), Loader=yaml.SafeLoader)

# credentials and configuration
hf_token = conf["hugging_face"]["token"]
pc_token = conf["pinecone"]["token"]
pc_index_name = "cluedo-py"
llm_name = "openai/gpt-oss-120b"

## Retrieve context from vector database

In [362]:
pc_client = Pinecone(api_key=pc_token)

# retrieve context from the vector db based on a query parameter
def retrieve_context(query, context_size, verbose=False):
    index = pc_client.Index(pc_index_name)

    # search the dense index and rerank results
    reranked_results = index.search(
        namespace="__default__",
        query={
            "top_k": context_size,
            "inputs": {
                'text': query
            }
        },
        rerank={
            "model": "bge-reranker-v2-m3",
            "top_n": context_size,
            "rank_fields": ["chunk_text"]
        } 
    )

    # join the context entries to create one string
    if verbose:
        print("Retrieved context: ")
    
    context = ""
    for hit in reranked_results['result']['hits']:
        context += hit['fields']['chunk_text'] + " "
        
        if verbose:
            print(f"id: {hit['_id']:<5} | score: {round(hit['_score'], 2):<5} | text: {hit['fields']['chunk_text']:<50}")
    
    if verbose:
        print("---------")

    return context.strip()

## Send query to LLM

In [363]:
llm_client = InferenceClient(token=hf_token)

# send a query to the llm, return response
def submit_to_llm(query):
    instructions = """
        Analyze the contextual information about an old estate and the people living and working there. 
        There is more contextual information, that you can ask about.
        If you don't have enough contextual information to answer the question with high certainty, write "Need more information about: X" 
        and specify X as one specific term mentioned in the concept. 
        Do not ask about general concepts like "motive" or "alibi". 
        If you have enough contextual information to answer the question with high certainty, answer the question in one paragraph and argue your reasoning.
    """

    response = llm_client.chat.completions.create(
        model=llm_name,
        messages=[{ "role": "user", "content": instructions + query }]
    )

    return response

## Examples

In [368]:
def ask_cluedo_rag(query):
    curr_query = query
    context = ""
    answer = ""
    context_size = 10
    retrieval_count = 0

    while len(answer) == 0 or str.startswith(answer, "Need more information about: ") and retrieval_count < 5:
        # retrieve context to question
        context += retrieve_context(curr_query, context_size, verbose=False)
        enhanced_query = " ".join([context, query])
        retrieval_count += 1

        # ask llm the question + context
        response = submit_to_llm(enhanced_query)

        # output the llm's answer
        answer = response.choices[0].message.content
        
        # if the llm needs more information, update query and repeat
        if str.startswith(answer, "Need more information about: "):
            curr_query = str.removeprefix(answer, "Need more information about: ")
            print("Retrieving new context about "+curr_query)

    # Either print the answer or state that there is not enough information in the database
    if str.startswith(answer, "Need more information about: "):
        print("There is not enough information to answer this question.")
    else:
        if retrieval_count > 1:
            print("---------")
        print(textwrap.fill(answer, width=120))

In [369]:
question = "Which locations does the estate have and what happens in these locations?"

ask_cluedo_rag(question)

The estate comprises several distinct locations, each with its own function: the **guesthouse**, where overnight
visitors are accommodated; the **garage**, used for storing and maintaining the lord’s vehicles; the **basement**, a
lower‑level space that likely houses utilities, storage, or service areas; the **large garden**, tended by the gardener
and providing outdoor space for recreation, landscaping, and possibly food production; and the **shed**, a restricted
area that can only be accessed by the gardener, the butler, and the lord, where tools, equipment, or supplies are kept.
The **lord** oversees the property, the **maid** works the household (though her specific duties aren’t detailed), the
**supplier** delivers food weekly to the estate (presumably to the kitchen or pantry), and the **gardener** maintains
the garden and accesses the shed for maintenance tasks.


In [370]:
question = "Which characters live and work around the estate?"

ask_cluedo_rag(question)

The people who both live and work on the estate, based on the provided details, are the lord (the owner and resident),
the maid (the newest employee), the gardener (who has shed access and drives to work), the butler (who also has shed
access), the chef (who drives to work with the gardener), and the supplier (who delivers food weekly). These six
individuals are explicitly mentioned as either staff members or the proprietor who are present on‑site for work or
residence.


In [371]:
question = "What do people at the estate like to eat and drink?"

ask_cluedo_rag(question)

Retrieving new context about drink.
---------
The residents’ and staff’s food and drink preferences can be deduced from the details given: the lord’s favourite meals
are meat‑based, so he regularly eats meat and, although his specific beverage isn’t stated, his gambling habit and the
household’s access to alcohol (via the chef) suggest he likely drinks alcohol as well; the lady is a vegetarian, so she
eats plant‑based dishes and would avoid meat; the chef’s alcohol problem indicates he himself enjoys drinking alcohol;
the chauffeur is a recovering alcoholic, so he deliberately refrains from alcohol; the gardener, maid and supplier have
no explicit preferences mentioned, so their eating and drinking habits remain unspecified.


In [373]:
question = "How disciplined are the employees of the lord?"

ask_cluedo_rag(question)

Retrieving new context about discipline.
Retrieving new context about punctuality
---------
The lord’s staff show a mixed level of discipline: the chauffeur appears reliable—he consistently cares for the lord’s
yellow Peugeot and drives him to appointments—while the butler enforces strict rules, constantly reprimanding the
gardener for taking “too many breaks,” indicating his own disciplined attitude. By contrast, the gardener’s absenteeism
(he is sick and missed work) and the chef’s known alcohol problem, plus his illegal driving (he has no licence yet still
drives to work with the gardener), reveal considerable lapses in personal discipline. The maid, supplier and gardener’s
routine duties are not described as problematic, suggesting they are at least nominally compliant. Overall, some
employees (the chauffeur and butler) are disciplined, but others (the chef and gardener) exhibit notable indiscipline.


In [374]:
question = "What are the details of the murder case?"

ask_cluedo_rag(question)

Retrieving new context about murderer.
---------
The guest—an employee of a bank who had been invited to the estate by the lord—was found dead in the garden, having been
bludgeoned to death with a shovel. The shovel is most plausibly one of the tools kept in the garden shed, which the
gardener routinely uses while working the grounds; the gardener is also the cousin of the chef, giving him easy access
to the weapon and the crime scene. Adding motive, the guest was carrying on an affair with the 25‑year‑old maid (the
butler’s niece and the estate’s newest employee), a relationship that could have provoked jealousy or a lethal
confrontation, while the lady of the house, a vegetarian, is in a personal feud with the chef (who is employed by the
lord), suggesting the chef might also have a grievance but no direct link to the shovel. Thus, the confirmed facts of
the case are: victim = bank employee guest; weapon = shovel from the garden shed; location = garden; relationship ties =
affair wit

## Conclusion
The strategy of prompting the model to explicitly request additional information on a given topic performs surprisingly well in certain cases. Although it remains unclear under which conditions the LLM decides to request further information, the generated queries—when issued—are often highly relevant and well-suited for retrieval from the database. Future experiments using a larger dataset could provide valuable insights into the impact of different context-window sizes. At present, the context window is limited to 10 entries. A window that is too small may exclude relevant information (e.g., when querying "Who works at the estate?", the number of employees retrievable is constrained by the context-window size), whereas an excessively large window may introduce irrelevant information and thereby reduce model performance.

Further investigation should also consider the size of the total context. As the model requests additional information across multiple topics, newly retrieved content is appended to the existing context, leading to progressively expanding context windows. In the current implementation, the number of information requests is capped at five to prevent overloading.