#Midnight mystery exam
I will try to explore different approaches to try to extract from the dataset who the murder is, aggregating the resulting informations, the idea is to find it.

---------------------------------------- <br>
To start, I uploaded the JSON file to the drive, and from there I import it into the notebook.

In [8]:
import json

from google.colab import drive
drive.mount('/content/drive')


# path to the dataset on Google Drive
file_path = "/content/drive/MyDrive/murder_mystery_exam.json"

# function to load dataset
def load_dataset(path):
    try:
        with open(path, "r") as f:
            data = json.load(f)
        print("Dataset loaded")
        return data
    except Exception as e:
        print("Failed to load dataset:", e)
        return None

data = load_dataset(file_path)

if data:
    print("Case:", data["metadata"]["case_name"])
    print("Victim:", data["metadata"]["victim"])
    print("Interrogations:", len(data["interrogations"]))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Dataset loaded
Case: The Midnight Mystery
Victim: Solicitor Gray
Interrogations: 59


Then I install the openai API to work with the LLM model fro OpenRouter

In [9]:
%pip install openai



My first idea, is to build two functions, that are based on LLM model, to extract two types of information from the interrogations: <br>
- the self repoterd information ;
- the informations about others contained inside each interrogation;
<br><br> I try to output this informations in a "JSON" format, so to facilitate later manipulation.

<br> I also extract location information, that can be then used to detect inconsistencies among interrogations

In [12]:
from openai import OpenAI

client = OpenAI(
    api_key="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",
    base_url="https://openrouter.ai/api/v1"
)

def extract_self_info(text, model="openai/gpt-4.1"):

    messages = [
        {"role": "system", "content": "You are an assistant that extracts structured information from witness statements. Respond only with a JSON object."},
        {"role": "user", "content": f"Extract the self-related information from this witness statement:\n\n'{text}'\n\nReturn a compact JSON object with the following keys: location, activity, time, emotion, and note_mention (True/False)."}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        n=1,  # just to remember that can be modify (number of responses)

    )

    # Monitor token usage
    print("Prompt tokens:", response.usage.prompt_tokens)
    print("Completion tokens:", response.usage.completion_tokens)
    print("Total tokens:", response.usage.total_tokens)

    return response.choices[0].message.content # see total output for also token usage




def extract_other_info(text, model="openai/gpt-4-turbo"):

    # try to minimize token usage
    messages = messages = [
        {"role": "system", "content": "You are an assistant that extracts structured relational observations from witness statements."},
        {"role": "user", "content": f"""
        From the following witness statement:

        '{text}'

        Extract all mentions of other people seen by the speaker.
        For each person, return a dictionary with the keys: person_name, location, time, and interaction (True/False).
        Return a JSON list of these dictionaries. If nobody is mentioned, return an empty list: [].
        """}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        n=1,  # just to remember that can be modify (number of responses)

    )

    # Monitor token usage
    print("Prompt tokens:", response.usage.prompt_tokens)
    print("Completion tokens:", response.usage.completion_tokens)
    print("Total tokens:", response.usage.total_tokens)

    return response.choices[0].message.content



For non wasting tokens, and to facilitate the usage of so extract data, i upload the results on a JSON file, that can be reused. <br> this code processes only not-processed guest, becouse some of them may result in error, with the need to run the code again to retry extracting their features.

In [28]:
import os
import time

# Path for the output file on google drive
output_path = "/content/drive/MyDrive/llm_extracted_info_exam.json"

# check if cached results already exist
if os.path.exists(output_path):
    print("JSON file already exists on drive")
    with open(output_path, "r") as f:
        all_results = json.load(f)
else:
    all_results = []

# list comprehension -> guest that where already processed
processed_guests = set([r["guest"] for r in all_results])

for entry in data["interrogations"]:
    guest = entry["guest"]
    statement = entry["statement"]

    # avoid token waste
    if guest in processed_guests:
        print(f"skipping {guest} (already processed)")
        continue

    print(f"analyzing: {guest}")

    try:
      self_info_raw = extract_self_info(statement)
      print("DEBUG self_info:", self_info_raw)
      self_info = json.loads(self_info_raw)

      other_info_raw = extract_other_info(statement)
      print("DEBUG other_info:", other_info_raw)
      other_info = json.loads(other_info_raw)
    except Exception as e:
        print(f"Failed to process {guest}: {e}")
        continue

    # save results
    result = {
        "guest": guest,
        "self_info": self_info,
        "others_info": other_info
    }
    all_results.append(result)

    # pause added to avoid hitting rate limits or overloading the API
    time.sleep(0.7)


with open(output_path, "w") as f:
    json.dump(all_results, f, indent=2)

print("results saved to:", output_path)

JSON file already exists on drive
skipping Professor Blackstone (already processed)
skipping Ambassador Indigo (already processed)
skipping Baron Brown (already processed)
skipping Mister Fitzgerald (already processed)
skipping Commodore White (already processed)
skipping Magistrate Ochre (already processed)
skipping Doctor Scarlett (already processed)
skipping Miss Azure (already processed)
skipping Baron Blackwood (already processed)
skipping Counselor Scarlett (already processed)
skipping Rector Violet (already processed)
skipping Barrister Beaumont (already processed)
skipping Commodore Ebony (already processed)
skipping Baron Sienna (already processed)
skipping Magistrate Ruby (already processed)
skipping Colonel Ravenswood (already processed)
analyzing: Major Beaumont
Prompt tokens: 473
Completion tokens: 371
Total tokens: 844
DEBUG self_info: {
  "location": "billiard room",
  "activity": [
    "talking with friends",
    "discussing the evening's events with Miss Coral",
    "e

One problem that emerges from this approach is that the execution time of the last cell is really slow, with some of the guest that fail to process ( and given the large amout of guests, a large token usage ).

I will try to execute the cell another time ( after the first, where a good percentage of guests failed to process, to see if this resolves ) -> from this re-execute approach I was able to find the information even of the guests I had failed to process. After the third execution there are still 3 that failed to process.

I try anyway to procced with this pipeline: <br>

<br>I try to build from the JSON file that I so created a dataframe using pandas, to facilitate further analysis. <br> I will create three tables, one for the self info, one for the others info, and one with the aggregated informations.

In [42]:
import pandas as pd

with open("/content/drive/MyDrive/llm_extracted_info_exam.json", "r") as f:
    all_results = json.load(f)

# build a DataFrame with the self-related information

self_rows = []
for entry in all_results:
    guest = entry.get("guest")
    self_info = entry.get("self_info", {})

    self_rows.append({
        "guest": guest,
        "location": self_info.get("location"),
        "activity": self_info.get("activity"),
        "time": self_info.get("time"),
        "emotion": self_info.get("emotion"),
        "note_mention": self_info.get("note_mention")
    })

df_self = pd.DataFrame(self_rows)

visualizing the resulting dataframe:

In [43]:
print(df_self.columns)
df_self.info()
df_self.head()

Index(['guest', 'location', 'activity', 'time', 'emotion', 'note_mention'], dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   guest         56 non-null     object
 1   location      56 non-null     object
 2   activity      56 non-null     object
 3   time          56 non-null     object
 4   emotion       54 non-null     object
 5   note_mention  56 non-null     bool  
dtypes: bool(1), object(5)
memory usage: 2.4+ KB


Unnamed: 0,guest,location,activity,time,emotion,note_mention
0,Ambassador Indigo,conservatory,"listening to music, discussing events and busi...",from about 11:30pm until well after midnight,confused (couldn't understand why when I heard...,False
1,Baron Brown,study,"[having a drink, playing cards, chatting, talk...",from about 11:30pm until well after midnight,shocked,True
2,Magistrate Ochre,garden,"[having a drink, chatting, conversation about ...",evenings,"[routine, unable to process what happened, sur...",True
3,Doctor Scarlett,kitchen,"[writing letters, relaxing, talking with Count...",11:50pm,confused,True
4,Miss Azure,kitchen,"smoking a pipe, having drinks, discussing even...",around 11:50pm,surprise/disbelief,True


In [44]:
others_rows = []

for entry in all_results:
    observer = entry.get("guest")
    for obs in entry.get("others_info", []):
        others_rows.append({
            "observer": observer,
            "person_seen": obs.get("person_name"),
            "location": obs.get("location"),
            "time": obs.get("time"),
            "interaction": obs.get("interaction")
        })

df_others = pd.DataFrame(others_rows)

visualizing the resulting dataframe:

In [45]:
print(df_others.columns)

df_others.head()

Index(['observer', 'person_seen', 'location', 'time', 'interaction'], dtype='object')


Unnamed: 0,observer,person_seen,location,time,interaction
0,Ambassador Indigo,Magistrate Ruby,conservatory,11:30pm until well after midnight,True
1,Ambassador Indigo,Blackwood,conservatory,11:30pm until well after midnight,True
2,Ambassador Indigo,Brown,conservatory,11:30pm until well after midnight,True
3,Ambassador Indigo,Magistrate Ochre,conservatory,11:30pm until well after midnight,True
4,Ambassador Indigo,Crimson,conservatory,11:30pm until well after midnight,True


Aggregate the informations to one single dataframe:

In [46]:
# merge the two DataFrames for combined analysis
df_merged = df_others.merge(df_self, left_on="observer", right_on="guest", suffixes=("_seen", "_self"))

df_merged.head()

Unnamed: 0,observer,person_seen,location_seen,time_seen,interaction,guest,location_self,activity,time_self,emotion,note_mention
0,Ambassador Indigo,Magistrate Ruby,conservatory,11:30pm until well after midnight,True,Ambassador Indigo,conservatory,"listening to music, discussing events and busi...",from about 11:30pm until well after midnight,confused (couldn't understand why when I heard...,False
1,Ambassador Indigo,Blackwood,conservatory,11:30pm until well after midnight,True,Ambassador Indigo,conservatory,"listening to music, discussing events and busi...",from about 11:30pm until well after midnight,confused (couldn't understand why when I heard...,False
2,Ambassador Indigo,Brown,conservatory,11:30pm until well after midnight,True,Ambassador Indigo,conservatory,"listening to music, discussing events and busi...",from about 11:30pm until well after midnight,confused (couldn't understand why when I heard...,False
3,Ambassador Indigo,Magistrate Ochre,conservatory,11:30pm until well after midnight,True,Ambassador Indigo,conservatory,"listening to music, discussing events and busi...",from about 11:30pm until well after midnight,confused (couldn't understand why when I heard...,False
4,Ambassador Indigo,Crimson,conservatory,11:30pm until well after midnight,True,Ambassador Indigo,conservatory,"listening to music, discussing events and busi...",from about 11:30pm until well after midnight,confused (couldn't understand why when I heard...,False


In [47]:
df_merged.describe(percentiles=[])

Unnamed: 0,observer,person_seen,location_seen,time_seen,interaction,guest,location_self,activity,time_self,emotion,note_mention
count,1427,1427,1395,1395,1427,1427,1427,1427,1427,1372,1427
unique,54,101,15,12,2,54,15,54,14,29,2
top,Ambassador Indigo,Ravenswood,kitchen,11:50pm,True,Ambassador Indigo,kitchen,"listening to music, discussing events and busi...",11:50pm,worried,True
freq,35,26,152,503,1419,35,152,35,363,161,1195


From the df, we try to see if there is some guest that has not been seen by anyone, which would make him a suspect:

In [49]:
all = set(df_others["observer"]).union(set(df_others["person_seen"]))

seen = set(df_others["person_seen"])

not_seen = all - seen

print("guests not seen by anyone:")
print(not_seen)

guests not seen by anyone:
set()


There is no guest that was not seen from anyone in general, we need to filter this observation restricted to the time of the murder:

In [50]:
# see what values are there for the time
print(df_others["time"].unique())

['11:30pm until well after midnight' 'evening' '11:50pm' None
 '11:30pm until after midnight' 'late in the evening' 'evenings'
 'before midnight' 'unspecified' 'when the clock struck twelve'
 '11:30pm to after midnight'
 'from about 11:30pm until well after midnight' 'just before midnight']


All time references refer to the time of the murder, exept for 'unspecified'

In [56]:
df_others[df_others["time"].str.contains("unspecified", case=False, na=False)]

Unnamed: 0,observer,person_seen,location,time,interaction
212,Baron Sienna,Count Silver,cellar,unspecified,True
213,Baron Sienna,Blackwood,cellar,unspecified,True
214,Baron Sienna,Ivory,cellar,unspecified,True
215,Baron Sienna,Northbrook,cellar,unspecified,True
216,Baron Sienna,Peacock,cellar,unspecified,True
217,Baron Sienna,Dean Stonehaven,cellar,unspecified,True
218,Baron Sienna,Beaumont,cellar,unspecified,True
220,Baron Sienna,Ravenswood,cellar,unspecified,True
221,Baron Sienna,Magistrate Ruby,cellar,unspecified,True
222,Baron Sienna,Archbishop Coral,cellar,unspecified,True


An other direction that I have not had time to explore is modeling guest relations in the form of a directed graph, starting from the df, in which nodes represent characters and arcs indicate who saw whom. This graph could have been analyzed to detect anomalous behavior, isolated guests, or used in conjunction with graph neural networks (GNN) techniques to infer suspects based on social structure and statements.










-------------------------------------

The next step  that I try, is to compare the victim's text with that of the suspects, to measure the similarity between the victim_note and the interrogations, to identify semantic correspondences, such as consistent events, commonplaces, overlapping tenses or narrative contradictions. <br>
As a model, I used all-mpnet-base-v2 based on bert:

In [20]:
%pip install sentence-transformers

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

In [21]:
from sentence_transformers import SentenceTransformer, util

# load model
model = SentenceTransformer("all-mpnet-base-v2")

victim_note = data["metadata"]["victim_note"]

interrogations = [(entry["guest"], entry["statement"]) for entry in data["interrogations"]]

# encode victim note
victim_embedding = model.encode(victim_note, convert_to_tensor=True)

# encode each interrogation and calculate cosine similarity
results = []
for guest, statement in interrogations:
    statement_embedding = model.encode(statement, convert_to_tensor=True)
    similarity = util.cos_sim(victim_embedding, statement_embedding).item()
    results.append((guest, similarity))

# sort by similarity descending
results.sort(key=lambda x: x[1], reverse=True)

for guest, score in results[:10]:
    print(f"{guest}: {score:.4f}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Viscount Pemberton: 0.4467
Baron Sienna: 0.4099
Chancellor Harrington: 0.3936
Duchess Ravenswood: 0.3828
Dean Stonehaven: 0.3824
Baron Whitehall: 0.3816
Magistrate Ruby: 0.3814
Baron Brown: 0.3776
Countess Grimshaw: 0.3744
Magistrate Ochre: 0.3683


From this approach, there are some Guests that appear to be "similar" in therm of semantic to the letter of the victim. We can keep the top 5 as possible suspects.

One possible approach is to apply embeddings on the interrogations, and from this to aplly clustering. <br> The idea is to use DBSCAN as a clustering method to detect possible outliers. <br>
That is done, with the idea that the semantical difference from the outlier means that he mention about thing that are not mentioned by others, narrating about something that is unique and that his story does not match with the "story patterns" of the other guests


In [30]:
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt


statements = [entry["statement"] for entry in data["interrogations"]]
guests = [entry["guest"] for entry in data["interrogations"]]

# compute the embeddings
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(statements)

# clustering with DBSCAN
db = DBSCAN(eps=0.4, min_samples=2, metric='cosine').fit(embeddings)
labels = db.labels_

for guest, label in zip(guests, labels):
    print(f"{guest}: cluster {label}")

# identification of outliers
outliers = [guests[i] for i, label in enumerate(labels) if label == -1]
print("\nOutlier (semantically distant):", outliers)

Professor Blackstone: cluster 0
Ambassador Indigo: cluster 0
Baron Brown: cluster 0
Mister Fitzgerald: cluster 0
Commodore White: cluster 0
Magistrate Ochre: cluster 0
Doctor Scarlett: cluster 0
Miss Azure: cluster 0
Baron Blackwood: cluster 0
Counselor Scarlett: cluster 0
Rector Violet: cluster 0
Barrister Beaumont: cluster 0
Commodore Ebony: cluster 0
Baron Sienna: cluster 0
Magistrate Ruby: cluster 0
Colonel Ravenswood: cluster 0
Major Beaumont: cluster 0
Earl Pearl: cluster 0
Ambassador Gold: cluster 0
Duchess Ravenswood: cluster 0
Brigadier Black: cluster 0
Miss Coral: cluster 0
Mister Onyx: cluster 0
Chancellor Harrington: cluster 0
Countess Grimshaw: cluster 0
Doctor Ashcroft: cluster -1
Archbishop Whitmore: cluster 0
Madame Northbrook: cluster 0
Viscountess White: cluster 0
Rector Sapphire: cluster 0
Solicitor Sinclair: cluster 0
Viscount Pemberton: cluster 0
Judge Winthrop: cluster 0
Lord Green: cluster 0
Baron Nightingale: cluster 0
General White: cluster 0
Baron Whitehall: c

From the first execution with eps parameter =1 , there where no outlieres identified (and all guests where assigned to cluster 0), I will try to execute it another time with a stricter value (0.4) -> with this stricter value HDSCAN identifies Doctor Ashcroft as a outlier, we can ad him to the suspects

<br> <br>

From that i try also to compute the guest with the highest distance (cosine distance) in the embedding space ( that should be pointed as outlier with the DBSCAN method )

In [32]:
from sklearn.metrics.pairwise import cosine_distances

distance_matrix = cosine_distances(embeddings)

# average distance for each witness (ignoring self)
avg_distances = distance_matrix.mean(axis=1)

guest_names = [entry["guest"] for entry in all_results]
# identification of the most distant witness (semantic outlier)
most_distant_idx = np.argmax(avg_distances)
most_distant_guest = guest_names[most_distant_idx]
most_distant_value = avg_distances[most_distant_idx]

print(f"The most semantically distant guest is: {most_distant_guest} (average distance: {most_distant_value:.4f})")

The most semantically distant guest is: Mister Coral (average distance: 0.4665)


The most semantically distant guest is Mister Coral, we can add him to the suspect list.

I try to do the same but insted of using the pretrained SBERT model, I try to use the model that I used to measure the similarity between victim_note and interrogations, to see if and how much the results change.

In [31]:
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt


statements = [entry["statement"] for entry in data["interrogations"]]
guests = [entry["guest"] for entry in data["interrogations"]]

# compute the embeddings
model = SentenceTransformer("all-mpnet-base-v2")
embeddings = model.encode(statements)

# clustering with DBSCAN
db = DBSCAN(eps=0.4, min_samples=2, metric='cosine').fit(embeddings)
labels = db.labels_

for guest, label in zip(guests, labels):
    print(f"{guest}: cluster {label}")

# identification of outliers
outliers = [guests[i] for i, label in enumerate(labels) if label == -1]
print("\nOutlier (semantically distant):", outliers)

Professor Blackstone: cluster 0
Ambassador Indigo: cluster 0
Baron Brown: cluster 0
Mister Fitzgerald: cluster 0
Commodore White: cluster 0
Magistrate Ochre: cluster 0
Doctor Scarlett: cluster 0
Miss Azure: cluster 0
Baron Blackwood: cluster 0
Counselor Scarlett: cluster 0
Rector Violet: cluster 0
Barrister Beaumont: cluster 0
Commodore Ebony: cluster 0
Baron Sienna: cluster 0
Magistrate Ruby: cluster 0
Colonel Ravenswood: cluster 0
Major Beaumont: cluster 0
Earl Pearl: cluster 0
Ambassador Gold: cluster 0
Duchess Ravenswood: cluster 0
Brigadier Black: cluster 0
Miss Coral: cluster 0
Mister Onyx: cluster 0
Chancellor Harrington: cluster 0
Countess Grimshaw: cluster 0
Doctor Ashcroft: cluster 0
Archbishop Whitmore: cluster 0
Madame Northbrook: cluster 0
Viscountess White: cluster 0
Rector Sapphire: cluster 0
Solicitor Sinclair: cluster 0
Viscount Pemberton: cluster 0
Judge Winthrop: cluster 0
Lord Green: cluster 0
Baron Nightingale: cluster 0
General White: cluster 0
Baron Whitehall: cl

With this other embeddings, we have no outliers, and all guests are assigned to same cluster, with an eps of 0.4.

In [33]:
from sklearn.metrics.pairwise import cosine_distances

distance_matrix = cosine_distances(embeddings)

# average distance for each witness (ignoring self)
avg_distances = distance_matrix.mean(axis=1)

guest_names = [entry["guest"] for entry in all_results]
# identification of the most distant witness (semantic outlier)
most_distant_idx = np.argmax(avg_distances)
most_distant_guest = guest_names[most_distant_idx]
most_distant_value = avg_distances[most_distant_idx]

print(f"The most semantically distant guest is: {most_distant_guest} (average distance: {most_distant_value:.4f})")

The most semantically distant guest is: Mister Coral (average distance: 0.4665)


Here we also have as output Mr. Coral.

Always focusing on the writing style of the guests, I try to ask the LLM what I tried to compute using embeddings. That is, to return the more similar guest in term of writing style with the letter.

In [None]:
def extract_self_info(text, model="openai/gpt-4.1"):

    messages = [
        {"role": "system", "content": "You are an linguistic expert that compares writing styles and returns a score of similarity."},
        {"role": "user", "content": f"This is the text referencet:\n\n'{text}'\n\ compare it whit this textt:\n\n'{text}'\n\ and return a score similarity from 0 to 10 considering the style and tone "}
    ]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        n=1,  # just to remember that can be modify (number of responses)

    )

    # Monitor token usage
    print("Prompt tokens:", response.usage.prompt_tokens)
    print("Completion tokens:", response.usage.completion_tokens)
    print("Total tokens:", response.usage.total_tokens)

    return response.choices[0].message.content # see total output for also token usage