# Intentifiing Interactions

As in our analysis we want to analyse how Members of Parliament in the UK house of commons interact with each other we have to filter out irrelevant data or to phrase it the other way around, we have to identify interactions. 

In order to achieve this we leveraged recent advancements in LLMs. We tried out different LLMs to classify whether two contributions are interacting or not. Almost all Open source small scale LLMs tended to overclassify interactions. 

The only model that was found and could run in a reasonable timeframe was the gpt-oss-safeguard-8k:20b model by OpenAI, a policy fine tuned reasoning model. This model performed to our satisfaction in a small qualitative analysis. 

In order to run this yourself, you at best have a graphics card with at least 16 gb of VRAM, running on cpu is possible but unbarebly slow. Install ollama and pull and run the gpt-oss-safeguard:20b. It is then exposed locally and can be queried from this notebook.

Its actually our favorite part of our network analysis, but Sune said, that he doesn't like methods sections. So oh well.

This notebook can be rerun as required, as we remove already analysed debated from the files that are still open for analysis. If you want the raw extracted lists of interactions you can reach out to s252890@dtu.dk


In [None]:
import os
import json
from ollama import chat
from datetime import datetime

In [None]:
with open("data.json", 'r', encoding="utf-8") as f:
    data = json.load(f)

In [None]:
list_of_analysed_debates = [name.replace(".json", "").split("_")[1] for name in os.listdir("interactions") if name.endswith(".json")]

data = [debate for debate in data if debate['ExtId'] not in list_of_analysed_debates]

In [None]:
start = datetime.strptime('17-12-2019', '%d-%m-%Y')
end = datetime.strptime('30-05-2024', '%d-%m-%Y')

data = [debate for debate in data if start <= datetime.fromisoformat(debate['Date']) <= end]
len(data)

This is our prompt for our model, to identify, whether two concecutive contributions can be seen as interacting or not. It provides the mode with pare minimum context and the policy, which it should follow.

In [None]:
SYSTEM_PROMPT =  """
You will be given a list of two contributions in a debate in the UK House of Commons. 
Each contribution is a dictionary with:
- Speaker: the name of the speaker
- Utterance: the text that the speaker said

Decide if the second contribution is interacting with the first contribution. 
A contribution is considered interacting if the second speaker, anywhere in their utterance:
1. Directly addresses the first speaker.
2. Builds upon a point made by the first speaker, or
3. Asks a question referring to the first speaker's contribution.

Answer only "Yes" if it is interacting, otherwise answer "No". 
Do not provide any explanations or extra text.
"""

Looping over all interactions in all debates to evaluate, whether they are interacting or not. If they interact, the file is safed to the interactions directory.

In [None]:
path = "interactions"
keys_to_keep = ["AttributedTo", "Value"]
rename_keys = {"Value": "Utterance", "AttributedTo": "Speaker"}


for debate in data:
    captured_interactions = []
    print("Debate id:", debate["ExtId"])

    for i in range(len(debate["Interactions"])-1):
        interaction_1 = debate["Interactions"][i]
        interaction_2 = debate["Interactions"][i+1]

        # Keep only the essential data
        interaction_1_filtered = {rename_keys[k]: interaction_1[k] for k in keys_to_keep}
        interaction_2_filtered = {rename_keys[k]: interaction_2[k] for k in keys_to_keep}

        interactions = f"{json.dumps([interaction_1_filtered, interaction_2_filtered], indent=2)}"

        # Prompt
        response = chat(
            model="gpt-oss-safeguard-8k:20b", # gpt-oss-safeguard:20b
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": interactions}
            ]
        )

        # If there's interaction between A and B --> save data
        if response['message']['content'].strip().lower() == "yes":
            print("Yes")
            captured_interactions.append(
                {            
                    "speaker1_member_id": interaction_1["MemberId"],
                    "speaker2_member_id": interaction_2["MemberId"],
                    "speaker1_order"    : interaction_1["OrderInSection"],
                    "speaker2_order"    : interaction_2["OrderInSection"],
                    "speaker1_text"     : interaction_1["Value"],
                    "speaker2_text"     : interaction_2["Value"],
                    "debate_id"         : debate["ExtId"],
                    "debate_date"       : debate["Date"],
                    "Location"          : debate["Location"],
                    "Title"             : debate["Title"]
                }
            )

        elif response['message']['content'].strip().lower() == "no":
            print("No")

        # Check if there's an unexpected response
        else:
            print("\n\nSOMETHING WENT WRONG!\n\n")
            print("response:", response['message']['content'])


    file = "debate_" + debate["ExtId"] + ".json"
    
    with open(os.path.join(path, file), "w", encoding="utf-8") as f:
        json.dump(captured_interactions, f, ensure_ascii=False, indent=4)

    print("\n")
