# Fallacy detector

> This module made to make a logical fallacy search tool for LLM-based agent. So it should find stuff like ad hominem, self-contradicting statements and so on, given the discussion history and so on.

## Implementation

In [None]:
#| default_exp fallacy

In [None]:
#| export
from pino_inferior.core import DATA_DIR, PROMPTS_DIR, OPENAI_API_KEY
from langchain.schema.runnable import RunnableSequence
from langchain.llms.openai import BaseLLM
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains.transform import TransformChain
from langchain.schema.messages import AIMessage, AIMessageChunk
import os
from dataclasses import dataclass
from typing import Union, List, Callable
import json
from datetime import datetime
from pino_inferior.message import Message
import tiktoken

In [None]:
#| export
FALLACIES_FNAME = os.path.join(DATA_DIR, "fallacies.json")
FALLACIES_PROMPT_DIR = os.path.join(PROMPTS_DIR, "fallacies")

In [None]:
#| export
INPUT_FALLACIES = "fallacies"
INPUT_HISTORY = "history"
INPUT_CONTEXT = "context"
INPUT_QUERY = "query"

INTERMEDIATE_FALLACIES_STR = "fallacies_str"
INTERMEDIATE_HISTORY_STR = "history_str"
INTERMEDIATE_LAST_AUTHOR = "last_message_author"

OUTPUT_LLM_OUTPUT = "llm_output"
OUTPUT_SHORT_ANSWER = "answer"

LLM_OUTPUT_MARKER = "Therefore"

### Fallacy representation

The following structures represents fallacy description as well as convert it to LLM-readable text.

In [None]:
#| export
@dataclass
class FallacyExample:
    """
    Example of a logical fallacy
    """
    text: str # Statement
    response: str # Fallacy detector response, explaining why it is a fallacy

    def __str__(self) -> str:
        return f"Example: {self.text}\nExample Response: {self.response}"


@dataclass
class Fallacy:
    """
    Fallacy representation
    """
    name: str # Fallacy name (like "ad hominem" and so)
    description: str # Fallacy description
    example: Union[FallacyExample, None] # Fallacy example

    def __str__(self):
        result = f"# {self.name}\n\n{self.description}"
        if self.example:
            result += "\n\n" + str(self.example)
        return result
    

def read_fallacies(fname: str) -> List[Fallacy]:
    """
    Read the file with fallacies markup
    :param fname: File name. Should contain JSON representing a list of `Fallacy`
    :returns: Fallacies list
    """
    with open(fname, "r", encoding="utf-8") as src:
        data = json.load(src)
    result = []
    for item in data:
        if item.get("example"):
            example = FallacyExample(**item["example"])
        else:
            example = None
        fallacy = Fallacy(name=item["name"], description=item["description"], example=example)
        result.append(fallacy)
    return result

### Prompts

Here I will just read predefined prompt templates.

In [None]:
#| export
system_prompt = SystemMessagePromptTemplate.from_template_file(
    os.path.join(FALLACIES_PROMPT_DIR, "system.txt"),
    input_variables=[]
)
instruction_prompt = HumanMessagePromptTemplate.from_template_file(
    os.path.join(FALLACIES_PROMPT_DIR, "instruction.txt"),
    input_variables=[INTERMEDIATE_FALLACIES_STR,
                     INTERMEDIATE_HISTORY_STR,
                     INTERMEDIATE_LAST_AUTHOR,
                     INPUT_CONTEXT,
                     INPUT_QUERY]
)
chat_prompt = ChatPromptTemplate.from_messages([system_prompt, instruction_prompt])
chat_prompt

ChatPromptTemplate(input_variables=['context', 'fallacies_str', 'history_str', 'last_message_author', 'query'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template="You are a logical fallacy search subsystem.\nYou're going to help us debating different topics, so you need to find our opponents potential weak points.")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'fallacies_str', 'history_str', 'last_message_author', 'query'], template="Fallacies to find (do not use thes examples):\n<Fallacies>\n{fallacies_str}\n</Fallacies>\n\nAdditional context:\n<Context>\n{context}\n</Context>\n\nChat history:\n<Chat>\n{history_str}\n</Chat>\n\nQuery:\n<Query>\n{query}\n</Query>\n\nNow, review {last_message_author}'s messages, paying close attention to their last one. If query provided - pay attention to it.\n\nYour task is to identify any potential faults or fallacies.\n\nIf you identify any fallacies from the given list, provid

### Conversions

#### History & fallacies to strings

Initially message history / fallacy types presented as objects.

We will need to stringify them.

Meanwhile we should as well check lengths and maybe cut history to make sure we will fit into LLM context window.

In [None]:
#| export
def _stringify(row: dict,
               length_function: Callable[[str], int],
               max_fallacies_length: int,
               max_messages_length: int) -> dict:
    fallacies: List[Fallacy] = row[INPUT_FALLACIES]
    history: List[Message] = row[INPUT_HISTORY] # TODO: cut

    fallacies_str = "\n\n".join(map(str, fallacies))
    while True:
        history_str = "\n\n".join(map(str, history))
        if length_function(history_str) <= max_messages_length:
            break
        else:
            history = history[1:]
    assert len(history) > 0, f"History length became less than {max_messages_length} only after removing all messages"
    assert length_function(fallacies_str) <= max_fallacies_length, f"Too big fallacies representation. Expected up to {max_fallacies_length}, got {length_function(fallacies_str)}"

    return {
        INTERMEDIATE_FALLACIES_STR: fallacies_str,
        INTERMEDIATE_HISTORY_STR: history_str,
    }

#### Extract last user name

We will explicitly tell model to search for issues in the last user messages - we interested in it, aren't we?

In [None]:
#| export
def _extract_last_user(row: dict) -> dict:
    history: List[Message] = row[INPUT_HISTORY]
    assert len(history) > 0
    return {
        INTERMEDIATE_LAST_AUTHOR: history[-1].author
    }

#### Short answer extraction

After getting a full answer from LLM we should extract the short form.

In [None]:
#| export
def _extract_answer_from_cot(row: dict) -> dict:
    response: Union[AIMessage, AIMessageChunk] = row[OUTPUT_LLM_OUTPUT]
    text: str = response.content
    text = text.split(LLM_OUTPUT_MARKER)[-1]
    text = text.split(":", maxsplit=1)[-1]
    text = text.strip()
    return {
        OUTPUT_SHORT_ANSWER: text
    }

#### Full chain

In [None]:
#| export
@dataclass
class LengthConfig:
    """
    Fallacy detector text length configuration
    """
    length_function: Callable[[str], int] # Text length getter
    max_messages_length: int = 2048 # Max history size
    max_fallacies_length: int = 4096 # Max fallacy list representation size


def build_fallacy_detection_chain(llm: BaseLLM, lengths: LengthConfig) -> RunnableSequence:
    """
    Build a sequential chain invoking fallacy detection
    :param llm: Language model to use inside fallacy detector (like ChatOpenAI)
    :param lengths: Fallacy detector text length configuration
    :returns: Sequential chain consuming message history and returning LLM output (and extracted short answer).
    """
    def _stringify_transform(row: dict) -> dict:
        return _stringify(
            row,
            lengths.length_function,
            lengths.max_fallacies_length,
            lengths.max_messages_length,
        )

    stringify_transform = TransformChain(
        input_variables=[INPUT_FALLACIES, INPUT_HISTORY],
        output_variables=[INTERMEDIATE_FALLACIES_STR, INTERMEDIATE_HISTORY_STR],
        transform=_stringify_transform,
    )
    extract_last_user_transform = TransformChain(
        input_variables=[INPUT_HISTORY],
        output_variables=[INTERMEDIATE_LAST_AUTHOR],
        transform=_extract_last_user,
    )
    extract_answer_transform = TransformChain(
        input_variables=[OUTPUT_LLM_OUTPUT],
        output_variables=[OUTPUT_SHORT_ANSWER],
        transform=_extract_answer_from_cot,
    )
    return stringify_transform | \
        extract_last_user_transform | \
        chat_prompt | \
        llm | \
        extract_answer_transform

### Example

In [None]:
llm = ChatOpenAI(
    model_name="gpt-4-0613",
    openai_api_key=OPENAI_API_KEY,
    streaming=True,
)
encoding = tiktoken.encoding_for_model(llm.model_name)
fallacies_detection_chain = build_fallacy_detection_chain(
    llm,
    LengthConfig(
        lambda text: len(encoding.encode(text)),
        max_messages_length=2048,
        max_fallacies_length=2048 + 1024,
    )
)

In [None]:
fallacies = read_fallacies(FALLACIES_FNAME)
history = [
    Message("Moonlight", datetime.now(),
            "Soon we will finish with Ukraine"),
    Message("alex4321", datetime.now(),
            "After six months of taking Bakhmut, did anything new happen?\n\n" + \
                "Well, so that there is a reason to suspect that it will happen soon, " + \
                "and not something regardless of the outcome - this will last for years."),
    Message("Moonlight", datetime.now(),
            "Time is a resource, we have plenty of it")
]
context = "Post about the war between Russia/Ukraine"
query = "Moonlight's argument about time being a resource in war"

In [None]:
fallacies_detection_chain.invoke({
    "fallacies": fallacies,
    "history": history,
    "context": context,
    "query": query,
})

{'llm_output': AIMessageChunk(content='- Possible Fallacies in Moonlight\'s messages:\n  - Argumentum ad Ignorantiam (Appeal to Ignorance):\n    Moonlight\'s assertion that "Time is a resource, we have plenty of it" could be seen as an appeal to ignorance. This statement assumes that because it has not been proven that time is not a plentiful resource in this context, it must be plentiful.\n  - Argumentum ad Baculum (Appeal to Force):\n    The statement "Soon we will finish with Ukraine" can be perceived as an appeal to force. Moonlight attempts to establish the inevitability of their position, not supported by logic or evidence but by the implied threat of force.\n  - Hasty Generalization:\n    Moonlight seems to make a hasty generalization when they say "Soon we will finish with Ukraine". This statement assumes that previous victories guarantee future success, without considering other variables that could affect the outcome of the war.\n\nTherefore, the answer is:\n  - Identified Fa

In [None]:
await fallacies_detection_chain.ainvoke({
    "fallacies": fallacies,
    "history": history,
    "context": context,
    "query": query,
})

TransformChain's atransform is not provided, falling back to synchronous transform


{'llm_output': AIMessageChunk(content='- Possible Fallacies in Moonlight\'s messages:\n  - Argumentum ad Ignorantiam (Appeal to Ignorance):\n    Moonlight\'s claim that "time is a resource, we have plenty of it" may be perceived as an appeal to ignorance. He suggests that because we don\'t know when the war will end, it\'s okay to assume it could last indefinitely.\n  - Argumentum ad Baculum (Appeal to Force):\n    Depending on the context, the statement "soon we will finish with Ukraine" could be interpreted as an appeal to force - implying that they will win the war through sheer force, without providing a logical or strategic argument.\n  - Hasty Generalization:\n    The assertion "soon we will finish with Ukraine" could be a hasty generalization, as it assumes a conclusion about the war\'s outcome without providing substantial evidence or considering potential variables.\n- Therefore, the answer is:\n  - Identified Fallacy 1: Argumentum ad Ignorantiam (Appeal to Ignorance)\n    Moo

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()