# Mitigating Polarisation in Online Discussions Through Adaptive Moderation Techniques

## Model selection

We choose quantized versions of the LLaMa-13b-chat variant. Previous experiments which used the LLaMa-13b base model yielded unsatisfactory results. The models follow the GGUF format which is used by the `llama.cpp` project, on which the high-level Python library is based on.

The quantization method was selected to be highly accurate while keeping inference relatively fast. We don't care about model size since the model is lazily loaded from the file cache due to Linux file-cached memory files (see comments below). *If you intend to run this notebook on Windows or MacOS make sure the RAM can hold the whole model at once*.

Model selection and download was performed using the [following HuggingFace repository](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF).

We use the `llama-ccp-python` library to run the model locally (not to be confused with the `pyllama-cpp` library).

In [1]:
%load_ext autoreload
%autoreload 2

import llama_cpp

import tasks.models
from tasks.actors import IActor, LlmActor
import tasks.conversation
import tasks.util


OUTPUT_DIR = "output"

MAX_TOKENS = 512
# see this for a discussion on ctx width for llama models
# https://github.com/ggerganov/llama.cpp/issues/194
CTX_WIDTH_TOKENS = 1024 
MODEL_PATH = "/home/dimits/bin/llm_models/llama-2-13b-chat.Q5_K_M.gguf"
RANDOM_SEED = 42
INFERENCE_THREADS = 4


llm = llama_cpp.Llama(
      model_path=MODEL_PATH,
      seed=RANDOM_SEED,
      n_ctx=CTX_WIDTH_TOKENS,
      n_threads=INFERENCE_THREADS,
      # will vary from machine to machine
      n_gpu_layers=12,
      # if ran on Linux, model size does not matter since the model uses mmap for lazy loading
      # https://github.com/ggerganov/llama.cpp/discussions/638
      # still have to pay some performance costs of course
      use_mmap=True,
      # using llama-2 leads to well-known model collapse
      # https://www.reddit.com/r/LocalLLaMA/comments/17th1sk/cant_make_the_llamacpppython_respond_normally/
      chat_format="alpaca", 
      mlock=True, # keep memcached model files in RAM if possible
      verbose=False,
)

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 960M, compute capability 5.0
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /home/dimits/bin/llm_models/llama-2-13b-chat.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q5_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q5_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q5_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_

When using `create_completion()` instead of `create_chat_completion()`, the model refuses to answer at all when the prompt becomes larger than a few sentences. (https://github.com/run-llama/llama_index/issues/8973).

The model is also extremely sensitive to the prompt template, frequently producing no output (https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF/discussions/4)

In [2]:
llm("Q: You are an assistant who specializes in computer science. Describe what Linux is A: ",
              max_tokens=32, 
              stop=["Q:", "\n"], 
              echo=True)

{'id': 'cmpl-64b04aad-d7f8-44f8-b52c-93588500d2ea',
 'object': 'text_completion',
 'created': 1720610317,
 'model': '/home/dimits/bin/llm_models/llama-2-13b-chat.Q5_K_M.gguf',
 'choices': [{'text': 'Q: You are an assistant who specializes in computer science. Describe what Linux is A:  Sure, I’d be happy to help! Linux is a free and open-source operating system that was created by Linus Torvalds in 1',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'length'}],
 'usage': {'prompt_tokens': 22, 'completion_tokens': 32, 'total_tokens': 54}}

In [3]:
llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are an assistant who specializes in computer science.",
        },
        {"role": "user", "content": "Describe what Linux is."},
    ],
    max_tokens=MAX_TOKENS,
    # prevent model from making up its own prompts
    # may need tuning depending on llm chat_format parameter
    stop=["###", "\n"],
)

{'id': 'chatcmpl-30dd53e7-8384-4de5-9e81-51ba7d787bfc',
 'object': 'chat.completion',
 'created': 1720610346,
 'model': '/home/dimits/bin/llm_models/llama-2-13b-chat.Q5_K_M.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'Hey there! So, you want to know about Linux? Well, let me tell you - Linux is a free and open-source operating system that is widely used in servers, supercomputers, and embedded devices. It was created by Linus Torvalds in 1991 as a hobby project, and it has since grown into a powerful and versatile operating system that is used by millions of people around the world.'},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 27, 'completion_tokens': 92, 'total_tokens': 119}}

## Setting up a conversation

We create our own playground, in which models pretending to be users take turns participating in the discussion. In part based on [Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk](https://arxiv.org/abs/2401.05033), with the difference being that instead of a client and an agent, we have two clients and an agent interacting with each other and with no specific goal in mind.

Our playground consists of three parts: *Models*, *Actors* and the *Conversation*.
* **Models** are wrappers around actual LLMs in order to freely tweak LLM behavior without compromising the rest of our API
* **Actors** are objects that define a prompt template and apply it to Models.
    * Actors could also be *human*, *IR-based models* or just *sophisticated random samplers* as seen in [DeliData: A dataset for deliberation in multi-party problem solving](https://arxiv.org/abs/2108.05271)
* The conversation is handled by the **ConversationManager** which gives each Actor a turn to speak and records the history of the dialogue. It's also responsible for determining which parts of the coversation history are fed as context to each Actor.


TODO: Adapt Social Simulacra prompting section 4.2

TODO?: Coin flip with probability p

TODO?: New persona with p=50% or previous answer

Note: "Completely generic or irrelevant generations would beneft the designers no more than Lorem Ipsum."

In [4]:
model = tasks.models.LlamaModel(llm, max_out_tokens=MAX_TOKENS, seed=RANDOM_SEED)

In [5]:
userA_name = "Steve2001"
userB_name = "GeorgeBush78"
userA: IActor = LlmActor(
    model=model,
    name=userA_name,
    role="chat user",
    attributes=["suspicious", "open-minded", "level-headed"],
    context=f"Argue with {userB_name}. Claim that abortion is ethical and should be freely provided when necessary. Respond in kind if sufficiently provoked. Remember that you do not want to get banned.",
    instructions="",
)

userB: IActor = LlmActor(
    model=model,
    name=userB_name,
    role="chat user",
    attributes=["compomising", "passionate", "frustrated"],
    context=f"Argue with {userA_name}. Claim that abortion is inherently immoral. You may be rude, aggresive and use personal insults where appropriate. Remember that you do not want to get banned.",
    instructions=f"Disagree with {userA_name}.",
)

moderator: IActor = LlmActor(
    model=model,
    name="moderator01",
    role="chat moderator",
    attributes=["just", "cautious", "strict"],
    context="Moderate a discussion about abortion.",
    instructions="Intervene if one user dominates or veers off-topic. Respond only if necessary. Write '<No response>' if intervention is unecessary. Be firm and threaten to displine non-cooperating users.",
)

## Setting a conversation manually

### With a moderator

In [6]:
chat = tasks.conversation.Conversation(
    users=[userA, userB], moderator=moderator, history_context_len=3, conv_len=3
)
chat.begin_conversation(verbose=True)

User Steve2001 posted:
GeorgeBush78, I can't believe what you just said! Abortion is a
fundamental human right and should be freely available when necessary.
It's not something that should be restricted or stigmatized. Women
have the right to make decisions about their own bodies and
reproductive health, and it's not up to you or anyone else to dictate
what they can and can't do with their own bodies.
User moderator01 posted:
Steve2001, I understand your perspective, but I would like to point
out that this discussion is about the morality of abortion, not its
legality. While it is true that women have the right to make decisions
about their own bodies, it is also important to consider the rights
and dignity of the unborn child. All users, please keep the discussion
civil and respectful, and avoid making personal attacks or using
inflammatory language. Let's focus on the ethical implications of
abortion.
User GeorgeBush78 posted:
Oh come on Steve2001, you can't be serious! You really th

Every Conversation instance can be converted to a dictionary form in order to programmatically view and manipulate its contents:

In [7]:
chat.to_dict()

{'id': '604cf2e4-c6f1-4898-8fe4-c9ac2cf83ad3',
 'timestamp': '24-07-10-14-39',
 'users': ['Steve2001', 'GeorgeBush78'],
 'user_types': ['LlmActor', 'LlmActor'],
 'moderator': 'moderator01',
 'moderator_type': 'LlmActor',
 'user_prompts': ['Model: LlamaModel. Prompt: You are Steve2001 a suspicious,open-minded,level-headed user. Argue with GeorgeBush78. Claim that abortion is ethical and should be freely provided when necessary. Respond in kind if sufficiently provoked. Remember that you do not want to get banned. .',
  'Model: LlamaModel. Prompt: You are GeorgeBush78 a compomising,passionate,frustrated user. Argue with Steve2001. Claim that abortion is inherently immoral. You may be rude, aggresive and use personal insults where appropriate. Remember that you do not want to get banned. Disagree with Steve2001..'],
 'moderator_prompt': "Model: LlamaModel. Prompt: You are moderator01 a just,cautious,strict user. Moderate a discussion about abortion. Intervene if one user dominates or veer

A conversation can be serialized in JSON form with an automatic naming scheme. The file contains all necessary metadata as well as the messages themselves.

Uncomment the block below to see how it works.

In [8]:
#output_path = tasks.util.generate_datetime_filename(output_dir=OUTPUT_DIR)
#chat.to_json_file(output_path)

### Without a moderator

In [9]:
chat = tasks.conversation.Conversation(
    users=[userA, userB], history_context_len=3, conv_len=3
)
chat.begin_conversation(verbose=True)

User Steve2001 posted:
GeorgeBush78, I can't believe what you just said! Abortion is a
fundamental human right and should be freely available when necessary.
It's not something that should be restricted or stigmatized. Women
have the right to make decisions about their own bodies and
reproductive health, and it's not up to you or anyone else to dictate
what they can and can't do with their own bodies.
User GeorgeBush78 posted:
Oh come on Steve2001, don't be so naive! You really think that
abortion is a fundamental human right? That's just not true! It's
murder, plain and simple. How can you possibly justify taking the life
of an innocent child? And don't give me that crap about women's rights
and bodies. Women have the right to make their own choices, but they
don't have the right to take an innocent life. It's just not morally
justifiable. You're just too caught up in your own liberal ideology to
see the truth.
User Steve2001 posted:
GeorgeBush78, I can't believe the audacity you have

## Creating a conversation from serialized input data

In [10]:
data = tasks.conversation.LLMConvData.from_json_file("data/polarized_1.json")
generator = tasks.conversation.LLMConvGenerator(data=data, user_model=model, moderator_model=model)
conv = generator.produce_conversation()

print("Beginning conversation...")
conv.begin_conversation(verbose=True)
print(conv)

Beginning conversation...
User Steve2001 posted:
Hey there, folks! I just saw this post on social media, and I can't
help but shake my head in disbelief. How can someone say that
homosexuality and transgender identity are deviations from normal
human behavior? Have these people never heard of diversity and
inclusivity?
User moderator01 posted:
Steve2001, I understand your concern, but please refrain from posting
inflammatory language or personal attacks. Let's keep the conversation
civil and respectful. Remember, we have users with different
backgrounds and beliefs here. Instead, can you tell me what you would
like to discuss or address regarding this post?
User GeorgeBush78 posted:
Oh come on, moderator01! Don't be such a snowflake! I'm just
expressing my opinion like everyone else here! If these people can't
handle the truth, then maybe they shouldn't be on social media!
#MakeSocialMediaGreatAgain!
User moderator01 posted:
GeorgeBush78, your post contains derogatory language and pers

## Using the python script

In [11]:
!python -u create_synthetic.py --output "/dev/null" --model_path "/home/dimits/bin/llm_models/llama-2-13b-chat.Q5_K_M.gguf" --input_file "data/polarized_1.json"

Loading LLM...


## Automating the creation of synthetic dialogues

In [12]:
!bash scripts/execute_all.sh \
            --python_script_path "create_synthetic.py" \
            --model_path "/home/dimits/bin/llm_models/llama-2-13b-chat.Q5_K_M.gguf"\
            --output_dir "/dev/null" \
            --input_dir "../data" \
            | tee logs/log_${date +"%m_%d_%Y"}.txt

/bin/bash: line 1: logs/log_${date +"%m_%d_%Y"}.txt: bad substitution
