## Multi-agent Approach to Generating More Accurate Responses

This is an attempt at implementing a multi-agent solution for generating more accurate responses. This was inspired by the work done by Li et al, 2024 (Improving LLM performance through ensembling) and Chen et al, 2024 (Scaling laws of Compound Inference Systems). As observed by Chen et al (and anecdotally), simpler queries tend to speed up while open ended or complex prompts result in inconsistent generation times. 

A simple similarity based scoring is also implemented here to pick the "best" response out of all the generated responses. 

### Implementing Multi-Agent Approach

In [2]:
from llama_cpp import Llama
import spacy
nlp = spacy.load('en_core_web_sm')

In [4]:
def perform_inference(llm, prompt):
    output = llm(
       f"<|user|>\n{prompt}<|end|>\n<|assistant|>",
  max_tokens=256,  # Generate up to 256 tokens
  stop=["<|end|>"], 
  temperature=0.7,
  seed=-1,
    )
    return output['choices'][0]['text']

In [5]:
llm_instances = [
    Llama(
        model_path="./Phi-3-mini-4k-instruct-q4.gguf",
        n_ctx=4096,
        n_threads=4,
        n_gpu_layers=35,
        verbose=True
    ) for _ in range(4)  # Create two instances
]

llama_model_loader: loaded meta data with 24 key-value pairs and 195 tensors from ./Phi-3-mini-4k-instruct-q4.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   4:                   phi3.feed_forward_length u32              = 8192
llama_model_loader: - kv   5:                           phi3.block_count u32              = 32
llama_model_loader: - kv   6:                  phi3.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi3.attention.head_count_kv u32         

In [6]:
prompt = input('Enter your prompt: ')
print(prompt)

What is Newton's third law?


In [7]:
responses = []
for llm in llm_instances:
    response = perform_inference(llm, prompt)
    responses.append(response)


llama_print_timings:        load time =    3523.62 ms
llama_print_timings:      sample time =      35.86 ms /   178 runs   (    0.20 ms per token,  4964.30 tokens per second)
llama_print_timings: prompt eval time =    3523.36 ms /    14 tokens (  251.67 ms per token,     3.97 tokens per second)
llama_print_timings:        eval time =    6083.25 ms /   177 runs   (   34.37 ms per token,    29.10 tokens per second)
llama_print_timings:       total time =   10311.49 ms /   191 tokens

llama_print_timings:        load time =     706.23 ms
llama_print_timings:      sample time =      37.58 ms /   205 runs   (    0.18 ms per token,  5454.74 tokens per second)
llama_print_timings: prompt eval time =     705.80 ms /    14 tokens (   50.41 ms per token,    19.84 tokens per second)
llama_print_timings:        eval time =    6637.63 ms /   204 runs   (   32.54 ms per token,    30.73 tokens per second)
llama_print_timings:       total time =    8050.24 ms /   218 tokens

llama_print_timings:     

In [8]:
responses

[" Newton's Third Law, often referred to as the action-reaction law, states that for every action in nature there is an equal and opposite reaction. In other words, any force exerted onto a body will create a force of equal magnitude but in the opposite direction on the object which exerted the first force. \n\nFor example, if you push a wall with your hand, the wall pushes back with an equal amount of force. However, since the wall is much larger and more rigid than your hand, it doesn't move while your hand might experience some motion due to the reaction force from the wall.\n\nThis law explains why we are able to walk or swim; as our feet push against the ground (or water), the ground pushes back with an equal force in the opposite direction allowing us to propel forward.",
 " Newton's Third Law, also known as the action-reaction law, states that for every action in nature there is an equal and opposite reaction. In other words, any force exerted onto a body will create a force of 

### Scoring the Responses Based on Similarity

In [9]:
def calculate_similarity(text1, text2):
    doc1 = nlp(text1)
    doc2 = nlp(text2)
    return doc1.similarity(doc2)

In [10]:
def select_best_response(responses):
    best_response = responses[0]
    highest_similarity = 0
    for i in range(len(responses)):
        similarity_sum = 0
        for j in range(len(responses)):
            if i != j:
                similarity_sum += calculate_similarity(responses[i], responses[j])
        if similarity_sum > highest_similarity:
            highest_similarity = similarity_sum
            best_response = responses[i]
    return best_response

In [11]:
final_response = select_best_response(responses)

# Print the final response
print("Final Response:", final_response)

  return doc1.similarity(doc2)


Final Response:  Newton's third law, also known as the law of action and reaction, states that for every action in nature there is an equal and opposite reaction. In other words, any force exerted on a body will create a force of equal magnitude but in the opposite direction on the object that exerted the first force. This law explains how forces always occur in pairs: when one object pushes or pulls another object with a certain force, it experiences an opposing force of the same magnitude from the second object.

Here's an example to illustrate this principle: When you jump off a small boat onto the shore, your action is pushing down on the boat due to gravity. In reaction, the boat moves in the opposite direction with equal and opposite force. The reason for this motion of the boat can be attributed directly to Newton's third law; as you push downwards (action), the boat pushes upwards (reaction).


### References
https://arxiv.org/abs/2403.02419 - Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
https://arxiv.org/abs/2402.05120 - More Agents Is All You Need