<a href="https://colab.research.google.com/github/Fruetel/LLM-Research/blob/main/Mixtral_8x7b_1_vs_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title #Runtime Info
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')


In [None]:
#@title # Setup
%cd /content/
!rm -rf llama.cpp
!git clone --depth 1 https://github.com/ggerganov/llama.cpp.git
%cd llama.cpp
!make LLAMA_CUBLAS=1

In [None]:
pip install llama-cpp-python

In [None]:
!wget https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_0.gguf

In [None]:
#@title # Connect using the following address when the server is up.
from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(31336)"))

In [None]:
from llama_cpp import Llama

llm = Llama(
  model_path="/content/llama.cpp/mixtral-8x7b-instruct-v0.1.Q5_0.gguf",  # Download the model file first
  n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=15         # The number of layers to offload to GPU, if you have GPU acceleration available
)

In [None]:
messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "What do you think about Klaus Kinski?"
        },
        {
            "role": "assistant",
            "content": "Klaus Kinski was a German actor known for his intense performances in both art-house and mainstream films. He worked with notable directors like Werner Herzog and Wim Wenders, and is considered a significant figure in German film history. However, he was also infamous for his volatile personality and difficult behavior on set."
        },
        {
            "role": "user",
            "content": "Tell me more about his personality. What do you mean by volatile?"
        }
    ]

output = llm.create_chat_completion(
    messages = messages
)

In [None]:
print(output["choices"][0])

In [None]:
editor_system_prompt = """You are an AI Editor Agent. Your job is to review a response given by a less intelligent chat assistant and improve it to a brilliant level. You will receive a conversation between a human user and the chat agent.

Perform your review in these steps:

1. summarize the conversation up to the last response of the agent.
2. shortly describe the respective intentions of the user and the agent.
3. perform a review of the agents response, pointing out logical flaws or
missing information.
4. provide an enhanced response based on the assistants response and your
review.

Your output must be in JSON format as in this example:\n

```
{
  "summary": "The user and the assistant are having a conversation about philosophical views of Oscar Wylde and Albert Camus",
  "intentions": {
    "user": "Determine how to apply the ideas of Wylde and Camus in his own life",
    "assistant": "Point out similarities and differences of the two philosophers mental model"
  },
  "review": "The assistant's response is factually correct, but very verbose. A shorter version with less fluff would help the user better",
  "enhanced response": "Camus uses the plague as a metaphor for the absurdities and injustices in the world. In this context, honesty means to live authentically and to oppose the prevailing untruths and irrationalities. For Camus, personal freedom is not just a matter of choice, but also an act of resistance against the absurdities and constraints of society."
}
```
"""

instruction = """Here is the conversation so far:

```
system: You are a helpful assistant.
user: What do you think about Klaus Kinski?.
assistant: Klaus Kinski was a German actor known for his intense performances in both art-house and mainstream films. He worked with notable directors like Werner Herzog and Wim Wenders, and is considered a significant figure in German film history. However, he was also infamous for his volatile personality and difficult behavior on set.
user: Tell me more about his personality. What do you mean by volatile?
```

The assistant answered:

```
Klaus Kinski's personality was known to be mercurial, unpredictable, and often difficult. He was prone to extreme mood swings and outbursts, which earned him a reputation for being volatile. This behavior was documented in several books and films, including Herzog's "My Best Fiend," a documentary about his tumultuous relationship with Kinski. Despite his talent and charisma, Kinski's personality often overshadowed his work and made him a challenging collaborator. He was also known for his womanizing and hedonistic lifestyle.
```

It is important that your answer must ONLY contain the JSON in the format of the
example, as it needs to be processed by a Python script. Stay true to the
format and avoid additional text in your response.

Now, do your job and provide the requested JSON. If your enhancement is
good, both you and your mother will receive $200. If your answer is not
formatted correctly, a kitten will be killed.
"""

In [None]:
messages = [
        {"role": "system", "content": editor_system_prompt},
        {
            "role": "user",
            "content": instruction
        }
    ]

output = llm.create_chat_completion(
    messages = messages
)

In [None]:
print(output["choices"][0]["message"]["content"])