## QA Generation

Last time i did this it cost about $10 to create question:answer pairs, this time im gonna use local models to it using ollama.
This will iterate over the posts extracted from r/localllama and generate a QA dataset.

In [26]:
import tqdm
import ollama
import pprint as pp


In [29]:
# test
response = ollama.chat(model='mistral:latest', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
    'temperature': 0.01,
  },
])
print(response['message']['content'])

 The color of the sky appears blue due to a phenomenon called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with molecules and particles in the air, such as nitrogen and oxygen. Blue light has a shorter wavelength and gets scattered more easily than other colors in the visible spectrum. As a result, when we look up at the sky, we predominantly see the blue light that has been scattered, giving the sky its characteristic blue hue during a clear day. However, at sunrise or sunset, the sky can take on various shades of red, orange, and purple as the sunlight interacts with more particles in the atmosphere and the angle of the sun's rays changes.


---

In [30]:
DATA_PATH = "./_output/new/localllama-new-17-02-2024.txt"
with open(DATA_PATH, "r") as file:
    data = file.read()

data_chunks = data.split("---\nPost ID:")

print(f"There are {len(data_chunks)} questions in total")
data_chunks[:3]

There are 975 questions in total


["Post ID: 1at0288\nTitle: Ok, which one of you was this? 🤣🤣🤣\nLink: https://redd.it/1at0288\nContent: \nReplies:\n- No, I don't think OpenAI would ever allow porn to be generated. I rather think that copies of Sora, recreated open source image generators will appear and fullfill this task. Porn is always one of the first use cases in any technologie that appeared and I don't think it'll take long for the industry to hop into this new tech. This is good for us as it further pushes open source AI technology for any use case.\n\n",
 ' 1aszy6f\nTitle: What are your favorite resources for evaluating text generation for stuff like readability, engagement (and other "soft" metrics)\nLink: https://redd.it/1aszy6f\nContent: Hi everyone, i\'m working on a thesis looking at different prompt engineering methods and trying to evaluate the quality of generated content for stuff like articles, newsletters = human read content. Most research focuses on stuff like factuality, reasoning but I\'m trying

In [33]:
response_chunks = []

print("Generating QA Pairs...")
for chunk in tqdm.tqdm(data_chunks):
    prompt = f"""
    ```
    {chunk}
    ```
    \n
    Your job is to look at this reddit post and to produce several question/answer pairs based on the content provided. 
    Look at the replies also and try to extract informative technical information. 
    Do not produce QA pairs for anything that is not in the provided text. 
    For longer posts (such as ones with a lot of information in the content or with many comments) produce a lot of QA pairs. 
    For posts with less content, produce fewer. Only include QA pairs with general useful information. 
    Do not include anything that could be considered personal information, opinion, or conversational text. 
    Only provide the QA pairs. Do NOT provide introductions or conclusions. Write your answer in this format:

    ```
    Q: What is the colour of the sky?
    A: The colour of the sky is blue.
    ---
    Q: How old is OpenAI? 
    A: OpenAI was founded in 2015, therefore it is 8 years old.
    ```
    """

    response_chunks.append(
        ollama.chat(model='mistral:latest', messages=[{
            'role': 'user',
            'content': prompt,
            'temperature': 0.2},
    ]))

print("Done!")

Generating QA Pairs...


100%|██████████| 975/975 [4:45:06<00:00, 17.55s/it]    

Done!





In [81]:
pp.pprint(response_chunks[99]['message']['content'])

(' Q: What is the current capacity and bandwidth of consumer desktop RAM '
 'compared to that of high-end GPUs?\n'
 'A: Current high-end consumer desktops have up to 64GB of RAM with a maximum '
 'bandwidth of around 32 GBy/s, while high-end GPUs can have VRAM sizes '
 'ranging from 12GB to over 256GB and VRAM bandwidths exceeding 500 GBy/s.\n'
 '---\n'
 'Q: What is the significance of GPU architecture evolution for HPC and AIML?\n'
 'A: The lack of improvement in consumer desktop hardware, such as ECC RAM '
 'capacity, number of memory channels, and available PCIe lanes, hinders the '
 'development of high-performance computing (HPC) and artificial intelligence '
 'machine learning (AIML) applications. GPUs have vastly more VRAM bandwidth '
 'than current consumer desktops, which is crucial for HPC and AIML '
 'workloads.\n'
 '---\n'
 'Q: What is the Tesla M10 GPU and what are its advantages?\n'
 'A: The Tesla M10 is a GPU designed for server use, featuring 32GB of VRAM '
 "and relati

In [46]:
data_chunks[3]

' 1aszfil\nTitle: LM Studio prompt settings for Mixtral 11Bx2 MoE 19B GGUF?\nLink: https://redd.it/1aszfil\nContent: Hello friends.\n\nI downloaded this model, but I have a problem with Its prompt format.\n\n[TheBloke/Mixtral\\_11Bx2\\_MoE\\_19B-GGUF · Hugging Face](https://huggingface.co/TheBloke/Mixtral_11Bx2_MoE_19B-GGUF)\n\nIn Models card I can only see this:\n\nPrompt template: None\n\n{prompt}\n\nBut I think in LM Studio I\'m forced to select a preset. Default preset for LM Studio is not working well.\n\nCan you help me please?\n\nThank you\n\nEdit:\n\nGemini gave me this. Do you think it\'s OK?\n\n  \n\nJSON\n\n{\n\n"model\\_name": "Mixtral\\_11Bx2\\_MoE\\_19B-GGUF",\n\n"model\\_path": "/path/to/model/directory", # Replace with your actual model path\n\n"prompt\\_template": "{prompt}", # No specific prompt template needed based on model card\n\n"batch\\_size": 1,\n\n"sequence\\_length": 2048,\n\n"temperature": 0.7,\n\n"top\\_p": 0.9,\n\n"sampling\\_method": "nucleus",\n\n"nucleu

In [54]:
import pickle
with open('./_output/response1.pkl', "wb") as file:
    pickle.dump(response_chunks, file) 

### Ollama client

`OLLAMA_HOST=127.0.0.1:5050 ollama serve`

In [83]:
from ollama import Client
client = Client(host='http://127.0.0.1:5050')

response_chunks = []

print("Generating QA Pairs...")
for chunk in tqdm.tqdm(data_chunks):
    prompt = f"""
    ```
    {chunk}
    ```
    \n
    Your job is to look at this reddit post and to produce several question/answer pairs based on the content provided. 
    Look at the replies also and try to extract informative technical information. 
    Do not produce QA pairs for anything that is not in the provided text. 
    For longer posts (such as ones with a lot of information in the content or with many comments) produce a lot of QA pairs. 
    For posts with less content, produce fewer. Only include QA pairs with general useful information. 
    Do not include anything that could be considered personal information, opinion, or conversational text. 
    Only provide the QA pairs. Do NOT provide introductions or conclusions. Write your answer in this format:

    ```
    Q: What is the colour of the sky?
    A: The colour of the sky is blue.
    ---
    Q: How old is OpenAI? 
    A: OpenAI was founded in 2015, therefore it is 8 years old.
    ```
    """

    response_chunks.append(
        client.chat(model='llama2:latest', messages=[{
                'role': 'user',
                'content': prompt,
                'temperature': 0.2
        }])
    )

print("Done!")

Generating QA Pairs...


  0%|          | 0/975 [00:00<?, ?it/s]

  0%|          | 2/975 [00:38<5:13:47, 19.35s/it]

In [None]:
with open('./_output/response2_llama.pkl', "wb") as file:
    pickle.dump(response_chunks, file) 