## QA Generation

Last time i did this it cost about $10 to create question:answer pairs, this time im gonna use local models to it using ollama.
This will iterate over the posts extracted from r/localllama and generate a QA dataset.

In [97]:
import tqdm
import ollama
import pickle
import pprint as pp


In [98]:
# test
response = ollama.chat(model='mistral:latest', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
    'temperature': 0.01,
  },
])
print(response['message']['content'])

 The reason why the sky appears blue during a clear day is due to a particular type of scattering called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with different gases and particles present in the air. Blue light has a shorter wavelength and gets scattered more easily than other colors in the visible spectrum like red or yellow. This scattered blue light then enters our eyes from all directions, giving the sky its characteristic blue hue.


In [99]:
response

{'model': 'mistral:latest',
 'created_at': '2024-02-18T09:16:28.075788Z',
 'message': {'role': 'assistant',
  'content': " The reason why the sky appears blue during a clear day is due to a particular type of scattering called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with different gases and particles present in the air. Blue light has a shorter wavelength and gets scattered more easily than other colors in the visible spectrum like red or yellow. This scattered blue light then enters our eyes from all directions, giving the sky its characteristic blue hue."},
 'done': True,
 'total_duration': 11635267083,
 'load_duration': 8825254416,
 'prompt_eval_count': 15,
 'prompt_eval_duration': 149672000,
 'eval_count': 93,
 'eval_duration': 2659788000}

In [100]:
# test
response = ollama.generate(model='mistral:latest', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
    'temperature': 0.01,
  },
])
print(response['message']['content'])

TypeError: Client.generate() got an unexpected keyword argument 'messages'

---

### Ollama client

`OLLAMA_HOST=127.0.0.1:5050 ollama serve`

In [30]:
DATA_PATH = "./_output/new/localllama-new-17-02-2024.txt"
with open(DATA_PATH, "r") as file:
    data = file.read()

data_chunks = data.split("---\nPost ID:")

print(f"There are {len(data_chunks)} questions in total")
data_chunks[:3]

There are 975 questions in total


["Post ID: 1at0288\nTitle: Ok, which one of you was this? 🤣🤣🤣\nLink: https://redd.it/1at0288\nContent: \nReplies:\n- No, I don't think OpenAI would ever allow porn to be generated. I rather think that copies of Sora, recreated open source image generators will appear and fullfill this task. Porn is always one of the first use cases in any technologie that appeared and I don't think it'll take long for the industry to hop into this new tech. This is good for us as it further pushes open source AI technology for any use case.\n\n",
 ' 1aszy6f\nTitle: What are your favorite resources for evaluating text generation for stuff like readability, engagement (and other "soft" metrics)\nLink: https://redd.it/1aszy6f\nContent: Hi everyone, i\'m working on a thesis looking at different prompt engineering methods and trying to evaluate the quality of generated content for stuff like articles, newsletters = human read content. Most research focuses on stuff like factuality, reasoning but I\'m trying

In [33]:
from ollama import Client
client = Client(host='http://127.0.0.1:5050')

response_chunks = []

print("Generating QA Pairs...")
for chunk in tqdm.tqdm(data_chunks):
    prompt = f"""
    ```
    {chunk}
    ```
    \n
    Your job is to look at this reddit post and to produce several question/answer pairs based on the content provided. 
    Look at the replies also and try to extract informative technical information. 
    Do not produce QA pairs for anything that is not in the provided text. 
    For longer posts (such as ones with a lot of information in the content or with many comments) produce a lot of QA pairs. 
    For posts with less content, produce fewer. Only include QA pairs with general useful information. 
    Do not include anything that could be considered personal information, opinion, or conversational text. 
    Only provide the QA pairs. Do NOT provide introductions or conclusions. Write your answer in this format:

    ```
    Q: What is the colour of the sky?
    A: The colour of the sky is blue.
    ---
    Q: How old is OpenAI? 
    A: OpenAI was founded in 2015, therefore it is 8 years old.
    ```
    """

    response_chunks.append(
        client.generate(model='mistral:latest', messages=[{
                'role': 'user',
                'content': prompt,
                'temperature': 0.2
        }])
    )

print("Done!")

Generating QA Pairs...


100%|██████████| 975/975 [4:45:06<00:00, 17.55s/it]    

Done!





In [96]:
pp.pprint(data_chunks[99])

(' 1ar7lfq\n'
 'Title: The Dilemma of AI Accelerators: Bridging the Gap for Affordable '
 'Solutions\n'
 'Link: https://redd.it/1ar7lfq\n'
 'Content: Why is there no middle ground with AI accelerators? The current '
 'landscape presents us with either small USB AI accelerators lacking memory '
 'or massive data center solutions like GH200 Superchip, Gaudi, or Graphcore. '
 "It's time we address this issue and advocate for a more balanced approach.\n"
 '\n'
 'GPUs, despite their versatility, fall short in terms of cost-efficiency '
 'compared to purpose-built NPUs/TPUs. A simplified solution, like a 24GB '
 'A2000 or A4000 without unnecessary features such as NVENC or display outs, '
 'could be a game-changer. While not suitable for gaming, it would meet the '
 'majority of consumer AI needs at an affordable price point.\n'
 '\n'
 'Some might argue, "Isn\'t that what workstation/data center GPUs are for?" '
 'Well, not quite. The existing GPUs often have an imbalanced ratio of compute '

In [81]:
pp.pprint(response_chunks[99]['message']['content'])

(' Q: What is the current capacity and bandwidth of consumer desktop RAM '
 'compared to that of high-end GPUs?\n'
 'A: Current high-end consumer desktops have up to 64GB of RAM with a maximum '
 'bandwidth of around 32 GBy/s, while high-end GPUs can have VRAM sizes '
 'ranging from 12GB to over 256GB and VRAM bandwidths exceeding 500 GBy/s.\n'
 '---\n'
 'Q: What is the significance of GPU architecture evolution for HPC and AIML?\n'
 'A: The lack of improvement in consumer desktop hardware, such as ECC RAM '
 'capacity, number of memory channels, and available PCIe lanes, hinders the '
 'development of high-performance computing (HPC) and artificial intelligence '
 'machine learning (AIML) applications. GPUs have vastly more VRAM bandwidth '
 'than current consumer desktops, which is crucial for HPC and AIML '
 'workloads.\n'
 '---\n'
 'Q: What is the Tesla M10 GPU and what are its advantages?\n'
 'A: The Tesla M10 is a GPU designed for server use, featuring 32GB of VRAM '
 "and relati

In [54]:
# save as pickle file
with open('./_output/new/response1.pkl', "wb") as file:
    pickle.dump(response_chunks, file) 