In [1]:
prompt_template = """Generate two texts related to the topic of {topic}, both written in German and each being at least 100 words long. The texts should be generated in a style typical to answer the question. Never create generic texts explaining how or where to search. 

The first text should be a hard negative example. It should be related to the question or search string about {topic}, but it shouldn't answer the question:
{questions}
This text should talk about the topic in a similar way but avoid giving the answer under any circumstances. For instance, if the question is "When is Costco open?", the hard negative example might discuss Walmart's opening hours instead. Remember, the hard negative example should never give the answer to the questions.

The second text should be a positive example. It must provide the solution to the question:
{questions}
This text should be an accurate and informative piece that fully explores the topic and answers the questions. Craft a response that directly tackles the underlying question by providing a specific answer, search result, or solution, rather than giving broad advice or unrelated information. 
For example, if the question is "Search for information about the history of Berlin", provide a detailed account of Berlin's history, rather than general advice on how or where to search for historical information. Mimic the style of results the question searches for! Both texts should be of similar length to ensure consistency in comparison and should be written in German."""



# prompt_template = """You have been assigned a retrieval task {topic}
# With the following queries: 
# {questions}

# Your mission is to write one text retrieval example for this task with the following elements:
# - "positive_document": a relevant document for the query.
# - "hard_negative_document": a hard negative document that only appears relevant to the query.

# Please adhere to the following guidelines:
# - All documents must be created independent of the query. Avoid copying the query verbatim. Itâ€™s acceptable if some parts of the "positive_document" are not topically related to the query.
# - All documents should be at least 100 words long.
# - The "hard_negative_document" contains some useful information, but it should be less useful or comprehensive compared to the "positive_document".
# - The documents should be in german.
# - Do not provide any explanation in any document on why it is relevant or not relevant to the query.

# - Both the query and documents require college level education to understand."""





response_template = """Hard negative example (not containing the answer to the questions!):\n"""

In [2]:
import torch 
import vllm 
import pandas as pd 
from vllm import SamplingParams
from transformers import AutoTokenizer

model_name = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
sampling_params = SamplingParams(temperature=0.1, max_tokens=16000)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
llm = vllm.LLM(model=model_name, quantization="gptq", dtype=torch.float16, tensor_parallel_size=2, max_model_len=16000, revision="gptq-4bit-32g-actorder_True", gpu_memory_utilization=0.75)



2024-01-30 14:24:35,555	INFO worker.py:1724 -- Started a local Ray instance.


INFO 01-30 14:24:36 llm_engine.py:72] Initializing an LLM engine with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ', tokenizer_mode=auto, revision=gptq-4bit-32g-actorder_True, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=16000, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=gptq, enforce_eager=False, seed=0)
INFO 01-30 14:24:41 weight_utils.py:164] Using model weights format ['*.safetensors']
[36m(RayWorkerVllm pid=1715487)[0m INFO 01-30 14:24:42 weight_utils.py:164] Using model weights format ['*.safetensors']
INFO 01-30 14:24:54 llm_engine.py:316] # GPU blocks: 1955, # CPU blocks: 4096
INFO 01-30 14:24:54 model_runner.py:625] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-30 14:24:54 model_runner



[36m(RayWorkerVllm pid=1715487)[0m INFO 01-30 14:25:29 model_runner.py:689] Graph capturing finished in 35 secs.
INFO 01-30 14:25:29 model_runner.py:689] Graph capturing finished in 35 secs.


In [3]:
import pandas as pd
import numpy as np 
df = pd.read_parquet("03_parsed_questions.parquet")
df[["Positive", "Hard Negative"]] = np.nan
df = df.iloc[::-1]

In [4]:
from tqdm import tqdm 

def generate_prompt(row):
    row = row.fillna("")
    questions = "\n".join(row[["Imperative Form", "Question", "Search String"]].str.removesuffix('"').str.removeprefix('"').to_list())
    topic = row["topic"]
    formatted_prompt = tokenizer.apply_chat_template(conversation=[
        {"role": "user", "content":prompt_template.replace("{questions}", str(questions)).replace("{topic}", str(topic))},
        {"role": "assistant", "content":response_template}
        ], tokenize=False)
    formatted_prompt = formatted_prompt.removesuffix("</s>")
    return formatted_prompt


BATCH_SIZE = 32

df = pd.read_parquet("04_results_texts_v3.parquet")
df_nan = df[df["raw_texts"]=="nan"]


for i in tqdm(range(0, len(df_nan), BATCH_SIZE)):
    batches = df_nan[["topic", "Imperative Form", "Question", "Search String"]].iloc[i:i+BATCH_SIZE]
    formatted_prompt =[generate_prompt(batch) for n, batch in batches.iterrows()]
    results = llm.generate(formatted_prompt, sampling_params=sampling_params)
    results_adj = [result.prompt.split("[/INST]")[-1]+ result.outputs[0].text for result in results]
    df.loc[batches.index, 'raw_texts'] = results_adj
    df.to_parquet("04_results_texts_v3.parquet")   



  0%|          | 0/2583 [00:00<?, ?it/s]