# Batch Run Example

This example demonstrates how to run build cot Search Results on a batch of question/answer pairs using the CoTBuilder class. Results are saved to a files in jsonl format as well as saved in memory.

**Imports**

This example will use LMStudio for local model serving and OpenAI for generation via external API. CoT-Forge also supports major LLM providers like Gemini, Groq, and Anthropic. Download LMStudio [here](https://lmstudio.ai/), download a model and run a local server.

In [1]:
from datasets import load_dataset

from cot_forge.llm import LMStudioProvider, OpenAIProvider
from cot_forge.reasoning import CoTBuilder, NaiveLinearSearch
from cot_forge.reasoning.verifiers import LLMJudgeVerifier

#### Dataset
The dataset we will use is [medical-o1-verifiable-problem by FreedomIntelligence on HuggingFace](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem). This dataset contains a set of open-ended medical questions and ground truth answers.



In [2]:
ds = load_dataset("FreedomIntelligence/medical-o1-verifiable-problem", split="train[:200]")
problems = ds["Open-ended Verifiable Question"]
solutions = ds["Ground-True Answer"]

print(f"Problem #1: {problems[0]}")
print(f"Solution #1: {solutions[0]}")

Problem #1: An 88-year-old woman with osteoarthritis is experiencing mild epigastric discomfort and has vomited material resembling coffee grounds multiple times. Considering her use of naproxen, what is the most likely cause of her gastrointestinal blood loss?
Solution #1: Gastric ulcer


#### Using CoTBuilder class to run batch build

In [None]:
# We'l use meta-llama-3-8b-instruct as our search_llm.
# gpt-4o will be used for the verifier.
llama = LMStudioProvider(model_name="meta-llama-3-8b-instruct")
gpt_4o = OpenAIProvider(model_name="gpt-4o")

builder = CoTBuilder(
    search_llm=gpt_4o, # generates reasoning steps
    search=NaiveLinearSearch(), # Naive linear search chooses random reasoning steps in a chain
    verifier=LLMJudgeVerifier(llm_provider=llama, strict=False), # llama to verify answers
    post_processing_llm=llama, # converts reasoning into natural language
    dataset_name="medical-o1-verifiable-problem", # dataset name, used for folder structure
    base_dir= "./data", # base directory to save the results
)

Let's process our dataset using the process_batch() method of CoTBuilder. This method can be run with multi-threading to speed up the process or with a single thread. Because we supplied a dataset_name parameter, the results will be saved to a file in jsonl format.

In [4]:
results = builder.process_batch(
    questions=problems, # List of questions
    ground_truth_answers=solutions, # List of ground truth answers
    multi_thread=True, # Use multi-threading for processing
    max_workers=4, # Number of workers to use for processing
    load_processed=True, # Load previously processed results if available
    only_successful=False, # Only process successful results into natural language
    overwrite=True, # Overwrite existing results if any collisions occur
    limit=20 # Limit the number of questions to process
)

Multi-thread processing question and ground truth answer pairs.: 100%|██████████| 20/20 [09:02<00:00, 27.11s/pair]


#### Examining results

Results are returned as a list of tuples, each with a [SearchResult](../src/cot_forge/reasoning/types.py#L137) object and a dictionary with the natural language reasoning text.

In [29]:
sample_result, reasoning = results[7]
print(sample_result)

SearchResult(success=True, question=A 46-year-old Caucasian male w..., num_terminal_nodes=1, num_successful_nodes=1, successful_answers=["The most appropriate initial diagnostic test for this patient's acute neurological condition is a CT scan of the head. It offers a rapid evaluation to identify any significant lesions or hemorrhages."])


Our search result was a success! Let's dig in further.

In [30]:
# Each SearchResult object contains a list of terminal nodes.
# Because we used a linear search, there is only one chain and therefore one terminal node.
terminal_node = sample_result.terminal_nodes[0]
for i, node in enumerate(terminal_node.get_full_node_chain()):
  print(f"Step {i}: {node}")

Step 0: ReasoningNode(strategy=initialize, success=False, final=False, cot_steps=4)
Step 1: ReasoningNode(strategy=explore_new_paths, success=True, final=True, cot_steps=6)


### Natural language reasoning
Let's examine the natural language reasoning text from our sample result.

In [32]:
print(reasoning['chain_of_thought_responses'][0])

<thinking>
Hmm, an HIV patient with a low CD4 count and neurological symptoms - that's a red flag. The sudden onset of right hand weakness and high fever makes me think something's going on in his brain.

Oh, I need to consider what could be causing this. Given the patient's history and symptoms, conditions like cerebral toxoplasmosis, primary CNS lymphoma, or progressive multifocal leukoencephalopathy (PML) come to mind. And with that fever, it's likely an infectious process is at play.

Wait a minute... in acute settings like this, speed is crucial. We need something quick and dirty to guide our next steps. MRI would be ideal for getting a detailed picture of what's going on in his brain, but it's not the fastest option.

Actually, a CT scan might be just what we need here. It's quicker than an MRI and can give us some immediate information about major lesions or hemorrhages. Plus, it can rule out life-threatening issues right off the bat.

Also, considering the patient's history of 

#### Results storage

In our CoTBuilder object, we specified `dataset_name="medical-o1-verifiable-problem"` and `base_dir= "./data"`. This means that the results will be saved to a folder called `./data/medical-o1-verifiable-problem` in the current working directory. Additionally, another folder at `./data/medical-o1-verifiable-problem/naive_linear_search` was created to store the results of the naive linear search. If a different search algorithm were used, the folder name would reflect that and results would be stored there instead to avoid overwriting.

Generally, the results will be saved in a folder structure like this:

```bash
base_dir/
└── dataset_name/
    ├── search_algorithm/
    │   ├── config.json
    │   ├── metadata.json
    │   ├── results.jsonl
    │   └── reasoning.jsonl
```

In [46]:
import os

print(os.listdir("./data/medical-o1-verifiable-problem/naive_linear_search"))

['reasoning.jsonl', 'config.json', 'metadata.json', 'results.jsonl']


The files contain the following information:
- `config.json`: The configuration of the CoTBuilder object used to run the search.
- `metadata.json`: Metadata about search progress such as completed items, successful items, last updated time, etc.
- `results.jsonl`: Serialized [SearchResult](../src/cot_forge/reasoning/types.py#L137) objects in jsonl format. They can be deserialized using the SearchResult.deserialize() method.
- `reasoning.jsonl`: Dictionaries with the natural language reasoning text. Each dictionary contains the following keys:
  - `question`: The question that was asked.
  - `ground_truth`: The answer that was given.
  - `chain_of_thought_responses`: A list of strings, representing the processed reasoning text for each stored chain in a processed question.

In [48]:
# Looking at the config file
import json

with open("./data/medical-o1-verifiable-problem/naive_linear_search/config.json") as f:
    config = json.load(f)
print(config)

{'search_llm': {'model_name': 'gpt-4o', 'input_token_limit': None, 'output_token_limit': None}, 'post_processing_llm': {'model_name': 'meta-llama-3-8b-instruct', 'input_token_limit': None, 'output_token_limit': None}, 'search': {'class_name': 'NaiveLinearSearch', 'name': 'naive_linear_search', 'description': 'A sequential search algorithm that randomly selects and applies reasoning strategies to build a chain of thought. Continues until verification succeeds or max depth is reached.', 'max_depth': 3}, 'verifier': {'name': 'llm_judge_verifier', 'description': 'A basic LLM judge verifier that compares a final answer with a ground truth answer.', 'llm_provider': {'model_name': 'meta-llama-3-8b-instruct', 'input_token_limit': None, 'output_token_limit': None}, 'llm_kwargs': {}, 'prompt_template': 'You are an answer judge.\nYou are tasked with verifying the correctness of an answer to a question.\nVerify if the provided answer successfully matches the ground truth answer.\nThey do not need 