In [1]:
# Import required libraries
from search import Search
from prompt import Prompt
from prompt import Methods

  from .autonotebook import tqdm as notebook_tqdm


## How to use the search tool

This notebook demonstrates how the production (prod) search environment can be used and provides examples, it is based on the notebook found in the experimental environment, therefore, please read that one first for more accurate information.
The notebook will mainly focus on:
- Creating `prompt` templates
- Creating and running `Search`

### Requirements:
For the environment to be used the following is required:
- All libraries found in 'requirements.txt' need to be present
- The dataset needs to have 'record_id' as the id column and 'label' (0 or 1). At least one of the following columns need to be included:
* A column with 'openalex' in the name, which contains a link to the article on OpenAlex platform

OR

* 'title' and/or 'abstract' and/or 'keywords'

### `prompt`
`prompt` lets you define the template for your search prompt. Like the experimental version, it supports RAG-style prompts with 'augmentation' (context) and 'prediction' parts. In prod, you have more flexibility in prompt structure and can use a wider variety of patterns and tokens.

Supported placeholders:
- `{record_id}`: The ID of the article in the dataset
- `{label_token}`: Custom label, using `positive_token` and `negative_token`
- `{title}`
- `{abstract}`
- `{keywords}`

*Note*: Use `{}` in the augmentation or prediction string to indicate where a list should appear.

#### `positive_token` and `negative_token`:
Set these for custom labeling schemes. The environment uses them to interpret LLM responses.

#### `prediction_method`
- `Methods.ID`: Model returns a list of IDs for relevant items.
- `Methods.TOKEN`: Model returns a token (e.g., '<POSITIVE>' or '<NEGATIVE>') for each item.
- `Methods.ID_TOKEN`: Model returns both an ID and a token for each item.

In [None]:
# Example: ID-based prompt for prod
prompt = Prompt(
    augmentation='You are given a list of items, each with an "ID" and a "content". Select the most relevant and return their IDs: {}',
    augmentation_item_pattern='{"ID":"{record_id}", content: {title} {abstract} }',
    prediction='{}',
    prediction_item_pattern='{"ID":"{record_id}", content: {title} {abstract} }',
    prediction_method=Methods.ID
)

# Example: TOKEN-based prompt for prod
prompt_token = Prompt(
    augmentation='given the following text: {}',
    augmentation_item_pattern='$$$ {title} {abstract} , STATUS={label_token}  $$$',
    prediction='Predict the STATUS, answer only with <POSITIVE> or <NEGATIVE>: {}',
    prediction_item_pattern='$$${title} {abstract}, STATUS= ',
    positive_token='<POSITIVE>',
    negative_token='<NEGATIVE>',
    prediction_method=Methods.TOKEN
)

### Search Object

The main interface in prod is the `Search` object. Unlike the experimental version's `Experiment`, `Search` is designed for flexible querying and supports a `user_input` parameter for custom queries.

In [None]:
# Example: create a Search object in prod
sch = Search('example_dataset_balanced.csv',
             columns=['title', 'abstract'],
             user_input='Example user query or keywords')

### Assign Prompt and Configure Search

Assign your prompt, set batch settings, and choose a model and approach. In prod, you can use more flexible batch and prompt settings than in the experimental version.

In [None]:
sch.prompt = prompt
sch.set_batch_settings(max_train_batch_size=50, max_predict_batch_size=10, batch_delay=1)
sch.model = 'gemini-2.0-flash' 
sch.api_key='your_api_key_here'
sch.approach = 'active'  # or 'few-shot', 'zero-shot'

### Run 

Run the search 

In [None]:
predictions = sch.run()

### Adding Positive Examples with set_initial_data

You can provide additional positive examples to the Search object using `set_initial_data`. This is useful for few-shot or RAG-style prompting. The method expects a path to an augmentation dataset (CSV) and the columns to use. Only positive examples are expected in the augmentation data.

In [None]:
# Example: add positive examples from an augmentation dataset
sch.set_initial_data('augmentation_dataset.csv', columns=['title', 'abstract'])