## Installation 

Not required if you are running this notebook on Binder. If you run this notebook locally or on your own host environment (e.g. Colab, JupyterLab), you need to install the Python dependencies first (recommend to setup a virtual Python environment first). To install the dependencies, uncomment and run the commands in the following cell.

In [None]:
# !git clone https://github.com/Sydney-Informatics-Hub/LCT_sequencing
# %cd LCT_sequencing
# %pip install -r requirements.txt

## Import dependencies

This assumes you have a copy of the repo in your parent directory including the folder "tools".

In [1]:
# Add package libraries in folder tools
import sys
import time
import os
sys.path.insert(1, '..')
#from llm.load_schema_json import load_json, validate_json, json_to_dataframe
from llm.utils_llm import openai_apikey_input
from llm.llmprocess import LLMProcess
#from llm.excel_json_converter import excel_to_json

#to open notebook in interactive mode, uncomment following line
#init_notebook_mode(all_interactive=True)

## Schema and Input data

Define locations of files for sequencing types, instruction prompts, and examples.

In [2]:
outpath = "./results_process_test/"

# Path to schemas and excel files for definitions and examples:
path_schema = "../schemas/"

# Path to data files:
path_data = "../tests"

# Filename for sequencing definitions (.json or .xlsx), assumed to be in folder path_schema:
filename_definitions = "sequencing_types.xlsx"

# Filename for prompt instructions, assumed to be in folder path_schema:
filename_zero_prompt = "instruction_multiprompt.txt"

# Filename for clausing pairs, assumed to be in path data:
filename_pairs = "sequences_test.csv"

# Filename for text to be claused, assumed to be in path data:
filename_text = "reference_text.txt"

# Filename for examples (.json or .xlsx), assume to be in folder path_data:
filename_examples = "sequencing_examples.xlsx"

## OpenAI Authentication

Authentication with your OpenAI API key for GPT usage (default model: GPT-3.5).
Note that this will incur charges on your OpenAI account.
The widget below allows you to enter your password string using an obfuscated text input box.

In [3]:
openai_apikey_input()

BokehModel(combine_events=True, render_bundle={'docs_json': {'ca931b4d-8ceb-4fec-bf28-541bebd4866a': {'version…

## Run process pipeline

All output files are saved in the output path folder as specified above.

In [4]:
# initiate LLM process
llm_process = LLMProcess(filename_pairs=os.path.join(path_data,filename_pairs),
                        filename_text=os.path.join(path_data, filename_text),
                        filename_examples=os.path.join(path_data, filename_examples),
                        filename_definitions=os.path.join(path_schema, filename_definitions),
                        filename_zero_prompt=os.path.join(path_schema, filename_zero_prompt),
                        outpath=outpath)
    

### Estimate costs

In [6]:
compute_cost = llm_process.estimate_compute_cost(path_cost = '../schemas/openai_pricing.json')
print(compute_cost)

{'compute_time': 18, 'costs': 0.0171}


## Run process

In [10]:
start_time = time.time()

llm_process.run()

compute_time = time.time() - start_time

Processed samples 8 to 8. Number of sequencing classes found: 1

In [11]:
print(f'Compute time: {round(compute_time,1)} seconds')

Compute time: 16.3 seconds
