In [1]:
#install requirements
!pip install -r requirements.txt

Looking in indexes: https://pypi.python.org/simple


In [3]:
from runner import Runner


## Instructions

Before use, ensure you have the following:

- Downloaded GPT4ALL model, see readme for link
- Put your data sources in the sources_documents folder
- Downloaded all requirements from requirements.txt (run the first cell - reminder you need Python 3.10 or above)

To execute this notebook follow the following steps:

1. Ensure you load your data sources into the 'sources_documents' folder, this is the default source directory. You can also change the location inside settings.py. Currently there is support for .pdf, .txt and .csv. It is able to create embeddings from multiple datasources

2. Run the pipeline, use the args to specify which steps you would like to run.
<br>

***WARNING! Embedding step can take a lot of time depending on your hardware (e.g. if you have a GPU that supported CUDA), you should ensure this is done before live demos***

<br>

3. Outputs by default are stored to the output folder, by default it is data/output. A new .csv is created for each query. If you want to save the result, need to set the save_output flag to True on runner() to save the output to a .csv

### Passing in Queries

To pass in queries there are a few options:

- Pass a query directly into the query function
- Read a query from a .txt file by default use queries/queries.txt, to change this you can update settings.py. NB passing in more than one will likely lead in a memory error due to the fact the vectorDB is stored locally in memory. It could work with stronger hardware, it hasn't been tested
- Using user input via terminal



In [4]:
!python runner.py --help


[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1mrunner.py [OPTIONS][0m[1m                                                    [0m[1m [0m
[1m                                                                                [0m
 specify the pipeline to run using arguments, if using user input, specify the  
 question to ask and when done enter 'exit' to save output, if not, specify the 
 questions to ask in the queries file                                           
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-query[0m                              [1;33mTEXT[0m  [2m[default: None][0m                   [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-preprocess[0m     [1;35m-[0m[1;35m-no[0m[1;35m-preprocess[0m     [1;33m    

In [3]:
### Run full pipeline ###
# preprocess input data
# get embeddings from input data
# run query

#query="what product to use to treat a cough"
#runner = Runner(config_name="config.yml")
#answers_df = runner.run(query=query, preprocess=True,ingest=True,save_output=False)

In [3]:
### Query Only using writer LLM API - for this to work you will have needed to run the above cell successfully ###

query="what product to use to treat a sore throat"
runner = Runner(config_name="writer_config.yml")
answers_df = runner.run(query=query, preprocess=False,ingest=False,save_output=False)

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
INFO:chromadb:Running Chroma using direct local API.
INFO:clickhouse_connect.driver.ctypes:Successfully imported ClickHouse Connect C data optimizations
INFO:clickhouse_connect.driver.ctypes:Successfully import ClickHouse Connect C/Numpy optimizations
INFO:clickhouse_connect.json_impl:Using python library for writing JSON byte strings


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

INFO:chromadb.db.duckdb:loaded in 122148 embeddings
INFO:chromadb.db.duckdb:loaded in 1 collections
INFO:chromadb.db.duckdb:collection with name langchain already exists, returning existing collection


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

INFO:question_answer:

> Question:
INFO:question_answer:what product to use to treat a sore throat
INFO:question_answer:
> Answer:
INFO:question_answer:{"choices":[{"text":"The product to use to treat a sore throat is Throat Fever Cough Our Pharmacists Recommend Grape Flavor 4 FL OZ 118 mL 90345multisymptom fever  coldjpg.","logprobs":null}]}
INFO:question_answer:
> source_documents/drug_context_cleaned.csv:
INFO:question_answer:
> source_documents/drug_context_cleaned.csv:
INFO:question_answer:Throat Fever Cough Our Pharmacists Recommend Grape Flavor 4 FL OZ 118 mL 90345multisymptom fever  coldjpg
INFO:question_answer:
> source_documents/drug_context_cleaned.csv:
INFO:question_answer:If sore throat is severe persists for more than 2 days is accompanied or followed by fever headache rash nausea or vomiting consult a doctor promptly Do not use with any other drug containing acetaminophen prescription or nonprescription If you are not sure whether a drug contains acetaminophen ask a doct

In [4]:
answers_df


Unnamed: 0,question,answer,source_documents
0,what product to use to treat a sore throat,"{""choices"":[{""text"":""The product to use to tre...",{'sources_document': ['source_documents/drug_c...


In [5]:
answers_df['answer'][0]

'{"choices":[{"text":"The product to use to treat a sore throat is Throat Fever Cough Our Pharmacists Recommend Grape Flavor 4 FL OZ 118 mL 90345multisymptom fever  coldjpg.","logprobs":null}]}'

In [6]:
answers_df['source_documents'][0]

{'sources_document': ['source_documents/drug_context_cleaned.csv'],
 'sources_row': [9967],
 'sources_content': ['If sore throat is severe persists for more than 2 days is accompanied or followed by fever headache rash nausea or vomiting consult a doctor promptly Do not use with any other drug containing acetaminophen prescription or nonprescription If you are not sure whether a drug contains acetaminophen ask a doctor or pharmacist if you are now taking a prescription monoamine oxidase inhibitor MAOI certain drugs for depression psychiatric or emotional conditions or Parkinsons disease or for 2 weeks after']}

In [5]:
### Query Only using writer LLM deployed as inference endpoint - for this to work you will have needed to run the above cell successfully ###


query="what product to use to treat a sore throat"
runner = Runner(config_name="hf_config.yml")
answers_df = runner.run(query=query, preprocess=False,ingest=False,save_output=False)

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
INFO:chromadb:Running Chroma using direct local API.
INFO:clickhouse_connect.driver.ctypes:Successfully imported ClickHouse Connect C data optimizations
INFO:clickhouse_connect.driver.ctypes:Successfully import ClickHouse Connect C/Numpy optimizations
INFO:clickhouse_connect.json_impl:Using python library for writing JSON byte strings


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

INFO:chromadb.db.duckdb:loaded in 122148 embeddings
INFO:chromadb.db.duckdb:loaded in 1 collections
INFO:chromadb.db.duckdb:collection with name langchain already exists, returning existing collection


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

ValueError: Error raised by inference API: 'start_logits'

In [16]:
answers_df


Unnamed: 0,question,answer,source_documents
0,what product to use to treat a sore throat,"{""choices"":[{""text"":""The product to use to tre...",{'sources_document': ['source_documents/drug_c...


In [14]:
answers_df['answer'][0]

'{"choices":[{"text":"The product to use to treat a sore throat is Throat Fever Cough Our Pharmacists Recommend Grape Flavor 4 FL OZ 118 mL 90345multisymptom fever  coldjpg.","logprobs":null}]}'

In [11]:
answers_df['source_documents'][0]

{'sources_document': ['source_documents/drug_context_cleaned.csv'],
 'sources_row': [9967],
 'sources_content': ['If sore throat is severe persists for more than 2 days is accompanied or followed by fever headache rash nausea or vomiting consult a doctor promptly Do not use with any other drug containing acetaminophen prescription or nonprescription If you are not sure whether a drug contains acetaminophen ask a doctor or pharmacist if you are now taking a prescription monoamine oxidase inhibitor MAOI certain drugs for depression psychiatric or emotional conditions or Parkinsons disease or for 2 weeks after']}