<h1> EXAMPLE: OLLAMA/LANGCHAIN </h1>

https://github.com/jmorganca/ollama/blob/main/docs/tutorials/langchainpy.md

https://python.langchain.com/docs/integrations/llms/ollama

Use langchain to have an ollama model query a database of word embeddings

You must have the ollama server running locally. If you are running a local machine, simply make sure the ollama app is running. If on the ARC (i.e., Open OnDemand), you will need to:

<ul>
    <li>Launch a new Terminal window and ssh into the your ARCC home:<br> <b>ssh username@arcc2.uc.edu</b></li>
    <li>ssh into the compute node Jupyter is running on; this can be found under the "My Interactive Sessions" tab in Open OnDemand: <br><b>ssh compute-xx</b>
    <li>Launch the ollama server: <br><b>ollama serve</b></li>
    <li>Now, applications should be able to access the ollama server on localhost:11434 </li>
</ul>
<b>NOTES:</b>
<ul>
    <li>It's good practice to clone your base model to work off of: <br><b>ollama cp cource_model:tags new_model</b></li>
    <li>It often takes a minute or more for the model to load each session - be patient</li>
    <li>If the model isn't loading in to langchain, may need to ssh into the node with a separate Terminal and <br><b>ollama pull model_name:tags</b></li>
    <li>Check the internet to make sure you are using the best prompt format for your model</li>
</ul>

<h3>Basic query for ollama & mistral</h3>

In [None]:
''' [INST] Instruction [/INST] Example model answer(s) [/INST] Follow-up instructions [/INST] '''

<h1>Import packages</h1>

In [None]:
import os
import time
import pandas as pd
import json
import ast
import pprint as pp

#also needs pip install: GPT4All, chromadb
from langchain.llms import Ollama

#import text loaders
from langchain.docstore.document import Document
from langchain.document_loaders import WebBaseLoader
from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain_community.document_loaders import DataFrameLoader
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader
from langchain_community.document_loaders import UnstructuredFileLoader

import langchain_community.vectorstores.utils as vutils

from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain

#import character splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

#for storing embeddings
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma

# for query
from langchain.chains import RetrievalQA

<h1> Simple query example:</h1>

In [None]:
#load model
ollama = Ollama(base_url='http://localhost:11434', model='mistral:latest')

#query
question = 'Why is the sky blue?'

#example query, for testing
response = ollama(''' [INST] Please be technical and answer in numbered paragraphs:{}" 
        [/INST] JSON format: {{1:"paragraph 1",2:"paragraph 2",...}}
        [/INST] If you don't know the answer, just say so [/INST] '''.format(question))

pp.pprint(response)

<h1>Sentiment analysis</h1>

<h3>On a list</h3>

In [None]:
#load model
ollama = Ollama(base_url='http://localhost:11434', model='mistral:latest',format='json')

#define text for SA
questions = [
    'i love apples',
    'i hate apples',
    'i am indifferent about apples',
    'apples are gross',
    'apples are delicious',
    'apples are yummy!',
    'apples are blechh',
    'apples are apples'
]

#output list
output = []

#for each text
for q in questions:
    
    #get response in json format
    response = ollama(''' [INST] What is the speakers sentiment towards the apples?
    Please answer -1 for negative, 0 for neutral, and 1 for positive. Please return the json only:{} 
        [/INST]Use this format: {{"sentence":"I like apples",sentiment":1,"explanation":"the speaker seems to enjoy apples"}}
        [/INST]Do not follow up with any text[/INST] '''.format(q))
    
    #append to output list
    output.append(response)

    #pause briefly
    time.sleep(.1)

#pasrse output into json file
output_str=''
output_str = ','.join(output)
output_str = '['+output_str+']'
ouput_json = json.loads(output_str)

#make dataframe
pd.DataFrame.from_records(ouput_json)

<h3>...OR Concatenate list into a single string</h3>

In [None]:
#load model
ollama = Ollama(base_url='http://localhost:11434', model='mistral:latest',format='json')

#define text for SA
question_list = [
    'i love apples',
    'i hate apples',
    'i am indifferent about apples',
    'apples are gross',
    'apples are delicious',
    'apples are yummy!',
    'apples are blechh',
    'apples are apples'
]

#join into single delimted string
question_string='; '.join(question_list)

#generate response
response = ollama(''' [INST] What is the speakers sentiment towards the apples in each list element?
    Please answer -1 for negative, 0 for neutral, and 1 for positive. Please return the json only:{} 
        [/INST]The list is delimited by semicolons. Use this output format: {{"sentence":"I like apples",sentiment":1,"explanation":"the speaker seems to enjoy apples"}}
        [/INST]Do not follow up with any text[/INST] '''.format(question_string))

pp.pprint(response)

#format into dataframe
pd.DataFrame.from_records(json.loads(response))

<h1>#1: Load text from web source</h1>

https://python.langchain.com/docs/integrations/document_loaders/web_base

In [None]:
loader = WebBaseLoader("https://americanliterature.com/author/benjamin-franklin/essay/the-morals-of-chess")
data = loader.load()

<h3>Split tokens into smaller chunks if needed</h3>

In [None]:
#may need to tweak chunk size/overlap for better answer
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

<h3>Create word embeddings using ollama model and store in vector database (may take a while)</h3>

In [None]:
#create embedding
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="mistral:latest")

# work with db in memory
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)

# save db to disk: https://python.langchain.com/docs/integrations/vectorstores/chroma
#vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed, persist_directory="./chroma_db")

# load db from disk
# vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=oembed)

<h3>Run query</h3>

In [None]:
#may take some time on first load
ollama = Ollama(base_url='http://localhost:11434', model='mistral:latest')

#queries database for similarities
question=''' [INST] What is Benjamin Franklin talking about here? Please summarize in 3 paragraphs.[/INST] '''

docs = vectorstore.similarity_search(question)

#formats similarities into text response
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
pp.pprint(qachain({'query': question}))

<h1>#2: Multilple local files - map to template</h1>

<h3>Load file data</h3>

In [None]:
#convert csv to json
def csv_to_json(csv_filepath):
    df = pd.read_csv(csv_filepath)
    json_data = df.to_json(orient='records')
    return json_data

CLINICAL_DATA_DIR = 'UC_hackathon_data_draft'

#get local file list
file_names = [f for f in os.listdir(CLINICAL_DATA_DIR) if f.endswith('.csv')]

#create list of imported clinical data and covert to DataFrame
all_clinical_data_l=list() 
for file_name in file_names:
    
    clinical_file_path=os.path.join(CLINICAL_DATA_DIR,file_name)
    clinical_json_str=csv_to_json(clinical_file_path)
    clinical_json_dict=json.loads(clinical_json_str)[0]
    all_clinical_data_l.append(clinical_json_dict)
    
df=pd.DataFrame.from_dict(all_clinical_data_l)

df.head()

<h3>Create model and prompt</h3>

In [None]:
#connect to llm model
ollama = Ollama(base_url='http://localhost:11434', model='mistral:latest')

#take study titles and join into single delimited string
just_titles_l= [study['Study Title'] for study in all_clinical_data_l]
titles_string="; ".join(just_titles_l)

map_template = ''' [INST] You are a factual clincal review assistant.
    Based on the following list of clinical Study Titles please summarize the main medical condition being examined: 
    {clinical_data_list} [/INST] '''

#assign template text to PromptTemplate object
map_prompt = PromptTemplate.from_template(map_template)

#create llm chain
map_chain = LLMChain(llm=ollama, prompt=map_prompt,verbose=False)

#execute prompt based on template, mapping variables to input
response=map_chain.run(clinical_data_list=titles_string)

pp.pprint(response)

<h1>#3: Multiple local files - load from pandas</h1>


<h3>Load files</h3>

In [None]:
#convert csv to json
def csv_to_json(csv_filepath):
    df = pd.read_csv(csv_filepath)
    json_data = df.to_json(orient='records')
    return json_data

CLINICAL_DATA_DIR = 'UC_hackathon_data_draft'

#get local file list
file_names = [f for f in os.listdir(CLINICAL_DATA_DIR) if f.endswith('.csv')]

#create list of imported clinical data and covert to DataFrame
all_clinical_data_l=list() 

for file_name in file_names:
    
    clinical_file_path=os.path.join(CLINICAL_DATA_DIR,file_name)
    clinical_json_str=csv_to_json(clinical_file_path)
    clinical_json_dict=json.loads(clinical_json_str)[0]
    
    all_clinical_data_l.append(clinical_json_dict)
    
df=pd.DataFrame.from_dict(all_clinical_data_l)

#for testing only load some of the documents
loader = DataFrameLoader(df,page_content_column="Study Title")
docs = loader.load()

df.head()

<h3>Create vector store database</h3>

In [None]:
print(len(docs))

#create ebedding
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="mistral:latest")

#Filters out metadata types that are not supported for a vector store.
docs = vutils.filter_complex_metadata(docs)

# work with db in memory
vectorstore = Chroma.from_documents(documents=docs, embedding=oembed)

<h3>Create model and prompt</h3>

In [None]:
#may take some time on first load
ollama = Ollama(base_url='http://localhost:11434', model='mistral:latest')

#queries database for similarities
question = ''' [INST] You are a factual clincal review assistant.Based on the data Study Titles column, 
please summarize the main medical condition being examined[/INST] '''

docs = vectorstore.similarity_search(question)

#formats similarities into text response
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
pp.pprint(qachain({'query': question}))

<h1>#4: Load from local files</h1>

https://python.langchain.com/docs/integrations/document_loaders/csv