# Ollama Quickstart

In this quickstart you will learn how to use models from Ollama as a feedback function provider.

[Ollama](https://ollama.ai/) allows you to get up and running with large language models, locally.

Note: you must have installed Ollama to get started with this example.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/examples/expositional/models/local_and_OSS_models/ollama_quickstart.ipynb)

In [None]:
!pip install trulens trulens-apps-langchain trulens-providers-litellm litellm==1.11.1 langchain==0.0.351

## Setup

### Import from LangChain and TruLens

In [None]:
!pip install trulens-apps-langchain

In [6]:
# Imports main tools:
# Imports from langchain to build app. You may need to install langchain first
# with the following:
# !pip install langchain>=0.0.170
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import ChatPromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate
from trulens.core import Feedback
from trulens.core import TruSession
from trulens.apps.langchain import TruChain

session = TruSession()
session.reset_database()

🦑 Initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `TruSession` to prevent this.


Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]
Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]


### Let's first just test out a direct call to Ollama

In [None]:
!pip install langchain_community

In [None]:
from langchain.llms import Ollama

ollama = Ollama(base_url="http://vpn.jxit.net.cn:11434", model="qwq:latest")
print(ollama("why is the sky blue"))

### Create Simple LLM Application

This example uses a LangChain framework and Ollama.

In [16]:
full_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template="Provide a helpful response with relevant background information for the following: {prompt}",
        input_variables=["prompt"],
    )
)

chat_prompt_template = ChatPromptTemplate.from_messages([full_prompt])

chain = LLMChain(llm=ollama, prompt=chat_prompt_template, verbose=True)

  chain = LLMChain(llm=ollama, prompt=chat_prompt_template, verbose=True)


### Send your first request

In [18]:
prompt_input = "What is a good name for a store that sells colorful socks?"

In [None]:
llm_response = chain(prompt_input)

display(llm_response)

## Initialize Feedback Function(s)

In [None]:
!pip install litellm
!pip install trulens-eval
!pip install trulens-providers-litellm

In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

In [80]:
import os
from subprocess import Popen
from time import sleep
# Pulls model from Ollama. Will be used as evaluating llm and generating llm
p = Popen(["ollama", "serve"])  # something long running
sleep(1)
os.system(f'ollama run llama2')

0

In [90]:
# Initialize LiteLLM-based feedback function collection class:
import litellm
import trulens
from trulens.providers.litellm import LiteLLM

litellm.set_verbose = True

ollama_provider = LiteLLM(
    model_engine="ollama/llama2", api_base="http://vpn.jxit.net.cn:11434"
)

ollama_provider.relevance_with_cot_reasons(
    "What is a good name for a store that sells colorful socks?",
    "Great question! Naming a store that sells colorful socks can be a fun and creative process. Here are some suggestions to consider: SoleMates: This name plays on the idea of socks being your soul mate or partner in crime for the day. It is catchy and easy to remember, and it conveys the idea that the store offers a wide variety of sock styles and colors.",
)

# Define a relevance function using LiteLLM
relevance = Feedback(
    ollama_provider.relevance_with_cot_reasons
).on_input_output()
# By default this will check relevance on the main app input and main app
# output.

WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


RuntimeError: Endpoint LiteLLMEndpoint request failed 4 time(s): 
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}

WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


WARNI [LiteLLM] `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.


SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'temperature': 0.0}


RuntimeError: Endpoint LiteLLMEndpoint request failed 4 time(s): 
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}
	litellm.BadRequestError: Invalid Message passed in {'role': 'system', 'content': "You are a RELEVANCE grader; providing the relevance of the given RESPONSE to the given PROMPT.\nRespond only as a number from 0 to 3, where 0 is the lowest score according to the criteria and 3 is the highest possible score.\n\nCriteria for evaluating relevance:\n\n        - RESPONSE must be relevant to the entire PROMPT to get a maximum score of 3.\n        - RELEVANCE score should increase as the RESPONSE provides RELEVANT context to more parts of the PROMPT.\n        - RESPONSE that is RELEVANT to none of the PROMPT should get a minimum score of 0.\n        - RESPONSE that is RELEVANT and answers the entire PROMPT completely should get a score of 3.\n        - RESPONSE that confidently FALSE should get a score of 0.\n        - RESPONSE that is only seemingly RELEVANT should get a score of 0.\n        - Answers that intentionally do not answer the question, such as 'I don't know' and model refusals, should also be counted as the least RELEVANT and get a score of 0.\n    \n\nA few additional scoring guidelines:\n\n- Long RESPONSES should score equally well as short RESPONSES.\n\n- Never elaborate."}

## Instrument chain for logging with TruLens

In [None]:
tru_recorder = TruChain(
    chain, app_name="Chain1_ChatApplication", feedbacks=[relevance]
)

In [None]:
with tru_recorder as recording:
    llm_response = chain(prompt_input)

display(llm_response)

In [48]:
session.get_records_and_feedback()[0]

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,app_name,app_version,latency,total_tokens,total_cost,cost_currency
0,app_hash_21a017db3a6de5923b722e0525fde47e,"{'tru_class_info': {'name': 'TruChain', 'modul...",LLMChain(langchain.chains.llm),record_hash_7f959fde0f19f846e9484de50e19be0b,What is a good name for a store that sells col...,"<think>\n\nOkay, so I need to come up with a g...",-,{'record_id': 'record_hash_7f959fde0f19f846e94...,"{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2025-03-13T12:33:12.912253"", ""...",2025-03-13T12:34:45.986705,Chain1_ChatApplication,base,93.074226,0,0.0,USD


## Explore in a Dashboard

In [49]:
from trulens.dashboard import run_dashboard

run_dashboard(session)  # open a local streamlit app to explore

# stop_dashboard(session) # stop if needed

Starting dashboard ...
npm warn exec The following package was not found and will be installed: localtunnel@2.0.2

Go to this url and submit the ip given here. your url is: https://pink-goats-speak.loca.lt

  Submit this IP Address: 34.106.255.215



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

## Or view results directly in your notebook

In [50]:
session.get_records_and_feedback()[0]

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,relevance_with_cot_reasons_calls,relevance_with_cot_reasons feedback cost in USD,app_name,app_version,latency,total_tokens,total_cost,cost_currency
0,app_hash_21a017db3a6de5923b722e0525fde47e,"{'tru_class_info': {'name': 'TruChain', 'modul...",LLMChain(langchain.chains.llm),record_hash_7f959fde0f19f846e9484de50e19be0b,What is a good name for a store that sells col...,"<think>\n\nOkay, so I need to come up with a g...",-,{'record_id': 'record_hash_7f959fde0f19f846e94...,"{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2025-03-13T12:33:12.912253"", ""...",2025-03-13T12:34:45.986705,[],0.0,Chain1_ChatApplication,base,93.074226,0,0.0,USD
