# Using LangChain with Ollama in Python

Let's imagine we are studying The Youth by Isaax Assimov. We might have a question about Slim and Red. If you ask llama3 for that info, you may get model hallucinations. 

This sounds like a typical censored response, but even llama3 gives a mediocre answer:

> In Stephen Crane's novel "The Youth", Slim and Red are two main characters who have a significant interaction. Here's a brief summary:

So let's figure out how we can use **LangChain** with Ollama to ask our question to the actual document, [The Youth](https://www.gutenberg.org/cache/epub/31547/pg31547-images.html) by Isaac Assimov, using Python.

Let's start by asking a simple question that we can get an answer to from the **Llama2** model using **Ollama**. First, we need to install the **LangChain** package:

`pip install langchain_community`

Then we can create a model and ask the question:

In [2]:
from langchain_community.llms import Ollama

question = "Can you summarize the interaction between Slim and Red in the Youth by Isaac Assimov?"
ollama = Ollama(
    base_url='http://localhost:11434',
    model="llama2"
)
print(ollama.invoke(question))


 I apologize, but there is no story called "The Youth by Isaac Asimov" that features a character named Slim or Red. It's possible that you may be thinking of a different author or work. Could you please provide more information or context about the story you are asking about?


### Create RAG System

Wrong answerr. Hallucinating.
Now let's load a document to ask questions against. I'll load up the Youth by Isaac Assimov, which you can find at Project Gutenberg. We will need **WebBaseLoader** which is part of **LangChain** and loads text from any webpage. On my machine, I also needed to install **bs4** to get that to work, so run `pip install bs4`.


In [4]:
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.gutenberg.org/cache/epub/31547/pg31547-images.html")
data = loader.load()

Llama2 context size is 4000 tokens, which means the full document won't fit into the context for the model. So we need to split it up into smaller pieces.

It's split up, but we have to find the relevant splits and then submit those to the model. We can do this by creating embeddings and storing them in a vector database. We can use Ollama directly to instantiate an embedding model. We will use ChromaDB in this example for a vector database which is provided by Langchain for free. 

We also need to pull embedding model: `ollama pull nomic-embed-text`

You can read more about Ollama supported embedding models [here](https://ollama.com/blog/embedding-models).


In [7]:
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma

loader = WebBaseLoader("https://www.gutenberg.org/cache/epub/31547/pg31547-images.html")
data = loader.load()
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
# create vector embeddings
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")

# load splits to vector db
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

Now let's ask a question from the document. **Can you summarize the interaction between Slim and Red in the Youth by Isaac Assimov?** Neleus is a character in the Odyssey, and the answer can be found in our text.

In [7]:
docs = vectorstore.similarity_search(question)
len(docs)  

4

This will output the number of matches for chunks of data similar to the search.

The next thing is to send the question and the relevant parts of the docs to the model to see if we can get a good answer. But we are stitching two parts of the process together, and that is called a chain. This means we need to define a chain:

In [8]:
from langchain.chains import RetrievalQA
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
res = qachain.invoke({"query": question})
print(res['result'])

According to the provided context, here is a summary of the interaction between Slim and Red:

Slim comes into the room unexpectedly and starts talking about having something that can get them into the circus. He proposes starting their own circus and becoming the biggest circus-fellows in the world. Red initially agrees to go along with the plan, but then takes it back after realizing that their parents might not approve of their idea. Slim seems disappointed by this turn of events, as he had been excited about the prospect of having a space-ship scout-ship.


## Tracing using LangSmith

In [3]:
# Load API key from secrets.json

import os
import json

os.environ["LANGCHAIN_TRACING_V2"] = "true"


def get_secrets():
    with open('secrets.json') as secrets_file:
        secrets = json.load(secrets_file)

    return secrets


if __name__ == "__main__":
    secrets = get_secrets()
    os.environ["LANGCHAIN_API_KEY"]  = secrets.get("LANGCHAIN_API_KEY")


### Create Dataset

In [14]:
from langsmith import Client

client = Client()

# Define dataset: these are your test cases
dataset_name = "The Youth QA Dataset"
dataset = client.create_dataset(dataset_name)
client.create_examples(
    inputs=[
        {"question": "What is the significance of the title 'Youth'?"},
        {"question": "Describe the relationship between Red and Slim in the story."},
        {"question": "How does Asimov use the theme of first contact in 'Youth'?"},
        {"question": "What is the twist at the end of the story 'Youth'?"},
        {"question": "What message does Asimov convey about the differences between children and adults?"},
    ],
    outputs=[
        {"answer": "The title 'Youth' reflects the story's focus on the perspectives and actions of the young characters, Red and Slim. Their innocence and adventurous spirit contrast sharply with the adult world of negotiations and hidden agendas. The title also highlights the theme of perception and misunderstanding, as the boys' innocent misinterpretation of the situation leads to the story's twist."},
        {"answer": "Red and Slim share a close friendship based on their mutual curiosity and love for adventure. They are typical boys, eager to explore and discover new things. Their relationship is marked by innocence and a sense of wonder, which contrasts with the more serious and cautious interactions of the adults in the story."},
        {"answer": "Asimov uses the theme of first contact to explore the potential for both cooperation and misunderstanding between different species. The adults are engaged in serious negotiations, unaware that the animals the boys found are actually alien beings. This twist highlights how assumptions can lead to misunderstandings and how the innocence of youth can reveal truths that adults might overlook."},
        {"answer": "The twist at the end of 'Youth' is that the two animals that Red and Slim have found are actually the offspring of an alien species. This revelation turns the story on its head, as the adults' negotiations and the boys' innocent play are shown to be interconnected in a way that neither group understood."},
        {"answer": "Asimov conveys that children and adults perceive the world very differently. Children see the world with innocence and curiosity, often leading them to discover truths that adults might miss due to their preconceived notions and serious concerns. The story suggests that a balance of both perspectives can be valuable, and that sometimes, the simplicity of a child's view can uncover profound truths."},
    ],
    dataset_id=dataset.id,
)

Craft the Prompt using Prompt Template

In [4]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

from datetime import datetime

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful knowledgeable assistant, trained to answer"
            " questions about the Youth by Isaac Assimov."
            "'Youth' is set in a future where humanity has achieved interstellar travel and encounters alien civilizations."
            "\nThe current time is {time}.\n\nRelevant documents will be retrieved in the following messages.",
        ),
        ("system", "{context}"),
        ("human", "{question}"),
    ]
).partial(time=str(datetime.now()))

model = Ollama(
    base_url='http://localhost:11434',
    model="llama2"
)
response_generator = prompt | model | StrOutputParser()

### Create an Evaluator LLM
Here we will use Llama3 to evaulate the reponses of Llama2 on the dataset

In [5]:
eval_model = Ollama(
    base_url='http://localhost:11434',
    model="llama2"
)

#### Finally, assemble the full chain!

In [8]:
# The full chain looks like the following
from operator import itemgetter

chain = (
    # The runnable map here routes the original inputs to a context and a question dictionary to pass to the response generator
    {
        "context": itemgetter("question")
        | retriever
        | (lambda docs: "\n".join([doc.page_content for doc in docs])),
        "question": itemgetter("question"),
    }
    | response_generator
)

#### Evaluate the Chain

Manually comparing the results of chains in the UI is effective, but it can be time consuming. It can be helpful to use automated metrics and AI-assisted feedback to evaluate your component's performance.

Below, we will create a custom run evaluator that logs a heuristic evaluation.

Heuristic evaluators

In [20]:
from langsmith.evaluation import EvaluationResult, run_evaluator
from langsmith.schemas import Example, Run


@run_evaluator
def check_not_idk(run: Run, example: Example):
    """Illustration of a custom evaluator."""
    agent_response = run.outputs["output"]
    if "don't know" in agent_response or "not sure" in agent_response:
        score = 0
    else:
        score = 1
    # You can access the dataset labels in example.outputs[key]
    # You can also access the model inputs in run.inputs[key]
    return EvaluationResult(
        key="not_uncertain",
        score=score,
    )

Below, we will configure the evaluation with the custom evaluator from above, as well as some pre-implemented run evaluators that do the following:

* Compare results against ground truth labels.
* Checks if the response has sufficient amount of detail
* Evaluate 'aspects' of the agent's response in a reference-free manner using custom criteria


In [24]:
# from langchain.smith import RunEvalConfig

# eval_config = RunEvalConfig(
#     evaluators=["qa", "cot_qa", "context_qa", "labeled_criteria"],
#     eval_llm=eval_model
# )


from langchain.evaluation import EvaluatorType
from langchain.smith import RunEvalConfig

evaluation_config = RunEvalConfig(
    eval_llm=eval_model,
    # Evaluators can either be an evaluator type (e.g., "qa", "criteria", "embedding_distance", etc.) or a configuration for that evaluator
    evaluators=[
        # Chain of thought question answering evaluator, which grades answers to questions using chain of thought ‘reasoning’.
        EvaluatorType.COT_QA,
        # Question answering evaluator that incorporates ‘context’ in the response.
        EvaluatorType.CONTEXT_QA,
        # Grade whether the output satisfies the stated criteria.
        RunEvalConfig.LabeledCriteria("detail"),
        # The LabeledScoreString evaluator outputs a score on a scale from 1-10.
        # You can use default criteria or write our own rubric
        RunEvalConfig.LabeledScoreString(
            {
                "accuracy": """
Score [[1]]: The answer is completely unrelated to the reference.
Score [[3]]: The answer has minor relevance but does not align with the reference.
Score [[5]]: The answer has moderate relevance but contains inaccuracies.
Score [[7]]: The answer aligns with the reference but has minor errors or omissions.
Score [[10]]: The answer is completely accurate and aligns perfectly with the reference."""
            },
            normalize_by=10,
        ),
    ],
    custom_evaluators=[check_not_idk],

)

Run the evaluation. This makes predictions over the dataset and then uses a set of evaluators to check the correctness on each data point.

In [26]:
from uuid import uuid4
unique_id = uuid4().hex[0:8]

chain_results = await client.arun_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=lambda: chain,
    evaluation=evaluation_config,
    project_name=f"youth-dataset-{unique_id}",
    verbose=True
)

View the evaluation results for project 'youth-dataset-217d9409' at:
https://smith.langchain.com/o/103e639e-1fea-5efb-81b6-6b537ff4132d/datasets/fc9382e7-72de-416e-a423-9e372b0ef23b/compare?selectedSessions=328ec4e9-fd05-42ad-b04c-f4b25120c60b

View all tests for Dataset The Youth QA Dataset at:
https://smith.langchain.com/o/103e639e-1fea-5efb-81b6-6b537ff4132d/datasets/fc9382e7-72de-416e-a423-9e372b0ef23b
[------------------------------------------------->] 5/5

Unnamed: 0,feedback.score_string:accuracy,feedback.not_uncertain,error,execution_time,run_id
count,5.0,5.0,0.0,5.0,5
unique,,,0.0,,5
top,,,,,9bb70685-e737-41d4-94f7-dcf5bdbe441e
freq,,,,,1
mean,0.7,1.0,,69.761311,
std,0.0,0.0,,31.022643,
min,0.7,1.0,,33.22086,
25%,0.7,1.0,,48.381462,
50%,0.7,1.0,,68.690345,
75%,0.7,1.0,,86.683496,


In [None]:
chain_results.to_dataframe()