# 📓 _LangChain_ Quickstart

In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/langchain_quickstart.ipynb)

## Setup
### Add API keys
For this quickstart you will need Open AI and Huggingface keys

In [13]:
!pip install trulens_eval openai langchain chromadb langchainhub bs4 tiktoken litellm bitsandbytes accelerate sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-2.6.1-py3-none-any.whl.metadata (11 kB)
Downloading sentence_transformers-2.6.1-py3-none-any.whl (163 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.3/163.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-2.6.1
[0m

In [3]:
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
from langchain import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings
import torch

In [4]:
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
    device_map="auto",
    quantization_config=quantization_config
)

generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.0001
generation_config.top_p = 0.80
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    generation_config=generation_config,
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

### Import from LangChain and TruLens

In [48]:
# Imports main tools:
from trulens_eval.feedback.provider.litellm import LiteLLM
from trulens_eval import TruChain, Tru, Feedback
from trulens_eval.app import App
litellm_provider = LiteLLM(model_engine = "huggingface/mistralai/Mistral-7B-Instruct-v0.2")
tru = Tru()
tru.reset_database()

# Imports from LangChain to build app
import bs4
from langchain import hub
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_core.runnables import RunnablePassthrough

### Load documents

In [11]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

### Create Vector Store

In [14]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=HuggingFaceEmbeddings(model_name="thenlper/gte-large",
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True}
))

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

### Create RAG

In [15]:
retriever = vectorstore.as_retriever()

prompt = hub.pull("rlm/rag-prompt")
llm = HuggingFacePipeline(pipeline=pipeline,)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Send your first request

## Initialize Feedback Function(s)

In [30]:
# select context to be used in feedback. the location of context is app specific.
context = App.select_context(rag_chain)
f_answer_relevance = (
    Feedback(litellm_provider.relevance)
    .on_input_output()
)

✅ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .


## Instrument chain for logging with TruLens

In [31]:
chain_recorder = TruChain(rag_chain, app_id="test_mistral", feedbacks=[f_answer_relevance])


In [42]:
# Needed to register the model for working the relevance but the tokens, cost, n_requests doesn't work

from litellm import register_model

#I override with random values for test
register_model({
        "mistralai/Mistral-7B-Instruct-v0.2": {
        "max_tokens" : 32000,
        "input_cost_per_token": 50,
        "output_cost_per_token": 0.000000,
        "litellm_provider" : "huggingface",
        "mode":"chat"
    },
})

{'mistralai/Mistral-7B-Instruct-v0.2': {'max_tokens': 32000,
  'input_cost_per_token': 50,
  'output_cost_per_token': 0.0,
  'litellm_provider': 'huggingface',
  'mode': 'chat'}}

In [49]:
#Get token for the provider
import os
os.environ["HUGGINGFACE_API_KEY"] = "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"

with chain_recorder as recording:
    llm_response = rag_chain.invoke("What is Task Decomposition?")

display(llm_response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


' Task decomposition is a method used by an autonomous agent system, such as a language model (LLM), to break down complex tasks into smaller, manageable ones. This can be achieved through various methods including simple prompting, task-specific instructions, or human inputs. For example, Chain of Thought and Tree of Thoughts techniques can be used to decompose problems into multiple thought steps and generate multiple thoughts per step, respectively. The AI assistant uses these tasks to create a plan for handling user requests.'

## Retrieve records and feedback

In [50]:
# The record of the app invocation can be retrieved from the `recording`:

rec = recording.records[0]
# recs = recording.records # use .records if multiple

display(rec)

Record(record_id='record_hash_1e2948154aecf61525fc2dd8bbdb23bc', app_id='test_mistral', cost=Cost(n_requests=0, n_successful_requests=0, n_classes=0, n_tokens=0, n_stream_chunks=0, n_prompt_tokens=0, n_completion_tokens=0, cost=0.0), perf=Perf(start_time=datetime.datetime(2024, 4, 15, 21, 35, 18, 903508), end_time=datetime.datetime(2024, 4, 15, 21, 35, 28, 756624)), ts=datetime.datetime(2024, 4, 15, 21, 35, 28, 756748), tags='-', meta=None, main_input='What is Task Decomposition?', main_output=' Task decomposition is a method used by an autonomous agent system, such as a language model (LLM), to break down complex tasks into smaller, manageable ones. This can be achieved through various methods including simple prompting, task-specific instructions, or human inputs. For example, Chain of Thought and Tree of Thoughts techniques can be used to decompose problems into multiple thought steps and generate multiple thoughts per step, respectively. The AI assistant uses these tasks to create 

In [51]:
# The results of the feedback functions can be rertireved from
# `Record.feedback_results` or using the `wait_for_feedback_result` method. The
# results if retrieved directly are `Future` instances (see
# `concurrent.futures`). You can use `as_completed` to wait until they have
# finished evaluating or use the utility method:

for feedback, feedback_result in rec.wait_for_feedback_results().items():
    print(feedback.name, feedback_result.result)

# See more about wait_for_feedback_results:
# help(rec.wait_for_feedback_results)

relevance 1.0


In [52]:
records, feedback = tru.get_records_and_feedback(app_ids=["test_mistral"])

records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,relevance,relevance_calls,latency,total_tokens,total_cost
0,test_mistral,"{""tru_class_info"": {""name"": ""TruChain"", ""modul...",RunnableSequence(langchain_core.runnables.base),record_hash_1e2948154aecf61525fc2dd8bbdb23bc,"""What is Task Decomposition?""",""" Task decomposition is a method used by an au...",-,"{""record_id"": ""record_hash_1e2948154aecf61525f...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-04-15T21:35:18.903508"", ""...",2024-04-15T21:35:28.756748,1.0,[{'args': {'prompt': 'What is Task Decompositi...,9,0,0.0


## Explore in a Dashboard

In [None]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.