<a href="https://colab.research.google.com/github/Vaibhav-sa30/Evaluating-LLM-Models-using-TruEra/blob/main/Evaluating_LLM_Models_using_TruEra.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. **!pip install -U trulens-eval:** This command is used to install or upgrade the Python package named "trulens-eval". The -U flag indicates that you want to upgrade the package if it's already installed, and trulens-eval is the name of the package you're installing.

2. **!npm install localtunnel -q:** This command, typically used in a Node.js environment, installs the "localtunnel" package. "localtunnel" is a tool that allows you to expose a local web server to the internet, making it accessible remotely. The -q flag suppresses the output, making the installation process less verbose.

3. **!pip install -q streamlit==1.13.0:** This command installs a specific version (1.13.0) of the "streamlit" Python package. "Streamlit" is a framework used for creating web applications with Python scripts. The -q flag, similar to the previous command, suppresses output during installation.

In [None]:
!pip install -U trulens-eval

# Google Colab Dependencies
!npm install localtunnel -q
!pip install -q streamlit==1.13.0

# Langchain Quickstart

In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response.

## Setup
### Add API keys
For this quickstart you will need Open AI and Huggingface keys

- *os* module provides functions for interacting with the operating system
- set two environment variables, OPENAI_API_KEY and HUGGINGFACE_API_KEY

\\
####What is environment variable?

Environment variables are a way to store data that can be accessed by any program running on the system. In this case, the environment variables are storing your API keys for OpenAI and Huggingface, which are two natural language processing APIs.

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["HUGGINGFACE_API_KEY"] = "..."

### Import from LangChain and TruLens

- **IPython.display module** provides functions for displaying data in IPython notebooks
- **trulens_eval module** provides functions for evaluating the performance of language models.
- trulens_eval module also includes a class called **TruChain**, which represents a language model chain
-  The Feedback class is used to collect feedback from users about the performance of a language model chain.
- The Huggingface class is used to interact with the Huggingface API.
- The Tru class is a helper class that provides some utility functions.
- The LLMChain class represents a language model chain that uses an LLM (large language model) as its underlying model.
- The OpenAI class is a subclass of LLM that represents an OpenAI LLM.
- The ChatPromptTemplate class is a class that represents a chat prompt template.
- The HumanMessagePromptTemplate class is a subclass of ChatPromptTemplate that represents a chat prompt template that expects a human message as input.

In [None]:
from IPython.display import JSON

# Imports main tools:
from trulens_eval import TruChain, Feedback, Huggingface, Tru
tru = Tru()

# Imports from langchain to build app. You may need to install langchain first
# with the following:
# ! pip install langchain>=0.0.170
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts.chat import ChatPromptTemplate, PromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate

### Create Simple LLM Application

This example uses a LangChain framework and OpenAI LLM

In [None]:
#We first create a HumanMessagePromptTemplate object, which represents a chat prompt template that expects a human message as input.

full_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template=
        "Provide a helpful response with relevant background information for the following: {prompt}",  #The {prompt} placeholder is used to indicate where the human message will be inserted.
        input_variables=["prompt"],
    )
)

# Next, we create a ChatPromptTemplate object from the HumanMessagePromptTemplate object, which represents a chat prompt that can be used with a language model chain.
chat_prompt_template = ChatPromptTemplate.from_messages([full_prompt])

# Finally, we create an OpenAI object and an LLMChain object
llm = OpenAI(temperature=0.9, max_tokens=128) #The OpenAI object represents an OpenAI LLM (large language model)

#The LLMChain object represents a language model chain that uses an LLM as its underlying model.

chain = LLMChain(llm=llm, prompt=chat_prompt_template, verbose=True)

#The LLMChain object is initialized with the llm object, the chat_prompt_template object, and the verbose flag. The verbose flag is set to True to indicate that the chain should print debugging information.

### Send your first request

In [None]:
prompt_input = '¿que hora es?'

In [None]:
llm_response = chain(prompt_input)

display(llm_response)

## Initialize Feedback Function(s)

In [None]:
# Initialize Huggingface-based feedback function collection class:
hugs = Huggingface()

# Define a language match feedback function using HuggingFace.

#READ THE DOC AT THIS POINT AND EXPLAIN HOW LANGUAGE MATCH HAPPENS ON INPUT AND OUTPUT

f_lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.

## Instrument chain for logging with TruLens

In [None]:
truchain = TruChain(chain,
    app_id='Chain1_ChatApplication',
    feedbacks=[f_lang_match],
    tags = "prototype")

In [None]:
# Instrumented chain can operate like the original:
llm_response = truchain(prompt_input)

display(llm_response)

## Explore in a Dashboard

In [None]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

### Chain Leaderboard

Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up.

Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best).

![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png)

To dive deeper on a particular chain, click "Select Chain".

### Understand chain performance with Evaluations

To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more.

The evaluations tab provides record-level metadata and feedback on the quality of your LLM application.

![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png)

### Deep dive into full chain metadata

Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain.

![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png)

If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [None]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all