### Configuration

* The **Opik** platform can either be hosted locally in a container or a cluster: [local hosting](https://www.comet.com/docs/opik/self-host/overview)
* Alternatively, there's a cloud based platform free of charge: [cloud version](https://www.comet.com/signup).
    * When using the cloud version: an API key and a workspace need to be specified.
    * Similarly to **DeepEval**, the cloud version of **Opik** has an intuitive UI, where datasets, evaluations results and more can be stored.
    * For the cloud based approach make sure to add the following to your `.env` file in the parent folder:
    ```bash
        OPIK_API_KEY=<your-api-key>
        OPIK_WORKSPACE=<your-workspace>
        OPIK_PROJECT_NAME=<project-name> # Setting this will automatically log traces for the project (Optional)
        OPIK_USAGE_REPORT_ENABLED=false  # Disable telemetry (Optional)
        # By default creates a file called ~/.opik.config (on Linux) if it doesn't exist
        OPIK_CONFIG_PATH=<filepath-to-your-config-file>  # Overwrite the file location (Optional)
    ```

In [1]:
import os
import opik
from dotenv import load_dotenv
from opik.exceptions import ConfigurationError

load_dotenv("../.env")

try:
    # If you're using the locally hosted version set the `use_local=True and provide the url`
    opik.configure(
        api_key=os.getenv("OPIK_API_KEY"),
        workspace=os.getenv("OPIK_WORKSPACE")
    )
except (ConfigurationError, ConnectionError) as ce:
    print(f"Error occurred: {ce}")

OPIK: Opik is already configured. You can check the settings by viewing the config file at /home/p3tr0vv/Desktop/Evaluation-Approaches-for-Retrieval-Augmented-Generation-RAG-/evaluation/opik/.opik.config


### LLM and tracing

* **Opik** uses **OpenAI** as the LLM-provider by default. To overwrite that create a `LiteLLMChatModel` instance with the model you want to use and specify your [input parameters](https://docs.litellm.ai/docs/completion/input).
* If you want to add tracing capabilities so that all calls to `litellm` are traced to the **Opik** platform create the `OpikLogger` and set the `litellm` callbacks (Optional).

In [2]:
import os
import litellm
from dotenv import load_dotenv
from opik.evaluation.models import LiteLLMChatModel
from litellm.integrations.opik.opik import OpikLogger

load_dotenv("../../env/rag.env")

# https://docs.litellm.ai/docs/completion/input
eval_model = LiteLLMChatModel(
    model_name=f"ollama/{os.getenv("CHAT_MODEL")}",
    temperature=float(os.getenv("TEMPERATURE")),
    top_p=float(os.getenv("TOP_P")),
    response_format={
        "type": "json_object"
    },
    api_base="http://localhost:11434",
    num_retries=3,
)

# This will trace all calls submitted to the LLM to Opik (Optional)
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]

### Evaluation

For an evaluation/experiment in **Opik** the following things are required:
- a dataset
    - **Opik** supports datasets, which are a collection of key value pairs.
    - Datasets can be created and deleted.
- an evaluation task
    - maps a dataset item/sample to a dictionary object, which is submitted as a parameter to the `score` method if a metric.
- a set of metrics
    - the ones of relevance for the project are the LLM-as-ajudge one
    - additionally, one can overwrite the `BaseMetric` class for a custom metric

In [3]:
from dotenv import load_dotenv
from typing import Dict, Any, List
from opik import Dataset
from opik.api_objects.dataset.rest_operations import ApiError

load_dotenv("../.env")

# Create an `Opik`` client for interacting with the platform
opik_client = opik.Opik(
    project_name=os.getenv("OPIK_PROJECT_NAME"),
    workspace=os.getenv("OPIK_WORKSPACE"),
    api_key=os.getenv("OPIK_API_KEY"),
)

try:
    # Fetch the dataset
    opik_dataset: Dataset = opik_client.get_dataset(name=os.getenv("DATASET_ALIAS"))
except ApiError as ae:
    # If not available fetch it from `DeepEval`
    # Convert it into a list of Opik Dataset Items and upload to `Opik`
    from deepeval.dataset import EvaluationDataset
    from deepeval import login_with_confident_api_key
    
    print(f"{ae.status_code}: {ae.body['errors']}")
    print(f"Fetching from DeepEval and then uploading on the Opik Platform")
    
    login_with_confident_api_key(os.getenv("DEEPEVAL_API_KEY"))
    deepeval_dataset = EvaluationDataset()
    deepeval_dataset.pull(
        alias=os.getenv("DATASET_ALIAS"),
        auto_convert_goldens_to_test_cases=True
    )
    
    opik_dataset: Dataset = opik_client.create_dataset(
        name=os.getenv("DATASET_ALIAS"),
        description="Evaluation dataset from DeepEval"
    )
    
    opik_dataset_items: List[Dict[str, Any]] = [vars(test_case) for test_case in deepeval_dataset.test_cases]
    opik_dataset.insert(opik_dataset_items)
    

In [4]:
from opik.api_objects.dataset.dataset_item import DatasetItem

# This function is used during evaluation
# For each item in the dataset, this function will be called
# The output of the function is a dictionary containing the relevant parameters for the metrics
def evaluation_task(item: DatasetItem) -> Dict[str, Any]:
    return {
        "input": item['input'],
        "output": item['actual_output'],
        "expected_output": item['expected_output'],
        "context": item['context']
    }

In [None]:
from opik.evaluation.metrics import Hallucination
from opik.evaluation.metrics.llm_judges.hallucination.template import generate_query # TODO: Override the prompt template

hallucination_metric = Hallucination(
    model=eval_model,
    project_name=os.getenv("OPIK_PROJECT_NAME")
)

In [6]:
from opik.evaluation import evaluate
from opik.evaluation.evaluation_result import EvaluationResult

eval_res: EvaluationResult = evaluate(
    dataset=opik_dataset,
    task=evaluation_task,
    scoring_metrics=[hallucination_metric],
    experiment_name="First evaluation ever using Opik",
    project_name=os.getenv("OPIK_PROJECT_NAME"),
    #experiment_config={}
)

Evaluation:   0%|          | 0/48 [00:00<?, ?it/s]OPIK: Started logging traces to the "Evaluation Approaches for RAG" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=019691d7-a282-716b-a3b2-a145dc28568b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
Evaluation: 100%|██████████| 48/48 [31:18<00:00, 39.14s/it]
