## Evidently configuration Guide

This notebook demonstrates the basic usage of the `evidently` library. We'll cover:

- Logging test cases  
- Running evaluations  
- Viewing and saving results locally  
- Evaluating Evidently metrics through the Trace metrics API


In [None]:
!pip install sentence-transformers
!pip install -U evidently pandas

In [2]:
import evidently
print(evidently.__version__)

0.7.9


### Imports Explained

This script uses the **Evidently** library to evaluate LLM (Large Language Model) outputs using a variety of built-in descriptors. Here’s a breakdown of what each import does:


#### Evidently Core

```python
from evidently import Dataset
from evidently import DataDefinition
```

- **`Dataset`**: Represents the input data for evaluation (typically a `pandas.DataFrame`).
- **`DataDefinition`**: Defines the structure of the dataset (e.g., which column contains the output text, reference, context, etc.).

---

#### Evidently Descriptors

```python
from evidently.descriptors import (
    DeclineLLMEval,
    Sentiment,
    TextLength,
    NegativityLLMEval,
    PIILLMEval,
    BiasLLMEval,
    ToxicityLLMEval,
    ContextQualityLLMEval,
    ContextRelevance
)
```

These are **prebuilt descriptors** that evaluate specific aspects of LLM-generated responses:

- **`DeclineLLMEval`**: Detects if the model declined to answer (e.g., refused or deferred).
- **`Sentiment`**: Analyzes the sentiment (positive, neutral, or negative) of the output.
- **`TextLength`**: Measures the number of words or characters in the text.
- **`NegativityLLMEval`**: Evaluates the level of negativity in the generated text.
- **`PIILLMEval`**: Detects whether Personally Identifiable Information (PII) is present.
- **`BiasLLMEval`**: Detects possible social, cultural, or political biases in responses.
- **`ToxicityLLMEval`**: Identifies toxic or harmful content in the output.
- **`ContextQualityLLMEval`**: Measures how well the context is used or preserved in the output.
- **`ContextRelevance`**: Evaluates whether the response is relevant to the provided context.

---

These descriptors allow automated, explainable evaluation of LLM responses using a consistent and extensible framework.

