# MLflow

Giskard, is available as a seamless plug-in with MLflow's `mlflow.evaluate()` API. With this integration,
you can effectively log comprehensive vulnerability reports through Giskard's scanning [capabilities](https://docs.giskard.ai/en/latest/guides/scan/index.html) directly onto the
MLflow platform. Furthermore, the integration facilitates metric logging, enabling you to compare the performance,
robustness, and even ethical bias of various ML models.

## Setup
The following requirements are necessary to use the plug-in:

- Install `mlflow` to access to the `mlflow.evaluate()` API.
- Install `giskard` (follow these [instructions](https://docs.giskard.ai/en/latest/guides/installation_library/index.html))
to access to the `giskard` evaluator.

After completing the installation process, you will be able to observe giskard as part of mlflow’s evaluators:

In [None]:
import mlflow
mlflow.models.list_evaluators() # ['default', 'giskard']

## Example notebook
This example demonstrates how to efficiently scan two LLMs for hidden vulnerabilities using Giskard and interpret the results within MLflow through just a few lines of code. The LLMs used are:

| Model          | Description | Max Tokens | Training data   |
|----------------| ----------- | ----------- |-----------------|
| `text-ada-001` | Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.| 2049 tokens | Up to Oct 2019  |
| `text-davinci-001` | Most capable GPT-3 model. Can do any task the other models can do, often with higher quality.| 2049 tokens | Up to Oct 2019  |

Based on the following simple prompt:

In [None]:
from langchain import PromptTemplate
prompt = PromptTemplate(template="Create a reader comment according to the following article summary: '{text}'",
                        input_variables=["text"])

We will populate 1000 article summaries from the following [dataset](https://github.com/sunnysai12345/News_Summary), that consists of 4515 examples gathered from Hindu, Indian times and Guardian. Time period ranges from February to august 2017.

In [None]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/sunnysai12345/News_Summary/master/news_summary_more.csv')
df_sample = pd.DataFrame(df["text"].sample(1000, random_state=11))

With the prompt and dataset in place, we are ready to move forward with evaluating and comparing the LLMs. First, make sure to set up your OpenAI API key:

In [None]:
import openai
openai.api_key = "YOUR_OPENAI_API_KEY"

The initial step involves loading the two models using the langchain library. Next, we log the models in mlflow, and finally, we proceed with the evaluation of each LLM separately using the Giskard evaluator.

In [None]:
import mlflow
import openai
from langchain import llms, LLMChain

models = ["text-ada-001", "text-davinci-001"]

for model in models:
    llm = llms.OpenAI(openai_api_key=openai.api_key,
                      request_timeout=20,
                      max_retries=100,
                      temperature=0,
                      model_name=model)

    chain = LLMChain(prompt=prompt, llm=llm)

    with mlflow.start_run(run_name=model):
        model_uri = mlflow.langchain.log_model(chain, "langchain").model_uri
        mlflow.evaluate(model=model_uri,
                        model_type="text",
                        data=df_sample,
                        evaluators="giskard")

After completing these steps, mlflow will generate a folder named `mlruns` that contains all the results. You can run `mlflow ui` from the directory where the `mlruns` folder is located, which will enable you to visualize the results. By accessing http://127.0.0.1:5000, you will be presented with the interface. There, you will find the two LLMs logged as separate runs for comparison and analysis.

## Plug-in parameters

The configuration of the giskard evaluator can be done entirely through the `evaluator_config` argument that can yield 3 keys:

- `model_config`: to be filled according to this [page](https://docs.giskard.ai/en/latest/reference/models/index.html).
- `dataset_config`: to be filled according to this [page](https://docs.giskard.ai/en/latest/reference/datasets/index.html).
- `scan_config`: to be filled according to this [page](https://docs.giskard.ai/en/latest/reference/scan/index.html).

As an example:

In [None]:
evaluator_config = {"model_config":   {"classification_labels": ["no", "yes"]},
                    "dataset_config": {"name": "Articles"},
                    "scan_config":    {"params": {"text_perturbation": {"num_samples": 1000}}}}
mlflow.evaluate(model=model_uri,
                model_type="text",
                data=df_sample,
                evaluators="giskard",
                evaluator_config=evaluator_config)

## Logging giskard objects to MLflow
It is possible to log 4 giskard objects into MLflow:

- A giskard [dataset](https://docs.giskard.ai/en/latest/guides/wrap_dataset/index.html)
- A giskard [model](https://docs.giskard.ai/en/latest/guides/wrap_model/index.html)
- The [scan](https://docs.giskard.ai/en/latest/guides/scan/index.html) results
- The [test-suite](https://docs.giskard.ai/en/latest/guides/scan/index.html) results

Here are two options on how to achieve this.

### Option 1 (via the fluent API)

In [None]:
import mlflow

with mlflow.start_run() as run:
    giskard_model.to_mlflow()
    giskard_dataset.to_mlflow()
    scan_results.to_mlflow()
    test_suite_results.to_mlflow()

### Option 2 (via MlflowClient)

In [None]:
from mlflow import MlflowClient

client = MlflowClient()
experiment_id = "0"
run = client.create_run(experiment_id)

giskard_model.to_mlflow(client, run.info.run_id)
giskard_dataset.to_mlflow(client, run.info.run_id)
scan_results.to_mlflow(client, run.info.run_id)
test_suite_results.to_mlflow(client, run.info.run_id)