# Apresentação ✒️

O framework Ragas, um framework que ajuda no processo de avaliação de sistemas de RAG, buscando compreender a eficácia da recuperação da informação recuperada em algum banco de dados (vetorial ou baseado em grafos).

A métrica de avaliação se consiste na capacidade do modelo conseguir responder, com base no RAG elaborado, as consultas do usuário, de modo que é avaliado a fidelidade da informação, resposta correta e outros atributos, extraídos do texto utilizado pelo uso de um outro modelo de linguagem generativo.

Além de prover a capacidade de avaliação, o framework disponibiliza uma geração dos parâmetros de análise utilizados para a etapa de avaliação a partir do texto de informação, criando as features necessárias. Nesse notebook em especial, o estudo se direciona a implementação do sistema de avaliação oferecido pelo framework.  

Como se trata de apenas a implementação do sistema de avaliação, o processo de geração do RAG não está sendo efetivamente implementado, de modo que as possíveis respostas que o modelo de linguagem generativa poderia gerar já se encontram no dataset que será carregado.

## Bibliotecas 📚

In [1]:
!pip install datasets -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.3/474.3 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.
ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 17.0.0 which is incompatible.[0m[31m
[0m

In [2]:
!pip install ragas -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m855.6 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m185.7/185.7 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.1/374.1 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.1/71.1 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m405.1/405.1 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.5/51.5 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
!pip install google-generativeai -q

In [16]:
!pip install langchain -q
!pip install langchain_google_genai -q

In [5]:
import warnings
warnings.filterwarnings('ignore')

In [6]:
import os
import getpass

In [24]:
import google.auth

import google.generativeai as genai
from ragas.llms import LangchainLLMWrapper
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings


In [8]:
from datasets import load_dataset

In [29]:
from ragas import evaluate

from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
    answer_similarity,
    answer_correctness,
)
from ragas.metrics.critique import harmfulness

## Carregando o dataset 💾

In [10]:
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")

amnesty_qa

amnesty_qa.py:   0%|          | 0.00/5.72k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/1.90k [00:00<?, ?B/s]

The repository for explodinggradients/amnesty_qa contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/explodinggradients/amnesty_qa.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Repo card metadata block was not found. Setting CardData to empty.


english.json:   0%|          | 0.00/70.8k [00:00<?, ?B/s]

Generating eval split: 0 examples [00:00, ? examples/s]

DatasetDict({
    eval: Dataset({
        features: ['question', 'ground_truth', 'answer', 'contexts'],
        num_rows: 20
    })
})

## Visualizando o dataset carregado.

In [11]:
"""
O dataset carregado retorna como suas features 4 itens.
Ele segue o padrão realizado pelo Ragas, framework de aferição
da resposta de modelos generativos de linguagem, de modo que
em relação a um determinado documento elabora um conjunto de
perguntas, verdade fundamentada, resposta e contexto, respectivamente.
"""

amnesty_qa

DatasetDict({
    eval: Dataset({
        features: ['question', 'ground_truth', 'answer', 'contexts'],
        num_rows: 20
    })
})

In [12]:
"""
Visualizando os primeiros 2 trechos de cada item.
"""

lista_features = ['question', 'ground_truth', 'answer', 'contexts']

for word in lista_features:
  print(f'Os 5 primeiros trechos de {word} : \n')
  print(amnesty_qa['eval'][word][:2])
  print('\n')




Os 5 primeiros trechos de question : 

['What are the global implications of the USA Supreme Court ruling on abortion?', 'Which companies are the main contributors to GHG emissions and their role in global warming according to the Carbon Majors database?']


Os 5 primeiros trechos de ground_truth : 

["The global implications of the USA Supreme Court ruling on abortion are significant. The ruling has led to limited or no access to abortion for one in three women and girls of reproductive age in states where abortion access is restricted. These states also have weaker maternal health support, higher maternal death rates, and higher child poverty rates. Additionally, the ruling has had an impact beyond national borders due to the USA's geopolitical and cultural influence globally. Organizations and activists worldwide are concerned that the ruling may inspire anti-abortion legislative and policy attacks in other countries. The ruling has also hindered progressive law reform and the imple

## Configurando os modelos utilizados ⚙️

In [15]:
# Definindo as variáveis de ambiente :

os.environ['GOOGLE_API_KEY'] = getpass.getpass()

genai.configure(api_key=os.environ.get('GOOGLE_API_KEY'))

··········


In [41]:
# Instanciando o modelo Gemini utilizado.

model = ChatGoogleGenerativeAI(
    model = 'gemini-1.5-pro-latest'
)

# Instanciando o modelo de Embedding utilizado.

embeddings = GoogleGenerativeAIEmbeddings(model='models/text-embedding-004')

In [23]:
# Testando a integração via API com o modelo Gemini.

model.invoke('Olá, como você está?').content

'Como um modelo de linguagem, não tenho sentimentos como humanos.  Mas estou aqui para te ajudar no que precisar! O que você gostaria de fazer hoje? \n'

In [27]:
# Testando a conexão com o modelo de embedding utilizado.

vector = embeddings.embed_query('Bring Me The Horizon')

"""
Visualizando os 5 primeiros vetores gerados após
o processo de embedding ao qual o texto informado foi passado.
"""

vector[:5]

[0.04387560859322548,
 -0.05744573101401329,
 -0.03193361684679985,
 -0.009691059589385986,
 0.049376215785741806]

## Avaliação

In [32]:
# Enumerando em lista as métricas que irão ser utilizadas
# para a análise das respostas presentes no dataset.

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_similarity,
    answer_correctness,
    ]

In [42]:
result = evaluate(
    amnesty_qa['eval'].select(range(1)),
    metrics = metrics,
    llm = LangchainLLMWrapper(model),
    embeddings = embeddings
)

Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[1]: TypeError(GenerativeServiceAsyncClient.generate_content() got an unexpected keyword argument 'temperature')
ERROR:ragas.executor:Exception raised in Job[0]: TypeError(GenerativeServiceAsyncClient.generate_content() got an unexpected keyword argument 'temperature')
ERROR:ragas.executor:Exception raised in Job[5]: TypeError(GenerativeServiceAsyncClient.generate_content() got an unexpected keyword argument 'temperature')
ERROR:ragas.executor:Exception raised in Job[2]: TypeError(GenerativeServiceAsyncClient.generate_content() got an unexpected keyword argument 'temperature')
ERROR:ragas.executor:Exception raised in Job[2]: TypeError(GenerativeServiceAsyncClient.generate_content() got an unexpected keyword argument 'temperature')
ERROR:ragas.executor:Exception raised in Job[0]: TypeError(GenerativeServiceAsyncClient.generate_content() got an unexpected keyword argument 'temperature')
ERROR:ragas.executor:Exception raised in Job[3]: TypeError

In [44]:
# Resultado encontrado.

result

{'faithfulness': nan, 'answer_relevancy': nan, 'context_recall': nan, 'context_precision': nan, 'answer_similarity': 0.9379, 'answer_correctness': nan}