# Pipeline de evaluación de las respuestas de Compliance

Este proceso evalúa las respuestas generadas con el RAG para validar el compliance primero respondiendo a las preguntas generadas en el paso anterior y comparando con la respuesta que se debería haber obtenido a partir de la documentación. Para esto vamos a utilizar un framework de evaluación con un LLM-as-a-judge que permite evaluar tanto el retrieval (RAG) como la generación de las respuestas.  

## RAGAS

Este framework permite evaluar los pipelines de Retrieval Augmented Generation (RAG).

In [2]:
import sys
import os

# Get the absolute path to the src directory
src_path = os.path.abspath('..')

# Add the src directory to the sys.path
if src_path not in sys.path:
    sys.path.insert(0, src_path)

# Set the __package__ attribute to simulate running as a package
__package__ = 'src'

In [11]:
import requests
import json

# Vamos a generar una query al endpoint de http://localhost:6000/questions-from-standard para obtener las preguntas de un estándar en particular

# Definimos el endpoint
url = "http://localhost:6000/questions-from-standard"

# Realizamos la petición
response = requests.get(url)


In [12]:
questions = json.loads(response.text)
questions = questions['questions']
questions[0:4]

['1. Has the borrower assessed the presence and impact of hazardous materials in the project activities?',
 '2. Does the project involve significant pest management issues, and if so, has a Pest Management Plan (PMP) been prepared?',
 "3. Has the borrower considered the project's proximity to areas of importance to biodiversity?",
 '4. Is the borrower complying with all relevant environmental requirements?']

In [3]:
# Now you can import your modules
from src import config
from src.models import Models

[32m2024-08-07 17:11:02.641[0m | [1mINFO    [0m | [36msrc.config[0m:[36m<module>[0m:[36m13[0m - [1mSecrets loaded from .env[0m


In [None]:
llm, embed_model = Models.initialize_models()

test_embedding = embed_model.get_text_embedding(
    "Open AI new Embeddings models is great."
)

print(test_embedding[0:4])
print(test_embedding.shape)


In [6]:
from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# list of metrics we're going to use
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    harmfulness,
]


from ragas.run_config import RunConfig
thread_timeout = 240
timeout = 240
max_retries = 10
max_wait = 120
run_config = RunConfig(
        timeout = timeout,
        max_retries=max_retries, 
        # thread_timeout=thread_timeout
        )

In [7]:
from .index import IndexManager
storage_dir = "../storage_vector_store"
index = IndexManager.load_index(storage_dir)

In [57]:
import requests
import json
async def get_answers_from_project(query):
    # Definimos el endpoint
    url = "http://localhost:8540/query-project"

    json_data = {
        "query": query,
    }

    # Realizamos la petición
    response = requests.post(url, json=json_data)
    return json.loads(response.text)
# Definimos el endpoint

response_json = get_answers_from_project("Jindal?")

In [46]:
response_json = json.loads(response.text)
import pprint

print("Response: ")
pprint.pprint(response_json["response"]["response"])
print("Source nodes: ")
pprint.pprint(response_json["response"]["source_nodes"][0]['node']["text"])
pprint.pprint(response_json["response"]["source_nodes"][1]['node']["text"])

Response: 
('Jindal Iron Ore (Pty) Ltd is involved in a project known as the Jindal MIOP, '
 'which aims to contribute to economic development and inclusive growth '
 'through revenue and tax generation, as well as the creation of employment '
 'opportunities. The project aligns with various strategic frameworks and '
 'policies, including the National Development Plan (NDP) and the Provincial '
 'Growth Development Strategy (PGDS) of KwaZulu-Natal (KZN). The project also '
 'emphasizes social investment and local economic development, particularly '
 'benefiting the communities directly affected by its establishment.')
Source nodes: 
('Jindal Iron Ore (Pty) Ltd SLR Project No: 720.10023.00001 \n'
 'Jindal MIOP EIA & EMPr   July 2023 \n'
 ' \n'
 ' \n'
 ' \n'
 ' 74  \n'
 ' \n'
 'Jindal Iron Ore Mine ESIA and EMPr - 09072023 FINAL Importantly, the NDP '
 'notes that while minerals beneficiation is a good way to increase '
 'productivity and export \n'
 'revenues and stimulate the develop

In [44]:
expert_questions = ["Energy Efficiency: What measures are in place to ensure the efficient use of energy in mining operations? Are there any energy-saving technologies or practices being implemented?",
                    "Water Management: How is water usage monitored and managed to ensure efficiency? Are there any systems in place for recycling or reusing water within the mining operations?",
                    "Raw Material Usage: What steps are being taken to optimize the use of raw materials and minimize waste? Are there any benchmarking data available to compare the mine's resource efficiency with industry standards?",
                    "Air Quality Management: What measures are in place to monitor and control air emissions, including greenhouse gases and particulate matter, from mining activities?",
                    "Water Pollution Control: How is wastewater from mining operations treated before being discharged? Are there any measures to prevent contamination of local water bodies?",
                    "Soil and Land Management: What practices are in place to prevent soil erosion and land degradation due to mining activities? Are there any land reclamation plans post-mining?",
                    "Hazardous Waste Management: How is hazardous waste generated by the mine identified, stored, and disposed of? Are there any protocols for handling spills or accidental releases?",
                    "Non-Hazardous Waste Management: What systems are in place for the management of non-hazardous waste, including recycling and reduction initiatives?",] 

In [59]:
df = pd.DataFrame()
for question in expert_questions:
    response_json = await get_answers_from_project(question)
    print("Question: ", question)
    print("Response: ")
    pprint.pprint(response_json["response"]["response"])
    print("Source nodes: ")
    pprint.pprint(response_json["response"]["source_nodes"][0]['node']["text"])
    pprint.pprint(response_json["response"]["source_nodes"][1]['node']["text"])
    print("")

    # now I want to save this into an excel file 
    import pandas as pd
    import numpy as np
    
    df_ = pd.DataFrame({
        'Question': [question],
        'Response': [response_json["response"]["response"]],
        'Source node 1': [response_json["response"]["source_nodes"][0]['node']["text"]],
        'Source node 2': [response_json["response"]["source_nodes"][1]['node']["text"]],
    })

    df = pd.concat([df,df_], axis = 0, ignore_index = True)

    # Now we can save the dataframe into an excel
    df.to_excel("project-compliance-validation.xlsx", index = False)

    # Create a Pandas Excel writer using XlsxWriter as the engine.
    # writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')




  response_json = await get_answers_from_project(question)


Question:  Energy Efficiency: What measures are in place to ensure the efficient use of energy in mining operations? Are there any energy-saving technologies or practices being implemented?
Response: 
('To ensure the efficient use of energy in mining operations, several measures '
 'are in place. These include the decarbonisation of the electricity supply, '
 'which can be achieved through the integration of renewable energy sources '
 'into the national grid or the installation of on-site renewable energy '
 'systems. Additionally, the electrification of the mine vehicle fleet is '
 'being considered to reduce fuel consumption in mobile machinery. Regular '
 'servicing of vehicles and machinery is also mandated to maintain optimal '
 'fuel efficiency.')
Source nodes: 
('Engineer/ EO Visual inspection \n'
 'Grievance mechanism Bi-annually throughout \n'
 'operational phase \n'
 '12 \uf0b7 Use of construction \n'
 'equipment and other \n'
 'machinery during \n'
 'construction activities

In [60]:
df

Unnamed: 0,Question,Response,Source node 1,Source node 2
0,Energy Efficiency: What measures are in place ...,To ensure the efficient use of energy in minin...,Engineer/ EO Visual inspection \nGrievance mec..., The planned mining activities should further...
1,Water Management: How is water usage monitored...,Water usage is monitored and managed through s...,Jindal Iron Ore (Pty) Ltd SLR Project No: 720....,Jindal Iron Ore (Pty) Ltd \nJindal MIOP EIA & ...
2,Raw Material Usage: What steps are being taken...,Steps being taken to optimize the use of raw m...,Engineer/ RE/ EO/ \nEnvironmental Manager Sign...,Engineer/ EO Visual inspection \nGrievance mec...
3,Air Quality Management: What measures are in p...,Measures to monitor and control air emissions ...,Jindal Iron Ore (Pty) Ltd SLR Project No: 720....,Jindal Iron Ore (Pty) Ltd SLR Project No: 720...
4,Water Pollution Control: How is wastewater fro...,Wastewater from mining operations is treated t...,Jindal Iron Ore (Pty) Ltd SLR Project No: 720....,Jindal Iron Ore (Pty) Ltd SLR Project No: 720....
5,Soil and Land Management: What practices are i...,To prevent soil erosion and land degradation d...,Jindal Iron Ore (Pty) Ltd SLR Project No: 720...,Jindal Iron Ore (Pty) Ltd \nJindal MIOP EIA & ...
6,Hazardous Waste Management: How is hazardous w...,Hazardous waste generated by the mine is manag...,Jindal Iron Ore (Pty) Ltd \nJindal MIOP EIA & ...,Jindal Iron Ore (Pty) Ltd \nJindal MIOP EIA & ...
7,Non-Hazardous Waste Management: What systems a...,The management of non-hazardous waste includes...,Jindal Iron Ore (Pty) Ltd SLR Project No: 720....,Jindal Iron Ore (Pty) Ltd SLR Project No: 720....
