## Generando respuestas
Este notebook tiene como objetivo leer uno o varios archivos generados en el notebook **explore_questions** con las estrategias mencionadas en el. Responderemos estas preguntas y diseñaremos promts que permitan responder desde el contexto de la reforma

In [1]:
import os
from llama_index.core.evaluation import DatasetGenerator, RelevancyEvaluator
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.llms.gradient import GradientBaseModelLLM
from llama_index.embeddings.gradient import GradientEmbedding
from llama_index.core import set_global_service_context
from llama_index.core import ServiceContext
import pandas as pd
import json
from tqdm import tqdm

In [29]:
import pickle

In [10]:
with open("/Users/jucampo/Desktop/Ideas/youtube_audios/credentials.json","r") as credentials:
    credentials_dict = json.load(credentials)
os.environ["OPENAI_API_KEY"] = credentials_dict['open_ai']
os.environ['GRADIENT_ACCESS_TOKEN'] = credentials_dict['gradient']
os.environ['GRADIENT_WORKSPACE_ID'] =  credentials_dict['gradient_id']

In [11]:
llm = GradientBaseModelLLM(
    base_model_slug="llama2-7b-chat",
    max_tokens=400,
)

In [None]:
embed_model = GradientEmbedding(
    gradient_access_token = os.environ["GRADIENT_ACCESS_TOKEN"],
    gradient_workspace_id = os.environ["GRADIENT_WORKSPACE_ID"],
    gradient_model_slug="bge-large",
)

In [None]:
service_context = ServiceContext.from_defaults(
    llm = llm,
    embed_model = embed_model,
    chunk_size=256,
)

set_global_service_context(service_context)

In [25]:
documents = SimpleDirectoryReader("/Users/jucampo/Desktop/Ideas/my_env/data/example_readers/reforma_salud/juntos").load_data()
print(f"Loaded {len(documents)} document(s).")
documents1 = SimpleDirectoryReader("/Users/jucampo/Desktop/Ideas/my_env/data/example_readers/reforma_salud/original").load_data()
print(f"Loaded {len(documents1)} document(s).")
documents2 = SimpleDirectoryReader("/Users/jucampo/Desktop/Ideas/my_env/data/example_readers/reforma_salud/cartilla").load_data()
print(f"Loaded {len(documents2)} document(s).")
documents3 = SimpleDirectoryReader("/Users/jucampo/Desktop/Ideas/my_env/data/example_readers/reforma_salud/texto_definitivo").load_data()
print(f"Loaded {len(documents3)} document(s).")

Loaded 426 document(s).
Loaded 291 document(s).
Loaded 32 document(s).
Loaded 103 document(s).


In [27]:
index = VectorStoreIndex.from_documents(documents,
                                        service_context=service_context)
query_engine = index.as_query_engine()

In [19]:
index1 = VectorStoreIndex.from_documents(documents1,
                                        service_context=service_context)
query_engine1 = index1.as_query_engine()

In [20]:
index2 = VectorStoreIndex.from_documents(documents2,
                                        service_context=service_context)
query_engine2 = index2.as_query_engine()

In [21]:
index3 = VectorStoreIndex.from_documents(documents3,
                                        service_context=service_context)
query_engine3 = index3.as_query_engine()

## Leemos los archivos limpiosde preguntas

In [30]:
with open('/Users/jucampo/Desktop/Ideas/my_env/notebooks/random_q.pkl', 'rb') as archivo:
    # Cargar el objeto desde el archivo
    l_df_questions_label = pickle.load(archivo)

In [33]:

set(l_df_questions_label[0]["questions"])

{'How does the document address the issue of healthcare access for rural or remote populations, and what are the proposed solutions to improve access?',
 'What are some of the challenges associated with measuring effectiveness in healthcare, and how can they be addressed?',
 'What are some potential challenges or limitations of implementing complementary healthcare systems, according to the author, and how can these be addressed?',
 'What are the key areas of focus for the Proyecto Evaluación y Reestructuración de los Procesos, Estrategias y Organismos Públicos y Privados, according to the document?',
 'What are the three key elements of continuous quality improvement, according to the authors of the article "Continuous quality improvement: educating towards a culture of clinical governance"?',
 "What is the author's position on the relationship between the market and the containment of healthcare costs, and how does this relate to the overall efficiency of the healthcare system?",
 'W

In [28]:
def aprox_pricing(response):
    tokens = response.response.split()

    # Contar el número de tokens
    num_tokens = len(tokens)

    print("Número de tokens:", num_tokens)
    print("pricing aprox:", num_tokens*0.0001*4000)
    return num_tokens

In [40]:

q_0_0 = l_df_questions_label[0]["questions"][0]
q_0_1  = l_df_questions_label[0]["questions"][1]

In [53]:
l_df_questions_label[-1]

Unnamed: 0,questions,embeddings_mean,label_4,distance_centroids,level
0,"What is the main goal of the ""Proyecto Evaluac...","[-0.1621012470319069, 0.15285321504675917, -0....",0,0.0,0
1,What is the main objective of the Programa de ...,"[-0.21453212617586057, -0.049616373516619204, ...",1,1.617482,2
2,What is the author's position on the relations...,"[0.16202743230639277, -0.09692060806461282, -0...",2,1.927308,2
3,How does the document describe the main purpos...,"[-0.09942214165268273, 0.044726147669656525, -...",3,0.0,0
4,How does the document address the issue of equ...,"[0.23716902352221633, -0.17365308227422444, -0...",4,1.81464,2
5,How does the demand for healthcare services af...,"[-0.000584186905104181, -0.26242404309627804, ...",5,1.873806,2
6,What is the relationship between the use of PI...,"[0.28301494143903255, -0.15226855009794235, -0...",6,1.820577,2
7,What are some of the challenges that healthcar...,"[0.2300226914906694, -0.2109784034471358, -0.1...",7,1.712071,2
8,What are the potential limitations of using ho...,"[0.3896677957640754, -0.19142921655266373, -0....",8,1.918868,2
9,What is the relationship between continuous qu...,"[0.22583956296809696, -0.057738541583107275, 0...",9,1.42697,2


In [57]:
l_q_a = []
l_q_a_aux = []
dict_q_a ={}
for df_q in tqdm(l_df_questions_label):
    for i_q in range(10):
        question = df_q["questions"][i_q]
        dict_q_a["question_"+str(i_q)]=question
        try:
            response = query_engine.query(question)
            response = response.response
        except Exception as e:
            print("pregunta "+str(i_q)+" sin contestar")
            response = " "
        dict_q_a["answer_"+str(i_q)]=response

    l_q_a.append(dict_q_a)

100%|██████████| 10/10 [08:36<00:00, 51.67s/it]


In [58]:
with open('random_q.pkl', 'wb') as f:
    pickle.dump(l_q_a, f)

{'question_0': 'What is the main goal of the "Proyecto Evaluación y Reestructuración de los Procesos, Estrategias y Organismos Públicos y Privados" in relation to the health system?',
 'answer_0': ' Based on the provided context information, the main goal of the "Proyecto Evaluación y Reestructuración de los Procesos, Estrategias y Organismos Públicos y Privados" in relation to the health system is to achieve agility, transparency, and modernization of the National Institute of Public Health (INVIMA) through a reorganization guided by principles of public innovation.\n\nThe project aims to strengthen the human and technological resource of INVIMA, improve cybersecurity, and optimize processes and procedures. Specifically, the project aims to make the registration of sanitary products indefinite with periodic and permanent monitoring, responding to the level of risk involved. Additionally, the project aims to automate modifications to the registration of sanitary products that do not af