In [1]:
%cd ..

/home/eak/Documents/AI/LLMChat/medivocate


In [2]:
import os
from glob import glob
from src.rag_pipeline.rag_system import RAGSystem

In [3]:
docs_dir="data/docs"
persist_directory_dir="data/chroma_db"
batch_size=64

# Initialize RAG system
rag = RAGSystem(
    docs_dir,
    persist_directory_dir,
    batch_size
)

if len(glob(os.path.join(persist_directory_dir, "*/*.bin"))):
    rag.initialize_vector_store() # vector store initialized
else:
    # Load and index documents
    documents = rag.load_documents()
    rag.initialize_vector_store(documents) # documents

In [4]:
import os
import glob
import re

def parse_questions_answers_with_regex_file(file):
    question_pattern = re.compile(r"<question>(.*?)</question>", re.DOTALL)
    answer_pattern = re.compile(r"<answer>(.*?)</answer>", re.DOTALL)
    qa_list = []

    try:
        with open(file, "r", encoding="utf-8") as f:
            content = f.read()
        
        # Find all questions and answers in the file
        questions = question_pattern.findall(content)
        answers = answer_pattern.findall(content)

        assert len(questions) == len(answers)

        # Pair questions and answers
        qa_list.extend(zip(map(str.strip, questions), map(str.strip, answers)))
    except Exception as e:
        print(f"Error processing file {file}: {e}")
    return qa_list

def parse_questions_answers_with_regex(folder_path):
    """
    Parse question-answer pairs from XML-like text files using regex.

    Args:
        folder_path (str): Path to the folder containing XML-like text files.

    Returns:
        list of tuples: Each tuple contains a question and its corresponding answer.
    """
    # List all text files in the folder
    files = glob.glob(os.path.join(folder_path, "*.txt"))
    qa_list = []

    for file in files:
        qa_list.extend(parse_questions_answers_with_regex_file(file))

    return qa_list


# Parse the question-answer pairs
questions_answers = parse_questions_answers_with_regex("data/evaluation")

len(questions_answers)


45

In [5]:
questions_answers[:2]

[('How did specialization in crafts like tanning, weaving, and pottery contribute to economic diversity within the Hawsa region?',
  'The specialization of certain groups in specific crafts allowed for a more diverse range of products, from textiles and jewelry to agricultural surplus and artisanal goods, which supported both local consumption and trade.'),
 ('What role did the Kebbi and Zamfara regions play in the Hawsa economy based on the information provided?',
  'The Kebbi region focused on tanning, weaving, and pottery, while the Zamfara region specialized in silver jewelry and pottery. Both areas contributed to the regional economy through their unique crafts and products.')]

In [6]:
response = rag.query(questions_answers[0][0])
print(response["answer"])

Specialization in crafts such as tanning, weaving, and pottery contributed significantly to economic diversity within the Hawsa region by offering a range of products that could be traded. These crafts provided raw materials for other industries like clothing production, which was highly valued due to its quality. The variety of objects produced allowed the Hawsa region to engage in extensive trade networks with both local and international markets.

The specialization in these crafts also created distinct economic niches within the community. For instance, tanning contributed to leather goods manufacturing, weaving facilitated textile production, and pottery was integral to household items and possibly decorative arts. Each craft not only provided a means of livelihood but also allowed for the creation of specialized skills that could be traded or sold.

Moreover, these crafts often required specific resources like hides, fibers, and clay, which were available locally due to specializ

In [7]:
print(questions_answers[0][1])

The specialization of certain groups in specific crafts allowed for a more diverse range of products, from textiles and jewelry to agricultural surplus and artisanal goods, which supported both local consumption and trade.


In [8]:
import random

query, correct = random.choice(questions_answers)

In [9]:
response = rag.query(query)
print(response["answer"])

The nature of African states changed significantly over time in several key ways:

1. Complexity and Diversity: Initially, African societies were often seen as simple chiefdoms or tribal groups. However, research has revealed a much more complex and diverse range of political structures, including kingdoms, empires, city-states, and federations.

2. State Formation: Over centuries, many pre-state-level societies evolved into states with defined territories, bureaucracies, and ruling elites. This process was often gradual rather than sudden.

3. Centralization vs Decentralization: Some African states became highly centralized, with strong monarchs or councils exercising control over large areas. Others maintained more decentralized structures, with local chiefs or councils holding significant power.

4. Military Organization: States developed sophisticated military organizations to defend their territories and expand influence. This included standing armies, professional soldiers, and o

In [10]:
print(correct)

States became more hierarchical with central authority replacing traditional clan or community leadership. New forms of governance emerged, including makhzen maghrébins, mansaya, and emirates, reflecting changes in power dynamics.


In [11]:
query

'In what ways did the nature of African states change over time?'

In [12]:
query, correct = random.choice(questions_answers)

response = rag.query(query)
print(response["answer"])

Based on the information provided, Undi's control over Nsenga territories likely manifested in practice through a system of tribute and military enforcement. The passage states that "Undi was able to exact tribute from the Nsenga, who were not his subjects," suggesting he maintained influence or authority without formal sovereignty. Additionally, it mentions that "he had to use force to keep them under control," indicating that Undi likely used military means to enforce compliance and maintain his power over these territories.

The text also notes that "Undi was able to exact tribute from the Nsenga, who were not his subjects," which implies a form of indirect rule or control. This suggests that Undi maintained influence over the Nsenga by collecting tributes without having formal authority over them, and likely used force to ensure their compliance.

In summary, Undi's control was characterized by the use of tribute collection as well as military enforcement to maintain his power in t

In [13]:
print(correct)

Undi may have maintained influence through a combination of political subjugation, economic leverage (through trade and famine relief), and cultural integration by adopting the Nsenga system of chieftaincies adapted to their own customs.


In [14]:
print(query)

How did Undi's control over Nsenga territories likely manifest in practice?


In [15]:
from src.utilities.llm_models import get_llm_model_embedding

embedder = get_llm_model_embedding()

In [16]:
embeddings = embedder.embed_documents([
    response["answer"],
    correct
])

In [17]:
import numpy as np

def cosine_similarity(expected, proposed):
    # Encode the input sentences to embeddings
    embeddings = np.array(embedder.embed_documents([
        expected,
        proposed
    ]))
    
    # Compute the dot product and norms
    dot_product = np.dot(embeddings[0], embeddings[1])
    norm_expected = np.linalg.norm(embeddings[0])
    norm_proposed = np.linalg.norm(embeddings[1])
    
    # Compute cosine similarity
    similarity = dot_product / (norm_expected * norm_proposed)
    return similarity

cosine_similarity(
    correct,
    response["answer"],
)

0.6740310174030234