In [1]:
%cd ..

/home/eak/Documents/AI/LLMChat/medivocate


In [2]:
import os
from glob import glob
from src.rag_pipeline.rag_system import RAGSystem

In [3]:
docs_dir="data/docs"
persist_directory_dir="data/chroma_db"
batch_size=64

# Initialize RAG system
rag = RAGSystem(
    docs_dir,
    persist_directory_dir,
    batch_size
)

if len(glob(os.path.join(persist_directory_dir, "*/*.bin"))):
    rag.initialize_vector_store() # vector store initialized
else:
    # Load and index documents
    documents = rag.load_documents()
    rag.initialize_vector_store(documents) # documents

In [4]:
from datasets import load_dataset

ds = load_dataset("alexneakameni/qa_africa", split="train")

ds

Dataset({
    features: ['question_number', 'question_text', 'answer_choices', 'correct_answers', 'explanation', 'A', 'B', 'C', 'D', 'E'],
    num_rows: 89782
})

In [5]:
ds[0]

{'question_number': 'c612edc8-f835-4a75-865e-aad8cfd64cb0_1',
 'question_text': 'Quels étaient les principaux objectifs de la politique élaborée par Gallieni en 1897 ?',
 'answer_choices': {'A': "Favoriser l'autonomie politique des grandes régions de l'île et isoler le pouvoir centralisé merina",
  'B': "Tout simplement profiter de l'autonomie pour effectuer la colonisation",
  'C': "Isoliser l'ennemi principal et profiter de son autonomie pour effectuer la colonisation",
  'D': None,
  'E': None},
 'correct_answers': ['C'],
 'explanation': "Gallieni répondait à un «\u2009triple objectif\u2009: isoler et réduire l'ennemi principal, le pouvoir centralisé merina\u2009; favoriser contre lui l'autonomie politique des grandes régions de l'île, selon le principe ‘diviser pour régner’\u2009; profiter de cette autonomie pour effectuer la colonisation aux moindres frais. ",
 'A': "Favoriser l'autonomie politique des grandes régions de l'île et isoler le pouvoir centralisé merina",
 'B': "Tout s

In [6]:
responses = "\n".join(
    [k + ". " + j for k, j in ds[0]["answer_choices"].items() if j is not None]
)

query = f"""
{ds[0]["question_text"]}
{responses}
"""

print(query)


Quels étaient les principaux objectifs de la politique élaborée par Gallieni en 1897 ?
A. Favoriser l'autonomie politique des grandes régions de l'île et isoler le pouvoir centralisé merina
B. Tout simplement profiter de l'autonomie pour effectuer la colonisation
C. Isoliser l'ennemi principal et profiter de son autonomie pour effectuer la colonisation



In [7]:
response = rag.query(query)
print(response["answer"])

Il n'y a pas de réponse directement associée dans le texte que vous avez fourni. Le texte discute plutôt des aspects historiques et économiques du commerce transatlantique d'esclaves, ainsi que des tendances de dépendance et de sous-développement dans certaines régions. Il n'y a aucune information sur la politique de Gallieni en 1897. Les options A, B et C ne sont pas soutenues par le texte fourni.


In [8]:
def get_question_answer(data: dict):
    responses = "\n".join(
        [k + ". " + j for k, j in data["answer_choices"].items() if j is not None]
    )

    query = f"""
    {data["question_text"]}
    {responses}
    """
    return query, data["correct_answers"], data["explanation"]

In [9]:
import random

data = random.choice(ds)
query, correct, explanation = get_question_answer(data)

In [10]:
response = rag.query(query)
print(response["answer"])

Je n'ai pas de contexte pertinent dans les informations fournies pour répondre à cette question sur la capacité crânienne des types d'Homo. Les informations fournies concernent plutôt l'art africain et la traite négrière, sans aucune référence aux capacités crâniennes des différentes espèces d'Homo.


In [11]:
correct

['A']

In [12]:
print(query)


    Quelle est la capacité crânienne plus élevée connue pour le type Homo primitive
    A. Homo erectus
B. Homo habilis
C. Homo neanderthalensis
    


In [13]:
query, correct, explanation = get_question_answer(random.choice(ds))

response = rag.query(query)
print(response["answer"])

C. That Africa was not a continent with historical significance

This answer is inferred from the context provided, which mentions that Hegel's Philosophy of History did not consider Africa as an entity with its own history and destiny, suggesting that this was seen as a primary issue. The other options do not align with the given context.


In [14]:
correct

['C']

In [15]:
explanation

"Hegel's Philosophy of History has been criticized for its portrayal of Africa as not having a significant or valuable history."

In [16]:
print(query)


    What is the primary issue with Hegel's Philosophy of History regarding Africa?
    A. That Europeans should prioritize studying their own history over the history of other continents
B. That Africa had a unique and valuable history that needed to be studied
C. That Africa was not a continent with historical significance
    


In [17]:
query, correct, explanation = get_question_answer(random.choice(ds))

response = rag.query(query)
print(response["answer"])

A. Gold and other precious minerals

The passage indicates that in the 19th century, miners primarily relied on gold and other precious minerals as their primary source of income. While it mentions trade at the Salaga market and riverbank crops, these were not the main sources of income for miners during this period.


In [18]:
correct

['A']

In [19]:
explanation

'Miners during the 19th century primarily worked for gold and other precious minerals, which were in high demand. The Salaga market was a major trading hub where merchants bought and sold various goods.'

In [20]:
query = "What were the main objectives of the first phase (1965-1969) of UNESCO’s work on African history?"
answer = """
The main objectives of the first phase were:

Documentation and planning for the project.
Conducting operational activities such as collecting oral traditions, creating regional documentation centers for oral tradition, and collecting unpublished manuscripts in Arabic and "ajami" (African languages written in Arabic script).
Inventorying archives and preparing a Guide des sources de l’histoire de l’Afrique, which was based on European archives and libraries and published in nine volumes.
Organizing meetings where African and international specialists discussed methodology and outlined the project after examining the available sources.
"""

response = rag.query(query)
print(response["answer"])

The main objective of the first phase (1965-1969) of UNESCO's work on African history was to conduct documentation and planning for the publication. This involved conducting field activities such as collecting oral tradition, establishing regional centers for oral tradition documentation, gathering unpublished manuscripts in Arabic and "ajami" (African scripts written in Arabic), inventorying archives, and preparing a Guide for sources of African history based on archives and libraries from European countries published over nine volumes.
