#### RAG avec les chaine RetrievalQA et LCEL (LangChain Expression Language) de langchain
Après avoir expérimenté une chaine de RAG basique dans les notebook v2, j'essaie dans cette v3 les chaînes:
* RetrievalQA: fonction 'legacy' de langchain, avec un paramètre `refine`, qui permet d'amériorer une réponse initiale en itérant successivement sur les différents documents fourni par le retriever 
* LCEL: méthode plus récente et conseillée par la doc langchain, plus légère et flexible

Ci-dessous une comparaison des 3 méthodes:
### Comparison Table

| Feature           | Method 1: Basic RAG Chain                  | Method 2: RetrievalQA Chain                | Method 3: LCEL RAG Chain                   |
|-------------------|-------------------------------------------|--------------------------------------------|-------------------------------------------|
| Structure         | Linear pipeline                           | Abstracted, built-in class                 | Programmatic, flexible pipeline           |
| Customization     | Low                                       | Medium                                     | High                                      |
| Ease of Use       | Easy                                      | Very easy                                  | Moderate (requires LCEL knowledge)        |
| Output Control    | Basic (string output)                     | Can return source documents                | Flexible (customizable output)            |
| Use Case          | Simple RAG tasks                          | Quick, out-of-the-box RAG                  | Advanced, complex RAG workflows           |

### Which Method to Use?

- **Method 1:** Use this for simple RAG tasks where you don’t need much customization.
- **Method 2:** Use this for quick implementations or when you want a pre-built solution with minimal setup.
- **Method 3:** Use this for advanced use cases where you need full control over the pipeline or want to implement complex logic.



### Librairies

In [5]:
# ! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain pypdf

Variables d'nvironement 

In [1]:
import dotenv

dotenv.load_dotenv("/home/chougar/Documents/GitHub/Formation_datascientest/DL-NLP/.env")

True

## Import librairies

In [None]:
# libraries and models setup
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma, FAISS
from langchain.retrievers import SelfQueryRetriever
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_ollama.llms import OllamaLLM
from time import time as timing


#### LLM & modele d'embedding à utiliser


In [3]:
# model_name="qwen2.5:3b"
# model_name="mistral:latest"
# model_name="deepseek-r1:8b"
# model_qa_name="hf.co/bartowski/Falcon3-7B-Instruct-GGUF:Q4_0"
# model_qa_alias="falcon3-7b-mamba"
# llm_qa = OllamaLLM(model=model_qa_name, temperature=0.2)

model_qa_name="gpt-4o-mini"
model_qa_alias="gpt-4-mini"
llm_qa = ChatOpenAI(model_name=model_qa_name, temperature=0.2)



# Embeddings model definition
# model_emb_name='text-embedding-ada-002'
model_emb_name="text-embedding-3-small"
embedding_model = OpenAIEmbeddings(model=model_emb_name)


#### Evaluateurs
1. LLM

In [4]:
llm_evaluator = ChatOpenAI(model_name="gpt-4o", temperature=0.2)

2. Modèle spécialisé:
https://huggingface.co/lytang/MiniCheck-Flan-T5-Large

In [None]:
from minicheck.minicheck import MiniCheck
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "cpu"

loaded_models={"specialized_evaluator": False}

doc = "A group of students gather in the school library to study for their upcoming final exams."
claim_1 = "The students are preparing for an examination."
claim_2 = "The students are on vacation."

# model_name can be one of ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
specialized_evaluator = MiniCheck(model_name='flan-t5-large', cache_dir='./ckpts')
loaded_models["specialized_evaluator"]=True

pred_label, raw_prob, _, _ = specialized_evaluator.score(docs=[doc, doc], claims=[claim_1, claim_2])
# pred_label, raw_prob, _, _ = scorer.score(docs=[doc], claims=[claim_2])

print(pred_label) # [1, 0]
print(raw_prob)   # [0.9805923700332642, 0.007121307775378227]


2025-03-06 12:44:06.728083: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741261446.893823    9520 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741261446.937340    9520 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Evaluating:   0%|          | 0/2 [00:00<?, ?it/s]Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
Evaluating: 100%|██████████| 2/2 [00:13<00:00,  6.91s/it]

[1, 0]
[0.9805923700332642, 0.007121290545910597]





In [43]:


# def purge_vram(specialized_evaluator=specialized_evaluator):
#     import torch
#     torch.cuda.empty_cache()

#     if loaded_models["specialized_evaluator"]:
#         del specialized_evaluator
#         specialized_evaluator=None
#         loaded_models["specialized_evaluator"]=False
#         print("specialized_evaluator deleted")
        
# # purge_vram(specialized_evaluator)

#### Step 2: charger le PP

In [9]:
pdf_file_path = './data/PROJECT DOCUMENT MAHAKAM 2023-2025_balise.pdf'

# Create a loader instance
loader = PyPDFLoader(pdf_file_path)

# Load the data from the PDF
data = loader.load()

#### Step 3: Split  PDF en fragments
Basé sur l'expérience du notebook `rag-v2-gridSearch`, le meilleur modèle et chunk size/overlap sont `text-embedding-ada-002` et 2000 / 400


In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=400)

docs = text_splitter.split_documents(data)

In [11]:
# nettoyage
bruits=["Planète Urgence | FOREST Programme"]
for doc in docs:
    for bruit in bruits:
        if bruit in doc.page_content:
            doc.page_content=doc.page_content.replace(bruit, "")

#### Step 4: Creation d'un BD Chroma

In [None]:
db_path = "./"

chroma_db = Chroma.from_documents(docs, embedding_model, persist_directory=db_path,)

faiss_db = FAISS.from_documents(docs, embedding_model,)



#### Step 5: Retrait documents pertinents (phase RETRIEVAL)
Basé sur l'expérience du notebook `rag-v2-gridSearch`, le meilleur k=12 et la meilleure méthode de recherche est `mmr`

In [19]:
# Creation des retrievers
#retriever = chroma_db.as_retriever(k=12, search_type="mmr")

retriever_chroma = chroma_db.as_retriever(
    search_kwargs={"k": 5, "fetch_k": 10},  # Retrieve more documents
    search_type= "mmr"
)

retriever_faiss = faiss_db.as_retriever(
    search_kwargs={"k": 5, "fetch_k": 10},  # Retrieve more documents
    search_type= "mmr"
)


# Test
query = "What is the topic of the PDF?"

t=timing()
result = retriever_chroma.invoke(query)
print(f"t1 :{timing()-t}")
print(f"chroma_db: {result}")

t=timing()
result = retriever_faiss.invoke(query)
print(f"t2 :{timing()-t}")
print(f"faiss: {result}")

t1 :0.7505199909210205
chroma_db: [Document(metadata={'page': 5, 'source': './data/PROJECT DOCUMENT MAHAKAM 2023-2025_balise.pdf'}, page_content='(sustainable ponds and sylvo fishery models) 22\nActivity 3.2. Train community groups particularly women group on the \nfinancial management and product marketing. 22\nOUTPUT 4. THE SUSTAINABLE COASTAL GOVERNANCE IN DELTA MAHAKAM AND ADANG BAY ARE \nSTRENGTHENED 23\nActivity 4.1. Facilitate coordination meeting among Mahakam Delta and \nAdang Bay’s actors to synergize the program 23\n7. PARTNERSHIPS 26\n8. CROSS-CUTTING APPROACHES 29\n9. KNOWLEDGE MANAGEMENT, CAPITALIZATION & COMMUNICATION 31\n10. SUSTAINABILITY, SCALING-UP AND/OR EXIT STRATEGY 36\n11. RESULTS FRAMEWORK 38\n12. MONITORING & EVALUATION 42\n13. TIMELINE 44'), Document(metadata={'page': 5, 'source': './data/PROJECT DOCUMENT MAHAKAM 2023-2025_balise.pdf'}, page_content='(sustainable ponds and sylvo fishery models) 22\nActivity 3.2. Train community groups particularly women group 

In [20]:
# Eval ran reply with llm
def score_reference_vs_rag_with_gpt(question, reference_text, ragReply, llm=llm_evaluator):
    # prompt=f"""
    #     Help me to compare a RAG answer against a reference text to a given question

    #     Question:\n{question}

    #     Reference text:\n{reference_text} 

    #     RAG anwser:\n{ragReply}

    #     Provide a score for 1 to 10 to evaluate the quality of the RAG answer against the reference text (coherence, coverage, clarity ..)
    #     When the RAG answer states it does have enough information to reply entierly to the question, don't penalize it in your evaluation.
    #     Respond only with the score (1,2 ... 10)
    # """

    prompt=f"""
        Evaluate the quality of a Retrieval-Augmented Generation (RAG) answer by comparing it against a reference text for a given question.  

        ### Inputs:  
        - **Question:** {question}  
        - **Reference Text:** {reference_text}  
        - **RAG Answer:** {ragReply}  

        ### Evaluation Criteria:  
        Rate the RAG answer on a scale of **1 to 10** based on the following aspects:  
        1. **Coherence:** How logically structured and consistent the answer is.  
        2. **Coverage:** How well the answer addresses key points found in the reference text.  
        3. **Clarity:** How clearly the answer conveys information.  

        **Important Consideration:**  
        - If the RAG answer explicitly states that it lacks sufficient information to fully answer the question, do **not** penalize it.  

        ### Output Format:  
        Respond **only** with a single number between **1 and 10** (e.g., 1, 2, ..., 10).  

    """

    resp=llm.invoke(prompt)    
    
    
    try:
        return int(resp.content)
    except:
        return resp.content





#### Step 6: RAG avec methode "legacy" RetrievalQA

Methode classique "RetrievalQA"

In [None]:

retrievalQA_chain = RetrievalQA.from_chain_type(
    llm=llm_qa,
    chain_type="refine",
    retriever=retriever_chroma,
    return_source_documents=True
)

# Call the RAG chain with a user query
# question = "What is the main topic discussed in the PDF?"
question="Context, environment, project rationale and challenges"
response = retrievalQA_chain.invoke({"query": question})
print(response)


{'query': 'Context, environment, project rationale and challenges', 'result': "**Context:**\nThe Covid-19 pandemic has significantly impacted life in Indonesia, particularly in East Kalimantan, which has one of the highest rates of exposure to the virus outside of Java. Since early 2020, the government has imposed lockdowns and restrictions on community activities to curb the spread of the virus. As of early 2022, while community activities began to normalize, the emergence of the Omicron variant posed new challenges. In this ongoing pandemic context, it is crucial for projects aimed at environmental restoration to continue while prioritizing health and safety protocols.\n\n**Environment:**\nEast Kalimantan is home to rich biodiversity, especially within its mangrove ecosystems, which are vital for local livelihoods and community resilience against climate change. However, these ecosystems are under threat due to degradation, which not only impacts the environment but also the local ec

In [21]:
# La 'reference_answer' est copiée manuellement du document pour être utilisée comme texte de référence lors de l'évaluation

questions=[
    {
        "q_#": 1,
        "question": "Description of the project", 
        "reference_answer": """    
            Brief project description
            East Kalimantan Province in 2021/2022 received great attention nationally
            because of the moving of the state capital city (Jakarta) to a location near
            the city of Balikpapan and Penajam Paser Utara in East Kalimantan Province.
            The development of the new capital will start in 2022. Although the
            Indonesia President commit to develop the new capital as Forest and Smart
            City, the surrounding area particularly the coastal area such as Delta
            Mahakam and Adang Bay might get high pressure as the consequence of the
            new development and the movement of 1.5 million people to the new
            capital.
            Delta Mahakam, in the eastern part of East Kalimantan, is an area that is
            relatively close to the prospective center of the State capital (about 100 km).
            Mahakam Delta is naturally a mangrove habitat, but due to excessive land
            clearing for extensive aquaculture about 47.5 % of the mangrove ecosystem
            is degraded to be converted into aquaculture (2017). Despite various
            conservation efforts by different parties and the government, land clearing
            still continues. Delta Mahakam land ownership is government land that has
            designated as a production forest, but this area has been inhabited by
            residents from generation to generation.
            Adang Bay is one of the coastal villages in Adang Bay, Paser Regency, on the
            southern part of East Kalimantan Province (about 100 km from the new
            capital). This area is also experiencing land conversion to increase
            aquaculture, besides there are several locations in coastal areas that are
            affected by abrasion. Restoration activities in East Kalimantan Province are
            needed to restore a degraded environment, as well as to support the vision
            of the nation's capital as a green city.
            The ecosystem in Delta Mahakam and Adang Bay 1 are also home to
            critically endangered species, such as the nasal monkey (proboscis
            monkey), endemic to the island of Borneo. On a global scale, the mangrove
            is a key ecosystem to answer the challenge of carbon sequestration and
            fight against climate change.
            The objective of the project is therefore to contribute to restore the
            degraded mangrove forest in East Kalimantan (Delta Mahakam and Adang
            Bay) as home of endemic and endangered species including proboscis
            monkey and key ecosystem to mitigate and to adapt the impact of climate
            change; and this, through four main actions: raising awareness of the
            stakeholders, rehabilitating degraded mangrove forest, supporting the
        """
    },
    {
        "q_#": 2,
        "question": "Country and city", 
        "reference_answer": """
            The location of the project is in Paser District (Adang Bay village) and Kutai
            Kartanegara district (Delta Mahakam) East Kalimantan Province. The location
            of project is nearby the new capital of Indonesia which is in the Penajam
            Paser Utara (around 130-160 km)
        """
    },
    {
        "q_#": 3,
        "question": "Target beneficiaries", 
        "reference_answer": """
            Number of direct beneficiaries of the pilot project: 3245 people with the
            proportion of 30% women and 70% men.
            Number of indirect beneficiaries: 3000 people by assuming at least the
            project will give benefit indirectly to 1500 people per location including in
            East Kalimantan and Indonesia.
            The target groups include:
            - School children (primary schools and secondary schools)
            - Teachers (primary school teachers)
            - Community members (villagers, consists of fish farmers, women group,
            and youth)
            - Village officials
            - Stakeholders from various institutions (government institutions,
            universities, and non-government organizations)
            - Public audience in general (reached by Media)
            Other potential groups:
            - High school and university students
            - Environmental activists 
        """
    },
    {
        "q_#": 4,
        "question": "Number of people concerned", 
        "reference_answer":"""
            Number of direct beneficiaries of the pilot project: 3245 people with the
            proportion of 30% women and 70% men.
            Number of indirect beneficiaries: 3000 people by assuming at least the
            project will give benefit indirectly to 1500 people per location including in
            East Kalimantan and Indonesia.
        """ 
    },
    {
        "q_#": 5,
        "question": "Context, environment, project rationale and challenges", 
        "reference_answer": """
            Context & environment and development challenges
            Geographic and socio-economic context
            East Kalimantan is one of the richest provinces in Indonesia and the main
            contribution to the national GDP. Before palm oil and mining coal booming in
            early 2000, forestry, mining and gas sectors are the backbone of economic
            development in East Kalimantan. Because too much depending on the
            unrenewable natural resources, the economic growth of East Kalimantan
            gradually declines and, in 2016, reached the minus point because of the
            lowest price of coal at the global level. In 2019, Indonesia government has
            decided to move the capital of Indonesia from Jakarta to East Kalimantan.
            Currently, the government accelerate the infrastructure development of new
            capital.

            The project will be implemented in several regions of Mahakam Delta and
            Adang Bay. Mahakam Delta is located on the eastern coast of the island of
            Borneo, in East Kalimantan province, which is one of the five provinces that
            has the lowest population density in Indonesia. This province is also the main
            contributor to the national GDP, mainly for its wealth in oil and gas. It is
            nevertheless aquaculture activities which constitute the main source of
            income for the local population. About 90% of the population depend on it
            for their livelihood. As a result, 54.19% of the Mahakam Delta has been
            converted to shrimp ponds. The majority of exports from the area are made
            up of tiger shrimp and white shrimp that are farmed in the delta ponds and
            along the Paser District's shore.

            Paser District is located on the east coast of East Kalimantan Province. The
            village of Adang Bay is located in the coastal area of this district, in Adang
            Bay. This area is a conservation area managed by the Ministry of Forestry
            (KPHP). Therefore, limited economic activities are allowed in this area.
            However, since the late 1990s, massive clearing for the construction of
            aquaculture ponds has destroyed the mangrove forest in the area. According    
            to the District Pastor's Investment Agency, about 1,506 people live in Adang
            Bay. Most of them work as fishermen, fish farmers and swallowers. The
            village government and the community have made a strong commitment to
            conserve the area by adhering to the jurisdictional REDD+ approach, funded
            by the World Bank's FCPF or Forest Carbon Partnership Facility Project.
            Environmental context
            Largest archipelago in the world (more than 13,000 islands), Indonesia has
            an area of 1,905,000 km2 of which less than 50% is still covered by forests
            today, while the country is part of the 3rd largest planetary tropical
            forest zone (after the Amazon and the Congo Basin). More than half of
            Indonesian forests have disappeared since 1960. However, they are home to
            a large part of the world's biodiversity (more than 10% respectively of plant,
            mammal, reptile and bird species). Today, the country counts for 3 to 5% of
            annual global greenhouse gas emissions (among the 10 most emitting
            countries) including more than 50% due to land use, their change of land use
            and the exploitation of forests.
            Indonesia is home to almost 1/4 of the world's mangroves (20%). This
            maritime ecosystem, made up of a set of mainly woody plants (the most
            notable species being the mangrove), develops in the swinging area of the
            tides of the low coasts and in marshes at the mouth of certain rivers. Of the
            nearly 3.2 million hectares of mangrove forest 2 in the country today, more
            than 50,000 ha are lost each year.
            The mangrove is one of the most productive ecosystems on the planet,
            home to a particularly abundant biomass. The mangroves' root system is
            notably a biotope where a variety of fish and crabs live and reproduce. The
            mangrove thus provides important resources (forestry and fishery) to coastal
            populations, a natural “buffer” zone adapted to salinity, filtering sediment
            and pollution carried by rivers and the sea, and preserving the fresh water
            resources of the land. They are a food security and livelihood issue, in
            particular providing income to fishing communities. This ecosystem is also
            an important natural fount of carbon, with Indonesian mangroves storing
            around 5 times more carbon per hectare than terrestrial forests. The
            government of Indonesia has taken into account this ecosystem in its REDD+
            strategy, implemented in the only pilot province of East-Kalimantan, with the
            Provincial Council on Climate Change (DDPI) with the support of the World
            Bank in the framework of the “Forest Carbon Partnership Facility Project”.
            Finally, the mangrove plays a key role in natural defense. The complex
            network of mangrove roots can help reduce wave power, which limits erosion
            and protects coastal communities from the destructive forces of tropical
            storms. Mangroves provide protection against extreme weather events and
            tsunamis, and can adapt to rising sea levels and subsidence. They therefore    
            contribute to reducing the risk of disasters, to the resilience of communities
            and ecosystems and to their adaptation to climate change.
            In Mahakam Delta, results from a study conducted in 2018 and 2019 by the
            Kutai Kartanegara District has shown that 47.8% of mangrove forests are
            deteriorated.
            Table 1. Critical Criteria of Mahakam Delta Mangrove3
            Critical Criteria Land Area (ha) Percentage
            Damaged 7,034 5.6
            Severe 52,945 42.2
            Undamaged 65,522 52.2
            Total 125,502 100.0
            
            Source: The Result of Spatial Analysis of Mangrove Damage Level (2018)

            With the plan to move the state capital to Penajam Paser Utara District
            (PPU), the development activities to create this new big city will take place
            massively. The central government has planned to create a green city for the
            new capital, which construction will start in 2022, but various problems still
            pose challenges in locations outside the new capital. On the one hand, a
            close government center can control the surrounding environment to keep it
            conserved, but the gap in the quality of human resources and plans to move
            a large number of people from Jakarta to this area will certainly cause
            pressure on the environment.

            Biodiversity issues
            The mangrove of Mahakam Delta conceals a rich marine and arboreal
            biodiversity, characterized by a large variety of fish, arthropods, reptiles
            such as the marine crocodile (Crocodylus porosus), aquatic mammals such
            as the Irrawaddy dolphin (Orcaella brevirostris) or terrestrial like the nasal
            monkey (Nasalis larvatus), these last 2 species being considered as being
            “endangered” by the IUCN.
            The deforestation of Mahakam Delta’s mangrove hampers the effort to
            conserve this type of species, for example by fragmenting the habitat of the
            nasal monkey, whose interaction between populations strongly depends on
            the continuity of the canopy. The isolation of these populations makes them
            more vulnerable to poaching. The long-nosed monkey, endemic of Borneo
            Island is listed as “Endangered” by the IUCN as it has undergone extensive    
            population reductions across its range, and ongoing hunting and habitat
            destruction continue to threaten most populations. Numbers have
            declined by more than 50% (but probably less than 80%) over the past 3
            generations (approximately 36-40 years).4 At the scale of Mahakam Delta,
            only 2 censuses have been conducted to monitor this specie, respectively in
            1997 and 2005 which reflects the lack of resources of local institutions to
            conserve and protect this biodiversity.
            In addition, the degradation of this ecosystem leads to a decrease in fish
            stocks in the delta, threatening both fishermen and species such as sea
            crocodiles and dolphins. The situation is currently might threatening to
            exacerbate human-animal conflicts and therefore to further decrease the
            populations of the above-mentioned species, even threatening them with
            extinction. Hence, beside preventing the mangrove forest conversion into
            palm oil plantation, aquaculture ponds, and other usages, the reforestation
            activity is necessary to improve the degraded mangrove ecosystem in the
            coastal area.
            Paradoxically, the considerable modification of delta habitats resulted in a
            very substantial increase in populations of birds associated with open
            wet areas, such as egrets (100 individuals in 1987 to nearly 15,000
            individuals in 2013). Likewise, some species of heron have seen their
            population sizes increase considerably, such as the purple heron or the Javan
            pond-heron, the lesser adjutant, ducks, Sunda teal and the wandering
            whistling-duck also seem to have used the habitats created by the clearings
            to considerably increase their populations.
            The populations of these species have benefited of new feeding areas when
            the shrimp ponds were developed. Indeed, egrets, ducks, and waders use
            the shrimp ponds in high numbers on cyclical basis when shrimp ponds are
            emptied for shrimp harvesting. The presence of pristine areas, with large
            trees or dense copses of smaller species (Nypa) removed from human
            presence, is also favourable for the reproduction of these species. Here they
            find quiet conditions for reproduction or gatherings (dormitories). Amongst
            the species observed in 2013 and those not observed in 1987, eight dwell in
            an aquatic environment and directly depend on the shrimp ponds: darter,
            stilts, grey heron, black-crowned night heron, intermediate egret, western
            marsh-harrier and the Garganey. The opening of shrimp ponds was the
            obvious factor leading to the growth of all these bird populations.
            
            Institutional Context
            The key players in coastal region in East Kalimantan including in Delta
            Mahakam Ulu (Delta Mahakam) and Delta Mahakam (Adang Bay) are
            relatively similar. Since the area located or nearby the conservation area and
            forest production area, the Ministry of Forestry via Nature Conservancy
            Agency in East Kalimantan and Forest Management Unit (provincial
            government agency) are the most influence actor. They have authority to
            determine the activities which allowed and not allowed in the area. However,
            they cannot control the vast area of conservation area since 50 percent of
            mangrove forest in the region have been degraded. Besides government, the
            others key actors are fishermen, fish farmers, swallow workers and investors
            in aquaculture sectors. Those actors have shaped the landscape of coastal
            area in Delta Mahakam and Adang Bay over the past 20 years. In their hand
            the future of sustainable aquaculture is determined. Environmental and
            development NGOs, oil and gas company and other parties has programme
            in their area. Most of the programme focus on improving the livelihood of the
            local people and restoring the mangrove forest.
            
            The Movement of Indonesian New Capital
            Paser District (East Kalimantan Province) will soon be the site of Indonesia's
            new political capital, Nusantara, as part of the plan to move the country's
            capital from the island of Java to the island of Borneo, which is home to one
            of the world’s largest rainforests.
            Jakarta, the current political capital which will become the country's
            economic capital by 2045, is currently facing several environmental, climatic
            and demographic problems and challenges: overpopulation, heavy pollution,
            rising water levels, frequent flooding, etc. In order to deal with the inevitable
            future security issue, the Indonesian government has decided to build a new
            capital 2,000 km away from Jakarta, in the province of East Kalimantan,
            more precisely between the towns of Balikpapan and Samarinda. With the
            legislation for the relocation of the new capital published, the physical
            development of the new capital will begin in 2022. In August 2024, the
            President plans to celebrate Indonesia's Independence Day in the new
            capital.
            The government plans to make the new capital a "forest city" by strongly
            preserving forest areas and using sustainable energy. However, many argue
            that the development of the new capital could lead to environmental
            degradation and loss of essential biodiversity, especially in the mangrove
            forest. The majority of the Indonesian population, including the local
            population, supports the new capital movement by echoing the effect of
            equitable development. Indeed, for decades, the natural resources of
            Kalimantan Island have been exploited to support Indonesian development,
            especially that of Java Island.
            The location of the project (Delta Mahakam and Adang Bay -Adang Bay) is an
            area relatively close to the potential centre of the state capital (about 100-
            200 km).

            Environment and development challenges
            a) Aquaculture industry
            Mahakam Delta area is under pressure from both the industrial and
            agricultural sectors, including aquaculture facing a national high dynamic.
            From 2015 to 2035 it is expected a destruction of 600,000 ha of mangrove
            for shrimp farm at the national scale. The World Bank (2013) estimates a
            pressure to double cultivated shrimp production from currently 300,000 t
            (produced by 600,000 ha of ponds) to 600,000 t/1,000,000 t by 2030 to fulfil
            the demand. However, with improvements in brackish water aquaculture
            productivity, halting palm oil concession to use mangroves, along with
            maintaining other mangrove use pressures at moderate levels, the net loss
            of mangroves in the next two decades could be reduced to around 23,000 ha
            at this same scale.
            The East-Kalimantan Province is the new area to develop aquaculture ponds
            as Java, Sumatra and Sulawesi islands are facing a decrease of the
            production and the destruction of their environment due to unsustainable
            practices.
            Feature 1: Forecasted mangrove loss at six mangrove regions in Indonesia
            in the next two decades due to land use change under pessimistic scenario.
            Circle size indicates potential loss areas in Sumatra, Kalimantan, and Papua;
            as for Java, Sulawesi and Maluku potential loss areas are represented by the
            smaller circles.
            Scientific studies also show that the percentage of mangrove natural
            recovery is higher in East-Kalimantan with 1.4%/year against 0.7%/year in
            other islands in inactive ponds. This suggests to consider conservation
            activities in specific areas of Mahakam Delta. At the scale of Mahakam Delta,
            the table below for which the percentage (43.7%) is as higher as the
            remaining mangrove forest (48.5%) highlights the dominance of aquaculture.

            
            b) Demography
            The demographic issue must also be considered. Indeed, the announcement
            in 2019 of the relocation of the political and administrative capital of Jakarta
            to the province of East Kalimantan, between the cities of Balikpapan and
            Samarinda, suggests strong migrations, the development of infrastructures
            but also a growing demand for aquaculture products. By 2024, the
            Indonesian Minister of Planning hopes to transfer nearly 1.5 million public
            officials and political representatives in East Kalimantan.
            Delta Mahakam Ulu village, which belongs to Delta Mahakam district, is
            located in the northern part of the Mahakam Delta. The location of Delta
            Mahakam sub-district is close to the state-owned oil company (Pertamina),
            formerly VICO. Due to the proximity of a fairly large company, the
            community's economy is quite dynamic and the area offers a variety of jobs.
            However, the number of people who still carry out the traditional work of
            fishermen and fish farmers is still quite high, especially in the coastal areas.
            Working as a fish farmer has become one of the choices of the community as
            land is available for opening ponds. The conversion of mangrove forests into
            ponds has been going on for decades, but the production of fish and shrimp
            has decreased from time to time. Based on various studies and research,
            planting a number of mangroves in ponds can improve the soil and water
            quality in the ponds so that they can provide sustainable production. The
            farmer groups in Delta Mahakam Ulu ponds are beginning to realise the
            importance of planting mangroves in the ponds, and therefore need support
            from various parties.
            c) Other issues
            The table below represents a summary of estimation of potential loss and
            gain of mangroves in six major regions by 2035. The Kalimantan Island is the
            one to analyze in order to justify Planète Urgence and partners’ information.
            The analysis does not yet consider the movement of new capital issue which
            very likely affect the mangrove forest in East Kalimantan as well.
            
            This table highlight the multiple and complex context in which mangrove loss
            depends and confirms challenges faced in Mahakam Delta area. The lack of
            resources (financial, human resources, material) of local authorities coupled
            with a lack of transparency, coordination and communication around
            responsibilities of each actors impacts the management of mangrove forests,
            natural resources and territorial development.    
            
            Another issue that has also had a major impact on life in Indonesia, including
            East Kalimantan, is the Covid 19 global pandemic that has attacked the
            entire world since early 2020. The Covid 19 pandemic has had a major
            impact on life in Indonesia. East Kalimantan is a province outside Java Island
            with the highest rate of exposure to Covid, which has resulted in the
            government imposing a lockdown and restrictions on community activities.
            At the beginning of 2022, community activities began to return to normal,
            but a new variant emerged, namely Omicron, which spread very quickly.
            Facing a pandemic situation that has not ended, of course, the project must
            continue but still pay attention to security, safety, and practice health
            protocols.
            
            3. Strategy & theory of change
            The three years project aims to contribute to restore the degraded of
            mangrove ecosystem in Production Forest (Mahakamm Delta) and
            Conservation area (Adang Bay). In doing so, the project will address the key
            problems in those regions:
            a. Lack of awareness of local people on mangrove ecosystem,
            biodiversity issue and waste
            b. Huge area of degraded mangrove forest which affect the resilience of
            local people in facing climate change, the habitat of endangered
            species and local economy;
            c. Lack of alternative sustainable livelihood in coastal area;
            d. Poor governance particularly on mangrove ecosystem and its
            environmental and economy issue.
            To overcome those problems, Planet Urgence and its partners will work by
            implementing the PU FORET strategy which rely on three components:
            1. Restore degraded forest;
            2. Environmental awareness;
            3. Strengthening livelihood of local people.
            In addition, the involvement of local NGOs, local community and volunteer is
            key for the successful of the project and the sustainability the impact of the
            project. Therefore, PU will reinforce the capacity of those local stakeholders
            to ensure they can carry out the project activities and together achieve the
            long-term goal of the project.    
            """
    },
    {
        "q_#": 6,
        "question": "Project start date / end date", 
        "reference_answer": "March 2023 – February 2026"
    },
    {
        "q_#": 7,
        "question": """Project budget 
            Total amount of the project (in Euros) 
            
            Amount of donation requested from the Foundation (in Euros) 
            
            Detailed provisional project budget 
            
            Detailed project budget for current year
        """, 
        "reference_answer": "The total required resources is 818 341 € for the period 2023-2026"
    },                

]
 



#### Reformulation de la question initiale

In [22]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

"v1"
# system = """You are a question re-writer that converts an input question to a better version that is optimized \n
#      for vectorstore retrieval. Re-write the input in an interrogative, clear and concise way."""

"v2"
# system="""
#     You are a **question rewriter** that refines input questions to optimize them for **vector store retrieval**.  

#     ### **Instructions:**  
#     - Convert the input into a **clear, concise, and well-structured interrogative sentence**.  
#     - Ensure the reworded question **preserves the original intent** while improving retrieval effectiveness.  
#     - Remove ambiguity and redundant phrasing to enhance search relevance.  

#     Here is the initial question: \n\n {question} \n Formulate an improved question.

#     ### **Optimized Output:**  
#     Provide only the rewritten question, without any additional text.  

# """

"v3"
system="""
    You are a question rewriter tasked with improving input questions to optimize them for vector store retrieval. 
    Your mission is to refine, rephrase, and enhance the provided questions to ensure they are:
    * Clear and easy to understand.
    * Concise and focused.
    * Optimized for effective retrieval by removing ambiguities, unnecessary words, and redundancies.
    * Written in an interrogative form while preserving the original intent.

    #### Input Fields to Rework:
    * Project Description: Reword questions that focus on the project’s overall scope and objectives.
    * Country and City: Refine questions to specifically inquire about the project’s location.
    * Target Beneficiaries: Enhance questions to clarify the population or group that benefits from the project.
    * Number of People Concerned: Rework questions to quantify how many people the project impacts.
    * Context, Environment, Project Rationale, and Challenges: Rephrase questions that ask for background information, challenges, and the reasoning behind the project.
    * Project Start Date / End Date: Rework questions regarding the project’s timeline.
    * Financial Information:
        * Project Budget: Reword questions about the overall project budget.
        * Total Project Cost: Rephrase inquiries about the total cost of the project.
        * Donation Request Amount: Refine questions asking about the amount of funding requested.
        * Provisional Project Budget: Rework questions about the detailed provisional budget for the project.
        * Current Year Budget: Enhance questions related to the budget specific to the current year.


    #### Response Format:
    For each input question, rephrase it in a clear, concise, and interrogative form, optimized for vector store retrieval. Return only the reworked question.
"""

re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        (
            "human",
            "Here is the initial question: \n\n {question} \n Formulate an improved question.",
        ),
    ]
)
llm_rewriter = ChatOpenAI(model_name="gpt-4o-mini", temperature=0.2)
question_rewriter = re_write_prompt | llm_rewriter | StrOutputParser()
# question="Context, environment, project rationale and challenges"
# question="Country and City"
question="Number of people concerned"
# question="Description of the project"
print(f"Initial question:\n{question}")
question_rewriter.invoke({"question": question})

Initial question:
Number of people concerned


'How many people are impacted by the project?'

Methode nouvelle Runnable / LCEL / "chain"

In [None]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnableLambda


# 1. Optimized prompt

# rag_prompt = ChatPromptTemplate.from_template("""Answer the question based on the context below. 
# If the context contain enough information to provide a complete or a partial answer, try to exloit it.
# If the context doesn't contain any information related to the question, say i don't know.

# Context: {context}

# Question: {question}

# Answer in a helpful, factual and detailed way:""")

rag_prompt = ChatPromptTemplate.from_template("""
    Answer the question based **only** on the provided context.  

    - If the context contains enough information to provide a complete or partial answer, use it to formulate a detailed and factual response.  
    - If the context lacks relevant information, respond with: "I don't know."  

    ### **Context:**  
    {context}  

    ### **Question:**  
    {question}  

    ### **Answer:**  
    Provide a clear, factual, and well-structured response based on the available context. Avoid speculation or adding external knowledge.  
""")

# 2. Configure retriever
retriever = chroma_db.as_retriever(
    # search_type="mmr",  # Try different search types
    # search_kwargs={"k": 5, "fetch_k": 10}  # Retrieve more documents
    k=6
)

# 3. Chain with refinement
# lcel_qa_chain = (
#     RunnableParallel({
#         "context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
#         "question": RunnablePassthrough(),
#     })
#     | rag_prompt
#     | llm_qa
#     | StrOutputParser()
# )

lcel_qa_chain = (
    RunnableParallel({
        "context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
        "sources": retriever,
        "question": RunnablePassthrough(),
    })
    | (lambda inputs: {
        "answer": (rag_prompt | llm_qa | StrOutputParser()).invoke(inputs),
        "sources": inputs["sources"]  
    })
)

# Usage
question="Number of people concerned"
reference_answer=[v["reference_answer"] for v in questions if v["question"]==question][0]

enhanced_question=question_rewriter.invoke({"question": question})
print("Enhanced_question:\n", enhanced_question)
print("\n----\nReference_answer:\n", reference_answer)
response = lcel_qa_chain.invoke(enhanced_question)
print("\n----\nLCEL_qa_chain answer:\n", response["answer"])
print("\n----\nLCEL_qa_chain sources:\n", response["sources"])



# eval with llm
llm_score= score_reference_vs_rag_with_gpt(enhanced_question, reference_answer, response)

# eval with specialized model
t=timing()
pred_label, raw_prob, _, _ = specialized_evaluator.score(docs=[reference_answer], claims=[response["answer"]])
print(f"mini check timing: {timing()-t}")
print(f"""LLM score: {llm_score}\n""")
print(f"""specialized model score: {pred_label}, {raw_prob}""")


Enhanced_question:
 How many people are impacted by the project?

----
Reference_answer:
 
            Number of direct beneficiaries of the pilot project: 3245 people with the
            proportion of 30% women and 70% men.
            Number of indirect beneficiaries: 3000 people by assuming at least the
            project will give benefit indirectly to 1500 people per location including in
            East Kalimantan and Indonesia.
        

----
LCEL_qa_chain answer:
 The project impacts a total of 6,245 people. This includes 3,245 direct beneficiaries, with a gender proportion of 30% women and 70% men, and 3,000 indirect beneficiaries, who are assumed to benefit indirectly from the project.

----
LCEL_qa_chain sources:
 [Document(metadata={'page': 18, 'source': './data/PROJECT DOCUMENT MAHAKAM 2023-2025_balise.pdf'}, page_content='\nFeature 3. Theory of Change\n4. Beneficiaries\nNumber of direct beneficiaries of the pilot project: 3245 people with the\nproportion of 30% women a

Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.26it/s]

mini check timing: 0.7997894287109375
LLM score: 8

specialized model score: [0], [0.10934196412563324]





### Différents types de recherche
La BDD Chroma propose 3 types de recherche (et arguments associés)

Ci dessous un test simple

In [29]:
# cosine similiarity 
def test_retrievers(db, question, chroma_db=chroma_db, faiss_db=faiss_db):
    if db=='chroma':
        db=chroma_db
    elif db=='faiss':
        db=faiss_db

    retriever_cosine_sim = db.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 6}
    )

    # cosine similiarity avec filtrage des docs au dessus d'un certain seuil
    retriever_sst = db.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={"score_threshold": 0.6, "k": 6},
    )

    """Maximal Marginal Relevance (mmr), toujours basé sur la cosine similiarity, 
    avec sélection des docs les plus pertinents, tout en préservant une certaine diversité (lambda_mult)"""
    retriever_mmr = db.as_retriever(
        search_type="mmr",
        search_kwargs={"fetch_k": 20, "lambda_mult": 0.5, "k": 6}
    )

    print("Cosine sim 'basic'")
    display(retriever_cosine_sim.invoke(question))
    print("\n---------\nSimilarity_score_threshold")
    display(retriever_sst.invoke(question))
    print("\n---------\nMMR")
    display(retriever_mmr.invoke(question))

# question="Number of people concerned"
# question="Country and city"
"""Exemple de la question 'Context, environment...'. Je sais par expérience que les infos sont dans les pages 10 à 18"""
question="Context, environment, project rationale and challenges"
test_retrievers(db="faiss", question=question)

if question.startswith("Context"):
    print("""Aucun retriever ne retourne les bons documents (attendus pages 10 à 18)""")
elif question.startswith("Number of"):
    print("""Nous voyons que les 2 premières méthodes ne retournent que les fragments les plus pertinents, 
    là où le MMR retourne des sources plus diversifiées""")

Cosine sim 'basic'


[Document(metadata={'source': './data/PROJECT DOCUMENT MAHAKAM 2023-2025_balise.pdf', 'page': 17}, page_content='\nAnother issue that has also had a major impact on life in Indonesia, including\nEast Kalimantan, is the Covid 19 global pandemic that has attacked the\nentire world since early 2020.  The  Covid 19 pandemic has had a major\nimpact on life in Indonesia. East Kalimantan is a province outside Java Island\nwith  the  highest  rate  of  exposure  to  Covid,  which  has  resulted  in  the\ngovernment imposing a lockdown and restrictions on community activities.\nAt the beginning of 2022, community activities began to return to normal,\nbut a new variant emerged, namely Omicron, which spread very quickly.\nFacing a pandemic situation that has not ended, of course, the project must\ncontinue  but  still  pay  attention  to  security,  safety,  and  practice  health\nprotocols.\n3. Strategy & theory of change\nThe  three  years  project  aims  to  contribute  to  restore  the  degr


---------
Similarity_score_threshold


No relevant docs were retrieved using the relevance score threshold 0.6


[]


---------
MMR


[Document(metadata={'source': './data/PROJECT DOCUMENT MAHAKAM 2023-2025_balise.pdf', 'page': 17}, page_content='\nAnother issue that has also had a major impact on life in Indonesia, including\nEast Kalimantan, is the Covid 19 global pandemic that has attacked the\nentire world since early 2020.  The  Covid 19 pandemic has had a major\nimpact on life in Indonesia. East Kalimantan is a province outside Java Island\nwith  the  highest  rate  of  exposure  to  Covid,  which  has  resulted  in  the\ngovernment imposing a lockdown and restrictions on community activities.\nAt the beginning of 2022, community activities began to return to normal,\nbut a new variant emerged, namely Omicron, which spread very quickly.\nFacing a pandemic situation that has not ended, of course, the project must\ncontinue  but  still  pay  attention  to  security,  safety,  and  practice  health\nprotocols.\n3. Strategy & theory of change\nThe  three  years  project  aims  to  contribute  to  restore  the  degr

Aucun retriever ne retourne les bons documents (attendus pages 10 à 18)


#### Grille de recherche pour les 2 méthodes
`similarity_score_threshold` étant une version rafinée de similarity, inutile de tester cette dernière

In [32]:
from itertools import product
""" search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.7, "k": 8},

    search_type="mmr",
    search_kwargs={"fetch_k": 20, "lambda_mult": 0.5, "k": 8}
"""

search_types=["similarity_score_threshold", "mmr",]
db_types=["chroma", "faiss"]

search_type_args={
    "similarity_score_threshold": {"score_threshold": 0.7, "k": 8},

    "mmr": {"fetch_k": 30, "lambda_mult": 0.5, "k": 8}
}

gridSearch=list(product(search_types, db_types))
gridSearch

[('similarity_score_threshold', 'chroma'),
 ('similarity_score_threshold', 'faiss'),
 ('mmr', 'chroma'),
 ('mmr', 'faiss')]

### Test avec méthode legacy

In [38]:
def update_retrieval_chain(search_type, search_type_args=search_type_args, llm_qa=llm_qa):
    retriever = chroma_db.as_retriever(
        search_type=search_type,
        search_kwargs=search_type_args[search_type],
    )
    
    rag_chain = RetrievalQA.from_chain_type(
        llm=llm_qa,
        chain_type="refine",
        retriever=retriever,
        return_source_documents=True
    )   

    return rag_chain
 
evaluations_retrieval_qa=[]
def retrievalQA_loop(evaluations_retrieval_qa=evaluations_retrieval_qa):
    for search_type in search_types:
        print(f"Search type: {search_type}")
        for q in questions:   
            enhanced_question= question_rewriter.invoke({"question": q["question"]})
            questions_variations=[
                {"question": q["question"], "enhanced_question": False},
                {"question": enhanced_question, "enhanced_question": True},
            ]

            for q_v in questions_variations:
                question=q_v["question"]
                retrievalQA_chain=update_retrieval_chain(search_type)
                try:
                    rag_qa = retrievalQA_chain.invoke({"query": question})
                    
                except Exception as e:
                    print(rag_qa)
                    rag_qa={}
                    rag_qa["result"]=str(e)
                    rag_qa["source_documents"]=None
                
                

                eval_score=score_reference_vs_rag_with_gpt(q["question"], q["reference_answer"], rag_qa["result"])

                print(f"""Question: {question}""")
                print(f"""Eval_score: {eval_score}""")

                evaluations_retrieval_qa.append(
                    {
                        "question_#":q["q_#"], 
                        "question": question,  
                        "reference_answer": q["reference_answer"],
                        "rag_answer": rag_qa["result"],
                        "eval_score_llm": eval_score,
                        "enhanced_question": q_v["enhanced_question"],
                        "search_type": search_type,
                        "source_documents": rag_qa["source_documents"]
                    }
                )
                print("\n-------")
                
            print("\n--------------------")

# retrievalQA_loop()

### Test avec méthode LCEL

In [36]:
def update_lcel_chain(search_type, db_type, search_type_args=search_type_args, llm_qa=llm_qa, rag_prompt=rag_prompt, docs=docs, chroma_db=chroma_db, faiss_db=faiss_db):
    if db_type=='chroma':
        test_db=chroma_db
    elif db_type=='faiss':
        test_db=faiss_db

    retriever = test_db.as_retriever(
        search_type=search_type,
        search_kwargs=search_type_args[search_type],
    )
    

    lcel_qa_chain = (
        RunnableParallel({
            "context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
            "sources": retriever,
            "question": RunnablePassthrough(),
        })
        | (lambda inputs: {
            "answer": (rag_prompt | llm_qa | StrOutputParser()).invoke(inputs),
            "sources": inputs["sources"]  
        })
    )

    return lcel_qa_chain
 


evaluations_lcel=[]
import torch

def retrieval_LCEL_loop(evaluations_lcel=evaluations_lcel):
    for search_type, db_type in gridSearch:
        print(f"Search type: {search_type}")
        print(f"DB type: {db_type}")
        for q in questions:   
            enhanced_question= question_rewriter.invoke({"question": q["question"]})
            questions_variations=[
                {"question": q["question"], "enhanced_question": False},
                {"question": enhanced_question, "enhanced_question": True},
            ]

            for q_v in questions_variations:
                question=q_v["question"]
                retrievalQA_chain=update_lcel_chain(search_type, db_type)
                rag_qa = retrievalQA_chain.invoke(question)
            
                

                eval_score=score_reference_vs_rag_with_gpt(q["question"], q["reference_answer"], rag_qa["answer"])

                pred_label, raw_prob, _, _ = specialized_evaluator.score(docs=[q['reference_answer']], claims=[rag_qa["answer"]])        
                # torch.cuda.empty_cache()
        

                print(f"""Question: {question}""")
                print(f"""LLM Eval_score: {eval_score}""")
                # print(f"""MiniCheck score: {raw_prob}""")

                evaluations_lcel.append(
                    {
                        "question_#":q["q_#"], 
                        "question": question,  
                        "reference_answer": q["reference_answer"],
                        "rag_answer": rag_qa["answer"],
                        "rag_sources": rag_qa["sources"],
                        "eval_score_llm": eval_score,
                        "eval_score_minicheck": raw_prob,
                        "enhanced_question": q_v["enhanced_question"],
                        "search_type": search_type,
                        "db_type": db_type
                    }
                )
                print("\n-------")
                
            print("\n--------------------")

retrieval_LCEL_loop()

Search type: similarity_score_threshold
DB type: chroma


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:03<00:00,  3.25s/it]


Question: Description of the project
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:03<00:00,  3.19s/it]


Question: What is the description of the project?
LLM Eval_score: 1

-------

--------------------


  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.7
  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.50it/s]


Question: Country and city
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.63it/s]


Question: What is the specific country and city where the project is located?
LLM Eval_score: 3

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.14s/it]


Question: Target beneficiaries
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.03it/s]


Question: Who are the target beneficiaries of the project?
LLM Eval_score: 1

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.87it/s]


Question: Number of people concerned
LLM Eval_score: 2

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.83it/s]


Question: How many people are impacted by the project?
LLM Eval_score: 2

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:32<00:00, 32.04s/it]


Question: Context, environment, project rationale and challenges
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:29<00:00, 29.85s/it]


Question: What are the context, environment, rationale, and challenges of the project?
LLM Eval_score: 1

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.74it/s]


Question: Project start date / end date
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.86it/s]


Question: What are the start and end dates of the project?
LLM Eval_score: 3

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.35it/s]


Question: Project budget 
            Total amount of the project (in Euros) 
            
            Amount of donation requested from the Foundation (in Euros) 
            
            Detailed provisional project budget 
            
            Detailed project budget for current year
        
LLM Eval_score: 3

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.82it/s]


Question: What is the total project budget in Euros, including the amount requested from the Foundation, the detailed provisional budget, and the budget for the current year?
LLM Eval_score: 3

-------

--------------------
Search type: similarity_score_threshold
DB type: faiss


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:03<00:00,  3.16s/it]
No relevant docs were retrieved using the relevance score threshold 0.7


Question: Description of the project
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:03<00:00,  3.30s/it]


Question: What is the description of the project?
LLM Eval_score: 1

-------

--------------------


  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.7
  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.29it/s]
No relevant docs were retrieved using the relevance score threshold 0.7


Question: Country and city
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.30it/s]


Question: What is the specific country and city where the project is located?
LLM Eval_score: 2

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.12s/it]


Question: Target beneficiaries
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.02s/it]


Question: Who are the target beneficiaries of the project?
LLM Eval_score: 1

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.63it/s]


Question: Number of people concerned
LLM Eval_score: 2

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.84it/s]


Question: How many people are impacted by the project?
LLM Eval_score: 2

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:35<00:00, 35.78s/it]


Question: Context, environment, project rationale and challenges
LLM Eval_score: 1

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:35<00:00, 35.14s/it]


Question: What are the context, environment, rationale, and challenges of the project?
LLM Eval_score: 1

-------

--------------------


  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.7
  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.75it/s]


Question: Project start date / end date
LLM Eval_score: 2

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.62it/s]


Question: What are the start and end dates of the project?
LLM Eval_score: 3

-------

--------------------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.38it/s]


Question: Project budget 
            Total amount of the project (in Euros) 
            
            Amount of donation requested from the Foundation (in Euros) 
            
            Detailed provisional project budget 
            
            Detailed project budget for current year
        
LLM Eval_score: 3

-------


No relevant docs were retrieved using the relevance score threshold 0.7
No relevant docs were retrieved using the relevance score threshold 0.7
Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.34it/s]


Question: What is the total project budget in Euros, including the amount requested from the Foundation, the detailed provisional budget, and the budget for the current year?
LLM Eval_score: 3

-------

--------------------
Search type: mmr
DB type: chroma


Evaluating: 100%|██████████| 1/1 [00:08<00:00,  8.41s/it]


Question: Description of the project
LLM Eval_score: 6

-------


Evaluating: 100%|██████████| 1/1 [00:06<00:00,  6.27s/it]


Question: What is the description of the project?
LLM Eval_score: 6

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.07it/s]


Question: Country and city
LLM Eval_score: 5

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.05it/s]


Question: What is the specific country and city where the project is located?
LLM Eval_score: 8

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.74s/it]


Question: Target beneficiaries
LLM Eval_score: 7

-------


Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.61s/it]


Question: Who are the target beneficiaries of the project?
LLM Eval_score: 8

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.52s/it]


Question: Number of people concerned
LLM Eval_score: 7

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.09it/s]


Question: How many people are impacted by the project?
LLM Eval_score: 8

-------

--------------------


Evaluating: 100%|██████████| 1/1 [01:00<00:00, 60.66s/it]


Question: Context, environment, project rationale and challenges
LLM Eval_score: 6

-------


Evaluating: 100%|██████████| 1/1 [01:02<00:00, 62.33s/it]


Question: What are the context, environment, rationale, and challenges of the project?
LLM Eval_score: 6

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.33it/s]


Question: Project start date / end date
LLM Eval_score: 10

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.41it/s]


Question: What are the start and end dates of the project?
LLM Eval_score: 10

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.73it/s]


Question: Project budget 
            Total amount of the project (in Euros) 
            
            Amount of donation requested from the Foundation (in Euros) 
            
            Detailed provisional project budget 
            
            Detailed project budget for current year
        
LLM Eval_score: 8

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.31it/s]


Question: What is the total project budget in Euros, including the requested donation amount from the Foundation, the detailed provisional budget, and the budget for the current year?
LLM Eval_score: 8

-------

--------------------
Search type: mmr
DB type: faiss


Evaluating: 100%|██████████| 1/1 [00:07<00:00,  7.97s/it]


Question: Description of the project
LLM Eval_score: 6

-------


Evaluating: 100%|██████████| 1/1 [00:07<00:00,  7.08s/it]


Question: What is the description of the project?
LLM Eval_score: 7

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.07it/s]


Question: Country and city
LLM Eval_score: 7

-------


Evaluating: 100%|██████████| 1/1 [00:01<00:00,  1.12s/it]


Question: What is the specific country and city where the project is located?
LLM Eval_score: 8

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:02<00:00,  2.10s/it]


Question: Target beneficiaries
LLM Eval_score: 7

-------


Evaluating: 100%|██████████| 1/1 [00:02<00:00,  2.01s/it]


Question: Who are the target beneficiaries of the project?
LLM Eval_score: 9

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.18it/s]


Question: Number of people concerned
LLM Eval_score: 9

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.13it/s]


Question: How many people are impacted by the project?
LLM Eval_score: 8

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:57<00:00, 57.95s/it]


Question: Context, environment, project rationale and challenges
LLM Eval_score: 7

-------


Evaluating: 100%|██████████| 1/1 [00:56<00:00, 56.97s/it]


Question: What is the context, environment, rationale, and the challenges associated with the project?
LLM Eval_score: 7

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.33it/s]


Question: Project start date / end date
LLM Eval_score: 10

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  2.64it/s]


Question: What are the start and end dates of the project?
LLM Eval_score: 10

-------

--------------------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.00it/s]


Question: Project budget 
            Total amount of the project (in Euros) 
            
            Amount of donation requested from the Foundation (in Euros) 
            
            Detailed provisional project budget 
            
            Detailed project budget for current year
        
LLM Eval_score: 7

-------


Evaluating: 100%|██████████| 1/1 [00:00<00:00,  1.39it/s]

Question: What is the total project budget in Euros, including the amount requested from the Foundation, the detailed provisional budget, and the budget for the current year?
LLM Eval_score: 8

-------

--------------------





Analyse des résultats
1. Retrieval QA

In [39]:
import pandas as pd
if len(evaluations_retrieval_qa)>0:
        df_evaluations_retrievalQA=pd.DataFrame(evaluations_retrieval_qa)
        print("Table 'similarity_score_threshold' par prompt simple:")
        display(df_evaluations_retrievalQA[(df_evaluations_retrievalQA["enhanced_question"]==False)&
                (df_evaluations_retrievalQA["search_type"]=="similarity_score_threshold")] )

        print("\n-----------------\n")
        print("Table 'similarity_score_threshold' par prompt amélioré:")
        display(df_evaluations_retrievalQA[(df_evaluations_retrievalQA["enhanced_question"]==True)&
                (df_evaluations_retrievalQA["search_type"]=="similarity_score_threshold")] )

        print("Table 'MMR' par prompt simple:")
        display( df_evaluations_retrievalQA[( df_evaluations_retrievalQA["enhanced_question"]==False)&
                ( df_evaluations_retrievalQA["search_type"]=="mmr")] )

        print("\n-----------------\n")
        print("Table 'MMR' par prompt amélioré:")
        display( df_evaluations_retrievalQA[( df_evaluations_retrievalQA["enhanced_question"]==True)&
                ( df_evaluations_retrievalQA["search_type"]=="mmr")] )


        print("\n-----------------\n")
        print("\n-----------------\n")

        print("Score moyen méthode 'similarity_score_threshold' par prompt simple:")
        display( df_evaluations_retrievalQA[( df_evaluations_retrievalQA["enhanced_question"]==False)&
                                ( df_evaluations_retrievalQA["search_type"]=="similarity_score_threshold")]["eval_score_llm"].mean())

        print("Score moyen méthode 'similarity_score_threshold' par prompt reformulé:")
        display( df_evaluations_retrievalQA[( df_evaluations_retrievalQA["enhanced_question"]==True)&
                                ( df_evaluations_retrievalQA["search_type"]=="similarity_score_threshold")]["eval_score_llm"].mean())

        print("\n---------\n")
        print("Score moyen méthode 'MMR' par prompt simple:")
        display( df_evaluations_retrievalQA[( df_evaluations_retrievalQA["enhanced_question"]==False)&
                                ( df_evaluations_retrievalQA["search_type"]=="mmr")]["eval_score_llm"].mean())

        print("Score moyen méthode 'MMR' par prompt reformulé:")
        display( df_evaluations_retrievalQA[( df_evaluations_retrievalQA["enhanced_question"]==True)&
                                ( df_evaluations_retrievalQA["search_type"]=="mmr")]["eval_score_llm"].mean())



Résultats LCEL

In [52]:
import pandas as pd

enhanced_question=[False, True]
params=list(product(search_types, db_types, enhanced_question))

df_evaluations_lecel=pd.DataFrame(evaluations_lcel)
best_params=[]

for search_type, db_type, enhanced_question in params:
    print(f"Search_type: {search_type},\n DB type: {db_type},\n enhanced prompt: {enhanced_question}")
    
    _df=df_evaluations_lecel[(df_evaluations_lecel["search_type"]==search_type)&
        (df_evaluations_lecel["db_type"]==db_type)&
        (df_evaluations_lecel["enhanced_question"]==enhanced_question)]
    
    eval_score_llm=_df["eval_score_llm"].mean()
    print(f"Score moyen llm eval: {eval_score_llm}")
    if eval_score_llm>=5:
        display(_df)

    best_params.append((search_type, db_type, enhanced_question, eval_score_llm))
    print('\n-----------\n')



Search_type: similarity_score_threshold,
 DB type: chroma,
 enhanced prompt: False
Score moyen llm eval: 1.4285714285714286

-----------

Search_type: similarity_score_threshold,
 DB type: chroma,
 enhanced prompt: True
Score moyen llm eval: 2.0

-----------

Search_type: similarity_score_threshold,
 DB type: faiss,
 enhanced prompt: False
Score moyen llm eval: 1.5714285714285714

-----------

Search_type: similarity_score_threshold,
 DB type: faiss,
 enhanced prompt: True
Score moyen llm eval: 1.8571428571428572

-----------

Search_type: mmr,
 DB type: chroma,
 enhanced prompt: False
Score moyen llm eval: 7.0


Unnamed: 0,question_#,question,reference_answer,rag_answer,rag_sources,eval_score_llm,eval_score_minicheck,enhanced_question,search_type,db_type
28,1,Description of the project,\n Brief project description\n ...,The project is focused on sustainable developm...,[page_content='\nMain Sustainable Development ...,6,[0.063147634267807],False,mmr,chroma
30,2,Country and city,\n The location of the project is i...,"The country is Indonesia, and the city is Nusa...","[page_content='' metadata={'page': 6, 'source'...",5,[0.0427585132420063],False,mmr,chroma
32,3,Target beneficiaries,\n Number of direct beneficiaries o...,The target beneficiaries of the pilot project ...,[page_content='\nFeature 3. Theory of Change\n...,7,[0.11114823073148727],False,mmr,chroma
34,4,Number of people concerned,\n Number of direct beneficiaries o...,"The number of people concerned, based on the p...",[page_content='\nList of \nparticipants\nMoU D...,7,[0.8117417097091675],False,mmr,chroma
36,5,"Context, environment, project rationale and ch...",\n Context & environment and develo...,The context of the project is set against the ...,[page_content='\nAnother issue that has also h...,6,[0.03525557741522789],False,mmr,chroma
38,6,Project start date / end date,March 2023 – February 2026,"The project start date is March 2023, and the ...",[page_content='\n Identification and mapping ...,10,[0.25516605377197266],False,mmr,chroma
40,7,Project budget \n Total amount of t...,The total required resources is 818 341 € for ...,"The total amount of the project is €818,341. T...",[page_content='\nMain Sustainable Development ...,8,[0.31312617659568787],False,mmr,chroma



-----------

Search_type: mmr,
 DB type: chroma,
 enhanced prompt: True
Score moyen llm eval: 7.714285714285714


Unnamed: 0,question_#,question,reference_answer,rag_answer,rag_sources,eval_score_llm,eval_score_minicheck,enhanced_question,search_type,db_type
29,1,What is the description of the project?,\n Brief project description\n ...,The project is focused on raising awareness an...,[page_content='\nMain Sustainable Development ...,6,[0.7461361289024353],True,mmr,chroma
31,2,What is the specific country and city where th...,\n The location of the project is i...,"The project is located in Indonesia, specifica...",[page_content='\nMain Sustainable Development ...,8,[0.927157461643219],True,mmr,chroma
33,3,Who are the target beneficiaries of the project?,\n Number of direct beneficiaries o...,The target beneficiaries of the project includ...,[page_content='\nFeature 3. Theory of Change\n...,8,[0.8615285754203796],True,mmr,chroma
35,4,How many people are impacted by the project?,\n Number of direct beneficiaries o...,"The project impacts a total of 6,245 people. T...",[page_content='\nFeature 3. Theory of Change\n...,8,[0.13911452889442444],True,mmr,chroma
37,5,"What are the context, environment, rationale, ...",\n Context & environment and develo...,The context of the project is significantly in...,[page_content='\nAnother issue that has also h...,6,[0.969774067401886],True,mmr,chroma
39,6,What are the start and end dates of the project?,March 2023 – February 2026,The project starts in March 2023 and ends in F...,[page_content='\nMain Sustainable Development ...,10,[0.7474793791770935],True,mmr,chroma
41,7,"What is the total project budget in Euros, inc...",The total required resources is 818 341 € for ...,The total project budget for the period from 2...,[page_content='\nMain Sustainable Development ...,8,[0.345091313123703],True,mmr,chroma



-----------

Search_type: mmr,
 DB type: faiss,
 enhanced prompt: False
Score moyen llm eval: 7.571428571428571


Unnamed: 0,question_#,question,reference_answer,rag_answer,rag_sources,eval_score_llm,eval_score_minicheck,enhanced_question,search_type,db_type
42,1,Description of the project,\n Brief project description\n ...,The project is focused on sustainable developm...,[page_content='\nMain Sustainable Development ...,6,[0.027476046234369278],False,mmr,faiss
44,2,Country and city,\n The location of the project is i...,"The country is Indonesia, and the city related...",[page_content='' metadata={'source': './data/P...,7,[0.03515425696969032],False,mmr,faiss
46,3,Target beneficiaries,\n Number of direct beneficiaries o...,The target beneficiaries of the pilot project ...,[page_content='\nFeature 3. Theory of Change\n...,7,[0.1638336330652237],False,mmr,faiss
48,4,Number of people concerned,\n Number of direct beneficiaries o...,The number of direct beneficiaries of the pilo...,[page_content='\nList of \nparticipants\nMoU D...,9,[0.1603330671787262],False,mmr,faiss
50,5,"Context, environment, project rationale and ch...",\n Context & environment and develo...,The context outlines a project aimed at restor...,[page_content='\nAnother issue that has also h...,7,[0.9523455500602722],False,mmr,faiss
52,6,Project start date / end date,March 2023 – February 2026,"The project start date is March 2023, and the ...",[page_content='\n Identification and mapping ...,10,[0.25516605377197266],False,mmr,faiss
54,7,Project budget \n Total amount of t...,The total required resources is 818 341 € for ...,- **Total amount of the project (in Euros):** ...,[page_content='\nMain Sustainable Development ...,7,[0.4989793300628662],False,mmr,faiss



-----------

Search_type: mmr,
 DB type: faiss,
 enhanced prompt: True
Score moyen llm eval: 8.142857142857142


Unnamed: 0,question_#,question,reference_answer,rag_answer,rag_sources,eval_score_llm,eval_score_minicheck,enhanced_question,search_type,db_type
43,1,What is the description of the project?,\n Brief project description\n ...,The project is focused on sustainable developm...,[page_content='\nMain Sustainable Development ...,7,[0.020879501476883888],True,mmr,faiss
45,2,What is the specific country and city where th...,\n The location of the project is i...,"The project is located in Indonesia, specifica...",[page_content='\nMain Sustainable Development ...,8,[0.927157461643219],True,mmr,faiss
47,3,Who are the target beneficiaries of the project?,\n Number of direct beneficiaries o...,The target beneficiaries of the project includ...,[page_content='\nFeature 3. Theory of Change\n...,9,[0.5990713238716125],True,mmr,faiss
49,4,How many people are impacted by the project?,\n Number of direct beneficiaries o...,"The project directly benefits 3,245 people, wi...",[page_content='\nFeature 3. Theory of Change\n...,8,[0.15475529432296753],True,mmr,faiss
51,5,"What is the context, environment, rationale, a...",\n Context & environment and develo...,The project is set in the context of East Kali...,[page_content='\nAnother issue that has also h...,7,[0.4610614478588104],True,mmr,faiss
53,6,What are the start and end dates of the project?,March 2023 – February 2026,The project starts in March 2023 and ends in F...,[page_content='\nMain Sustainable Development ...,10,[0.7474793791770935],True,mmr,faiss
55,7,"What is the total project budget in Euros, inc...",The total required resources is 818 341 € for ...,"The total project budget is €818,341 for the d...",[page_content='\nMain Sustainable Development ...,8,[0.14075498282909393],True,mmr,faiss



-----------



In [67]:
_best_params= pd.DataFrame(best_params).sort_values(by=3, ascending=False).head(1).values[0]
display(_best_params)
df_evaluations_lecel[
    (df_evaluations_lecel["search_type"]==_best_params[0])&
    (df_evaluations_lecel["db_type"]==_best_params[1])&
    (df_evaluations_lecel["enhanced_question"]==_best_params[2])
].to_csv("./rag-mahakam-best-params.csv", index=False)


array(['mmr', 'faiss', True, 8.142857142857142], dtype=object)

In [70]:
pd.pivot_table(
    df_evaluations_lecel[df_evaluations_lecel["search_type"]=="mmr"],
    values="eval_score_llm",
    index="db_type",
    columns="enhanced_question",
    aggfunc="mean" 
).round(2)

enhanced_question,False,True
db_type,Unnamed: 1_level_1,Unnamed: 2_level_1
chroma,7.0,7.71
faiss,7.57,8.14


In [74]:
import textwrap
num_question=7
search_type="mmr"
db_type="faiss"

row=df_evaluations_lecel[(df_evaluations_lecel["enhanced_question"]==True)&
                            (df_evaluations_lecel["search_type"]==search_type)&
                            (df_evaluations_lecel["db_type"]==db_type)&
                            (df_evaluations_lecel["question_#"]==num_question)]



fragments=row["rag_sources"]

print("Question:\n", row["question"].values[0])

print("\n------\nreference_answer:\n", row["reference_answer"].values[0])

print((f"\n---------\nrag_answer:\n{row['rag_answer'].values[0]}"))

print((f"\n---------\nRetrieved docs:\n{fragments.values[0]}"))


Question:
 What is the total project budget in Euros, including the amount requested from the Foundation, the detailed provisional budget, and the budget for the current year?

------
reference_answer:
 The total required resources is 818 341 € for the period 2023-2026

---------
rag_answer:
The total project budget is €818,341 for the duration of the project from March 2023 to February 2026. The context does not specify the amount requested from the Foundation or provide a detailed provisional budget or the budget for the current year (2023). Therefore, I cannot provide that information.

---------
Retrieved docs:


#### Conclusions:
> Type de chaîne RAG à privilégier:
* Une chaîne basée sur LCEL est à retenir, la chaîne basée sur le `retrievalQA` étant à banir totalement car bien trop coûteux en temps/tokens, et offrant des moins bonnes réponses 
> DB à privilégier:
*  La DB FAISS démontre un score légèrement supérieur à Chroma (> 0.5 points), et un temps de retrait 4 fois plus rapide
> Type de recherche appliquée au retriever à privilégier:
* La méthode `MMR`de part son equilibre diversité/pertinence des documents retournés offre de bons résultats, les méthodes plus simples uniquement basées sur le cosine similarity donnent de très mauvais résultats 
> Ré écriture automatique des questions:
* Cette ré-écriture est bénéfique (augmentation du score moyen de 0.7), mais elle repose ici sur des exemples explicites tirées de l'AAP ici traitée (dépôt sur site). Il faut tester si ces exemples sont toujours opérants pour d'autres APP, s'il est nécessaire de reprendre les exemples du document "Guide pratique pour les nouveaux arrivants" 
> Evaluation des résultats:
* Déleguer l'évaluation à un LLM avancé (gpt 4o ici) permet un bon niveau de confiance sur la pertinence des réponses/textes de référence, et capture bien mieux les nuances qui font qu'une réponse est factuellement juste (VS texte de référence), à l'inverse d'un modèle spécialisé comme `MiniCheck flan T5`, qui a échoue lorsque la strcucture et le naratif de la réponse différe trop du texte de référence, alors que les faits rapportés (lieux, chiffres) sont exactement les mêmes (voir questions/réponses 4 et 7)

#### Prochaines étapes:
* Expérimenter avec sparse retriever (basé sur la fréquence des mots) et lexical retriever (basé sur l'indexation des mots)
* Expérimenter avec un reranker