## This is the Test Notebook to test the API

Before you use the notebook please ensure you have

1. Followed the instructions on the README
2. The Postgres Database is up and running - if you restarted your laptop, device or environment you can use this
```bash
sudo service postgresql start
```
If you are using Windows, just start the Postgres Software

To confirm that it works you can use this

```bash
psql -U your_username -d your_database_name
```
You the put your password and ensure its at least connected


3. The flask app has been started. You can use the following command to get started

```bash
python flask_app.py

In [1]:
import requests
import json

# Set the base URL for the Flask API
BASE_URL = "http://localhost:8080"  # Adjust as necessary

def test_upload_file(file_path):
    """Test the file upload endpoint."""
    with open(file_path, 'rb') as file:
        response = requests.post(f"{BASE_URL}/upload", files={"file": file})

    if response.status_code == 200:
        print("Upload successful:", response.json())
    else:
        print("Upload failed:", response.status_code, response.json())
    
    return response

def test_process_query(query):
    """Test the process query endpoint."""
    payload = {"query": query}
    response = requests.post(f"{BASE_URL}/query", json=payload)

    if response.status_code == 200:
        print("Query processing successful:", response.json())
    else:
        print("Query processing failed:", response.status_code, response.json())
    
    return response

def test_evaluate_user_answer(test_question_id, user_answer):
    """Test the evaluate user answer endpoint."""
    payload = {
        "answer": user_answer,
        "test_question_id": test_question_id
    }
    response = requests.post(f"{BASE_URL}/evaluate", json=payload)

    if response.status_code == 200:
        print("Evaluation successful:", response.json())
    else:
        print("Evaluation failed:", response.status_code, response.json())
    
    return response


In [2]:
sample_file_path = "/teamspace/studios/this_studio/QATSystem/sample_file/2005.11401v4.pdf"

In [3]:
test_upload_file(sample_file_path)

Upload failed: 202 {'message': 'File uploaded successfully. Knowledge base creation started.'}


<Response [202]>

In [4]:
response = test_process_query("What are the disadvantages of RAG?")

Query processing successful: {'answer': 'Based on the information provided in the documents, one potential disadvantage of RAG is that it may require many forward passes for longer output sequences during the decoding process, which can be less efficient. Additionally, retrieving more documents leads to higher Rouge-L for RAG-Token at the expense of Bleu-1, meaning that while the model may match more words from the reference answer, it may not generate the exact correct answer as frequently.\n\nAnother potential downside of RAG is that, like any language model that relies on external knowledge sources, it is only as factual and unbiased as the knowledge source it is trained on, in this case, Wikipedia. This means that there is a risk of generating factually incorrect or biased information.\n\nAdditionally, RAG could potentially be used to generate abuse, fake or misleading content, similar to concerns with other language models such as GPT-2. However, the document suggests that these c

In [5]:
response.json()

{'answer': 'Based on the information provided in the documents, one potential disadvantage of RAG is that it may require many forward passes for longer output sequences during the decoding process, which can be less efficient. Additionally, retrieving more documents leads to higher Rouge-L for RAG-Token at the expense of Bleu-1, meaning that while the model may match more words from the reference answer, it may not generate the exact correct answer as frequently.\n\nAnother potential downside of RAG is that, like any language model that relies on external knowledge sources, it is only as factual and unbiased as the knowledge source it is trained on, in this case, Wikipedia. This means that there is a risk of generating factually incorrect or biased information.\n\nAdditionally, RAG could potentially be used to generate abuse, fake or misleading content, similar to concerns with other language models such as GPT-2. However, the document suggests that these concerns may be valid but to a

In [6]:
response.json()['bullet_points']

["Bullet Point 1: RAG's decoding process, specifically for RAG-Sequence, may require many forward passes for longer output sequences, which can lead to reduced efficiency and increased computational requirements. This is because the likelihood of an output sequence is calculated by running a beam search for each document, scoring each hypothesis, and then estimating the probability of a hypothesis by running additional forward passes for documents where the hypothesis did not appear in the beam.",
 'Bullet Point 2: Retrieving more documents can lead to higher Rouge-L scores for RAG-Token, indicating that the model matches more words from the reference answer. However, this comes at the expense of Bleu-1 scores, which means that the exact correct answer is not generated as frequently. This trade-off between Rouge-L and Bleu-1 scores suggests that while RAG can generate responses that are more similar to the reference answer, it may not always generate the exact correct answer.',
 'Bulle

In [15]:
from pprint import pprint

In [16]:
pprint(response.json()['answer'])

('Based on the information provided in the documents, one potential '
 'disadvantage of RAG is that it may require many forward passes for longer '
 'output sequences during the decoding process, which can be less efficient. '
 'Additionally, retrieving more documents leads to higher Rouge-L for '
 'RAG-Token at the expense of Bleu-1, meaning that while the model may match '
 'more words from the reference answer, it may not generate the exact correct '
 'answer as frequently.\n'
 '\n'
 'Another potential downside of RAG is that, like any language model that '
 'relies on external knowledge sources, it is only as factual and unbiased as '
 'the knowledge source it is trained on, in this case, Wikipedia. This means '
 'that there is a risk of generating factually incorrect or biased '
 'information.\n'
 '\n'
 'Additionally, RAG could potentially be used to generate abuse, fake or '
 'misleading content, similar to concerns with other language models such as '
 'GPT-2. However, the docum

In [8]:
response.json()['test_question']

'How might the efficiency of RAG be impacted during the decoding process, and what trade-off could arise when retrieving more documents in RAG-Token?'

In [7]:
test_question_id = response.json()['test_question_id']


To see that the evaluation works you can use two examples

A wrong answer and a correct answer.

In [9]:
wrong_generated_answer = """I don't know the answer"""

In [10]:
test_evaluate_user_answer(test_question_id,
wrong_generated_answer)

Evaluation successful: {'knowledge_confidence': 5, 'knowledge_understood': False}


<Response [200]>

In [11]:
correct_generated_answer = """The quality of retrieved documents significantly impacts the performance of RAG (Retrieval Augmented Generation) models. If the retrieved documents are irrelevant, inaccurate, or incomplete, the model will likely generate incorrect or misleading responses. Therefore, it's crucial to ensure the quality of the document corpus used for RAG.

Here are some potential solutions to mitigate the dependence on document quality:

Document Filtering and Curation: Implement mechanisms to filter and curate the document corpus, removing irrelevant or low-quality documents. This can involve using keyword matching, topic modeling, or other techniques to identify relevant documents.

Document Ranking: Rank retrieved documents based on their relevance to the query. This can be achieved using techniques like cosine similarity, TF-IDF, or more advanced ranking algorithms. By prioritizing relevant documents, the model can generate more accurate responses.

Multiple Document Retrieval: Retrieve multiple documents related to the query and combine their information to generate a more comprehensive response. This can help mitigate the impact of individual document quality issues.

Contextual Awareness: Enhance the model's ability to understand the context of the query and the retrieved documents. This can involve using techniques like semantic analysis or knowledge graphs to identify relationships between concepts and information.

Feedback Mechanisms: Incorporate feedback mechanisms to allow users to provide feedback on the generated responses. This feedback can be used to improve the model's performance over time by identifying and addressing issues related to document quality and response accuracy.

By addressing these factors, RAG models can become more robust and less reliant on the quality of individual documents, leading to improved performance and more accurate responses.."""

In [12]:
test_evaluate_user_answer(test_question_id,
correct_generated_answer)

Evaluation successful: {'knowledge_confidence': 68, 'knowledge_understood': True}


<Response [200]>

In [18]:
test_evaluate_user_answer(test_question_id,
wrong_generated_answer)

Evaluation successful: {'knowledge_confidence': 11, 'knowledge_understood': False}


<Response [200]>