# Documenation for backend/citations.py

In [None]:
import os
from roman import toRoman

Explanation:
1. `import os`

Imports the os module, which provides functions to interact with the operating system.
It allows the script to perform tasks like reading or modifying environment variables, handling file paths, or executing system commands.

2. `from roman import toRoman`

Imports the toRoman function from the roman module.
The toRoman function converts an integer (e.g., 10) to its Roman numeral representation (e.g., X). It is used for converting numbers into Roman numerals within the script.

- This setup suggests the code will interact with the system and handle numeric conversions, likely using Roman numerals for certain outputs or data processing.

In [None]:
def get_answer_with_source(
	response, 
	document_path="data/default/textbook//Roger S. Pressman_ Bruce R. Maxim - Software Engineering_ A Practitioner's Approach-McGraw-Hill Education (2019).pdf"
):
  """
  Extract the answer and relevant source information from the response.
  This function processes the response from the RAG chain, extracting the answer
  and up to 3 source references (page numbers) from the context documents.

  Args:
    response (dict): The response dictionary from the RAG chain, containing 'answer' and 'context' keys.
  Returns:
    str: A formatted string containing the answer followed by source information.
  """
  answer = response.get('answer', 'No answer found.') # Extract the answer
  sources = [] # Handle multiple contexts in the response (assuming response['context'] is a list)

1. `get_answer_with_source`
Extracts the answer and relevant source information from a RAG (Retrieval-Augmented Generation) response, returning a formatted string with the answer and citations from the source document.

2. `response (dict):`
A dictionary containing:
'answer' (str): The answer extracted from the response.
'context' (list): A list of source references from which the answer is derived (typically containing page numbers or other identifiers).

3. `document_path (str):`
A string representing the file path to the source document. This can be modified to point to a different document if necessary.
Returns


4. `str: A formatted string that includes:`
The extracted answer.
Up to three relevant source citations, formatted as page numbers. If no answer is found, it returns a default message.

_Process Overview_:
1. Retrieve the Answer:
Use response.get('answer', 'No answer found.') to extract the answer. If the answer key is missing, a default message 'No answer found.' is assigned.

2. Initialize Sources List:
Create an empty list sources to hold the relevant source references.


3. Extract Source Information:
Iterate through the `response['context']` (assuming it is a list) to extract source information. This should typically be the page numbers or identifiers of the sources from which the answer was derived.
Limit the number of sources collected to a maximum of three.


4. Format the Output:
Construct a string that includes: The answer.
A formatted citation section that lists the collected source references, if any.


5. Return the Formatted String:
Return the complete formatted string, which combines the answer and the source information.

In [None]:
 # Iterate over the list of context documents and collect up to top 5 sources
  for doc in response['context'][:5]:
    file_name = os.path.basename(document_path)
    page = doc.metadata.get('page', 'Unknown page')

    # The documents are zero indexed. So page 1 of the pdf is page 0 in docs
    # Chapter 1 starts at doc 33 (page 34 of the PDF)
    adjusted_page = page - 33 
    if adjusted_page >= 0: # For pages starting from Chapter 1
      link = f'<a href="/team3/?view=pdf&file={file_name}&page={page + 1}" target="_blank">[{adjusted_page + 1}]</a>'
    else: # For pages before Chapter 1
      adjusted_page = "Cover" if page == 0 else toRoman(page)
      link = f'<a href="/team3/?view=pdf&file={file_name}&page={page + 1}" target="_blank">[{adjusted_page}]</a>'
    sources.append(link)

  # Join the top 5 sources with newlines
  sources_info = "\nSources: " + "".join(sources)
  final_answer = f"{answer}\n\n{sources_info}"
  return final_answer

Explanation:

1. Iterate over the List of Context Documents:

`for doc in response['context'][:5]:`
- This line starts a loop that goes through the first five documents in the `response['context']list`.
`response['context']` presumably contains documents relevant to a query, and this code focuses only on the top five for brevity.

2. Extracting File Name:

`file_name = os.path.basename(document_path)`

- ```document_path``` is a variable that contains the path to the document.
- ```os.path.basename(document_path)``` extracts just the filename from the complete path, discarding the directory information.

3. Get Page Number:

`page = doc.metadata.get('page', 'Unknown page')`

- This retrieves the page number of the current document (doc) from its metadata.
- If the page number is not found, it defaults to 'Unknown page'.

4. Adjust Page Number for Zero Indexing:

`adjusted_page = page - 33`

- The code adjusts the page number by subtracting 33. This adjustment appears to be based on a specific structure of the document, where Chapter 1 starts at what is considered page 34 in a PDF (index 33 in zero-based indexing).

5. Conditional Logic for Page Link Creation:

` if adjusted_page >= 0:  # For pages starting from Chapter 1
    link = f'<a href="/team3/?view=pdf&file={file_name}&page={page + 1}" target="_blank">[{adjusted_page + 1}]</a>'
  else:  # For pages before Chapter 1
    adjusted_page = "Cover" if page == 0 else toRoman(page)
    link = f'<a href="/team3/?view=pdf&file={file_name}&page={page + 1}" target="_blank">[{adjusted_page}]</a>'` 
    
- If Condition: Checks if the adjusted_page is non-negative (meaning it's a page in Chapter 1 or later).
    - If true, it constructs a link to that page, using page + 1 for human-readable page numbering (as people typically start counting from 1).
    - The link is formatted as an HTML anchor tag.

- Else Condition: If adjusted_page is negative (for pages before Chapter 1):
    - It checks if the original page is 0 (indicating the cover page) and assigns the string "Cover". If it's any other negative page, it converts the page number to a Roman numeral using the function toRoman(page).
    - A similar link is constructed for pages before Chapter 1.

6. Collecting the Links:

`sources.append(link)` 

- Each constructed link is appended to the sources list, which presumably collects all the links for later display.

7. Join the Links and Prepare Final Answer:

` sources_info = "\nSources: " + "".join(sources)
final_answer = f"{answer}\n\n{sources_info}"
return final_answer`

- The links in the sources list are joined into a single string, prefixed with "Sources: ".
- The final answer combines the original answer with the formatted sources, separating them with newlines.
The complete string (final_answer) is returned.

_Process Overview_:
This code iterates through a set of context documents, adjusts their page numbers for display purposes, constructs links to specific pages in a PDF file, and formats these links as HTML. Finally, it combines this information into a single output string, ready to be presented to the user.