# Libraries and preparation

refs:
- https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb
- https://github.com/langchain-ai/langgraph/blob/main/examples/multi_agent/agent_supervisor.ipynb?ref=blog.langchain.dev
- https://github.com/langchain-ai/langgraph/blob/main/examples/multi_agent/hierarchical_agent_teams.ipynb?ref=blog.langchain.dev

MAP:REDUCE: https://langchain-ai.github.io/langgraph/how-tos/map-reduce/

In [None]:
import subprocess
import threading

#istallazione di ollama
!curl -fsSL https://ollama.com/install.sh | sh

In [None]:
def start_ollama():
    t = threading.Thread(target=lambda: subprocess.run(["ollama", "serve"]),daemon=True)
    t.start()

In [None]:
def pull_model(local_llm):
    !ollama pull local_llm

In [None]:
def start_model(local_llm):        
    t2 = threading.Thread(target=lambda: subprocess.run(["ollama", "run", local_llm]),daemon=True)
    t2.start()

In [None]:
%%capture --no-stderr
%pip install -U langchain-ai21 langchain-pinecone langchain-nomic langchain_community tiktoken langchainhub chromadb langchain langgraph tavily-python nomic[local] langchain-text-splitters

In [None]:
# Tracing and api-keys
import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["TAVILY_API_KEY"] = "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"
os.environ["LANGCHAIN_API_KEY"] = "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"
os.environ["PINECONE_API_KEY"] = "94ef7896-1fae-44d3-b8d2-0bd6f5f664f5"
os.environ["AI21_API_KEY"] = "KlINkh5QKw3hG1b5Hr75YDO7TwGoQvzn"

Bias detection model:

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
import torch

device = 0 if torch.cuda.is_available() else -1

bias_model_tokenizer = AutoTokenizer.from_pretrained("d4data/bias-detection-model")
bias_model = AutoModelForSequenceClassification.from_pretrained("d4data/bias-detection-model",from_tf=True)

- https://shap.readthedocs.io/en/latest/example_notebooks/text_examples/text_entailment/Textual%20Entailment%20Explanation%20Demo.html
- https://huggingface.co/facebook/bart-large-mnli

Entailment model (BART):

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

device = 0 if torch.cuda.is_available() else -1

bart_model = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli",device=device)
bart_tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")

In [None]:
def BART_prediction(premise,hypothesis):
    print(f"Premise: {premise}")
    print(f"Hypo: {hypothesis}")
    input_ids = bart_tokenizer.encode(premise, hypothesis, return_tensors="pt")
    logits = bart_model(input_ids)[0]
    probs = logits.softmax(dim=1)

    max_index = torch.argmax(probs).item()

    bart_label_map = {0: "contradiction", 1: "neutral", 2: "entailment"}
    return bart_label_map[max_index]

In [None]:
# test model
"""
premise = "I love sea dsf afdas f"
hypothesis = "I love sun sdf  asfdfsdss"
input_ids = bart_tokenizer.encode(premise, hypothesis, return_tensors="pt")
print(f"Input ids: {input_ids.shape}")
logits = bart_model(input_ids)[0]
probs = logits.softmax(dim=1)

max_index = torch.argmax(probs).item()

bart_label_map = {0: "contradiction", 1: "neutral", 2: "entailment"}
print(bart_label_map[max_index])
for i, lab in bart_label_map.items():
    print(f"{lab} probability: {probs[0][i] * 100:0.2f}%")
"""

# Tools

refs:
- https://python.langchain.com/v0.2/docs/integrations/tools/tavily_search/

In [None]:
### Search
from langchain_community.tools.tavily_search import TavilySearchResults

web_search_tool = TavilySearchResults(k=2)

# Indexing

Organizing external sources for the llm. Phase of indexing and chunking of docs refs:
- https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/
- https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/
- https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/
- Nomic embeddings: https://docs.nomic.ai/atlas/capabilities/embeddings#selecting-a-device

osservazione: si possono controllare gli indici direttamente da https://app.pinecone.io/organizations/-O2Tiw_0VD7HTOASPJE5/projects/2a95c518-e514-4d39-bed8-4b12fd90ad44/indexes

osservazione sul chuncking: https://dev.to/peterabel/what-chunk-size-and-chunk-overlap-should-you-use-4338

In [None]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_pinecone import PineconeVectorStore
from langchain_ai21 import AI21Embeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

def create_vectorstore(urls: list[str]):
    docs = [WebBaseLoader(url).load() for url in urls] #text + meta-data on docs
    docs_list = [item for sublist in docs for item in sublist] #ci serve l'attributo page_content

    # Chunking
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=250, chunk_overlap=0
    )

    doc_splits = text_splitter.split_documents(docs_list)
    index_name = "vectorstore"

    # Add to vectorDB
    vectorstore = PineconeVectorStore.from_documents(
        documents=doc_splits,
        #embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local", device="cuda"),
        embedding=AI21Embeddings(device="cuda"),
        index_name=index_name
    )
    return vectorstore.as_retriever()

# Indexing KBTs

In [None]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_pinecone import PineconeVectorStore
from langchain_ai21 import AI21Embeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter


# metodo che permette di creare una lista di vectorstore
def create_KBTs(aspects, urls_list: list[list[str]]):
    retriever_list = []
    index=0
    for urls in urls_list:
        docs = [WebBaseLoader(url).load() for url in urls] #text + meta-data on docs
        docs_list = [item for sublist in docs for item in sublist] #ci serve l'attributo page_content

        # Chunking
        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
            chunk_size=250, chunk_overlap=0
        )

        doc_splits = text_splitter.split_documents(docs_list)
        index_name = f"{aspects[index].lower()}-kbt"
        
        #debug:
        #print(doc_splits)

        # Add to vectorDB
        vectorstore_KBT = PineconeVectorStore.from_documents(
            documents=doc_splits,
            #embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local", device="cuda"),
            embedding=AI21Embeddings(device="cuda"),
            index_name=index_name
        )
        retriever_KBT = vectorstore_KBT.as_retriever()
        retriever_list.append(retriever_KBT)
        index=index+1
    return retriever_list

# Query generation (multi-aspects)

In [None]:
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

def query_generator(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You have to generate multiple
        search queries based on some specified aspects. You have to generate an answer as a Python list,
        and in each position of the list there is the generated query of the aspect. NO PREAMBLE: return only the list.
        Here some examples:
        Original query: "What about COVID19?"
        Aspects: ["Health","Economy"]
        Answer: ["Symptoms of COVID19","Economic consequences of COVID19"]
        \n ----- \n
        Original query: "COVID19 was fake?"
        Aspects: ["Health","Society"]
        Answer: ["Is COVID19 just a cold?","What people think about COVID19?"]
        \n ----- \n
        <|eot_id|><|start_header_id|>user<|end_header_id|>
        Original query: {original_query}
        Aspects: {aspects}
        Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["aspects","original_query"],
    )
    llm = ChatOllama(model=local_llm, temperature=0) 
    query_generator = prompt | llm | StrOutputParser() 
    return query_generator


#original_query = "Covid19 was a hoax?"
#aspects = ["Health","Economy"]
#generation = query_generator(local_llm).invoke({"original_query": original_query, "aspects":aspects})
#print(eval(generation))


# # Reciprocal Rank Fusion algorithm
# def reciprocal_rank_fusion(search_results_dict, k=60):
#     fused_scores = {}
#     print("Initial individual search result ranks:")
#     for query, doc_scores in search_results_dict.items():
#         print(f"For query '{query}': {doc_scores}")

#     for query, doc_scores in search_results_dict.items():
#         for rank, (doc, score) in enumerate(sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)):
#             if doc not in fused_scores:
#                 fused_scores[doc] = 0
#             previous_score = fused_scores[doc]
#             fused_scores[doc] += 1 / (rank + k)
#             print(f"Updating score for {doc} from {previous_score} to {fused_scores[doc]} based on rank {rank} in query '{query}'")

#     reranked_results = {doc: score for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)}
#     print("Final reranked results:", reranked_results)
#     return reranked_results

# Organizing outputs

In [None]:
def final_answer(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant that organizes and puts together many answers.
        Make the summary of each answer, if it is necessary, and then put them together maintaining coherence in the discussion. No preamble, just give the final output. Expected JSON format.
        <|eot_id|><|start_header_id|>user<|end_header_id|>
        Here are the answers: {answers}
        Result answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["answers"],
    )
    llm = ChatOllama(model=local_llm, format="json", temperature=0)
    final_answer = prompt | llm | JsonOutputParser()
    return final_answer

#final_output = final_answer.invoke({"answers": answers})
#print(final_output)

# Retrieval

In [None]:
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

#higher temperature more likely hallucinations

def retrieval_grader(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing relevance
        of a retrieved document to a user question. If the document contains keywords related to the user question,
        grade it as relevant. It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
        Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n
        Provide only the answer 'yes' or 'no', NOT ANYMORE. NO PREAMBLE. NO EXPLANATION.
        <|eot_id|><|start_header_id|>user<|end_header_id|>
        Here is the retrieved document: \n\n {document} \n\n
        Here is the user question: {question} \n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
        """,
        input_variables=["question", "document"],
    )
    llm = ChatOllama(model=local_llm, temperature=0) #higher temperature more likely hallucinations
    retrieval_grader = prompt | llm | StrOutputParser()
    return retrieval_grader

# question = "agent memory"
# docs = retriever.invoke(question)
# doc_txt = docs[1].page_content
# print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

# Generating answer

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def rag_chain(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks.
        Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know.
        Use three sentences maximum and keep the answer concise <|eot_id|><|start_header_id|>user<|end_header_id|>
        Question: {question}
        Context: {context}
        Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["question", "document"],
    )
    llm = ChatOllama(model=local_llm, temperature=0)
    # Chain
    rag_chain = prompt | llm | StrOutputParser()
    return rag_chain

# Run
# question = "agent memory"
# docs = retriever.invoke(question)
# generation = rag_chain.invoke({"context": docs, "question": question})
# print(generation)

# Hallucinations check (not used)

Per ora non uso hallucinations check

In [None]:
def hallucination_grader(local_llm):
    prompt = PromptTemplate(
        template=""" <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether
        an answer is grounded in / supported by a set of facts. Give a binary score 'yes' or 'no' (both in lower case) to indicate
        whether the answer is grounded in / supported by a set of facts. Provide the binary score as a JSON with a
        SINGLE KEY 'score' and NO preamble or explanation. <|eot_id|><|start_header_id|>user<|end_header_id|>
        Here are the facts:
        \n ------- \n
        {documents}
        \n ------- \n
        Here is the answer: {generation}  <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["generation", "documents"],
    )
    llm = ChatOllama(model=local_llm, format="json", temperature=0)
    hallucination_grader = prompt | llm | JsonOutputParser()
    return hallucination_grader

#hallucination_grader.invoke({"documents": docs, "generation": generation})

# Answer check

In [None]:
def answer_grader(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether an
        answer is useful to resolve a question. Give a binary score 'yes' or 'no' (both in lower case) to indicate whether the answer is
        useful to resolve a question. Provide the binary score as a JSON with a SINGLE KEY 'score' and NO preamble or explanation.
         <|eot_id|><|start_header_id|>user<|end_header_id|> Here is the answer:
        \n ------- \n
        {generation}
        \n ------- \n
        Here is the question: {question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["generation", "question"],
    )
    llm = ChatOllama(model=local_llm, format="json", temperature=0)
    answer_grader = prompt | llm | JsonOutputParser()
    return answer_grader

#answer_grader.invoke({"question": question, "generation": generation})

# Routing (not used)

Per ora non applichiamo il routing!

In [None]:
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

def question_router(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at routing a
        user question to a vectorstore or web search. Use the vectorstore for questions on LLM  agents,
        prompt engineering, and adversarial attacks. You do not need to be stringent with the keywords
        in the question related to these topics. Otherwise, use web-search. Give a binary choice 'web_search'
        or 'vectorstore' based on the question. Return the a JSON with a single key 'datasource' and
        no premable or explanation. Question to route: {question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["question"],
    )
    llm = ChatOllama(model=local_llm, format="json", temperature=0)
    question_router = prompt | llm | JsonOutputParser()
    return question_router
    
# question = "llm agent memory"
# docs = retriever.get_relevant_documents(question)
# doc_txt = docs[1].page_content
# print(question_router.invoke({"question": question}))

# Document check (Entailment)

In [None]:
def entailment_checker(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You have to perform a task of textual entailments
        between two documents. Give three category 'contradiction','neutral' or 'entailment' (all in lower case).
        Two documents are in 'contradiction', if there are contradicting statements.
        There is an 'entailment' between first document and second document if it is likely that from the first document I can deduce what is stated in the second document. 
        Two documents are 'neutral' if they are not in contradiction or in entailment.
        Provide only the answer with the category, NOT ANYMORE. NO PREAMBLE. NO EXPLANATION.
         <|eot_id|><|start_header_id|>user<|end_header_id|> First document:
        {first_doc}
        \n ------- \n
        Second document:
        \n ------- \n
        {second_doc}
        \n ------- \n
        <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["first_doc", "second_doc"],
    )
    llm = ChatOllama(model=local_llm, temperature=0)
    entailment_checker = prompt | llm | StrOutputParser()
    return entailment_checker

# Debiasing

https://learnprompting.org/docs/reliability/debiasing

In [None]:
def debiasing_answer(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> The user will pass you text cointaining biases.
        Your role is to give a debiased version of the text. Give ONLY the debiased text, NO PREAMBLE, NO EXPLANATIONS.
        Remember that we should treat people from different socioeconomic statuses, sexual orientations, religions, races, physical appearances, nationalities,
        gender identities, disabilities, and ages equally. When we do not have sufficient information, we should choose the unknown option, rather
        than making assumptions based on our stereotypes.
         <|eot_id|><|start_header_id|>user<|end_header_id|> 
        {text}
        <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["text"],
    )
    llm = ChatOllama(model=local_llm, temperature=0)
    debiasing = prompt | llm | StrOutputParser()
    return debiasing

# Hate speech detection

https://arxiv.org/html/2401.03346v1/#S4

In [None]:
def hate_speech_detection(local_llm):
    prompt = PromptTemplate(
        template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an helpful assistant who has to detect the presence of hate speech.
            Hate speech is speech that attacks a person or group based on attributes such as race, religion, ethnic origin, national origin, sex, disability, sexual orientation, or gender identity. 
            You have to answer "yes" if it contains hate speech, or "no" if it doesn't contain hate speech. NO PREAMBLE, NO EXPLANATIONS.
            <|eot_id|><|start_header_id|>user<|end_header_id|> 
            Do you think this document contain hate speech? document: {document}.
            <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
        input_variables=["document"],
    )
    llm = ChatOllama(model=local_llm, temperature=0)
    hate_speech_detection = prompt | llm | StrOutputParser()
    return hate_speech_detection

# Aspect agents

refs
- https://www.langchain.com/langgraph

In [None]:
from pprint import pprint
from typing import List, Annotated
import operator
import functools

from langchain_core.documents import Document
from typing_extensions import TypedDict

from langgraph.graph import END, StateGraph, START

### State
class GraphState(TypedDict):
    """
    Represents the state of graph of aspect agents.
    """
    
    query: str
    aspect_id: int
    answers_agent: Annotated[List[str], operator.add]
    my_answer: str
    web_search: str
    documents: List[str]
    documents_kbt: List[str]


def retrieve(state,verbose,retriever,retrievers_KBT):
    """
    Retrieve documents from vectorstore and from KBT

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    if verbose: 
        print("---RETRIEVE---")
        print(f"State: {state}")
        
    query = state["query"]
    aspect_id = state["aspect_id"]

    # Retrieval
    documents = retriever.invoke(query)
    documents_kbt = retrievers_KBT[aspect_id].invoke(query)
    
    #pprint(f"Documents retrieved: {documents}")
    #pprint(f"Documents KBT retrieved: {documents_kbt}")
    
    return {"documents": documents, "documents_kbt": documents_kbt, "query": query}


def generate(state,verbose,llm,fairness):
    """
    Generate answer using RAG on retrieved documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    if verbose:
        print("---GENERATE---")
        print(f"State: {state}")
    query = state["query"]
    documents = state["documents"]

    # RAG generation
    generation = rag_chain(llm).invoke({"context": documents, "question": query})
    if fairness:
        return {"documents": documents, "query": query, "my_answer": generation}
    return {"documents": documents, "query": query, "my_answer": generation, "answers_agent": [generation]}


def grade_documents(state,verbose,llm):
    """
    Determines whether the retrieved documents are relevant to the question
    If any document is not relevant, we will set a flag to run web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Filtered out irrelevant documents and updated web_search state
    """
    if verbose:
        print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
        print(f"State: {state}")
    query = state["query"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    web_search = "No"
    for d in documents:
        score = retrieval_grader(llm).invoke(
            {"question": query, "document": d.page_content}
        )
        #grade = score["score"]
        # Document relevant
        if score.lower() == "yes":
            if verbose: print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        # Document not relevant
        else:
            if verbose: print("---GRADE: DOCUMENT NOT RELEVANT---")
            # We do not include the document in filtered_docs
            # We set a flag to indicate that we want to run web search
            web_search = "Yes"
            continue
    return {"documents": filtered_docs, "query": query, "web_search": web_search}


def web_search(state,verbose):
    """
    Web search based based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to documents
    """
    if verbose:
        print("---WEB SEARCH---")
        print(f"State: {state}")
    query = state["query"]
    documents = state["documents"]

    # Web search
    docs = web_search_tool.invoke({"query": query})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=250, chunk_overlap=0
    )

    doc_splits = text_splitter.split_documents([web_results])
    for doc in doc_splits:
        if documents is None:
            documents = [doc]
        else:
            documents.append(doc)
    return {"documents": documents, "query": query}


def hate_speech_filter(state,verbose,llm):
    if verbose:
        print("---HATE SPEECH FILTER---")
        print(f"State: {state}")
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    for d in documents:
        score = hate_speech_detection(llm).invoke(
            {"document": d.page_content}
        )
        #grade = score["score"]
        if score.lower() == "no":
            print("---DOCUMENT ACCEPTED---")
            filtered_docs.append(d)
    
    return {"documents": filtered_docs}


def entailment_filter(state,BART_model,strategy_entailment,neutral_acceptance,verbose,llm):
    """
    Filter documents that doesn't entail with KBT

    Args:
        state (dict): The current graph state
    """
    
    if verbose:
        print("---ENTAILMENT FILTER---")
        print(f"State: {state}")
    query = state["query"]
    documents = state["documents"]
    documents_KBT = state["documents_kbt"]

    # Score each doc
    filtered_docs = []
    if strategy_entailment: #Skeptical
        for d in documents:
            neutral = True
            for d_kbt in documents:
                if BART_model:
                    score = BART_prediction(d_kbt.page_content,d.page_content)
                else:
                    score = entailment_checker(llm).invoke(
                        {"first_doc": d_kbt.page_content, "second_doc": d.page_content}
                    )
                #grade = score["score"]
                # Document entailed
                if score.lower() != "neutral":
                    neutral = False
                if score.lower() == "contradiction":
                    # contradiction found
                    break
            if not neutral or neutral_acceptance:
                filtered_docs.append(d)
                if verbose: print("---DOCUMENT ENTAILED---")   
    else: #Credolous
        for d in documents:
            neutral = True
            for d_kbt in documents:
                if BART_model:
                    score = BART_prediction(d_kbt.page_content,d.page_content)
                else:
                    score = entailment_checker(llm).invoke(
                        {"first_doc": d_kbt.page_content, "second_doc": d.page_content}
                    )
                #grade = score["score"]
                # Document entailed
                if score.lower() != "neutral":
                    neutral = False
                if score.lower() == "entailment":
                    if verbose: print("---DOCUMENT ENTAILED---")
                    filtered_docs.append(d)
                    break
            if neutral and neutral_acceptance:
                filtered_docs.append(d)
                if verbose: print("---DOCUMENT ENTAILED---")
    
    return {"documents": filtered_docs}


def debiasing(state,verbose,llm):
    if verbose:
        print("---DEBIASING FILTER---")
        print(f"State: {state}")
        
    answer = state["my_answer"]
    
    unbiased_answer = debiasing_answer(llm).invoke({"text": answer})
    
    return {"answers_agent": [unbiased_answer]}


### Conditional edge

def bias_detection(state,verbose):
    if verbose:
        print("---BIAS DETECTION---")
        print(f"State: {state}")
        
    answer = state["my_answer"]
    
    bias_detection = pipeline('text-classification', model=bias_model, tokenizer=bias_model_tokenizer, device=device) # cuda = 0,1 based on gpu availability
    answer = bias_detection(answer)
    if verbose: print(answer[0]['label']) #Biased, Non-biased
    
    return answer[0]['label']

    
#Not used
def route_question(state,verbose,llm):
    """
    Route question to web search or RAG.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """
    if verbose:
        print("---ROUTE QUESTION---")
        print(f"State: {state}")
    query = state["query"]
    #print(queries)
    source = question_router(llm).invoke({"question": query})
    #print(source)
    #print(source["datasource"])
    if source["datasource"] == "web_search":
        if verbose: print("---ROUTE QUESTION TO WEB SEARCH---")
        return "websearch"
    elif source["datasource"] == "vectorstore":
        if verbose: print("---ROUTE QUESTION TO RAG---")
        return "vectorstore"


def decide_to_generate(state,verbose):
    """
    Determines whether to generate an answer, or add web search

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """
    if verbose:
        print("---ASSESS GRADED DOCUMENTS---")
        print(f"State: {state}")
    web_search = state["web_search"]

    if web_search == "Yes":
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        if verbose: print(
                "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---"
            )
        return "websearch"
    else:
        # We have relevant documents, so generate answer
        if verbose: print("---DECISION: RELEVANT---")
        return "relevant"


# Not used
def grade_generation_v_documents_and_question(state,verbose,llm):
    """
    Determines whether the generation is grounded in the document and answers question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """
    
    if verbose:
        print("---CHECK HALLUCINATIONS---")
        print(f"State: {state}")
    query = state["query"]
    documents = state["documents"]
    my_answer = state["my_answer"]

    score = hallucination_grader(llm).invoke(
        {"documents": documents, "generation": my_answer} #answers_agent[0]
    )
    if verbose: print(f"score: {score}")
    grade = score["score"]

    # Check hallucination
    if grade == "yes":
        if verbose: print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        if verbose: print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": query, "generation": my_answer})
        grade = score["score"]
        if grade == "yes":
            if verbose: print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            if verbose: print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        if verbose: pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"

**Building graph with edges**

In [None]:
# Workflow condizionale
def workflow_aspect_agent(configs):
    # Build graph
    workflow = StateGraph(GraphState)

    # Define the nodes
    workflow.add_node("websearch", functools.partial(web_search, verbose=configs.verbose))  # web search
    workflow.add_node("retrieve", functools.partial(retrieve, verbose=configs.verbose, retriever=configs.retriever, retrievers_KBT=configs.retrievers_KBT))  # retrieve
    workflow.add_node("grade_documents", functools.partial(grade_documents, verbose=configs.verbose, llm=configs.local_llm))  # grade documents
    workflow.add_node("generate", functools.partial(generate, verbose=configs.verbose, llm=configs.local_llm, fairness=configs.fairness))  # generatae
    if configs.safeness:
        workflow.add_node("hate_speech_filter", functools.partial(hate_speech_filter, verbose=configs.verbose, llm=configs.local_llm))
    if configs.trustworthiness:
        workflow.add_node("entailment_filter", functools.partial(entailment_filter, BART_model=configs.BART_model, strategy_entailment=configs.strategy_entailment, neutral_acceptance=configs.neutral_acceptance, verbose=configs.verbose, llm=configs.local_llm))  # entailment
    if configs.fairness:  
        workflow.add_node("debiasing_filter", functools.partial(debiasing, verbose=configs.verbose, llm=configs.local_llm))

    # Non applichiamo il routing
    """
    workflow.add_conditional_edges(
        START,
        route_question,
        {
            "websearch": "websearch", #se la risposta è websearch, allora vai al nodo websearch
            "vectorstore": "retrieve", #se la risposta è vectorstore, allora vai al nodo retrieve
        },
    )
    """
    
    workflow.add_edge(START, "retrieve")
    workflow.add_edge("retrieve", "grade_documents")
    
    workflow.add_conditional_edges(
        "grade_documents",
        functools.partial(decide_to_generate, verbose=configs.verbose),
        {
            "websearch": "websearch",
            "relevant": "hate_speech_filter" if configs.safeness else "entailment_filter" if configs.trustworthiness else "generate"
        },
    )
    
    if configs.safeness:
        workflow.add_edge("websearch", "hate_speech_filter")
        if configs.trustworthiness:
            workflow.add_edge("hate_speech_filter", "entailment_filter")
            workflow.add_edge("entailment_filter", "generate")
        else:
            workflow.add_edge("hate_speech_filter","generate")
    elif configs.trustworthiness:
        workflow.add_edge("websearch", "entailment_filter")
        workflow.add_edge("entailment_filter", "generate")
    else:
        workflow.add_edge("websearch", "generate")
    
    if configs.fairness:
        workflow.add_conditional_edges(
            "generate",
            functools.partial(bias_detection, verbose=configs.verbose),
            {
                "Biased": "debiasing_filter",
                "Non-biased": END,
            },
        )
        workflow.add_edge("debiasing_filter", END)
    else:
        workflow.add_edge("generate", END)

    # Non faccio il controllo sulle allucinazioni
    """
    workflow.add_conditional_edges(
        "generate",
        grade_generation_v_documents_and_question,
        {
            "not supported": "generate",
            "useful": END,
            "not useful": "websearch",
        },
    )
    """    
    workflow_compiled = workflow.compile()
    return workflow_compiled

In [None]:
from IPython.display import Image, display

display(Image(workflow_aspect_agent(configs).get_graph().draw_mermaid_png()))

# Master agent

In [None]:
from typing import Annotated
import operator
from langgraph.constants import Send


### Super Graph State
class SuperGraphState(TypedDict):
    """
    Represents the state of our super-graph.
    """
    
    question: str
    aspects: List[str]
    queries: List[str]
    answers_agent: Annotated[List[str], operator.add]
    final_answer: str


def generate_queries(state,verbose,llm):
    """
    Generate multi-aspect queries from the starting question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains the multi aspect queries
    """
    if verbose: 
        print("---GENERATE MULTI-ASPECTS QUERIES---")
        print(f"State: {state}")
    question = state["question"]
    aspects = state["aspects"]

    generation = query_generator(llm).invoke({"original_query": question, "aspects": aspects})
    #print(list(generation.values()))
    return {"queries": eval(generation)}

def send_queries(state,verbose):
    if verbose: 
        print("---SEND MULTI-ASPECTS QUERIES---")
        print(f"State: {state}")
    return [Send("aspect_agent_node", {"query": q, "aspect_id": state["queries"].index(q)}) for q in state["queries"]]

def organize_outputs(state,verbose,llm):
    """
    Organize the outputs of the agents.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, answer, that contains the final answer to give to user
    """
    if verbose:
        print("---ORGANIZE OUTPUTS---")
        print(f"State: {state}")
    answers_agent = state["answers_agent"]

    final_output =final_answer(llm).invoke({"answers": answers_agent})
    return {"final_answer": final_output}

In [None]:
def master_flow(configs):
    master_flow = StateGraph(SuperGraphState)

    # Define the nodes
    master_flow.add_node("generate_queries", functools.partial(generate_queries, verbose=configs.verbose, llm=configs.local_llm))
    master_flow.add_node("organize_queries", functools.partial(organize_outputs, verbose=configs.verbose, llm=configs.local_llm))
    master_flow.add_node("aspect_agent_node",workflow_aspect_agent(configs))

    # Build graph
    master_flow.add_edge(START, "generate_queries")
    master_flow.add_conditional_edges("generate_queries", functools.partial(send_queries, verbose=configs.verbose), ["aspect_agent_node"])
    master_flow.add_edge("aspect_agent_node", "organize_queries")
    master_flow.add_edge("organize_queries", END)

    master_compiled = master_flow.compile()
    return master_compiled

In [None]:
from IPython.display import Image, display

# Setting xray to 1 will show the internal structure of the nested graph
display(Image(master_flow(configs).get_graph().draw_mermaid_png()))

# Configuration and app-launching

Vectorstore configuration

In [None]:
urls = [
    "https://bmjgroup.com/celebrity-tweets-likely-shaped-us-negative-public-opinion-of-covid-19-pandemic/",
    "https://eu.usatoday.com/story/news/health/2024/07/26/covid-vaccine-us-china-propaganda/74555829007/",
    "https://www.theguardian.com/society/2023/jun/13/quarter-in-uk-believe-covid-was-a-hoax-poll-on-conspiracy-theories-finds",
]
#richiede lettura tramite pdf:
# "https://www.ourcommons.ca/Content/Committee/441/HESA/Brief/BR11822476/br-external/LutzMitchell-e.pdf"
retriever = create_vectorstore(urls)

KBT configuration

In [None]:
#Aspects and KBT configuration
#aspects and urls_list must be of the same size and ordered for each aspect.
aspects = ["Health"] #Possible aspects: Health, Economy
urls_list = [["https://www.who.int/emergencies/diseases/novel-coronavirus-2019/covid-19-vaccines",
                "https://www.who.int/news-room/questions-and-answers/item/vaccines-and-immunization-vaccine-safety",
                "https://www.bbc.com/news/stories-52731624",
                "https://www.bbc.com/news/technology-52903680"]],
            #["https://www.worldbank.org/en/publication/wdr2022/brief/chapter-1-introduction-the-economic-impacts-of-the-covid-19-crisis"]]
retrievers_KBT = create_KBTs(aspects, urls_list)

In [None]:
class Config(object):
    def __init__(self,retriever,retrievers_KBT,aspects):
        self.local_llm = "llama3.1" #"llama3.1:70b"
        
        self.retriever = retriever
        self.retrievers_KBT = retrievers_KBT
        
        self.aspects = aspects
        
        #if we want print all the process: True
        self.verbose = True 

        # Controlling properties
        self.safeness = True # if we want to add hate speech detection module
        self.trustworthiness = True # if we want to add entailment module with KBT
        self.fairness = True  # if we want to add debiasing module.

        #Controlling entailment
        # strategy for the entailment, False = "Credolous", True = "Skeptical" 
        self.strategy_entailment = True

        #manage the total neutral entailed documents (what if a document is neutral with all documents of KBT)
        # True = accept the neutral documents, False = don't accept
        self.neutral_acceptance = True
        
        # True: uses BART model for the entailment, False: uses LLM
        self.BART_model = True

configs = Config(retriever,retrievers_KBT,aspects)

Starting language model

In [None]:
start_ollama()
pull_model(configs.local_llm)
start_model(configs.local_llm)

In [66]:
inputs = {"question": "What about covid19?", "aspects": configs.aspects}

for output in master_flow(configs).stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}:")
pprint(value["final_answer"])
answer = value["final_answer"]

---GENERATE MULTI-ASPECTS QUERIES---
State: {'question': 'What about covid19?', 'aspects': ['Health'], 'queries': None, 'answers_agent': [], 'final_answer': None}


time=2024-08-02T09:58:38.491Z level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87 gpu=GPU-56a47ebe-d144-bcc5-c2cd-9a5403f452e9 parallel=4 available=15615524864 required="6.2 GiB"
time=2024-08-02T09:58:38.492Z level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[14.5 GiB]" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-08-02T09:58:38.492Z level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama1812754288/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/b

INFO [main] build info | build=1 commit="6eeaeba" tid="140070634057728" timestamp=1722592718
INFO [main] system info | n_threads=2 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140070634057728" timestamp=1722592718 total_threads=4
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="6" port="40905" tid="140070634057728" timestamp=1722592718


llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2024-08-02T09:58:38.744Z level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache siz

INFO [main] model loaded | tid="140070634057728" timestamp=1722592722


time=2024-08-02T09:58:42.295Z level=INFO source=server.go:623 msg="llama runner started in 3.80 seconds"


[GIN] 2024/08/02 - 09:58:42 | 200 |  4.506249886s |       127.0.0.1 | POST     "/api/chat"
---SEND MULTI-ASPECTS QUERIES---
State: {'queries': ['Symptoms of COVID19'], 'question': 'What about covid19?', 'aspects': ['Health'], 'answers_agent': []}
'Finished running: generate_queries:'
---RETRIEVE---
State: {'query': 'Symptoms of COVID19', 'aspect_id': 0, 'answers_agent': [], 'my_answer': None, 'web_search': None, 'documents': None, 'documents_kbt': None}
---CHECK DOCUMENT RELEVANCE TO QUESTION---
State: {'query': 'Symptoms of COVID19', 'aspect_id': 0, 'answers_agent': [], 'my_answer': None, 'web_search': None, 'documents': [Document(metadata={'description': 'Posts by politicians and news anchors had greatest impact, analysis suggests Data might be used to bolster public health messaging and counter misinformation Tweets by people in the public eye likely increasingly shaped negative public opinion of the COVID-19 pandemic as it progressed in the US, suggests an analysis of sentiments ex