## Graph RAG using Groq

In this notebook i will be using CrewAI to showcase how you can do RAG on a graphQL representation of your obsidian notes.<br>
This example will be using grow for better inference speeds

```bash
pip install langchain-groq
```


I am exposing the notes as a local GraphQL server using this [plugin](https://github.com/TwIStOy/obsidian-local-graphql). <br>
Access the Apollo client UI by going to this link [http://localhost:28123/](http://localhost:28123/)

In [15]:
import os 
import pprint as pp
from langchain.tools import tool
from langchain_groq import ChatGroq
from langchain.agents import load_tools
from crewai import Agent, Task, Crew, Process

from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv('.env'))

True

In [2]:
class ObsidianGraphTool:
    @tool("get_documents")
    def get_documents(query: str) -> str:
        """Get Documents from obsdiain graph and parse them."""

        def define_query_string(search_type):
            # Define your query with the variable
            string = f""" \
            query ExampleQuery($query: String!) {{
                vault {{
                {search_type}(query: $query) {{
                    file {{
                    name
                    readContent
                    stat {{
                        size
                    }}
                    }}
                    score
                }}
                }}
            }}
            """
            return gql(string)


        def filter_results(result, min_neg_score, search_type, top_n):
            """
            Filter results based on min_neg_score threshold and sort by score descending (closest to 0)
            """
            result_string = ""
            top_items = sorted(
                (item for item in result["vault"][search_type] if item["score"] > min_neg_score),
                key=lambda x: x["score"], 
                reverse=True
            )

            # Get top N results
            if top_n != -1:
                top_items = top_items[:top_n]

            for item in top_items:
                file_name = item["file"]["name"]
                score = item["score"]
                read_content = item['file']['readContent']

                file_string = f"File Name: {file_name}\nScore: {score}\nContent:\n{read_content}"        
                result_string += file_string + "\n"

            return result_string

        # Configure the client
        transport = RequestsHTTPTransport(
            url='http://localhost:28123/',  # Make sure this URL is correct
            use_json=True,
            headers={'Content-Type': 'application/json'},
            verify=True,
            retries=3,
        )

        client = Client(transport=transport, fetch_schema_from_transport=True)

        SIMPLE_SEARCH_TYPE = "simpleSearch"
        FUZZY_SEARCH_TYPE = "fuzzySearch"
        MIN_NEG_SCORE = -3
        TOP_N = 3

        # simple_search_query = define_query_string(SIMPLE_SEARCH_TYPE)
        fuzzy_search_query = define_query_string(FUZZY_SEARCH_TYPE)

        # Execute the query with variables
        variables = {
            "query": query
        }

        result = client.execute(fuzzy_search_query, variable_values=variables)
        result = filter_results(result, MIN_NEG_SCORE, FUZZY_SEARCH_TYPE, TOP_N)
        return result

## Defining the Crew



In [24]:
def define_crew(question):
    """Setup a crew to extract keywords and search a database based on a question"""

    # llm = Ollama(model=model_name, base_url=api_base)
    llm = ChatGroq(temperature=0, model_name="mixtral-8x7b-32768", groq_api_key=os.environ.get("GROQ_API_KEY"))

    # ------ Agents ------ 
    extract_agent = Agent(
        role="Extractor",
        goal="Extract a single entity within the query, and boil it down to a single word to search for.",
        backstory="Skilled in identifying a single key word from complex questions.",
        llm=llm,
        # tools=human_tools,
        verbose=True,
        allow_delegation=False,
    )

    # Agent to perform a search in the graph database using ObsidianGraphTool
    search_agent = Agent(
        role="Answerer",
        goal="Search the graph database using the extracted search term and use that information to ANSWER the users query.",
        backstory="Helpful assistant who uses get_documents tools to get answers and provides clear explanations",
        llm=llm,
        tools=[ObsidianGraphTool.get_documents],  # Assuming the tool is correctly implemented as a class method
        verbose=True,
        allow_delegation=False,
        memory=True
    )


    # Agent to use the search results to answer the user's original question
    answer_agent = Agent(
        role="Answerer",
        goal="Provide a concise answer using information from the graph database.",
        backstory="Adept at synthesizing information to craft clear, accurate answers.",
        llm=llm,
        verbose=False,
        allow_delegation=False,
    )

    # ------ Tasks ------ 

    # Define the tasks
    extract_task = Task(
        description=f"Extract a search term from this question: {question}\n\n DO NOT include spaces in your. DO NOT try to infer acronyms, just use them as is.",
        expected_output="A single word to search for within a fuzzy search",
        agent=extract_agent,
    )

    search_task = Task(
        description="""
        1. Pass the extracted search term to provided get_documents tool to search the graph database.
        2. Answer the user's question using the search results

        Rules:
        - You MUST only use the provided get_documents tool once
        - You MUST use the information from the search as 

        This is the users question:{question}\n
        """,
        expected_output="An informative answer to the question",
        agent=search_agent,
        context=[extract_task]  # Pass the output of the extract_task as context
    )

    # Create the crew
    graph_search_crew = Crew(
        agents=[extract_agent, search_agent],
        tasks=[extract_task, search_task],
        process=Process.sequential,
        manager_llm=llm
    )

    return graph_search_crew

In [26]:
# Execute the crew
question = "How should i measure bias in an ML project?"

graph_search_crew = define_crew(question)
result = graph_search_crew.kickoff()
pp.pprint(result)

Overriding of current TracerProvider is not allowed




[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mThe question is asking about measuring bias in a Machine Learning (ML) project. The key term here is "bias" as it is the main subject of the question. The task is to extract a single word to search for, so the answer should be "bias".

Thought: I now can give a great answer

Final Answer: bias[0m

[1m> Finished chain.[0m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mTo answer the user's question, I need to first use the `get_documents` tool to search the graph database for information related to the search term "bias". Then, I will use this information to provide an informative answer to the user's question.

Action: get_documents
Action Input: {"query": "bias"}[0m[93m 

File Name: 8. Aequitas.md
Score: -0.14600000000000002
Content:
---
tags:
  - data
  - ml
  - model-bias
  - mlops
  - model-testing
---
> [[Model Evaluation]], [[ML Pipelines]], [[Model Bias (Fairness)]], [[7.  Model Bias]]
### Aeq