## Graph RAG Examples

In this notebook i will be using CrewAI to showcase how you can do RAG on a graphQL representation of your obsidian notes.<br>
To start the ollama server on mac, run the following:

- `ollama pull mistral:7b-instruct-v0.2-fp16`
- `OLLAMA_HOST=127.0.0.1:5050 ollama serve`

You need to export these configurations as environment variables, but i done it in the code because there's no sensitive informaiton.

```bash
export OPENAI_API_BASE="http://localhost:5050/"
export OPENAI_MODEL_NAME="mistral:latest"
export OPENAI_API_KEY=""
```

I am exposing the notes as a local GraphQL server using this [plugin](https://github.com/TwIStOy/obsidian-local-graphql). <br>
Access the Apollo client UI by going to this link [http://localhost:28123/](http://localhost:28123/)

In [1]:
import os 

# import ollama
from langchain.tools import tool
from langchain.agents import load_tools
from langchain_community.llms import Ollama  
from crewai import Agent, Task, Crew, Process

from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

In [2]:
os.environ["OPENAI_API_BASE"] = "http://localhost:5050"
os.environ["OPENAI_MODEL_NAME"] = "mistral:7b-instruct-v0.2-fp16"
os.environ["OPENAI_API_KEY"] = ""

In [3]:
# Check the value is correct
os.getenv("OPENAI_API_BASE")

'http://localhost:5050'

In [4]:
class ObsidianGraphTool:
    @tool("get_documents")
    def get_documents(query: str) -> str:
        """Get Documents from obsdiain graph and parse them."""

        def define_query_string(search_type):
            # Define your query with the variable
            string = f""" \
            query ExampleQuery($query: String!) {{
                vault {{
                {search_type}(query: $query) {{
                    file {{
                    name
                    readContent
                    stat {{
                        size
                    }}
                    }}
                    score
                }}
                }}
            }}
            """
            return gql(string)


        def filter_results(result, min_neg_score, search_type, top_n):
            """
            Filter results based on min_neg_score threshold and sort by score descending (closest to 0)
            """
            result_string = ""
            top_items = sorted(
                (item for item in result["vault"][search_type] if item["score"] > min_neg_score),
                key=lambda x: x["score"], 
                reverse=True
            )

            # Get top N results
            if top_n != -1:
                top_items = top_items[:top_n]

            for item in top_items:
                file_name = item["file"]["name"]
                score = item["score"]
                read_content = item['file']['readContent']

                file_string = f"File Name: {file_name}\nScore: {score}\nContent:\n{read_content}"        
                result_string += file_string + "\n"

            return result_string

        # Configure the client
        transport = RequestsHTTPTransport(
            url='http://localhost:28123/',  # Make sure this URL is correct
            use_json=True,
            headers={'Content-Type': 'application/json'},
            verify=True,
            retries=3,
        )

        client = Client(transport=transport, fetch_schema_from_transport=True)

        SIMPLE_SEARCH_TYPE = "simpleSearch"
        FUZZY_SEARCH_TYPE = "fuzzySearch"
        MIN_NEG_SCORE = -3
        TOP_N = 3

        # simple_search_query = define_query_string(SIMPLE_SEARCH_TYPE)
        fuzzy_search_query = define_query_string(FUZZY_SEARCH_TYPE)

        # Execute the query with variables
        variables = {
            "query": query
        }

        result = client.execute(fuzzy_search_query, variable_values=variables)
        result = filter_results(result, MIN_NEG_SCORE, FUZZY_SEARCH_TYPE, TOP_N)
        return result

## Defining the Crew



In [None]:
human_tools = load_tools(["human"])
human_tools[0]

In [20]:
def define_crew(question):
    """Setup a crew to extract keywords and search a database based on a question"""
    
    # Verify environment setup
    model_name = os.getenv("OPENAI_MODEL_NAME")
    api_base = os.getenv("OPENAI_API_BASE")
    if not model_name or not api_base:
        raise EnvironmentError("Missing OPENAI_MODEL_NAME or OPENAI_API_BASE environment variables")

    llm = Ollama(model=model_name, base_url=api_base)
    
    # ------ Agents ------ 
    extract_agent = Agent(
        role="Extractor",
        goal="Extract a single entity to search for in a graph database.",
        backstory="Skilled in identifying a single key word from complex questions.",
        llm=llm,
        # tools=human_tools,
        verbose=True,
        allow_delegation=False,
    )

    # Agent to perform a search in the graph database using ObsidianGraphTool
    search_agent = Agent(
        role="Answerer",
        goal="Search the graph database using the extracted search term and use that information to ANSWER the users query.",
        backstory="Helpful assistant who uses get_documents tools to get answers and provides clear explanations",
        llm=llm,
        tools=[ObsidianGraphTool.get_documents],  # Assuming the tool is correctly implemented as a class method
        verbose=True,
        allow_delegation=False,
        memory=True
    )


    # Agent to use the search results to answer the user's original question
    answer_agent = Agent(
        role="Answerer",
        goal="Provide a concise answer using information from the graph database.",
        backstory="Adept at synthesizing information to craft clear, accurate answers.",
        llm=llm,
        verbose=False,
        allow_delegation=False,
    )

    # ------ Tasks ------ 

    # Define the tasks
    extract_task = Task(
        description=f"Extract a search term from this question: {question}\n\n DO NOT include spaces in your  ",
        expected_output="A simple word to search for within a fuzzy search",
        agent=extract_agent,
    )

    search_task = Task(
        description="""
        1. Pass the extracted search term to provided get_documents tool to search the graph database.
        2. Answer the user's question using the search results

        Rules:
        - You MUST only use the provided get_documents tool once
        - You MUST use the information from the search as 

        This is the users question:{question}\n
        """,
        expected_output="A clear and concise answer to the question",
        agent=search_agent,
        context=[extract_task]  # Pass the output of the extract_task as context
    )

    # Create the crew
    graph_search_crew = Crew(
        agents=[extract_agent, search_agent],
        tasks=[extract_task, search_task],
        process=Process.sequential,
        manager_llm=llm
    )

    return graph_search_crew

In [21]:
# Execute the crew
question = "i wanna know how LoRA works"

graph_search_crew = define_crew(question)
result = graph_search_crew.kickoff()
print(result)

Overriding of current TracerProvider is not allowed




[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3m I now can give a great answer.
Final Answer: LoRa[0m

[1m> Finished chain.[0m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3m I will use the get_docs tool to search for documents related to "LoRa" in the graph database.

Action: get_docs
Action Input: {"query": "LoRa"}[0m[93m 

File Name: Quantization.md
Score: -0.18000000000000002
Content:
> [[Large Language Model]], [[QLoRA]],
### What is Quantization?

![[Screenshot 2024-03-05 at 13.40.43.png]]

- Quantizing involves splitting a range of numbers (which is infinite in floating point) into bins
- Quantization is needed when you want to represent a value from an infinite range in a physically constrained system, as you need infinite bytes - you need to make some assumptions
File Name: LoRA.md
Score: -0.25470000000000004
Content:
---
tags:
  - fine-tuning
  - llm
  - generative-ai
---
> [[Large Language Model]], [[Generative AI]], [[Fine Tuning]]

L

In [22]:
question = "i wanna know how ICL works"

graph_search_crew = define_crew(question)
result = graph_search_crew.kickoff()
print(result)

Overriding of current TracerProvider is not allowed




[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3m ICL

Final Answer: ICL. This acronym stands for Integrated Circuit Logic, which could be the intended search term based on the context of the question, as the person asked about how "ICL" works. However, without further clarification or additional context, it's essential to understand that ICL may refer to various concepts in different domains such as integrated circuits, finance (Interest Coverage Ratio), or computer science, among others. Thus, a more specific search term might be necessary depending on the intended application or subject matter.[0m

[1m> Finished chain.[0m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3m I need to extract a more specific search term from the context provided by the user's question before searching the graph database using get_docs tool. In this case, since the context suggests that ICL could refer to Integrated Circuits Logic, I will use this term as my search quer