SciPhi-AI · emrgnt-cmplxty · Jun 13, 2024 · Jun 11, 2024 · Jun 12, 2024 · Jun 13, 2024
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 
 <img src="./docs/pages/r2r.png" alt="Sciphi Framework">
 <h3 align="center">
-Build, deploy, observe, and optimize your RAG system.
+Build, deploy, observe, and optimize your RAG engine.
 </h3>
 
 # About
@@ -30,7 +30,7 @@ For a more complete view of R2R, check out our [documentation](https://r2r-docs.
 
 ## Table of Contents
 1. [Quick Install](#quick-install)
-2. [R2R Python SDK Demo](#r2r-demo)
+2. [R2R Python SDK Demo](#r2r-python-sdk-demo)
 3. [R2R Dashboard](#r2r-dashboard)
 4. [Community and Support](#community-and-support)
 5. [Contributing](#contributing)
@@ -340,6 +340,7 @@ There are a number of helpful tutorials and cookbooks that can be found in the [
 - [Local RAG](https://r2r-docs.sciphi.ai/cookbooks/local-rag): A quick cookbook demonstration of how to run R2R with local LLMs.
 - [Hybrid Search](https://r2r-docs.sciphi.ai/cookbooks/hybrid-search): A brief introduction to running hybrid search with R2R.
 - [Reranking](https://r2r-docs.sciphi.ai/cookbooks/rerank-search): A short guide on how to apply reranking to R2R results.
+- [GraphRAG](https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph): A walkthrough of automatic knowledge graph generation with R2R.
 - [Dashboard](https://r2r-docs.sciphi.ai/cookbooks/dashboard): A how-to guide on connecting with the R2R Dashboard.
 - [SciPhi Cloud Docs](https://docs.sciphi.ai/): SciPhi Cloud documentation.
 

diff --git a/config.json b/config.json
@@ -19,7 +19,7 @@
     "rerank_model": "None"
   },
   "kg": {
-    "provider": "None",
+    "provider": "neo4j",
     "batch_size": 1,
     "text_splitter": {
       "type": "recursive_character",

diff --git a/docs/pages/cookbooks/knowledge-graph.mdx b/docs/pages/cookbooks/knowledge-graph.mdx
diff --git a/docs/pages/cookbooks/local-rag.mdx b/docs/pages/cookbooks/local-rag.mdx
@@ -1,25 +1,28 @@
+import { Tabs } from 'nextra/components'
 import { Callout } from 'nextra/components'
 
 ## Building a Local RAG System with R2R
 
 ### Installation
 
+<Tabs items={['Docker', 'Pip']}>
+
+
+
+<Tabs.Tab>
+
 <Callout type="info" emoji="🐳">
    Docker makes it convenient to run R2R without managing your local environment.
 </Callout>
 
-<details>
-<summary>Docker</summary>
+First, download the latest R2R image from Dockerhub:
 
-To run R2R using Docker, you can use the following commands:
 
 ```bash filename="bash" copy
 docker pull emrgntcmplxty/r2r:latest
 ```
 
-This will pull the latest R2R Docker image.
-
-Then, run the container with:
+Then, run the service:
 
 ```bash filename="bash" copy
 docker run -d \
@@ -40,9 +43,18 @@ This command starts the R2R container with the following options:
 - `-e CONFIG_OPTION=local_ollama`: Selects the "local_ollama" configuration option.
 - `emrgntcmplxty/r2r:latest`: Specifies the Docker image to use.
 
-</details>
 
-We can start by using `pip` to install R2R with the local-embeddding dependencies:
+Lastly, install the R2R client using `pip`
+
+```bash filename="bash" copy
+pip install 'r2r'
+```
+
+</Tabs.Tab>
+
+<Tabs.Tab>
+
+We can install r2r with the necessary optional dependencies to run locally using `pip`
 
 ```bash filename="bash" copy
 pip install 'r2r[local-embedding]'
@@ -52,7 +64,6 @@ R2R supports  `Ollama`, a popular tool for Local LLM inference. Ollama is provid
 
 Ollama must be installed independently. You can install Ollama by following the instructions on their [official website](https://ollama.com/) or by referring to their [GitHub README](https://github.com/ollama/ollama).
 
-
 ### Configuration
 
 Let's move on to setting up the R2R pipeline. R2R relies on a `config.json` file for defining various settings, such as embedding models and chunk sizes. By default, the `config.json` found in the R2R GitHub repository's root directory is set up for cloud-based services.
@@ -98,15 +109,10 @@ This chosen config modification above instructs R2R to use the `sentence-transfo
 
 A local vector database will be used to store the embeddings. The current default is a minimal sqlite implementation.
 
-### Server Standup
+</Tabs.Tab>
 
+</Tabs>
 
-```bash filename="bash" copy
-# cd $WORKDIR
-python -m r2r.examples.servers.configurable_pipeline --host 0.0.0.0 --port 8000 --config local_ollama --pipeline_type qna
-```
-
-The server exposes a REST API for interacting with the R2R RAG pipeline and application. See the [API docs](/getting-started/app-api) for more details on the available endpoints.
 
 ## Ingesting and Embedding Documents
 

diff --git a/docs/pages/index.mdx b/docs/pages/index.mdx
@@ -14,11 +14,13 @@ R2R was conceived to help developers bridge the gap between local LLM experiment
 
 ## Key Features
 
-- **🔧 Build**: Effortlessly create and manage observable, high-performance RAG pipelines with our robust framework, including multimodal RAG, hybrid search, and the latest methods such as HyDE.
-- **🚀 Deploy**: Launch production-ready asynchronous RAG pipelines with seamless streaming capabilities. Begin serving users immediately with built-in user and document management features.
-- **🧩 Customize**: Easily tailor your pipeline using intuitive configuration files to meet your specific needs.
-- **🔌 Extend**: Enhance and extend your pipeline with custom code integrations to add new functionalities.
-- **🤖 OSS**: Leverage a framework developed by the open-source community, ensuring flexibility, scalability, and ease of deployment.
+- **📁 Multimodal Support**: Ingest files ranging from `.txt`, `.pdf`, `.json` to `.png`, `.mp3`, and more.
+- **🔍 Hybrid Search**: Combine semantic and keyword search with reciprocal rank fusion for enhanced relevancy.
+- **🔗 Graph RAG**: Automatically extract relationships and build knowledge graphs.
+- **🗂️ App Management**: Efficiently manage documents and users with rich observability and analytics.
+- **🌐 Client-Server**: RESTful API support out of the box.
+- **🧩 Configurable**: Provision your application using intuitive configuration files..
+- **🔌 Extensible**: Develop your application further with a convenient builder and factory pattern.
 
 ## Demo(s)
 The [R2R Demo](/getting-started/r2r-demo) provides a step-by-step guide to running the default R2R Retrieval-Augmented Generation (RAG) backend. The demo ingests the provided documents and illustrates search and RAG functionality, logging, analytics, and document management.
@@ -33,6 +35,8 @@ To get started with R2R, we recommend setting up the framework and following an
 - [Local RAG](/cookbooks/local-rag): A quick cookbook demonstration of how to run R2R with local LLMs.
 - [Hybrid Search](/cookbooks/hybrid-search): A brief introduction to running hybrid search with R2R.
 - [Reranking](/cookbooks/rerank-search): A short guide on how to apply reranking to R2R results.
+- [GraphRAG](https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph): A walkthrough of automatic knowledge graph generation with R2R.
+- [Dashboard](https://r2r-docs.sciphi.ai/cookbooks/dashboard): A how-to guide on connecting with the R2R Dashboard.
 - [SciPhi Cloud Docs](https://docs.sciphi.ai/): SciPhi Cloud documentation.
 
 ## Community

diff --git a/r2r/core/abstractions/document.py b/r2r/core/abstractions/document.py
@@ -1,13 +1,16 @@
 """Abstractions for documents and their extractions."""
 
 import json
+import logging
 import uuid
 from datetime import datetime
 from enum import Enum
 from typing import Optional, Union
 
 from pydantic import BaseModel
 
+logger = logging.getLogger(__name__)
+
 DataType = Union[str, bytes]
 
 
@@ -115,20 +118,12 @@ class Entity(BaseModel):
     sub_category: Optional[str] = None
     value: str
 
-
-def extract_entities(entity_data: dict[str, str]) -> list[Entity]:
-    entities = []
-    for entity_key, entity_value in entity_data.items():
-        parts = entity_value.split(":")
-        if len(parts) == 2:
-            category, value = parts
-            sub_category = None
-        else:
-            category, sub_category, value = parts
-        entities.append(
-            Entity(category=category, sub_category=sub_category, value=value)
+    def __str__(self):
+        return (
+            f"{self.category}:{self.sub_category}:{self.value}"
+            if self.sub_category
+            else f"{self.category}:{self.value}"
         )
-    return entities
 
 
 class Triple(BaseModel):
@@ -139,36 +134,57 @@ class Triple(BaseModel):
     object: str
 
 
-def extract_triples(
-    triplet_data: list[str], entities: dict[str, str]
-) -> list[Triple]:
-    triples = []
-    for triplet in triplet_data:
-        parts = triplet.split(": ")
-        subject_key = parts[0]
-        predicate = parts[1]
-        object_key = parts[2]
-
-        subject = entities[subject_key]
-        if object_key in entities:
-            object = entities[object_key]
-        else:
-            for entities_key, entities_value in entities.items():
-                if entities_key in object_key:
-                    object_key = object_key.replace(
-                        entities_key, entities_value
+def extract_entities(llm_payload: list[str]) -> dict[str, Entity]:
+    entities = {}
+    for entry in llm_payload:
+        try:
+            if "], " in entry:  # Check if the entry is an entity
+                entry_val = entry.split("], ")[0] + "]"
+                entry = entry.split("], ")[1]
+                colon_count = entry.count(":")
+
+                if colon_count == 1:
+                    category, value = entry.split(":")
+                    sub_category = None
+                elif colon_count == 2:
+                    category, sub_category, value = entry.split(":")
+                elif colon_count > 2:
+                    parts = entry.split(":", 2)
+                    category, sub_category, value = (
+                        parts[0],
+                        parts[1],
+                        parts[2],
                     )
+                else:
+                    raise ValueError("Unexpected entry format")
+        except Exception as e:
+            logger.error(f"Error processing entity {entry}: {e}")
+            continue
+    return entities
 
-            object = object_key
 
-        triples.append(
-            Triple(subject=subject, predicate=predicate, object=object)
-        )
+def extract_triples(
+    llm_payload: list[str], entities: dict[str, Entity]
+) -> list[Triple]:
+    triples = []
+    for entry in llm_payload:
+        try:
+            if "], " not in entry:  # Check if the entry is an entity
+                subject, predicate, object = entry.split(" ")
+                subject = str(entities[subject])
+                if "[" in object and "]" in object:
+                    object = str(entities[object])
+                triples.append(
+                    Triple(subject=subject, predicate=predicate, object=object)
+                )
+        except Exception as e:
+            logger.error(f"Error processing triplet {entry}: {e}")
+            continue
     return triples
 
 
 class KGExtraction(BaseModel):
     """An extraction from a document that is part of a knowledge graph."""
 
-    entities: list[Entity]
+    entities: dict[str, Entity]
     triples: list[Triple]
diff --git a/r2r/core/logging/log_processor.py b/r2r/core/logging/log_processor.py
@@ -131,7 +131,9 @@ def calculate_basic_statistics(logs, key):
             else None
         )
         std_dev = round(statistics.stdev(values) if len(values) > 1 else 0, 3)
-        variance = round(statistics.variance(values) if len(values) > 1 else 0, 3)
+        variance = round(
+            statistics.variance(values) if len(values) > 1 else 0, 3
+        )
 
         return {
             "Mean": mean,

diff --git a/r2r/examples/configs/neo4j_kg.json b/r2r/examples/configs/neo4j_kg.json
@@ -0,0 +1,11 @@
+{
+    "kg": {
+        "provider": "neo4j",
+        "batch_size": 1,
+        "text_splitter": {
+        "type": "recursive_character",
+        "chunk_size": 2048,
+        "chunk_overlap": 0
+        }
+    }
+}