Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add yc kg cookbook rebased #439

Merged
merged 4 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

<img src="./docs/pages/r2r.png" alt="Sciphi Framework">
<h3 align="center">
Build, deploy, observe, and optimize your RAG system.
Build, deploy, observe, and optimize your RAG engine.
</h3>

# About
Expand All @@ -30,7 +30,7 @@ For a more complete view of R2R, check out our [documentation](https://r2r-docs.

## Table of Contents
1. [Quick Install](#quick-install)
2. [R2R Python SDK Demo](#r2r-demo)
2. [R2R Python SDK Demo](#r2r-python-sdk-demo)
3. [R2R Dashboard](#r2r-dashboard)
4. [Community and Support](#community-and-support)
5. [Contributing](#contributing)
Expand Down Expand Up @@ -340,6 +340,7 @@ There are a number of helpful tutorials and cookbooks that can be found in the [
- [Local RAG](https://r2r-docs.sciphi.ai/cookbooks/local-rag): A quick cookbook demonstration of how to run R2R with local LLMs.
- [Hybrid Search](https://r2r-docs.sciphi.ai/cookbooks/hybrid-search): A brief introduction to running hybrid search with R2R.
- [Reranking](https://r2r-docs.sciphi.ai/cookbooks/rerank-search): A short guide on how to apply reranking to R2R results.
- [GraphRAG](https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph): A walkthrough of automatic knowledge graph generation with R2R.
- [Dashboard](https://r2r-docs.sciphi.ai/cookbooks/dashboard): A how-to guide on connecting with the R2R Dashboard.
- [SciPhi Cloud Docs](https://docs.sciphi.ai/): SciPhi Cloud documentation.

Expand Down
2 changes: 1 addition & 1 deletion config.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"rerank_model": "None"
},
"kg": {
"provider": "None",
"provider": "neo4j",
"batch_size": 1,
"text_splitter": {
"type": "recursive_character",
Expand Down
293 changes: 293 additions & 0 deletions docs/pages/cookbooks/knowledge-graph.mdx

Large diffs are not rendered by default.

38 changes: 22 additions & 16 deletions docs/pages/cookbooks/local-rag.mdx
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
import { Tabs } from 'nextra/components'
import { Callout } from 'nextra/components'

## Building a Local RAG System with R2R

### Installation

<Tabs items={['Docker', 'Pip']}>



<Tabs.Tab>

<Callout type="info" emoji="🐳">
Docker makes it convenient to run R2R without managing your local environment.
</Callout>

<details>
<summary>Docker</summary>
First, download the latest R2R image from Dockerhub:

To run R2R using Docker, you can use the following commands:

```bash filename="bash" copy
docker pull emrgntcmplxty/r2r:latest
```

This will pull the latest R2R Docker image.

Then, run the container with:
Then, run the service:

```bash filename="bash" copy
docker run -d \
Expand All @@ -40,9 +43,18 @@ This command starts the R2R container with the following options:
- `-e CONFIG_OPTION=local_ollama`: Selects the "local_ollama" configuration option.
- `emrgntcmplxty/r2r:latest`: Specifies the Docker image to use.

</details>

We can start by using `pip` to install R2R with the local-embeddding dependencies:
Lastly, install the R2R client using `pip`

```bash filename="bash" copy
pip install 'r2r'
```

</Tabs.Tab>

<Tabs.Tab>

We can install r2r with the necessary optional dependencies to run locally using `pip`

```bash filename="bash" copy
pip install 'r2r[local-embedding]'
Expand All @@ -52,7 +64,6 @@ R2R supports `Ollama`, a popular tool for Local LLM inference. Ollama is provid

Ollama must be installed independently. You can install Ollama by following the instructions on their [official website](https://ollama.com/) or by referring to their [GitHub README](https://github.com/ollama/ollama).


### Configuration

Let's move on to setting up the R2R pipeline. R2R relies on a `config.json` file for defining various settings, such as embedding models and chunk sizes. By default, the `config.json` found in the R2R GitHub repository's root directory is set up for cloud-based services.
Expand Down Expand Up @@ -98,15 +109,10 @@ This chosen config modification above instructs R2R to use the `sentence-transfo

A local vector database will be used to store the embeddings. The current default is a minimal sqlite implementation.

### Server Standup
</Tabs.Tab>

</Tabs>

```bash filename="bash" copy
# cd $WORKDIR
python -m r2r.examples.servers.configurable_pipeline --host 0.0.0.0 --port 8000 --config local_ollama --pipeline_type qna
```

The server exposes a REST API for interacting with the R2R RAG pipeline and application. See the [API docs](/getting-started/app-api) for more details on the available endpoints.

## Ingesting and Embedding Documents

Expand Down
14 changes: 9 additions & 5 deletions docs/pages/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@ R2R was conceived to help developers bridge the gap between local LLM experiment

## Key Features

- **🔧 Build**: Effortlessly create and manage observable, high-performance RAG pipelines with our robust framework, including multimodal RAG, hybrid search, and the latest methods such as HyDE.
- **🚀 Deploy**: Launch production-ready asynchronous RAG pipelines with seamless streaming capabilities. Begin serving users immediately with built-in user and document management features.
- **🧩 Customize**: Easily tailor your pipeline using intuitive configuration files to meet your specific needs.
- **🔌 Extend**: Enhance and extend your pipeline with custom code integrations to add new functionalities.
- **🤖 OSS**: Leverage a framework developed by the open-source community, ensuring flexibility, scalability, and ease of deployment.
- **📁 Multimodal Support**: Ingest files ranging from `.txt`, `.pdf`, `.json` to `.png`, `.mp3`, and more.
- **🔍 Hybrid Search**: Combine semantic and keyword search with reciprocal rank fusion for enhanced relevancy.
- **🔗 Graph RAG**: Automatically extract relationships and build knowledge graphs.
- **🗂️ App Management**: Efficiently manage documents and users with rich observability and analytics.
- **🌐 Client-Server**: RESTful API support out of the box.
- **🧩 Configurable**: Provision your application using intuitive configuration files..
- **🔌 Extensible**: Develop your application further with a convenient builder and factory pattern.

## Demo(s)
The [R2R Demo](/getting-started/r2r-demo) provides a step-by-step guide to running the default R2R Retrieval-Augmented Generation (RAG) backend. The demo ingests the provided documents and illustrates search and RAG functionality, logging, analytics, and document management.
Expand All @@ -33,6 +35,8 @@ To get started with R2R, we recommend setting up the framework and following an
- [Local RAG](/cookbooks/local-rag): A quick cookbook demonstration of how to run R2R with local LLMs.
- [Hybrid Search](/cookbooks/hybrid-search): A brief introduction to running hybrid search with R2R.
- [Reranking](/cookbooks/rerank-search): A short guide on how to apply reranking to R2R results.
- [GraphRAG](https://r2r-docs.sciphi.ai/cookbooks/knowledge-graph): A walkthrough of automatic knowledge graph generation with R2R.
- [Dashboard](https://r2r-docs.sciphi.ai/cookbooks/dashboard): A how-to guide on connecting with the R2R Dashboard.
- [SciPhi Cloud Docs](https://docs.sciphi.ai/): SciPhi Cloud documentation.

## Community
Expand Down
88 changes: 52 additions & 36 deletions r2r/core/abstractions/document.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
"""Abstractions for documents and their extractions."""

import json
import logging
import uuid
from datetime import datetime
from enum import Enum
from typing import Optional, Union

from pydantic import BaseModel

logger = logging.getLogger(__name__)

DataType = Union[str, bytes]


Expand Down Expand Up @@ -115,20 +118,12 @@ class Entity(BaseModel):
sub_category: Optional[str] = None
value: str


def extract_entities(entity_data: dict[str, str]) -> list[Entity]:
entities = []
for entity_key, entity_value in entity_data.items():
parts = entity_value.split(":")
if len(parts) == 2:
category, value = parts
sub_category = None
else:
category, sub_category, value = parts
entities.append(
Entity(category=category, sub_category=sub_category, value=value)
def __str__(self):
return (
f"{self.category}:{self.sub_category}:{self.value}"
if self.sub_category
else f"{self.category}:{self.value}"
)
return entities


class Triple(BaseModel):
Expand All @@ -139,36 +134,57 @@ class Triple(BaseModel):
object: str


def extract_triples(
triplet_data: list[str], entities: dict[str, str]
) -> list[Triple]:
triples = []
for triplet in triplet_data:
parts = triplet.split(": ")
subject_key = parts[0]
predicate = parts[1]
object_key = parts[2]

subject = entities[subject_key]
if object_key in entities:
object = entities[object_key]
else:
for entities_key, entities_value in entities.items():
if entities_key in object_key:
object_key = object_key.replace(
entities_key, entities_value
def extract_entities(llm_payload: list[str]) -> dict[str, Entity]:
entities = {}
for entry in llm_payload:
try:
if "], " in entry: # Check if the entry is an entity
entry_val = entry.split("], ")[0] + "]"
entry = entry.split("], ")[1]
colon_count = entry.count(":")

if colon_count == 1:
category, value = entry.split(":")
sub_category = None
elif colon_count == 2:
category, sub_category, value = entry.split(":")
elif colon_count > 2:
parts = entry.split(":", 2)
category, sub_category, value = (
parts[0],
parts[1],
parts[2],
)
else:
raise ValueError("Unexpected entry format")
except Exception as e:
logger.error(f"Error processing entity {entry}: {e}")
continue
return entities

object = object_key

triples.append(
Triple(subject=subject, predicate=predicate, object=object)
)
def extract_triples(
llm_payload: list[str], entities: dict[str, Entity]
) -> list[Triple]:
triples = []
for entry in llm_payload:
try:
if "], " not in entry: # Check if the entry is an entity
subject, predicate, object = entry.split(" ")
subject = str(entities[subject])
if "[" in object and "]" in object:
object = str(entities[object])
triples.append(
Triple(subject=subject, predicate=predicate, object=object)
)
except Exception as e:
logger.error(f"Error processing triplet {entry}: {e}")
continue
return triples


class KGExtraction(BaseModel):
"""An extraction from a document that is part of a knowledge graph."""

entities: list[Entity]
entities: dict[str, Entity]
triples: list[Triple]
4 changes: 3 additions & 1 deletion r2r/core/logging/log_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,9 @@ def calculate_basic_statistics(logs, key):
else None
)
std_dev = round(statistics.stdev(values) if len(values) > 1 else 0, 3)
variance = round(statistics.variance(values) if len(values) > 1 else 0, 3)
variance = round(
statistics.variance(values) if len(values) > 1 else 0, 3
)

return {
"Mean": mean,
Expand Down
11 changes: 11 additions & 0 deletions r2r/examples/configs/neo4j_kg.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"kg": {
"provider": "neo4j",
"batch_size": 1,
"text_splitter": {
"type": "recursive_character",
"chunk_size": 2048,
"chunk_overlap": 0
}
}
}
Loading
Loading