Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ Go to the [examples directory](examples) to try out with any of the examples, fo
| [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search |
| [Manuals LLM Extraction](examples/manuals_llm_extraction) | Extract structured information from a manual using LLM |
| [Google Drive Text Embedding](examples/gdrive_text_embedding) | Index text documents from Google Drive |
| [Docs to Knowledge Graph](examples/docs_to_kg) | Extract relationships from Markdown documents and build a knowledge graph |

More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱.

Expand Down
2 changes: 2 additions & 0 deletions examples/docs_to_kg/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Postgres database address for cocoindex
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
61 changes: 61 additions & 0 deletions examples/docs_to_kg/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Build Knowledge Graph from Markdown Documents, with OpenAI, Neo4j and CocoIndex

In this example, we

* Extract relationships from Markdown documents.
* Build a knowledge graph from the relationships.

Please give [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a star to support us if you like our work. Thank you so much with a warm coconut hug 🥥🤗. [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)

## Prerequisite

Before running the example, you need to:

* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
* [Install Neo4j](https://cocoindex.io/docs/getting_started/installation#-install-neo4j) if you don't have one.
* Install / configure LLM API. In this example we use OpenAI. You need to [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example. Alternatively, you can also follow the comments in source code to switch to Ollama, which runs LLM model locally, and get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama).

## Run

### Build the index

Install dependencies:

```bash
pip install -e .
```

Setup:

```bash
python main.py cocoindex setup
```

Update index:

```bash
python main.py cocoindex update
```

### Browse the knowledge graph

After the knowledge graph is build, you can explore the knowledge graph you built in Neo4j Browser.
You can open it at [http://localhost:7474](http://localhost:7474), and run the following Cypher query to get all relationships:

```cypher
MATCH p=()-->() RETURN p
```

## CocoInsight
CocoInsight is a tool to help you understand your data pipeline and data index. CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).

Run CocoInsight to understand your RAG data pipeline:

```
python main.py cocoindex server -c https://cocoindex.io
```

Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight). It connects to your local CocoIndex server with zero data retention.

You can view the pipeline flow and the data preview in the CocoInsight UI:
![CocoInsight UI](https://cocoindex.io/blogs/assets/images/cocoinsight-edd71690dcc35b6c5cf1cb31b51b6f6f.png)
85 changes: 85 additions & 0 deletions examples/docs_to_kg/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""
This example shows how to extract relationships from Markdown documents and build a knowledge graph.
"""
import dataclasses
from dotenv import load_dotenv
import cocoindex


@dataclasses.dataclass
class Relationship:
"""Describe a relationship between two nodes."""
source: str
relationship_name: str
target: str

@dataclasses.dataclass
class Relationships:
"""Describe a relationship between two nodes."""
relationships: list[Relationship]

@cocoindex.flow_def(name="DocsToKG")
def docs_to_kg_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
"""
Define an example flow that extracts triples from files and build knowledge graph.
"""

conn_spec = cocoindex.add_auth_entry(
"Neo4jConnection",
cocoindex.storages.Neo4jConnectionSpec(
uri="bolt://localhost:7687",
user="neo4j",
password="cocoindex",
))

data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="../../docs/docs/core",
included_patterns=["*.md", "*.mdx"]))

relationships = data_scope.add_collector()

with data_scope["documents"].row() as doc:
doc["chunks"] = doc["content"].transform(
cocoindex.functions.SplitRecursively(),
language="markdown", chunk_size=10000)

with doc["chunks"].row() as chunk:
chunk["relationships"] = chunk["text"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI, model="gpt-4o"),
output_type=Relationships,
instruction=(
"Please extract relationships from CocoIndex documents. "
"Focus on concepts and ingnore specific examples. "
"Each relationship should be a tuple of (source, relationship, target).")))

with chunk["relationships"]["relationships"].row() as relationship:
relationships.collect(
id=cocoindex.GeneratedField.UUID,
source=relationship["source"],
relationship_name=relationship["relationship_name"],
target=relationship["target"],
)

relationships.export(
"relationships",
cocoindex.storages.Neo4jRelationship(
connection=conn_spec,
relationship="RELATIONSHIP",
source=cocoindex.storages.Neo4jRelationshipEndSpec(field_name="source", label="Entity"),
target=cocoindex.storages.Neo4jRelationshipEndSpec(field_name="target", label="Entity"),
nodes={
"Entity": cocoindex.storages.Neo4jRelationshipNodeSpec(key_field_name="value"),
},
),
primary_key_fields=["id"],
)

@cocoindex.main_fn()
def _run():
pass

if __name__ == "__main__":
load_dotenv(override=True)
_run()
Loading