# 📊 Dynamic Knowledge Graph Construction with CAMEL

This notebook demonstrates how to build a dynamic knowledge graph using CAMEL's Knowledge Graph Agent and Neo4j. The knowledge graph is constructed by parsing PDF documents, extracting entities and relationships, and storing them in a Neo4j database. The graph is then queried to retrieve time-based relationships.

In this notebook, you'll explore:

- **CAMEL**: A powerful multi-agent framework that enables the construction of knowledge graphs from unstructured data.
- **Neo4j**: A graph database used to store and query the knowledge graph.
- **Mistral and SambaVerse Models**: Large language models used to generate the knowledge graph from parsed documents.
- **Deduplication**: Techniques to ensure the uniqueness of nodes and relationships in the graph.

This setup not only demonstrates a practical application of AI-driven knowledge graph construction but also provides a flexible framework that can be adapted to other real-world scenarios requiring dynamic graph generation and querying.

## 📦 Installation

First, install the CAMEL package with all its dependencies:

In [None]:
!pip install "camel-ai[all]==0.2.16"

## 🔑 Setting Up API Keys

You'll need to set up your API keys for Mistral and SambaVerse. This ensures that the tools can interact with external services securely.

In [None]:
import os
from getpass import getpass

# Prompt for the Mistral API key securely
mistral_api_key = getpass('Enter your Mistral API key: ')
os.environ["MISTRAL_API_KEY"] = mistral_api_key

# Prompt for the SambaVerse API key securely
sambaverse_api_key = getpass('Enter your SambaVerse API key: ')
os.environ["SAMBA_API_KEY"] = sambaverse_api_key

Enter your Mistral API key: ··········
Enter your SambaVerse API key: ··········


## 🛠️ Setting Up Neo4j

To store and query the knowledge graph, you'll need a Neo4j instance. If you don't have one, you can set up a local instance or use a cloud service like Neo4j Aura.

1. **Local Setup**: Download and install Neo4j Desktop from [here](https://neo4j.com/download/).
2. **Cloud Setup**: Sign up for Neo4j Aura [here](https://neo4j.com/cloud/aura/).

Once you have your Neo4j instance running, set up the connection details:

In [None]:
neo4j_uri = input('Enter your Neo4j URI: ')
neo4j_username = input('Enter your Neo4j username: ')
neo4j_password = getpass('Enter your Neo4j password: ')

os.environ["NEO4J_URI"] = neo4j_uri
os.environ["NEO4J_USERNAME"] = neo4j_username
os.environ["NEO4J_PASSWORD"] = neo4j_password

Enter your Neo4j URI: bolt://localhost:7687
Enter your Neo4j username: neo4j
Enter your Neo4j password: ··········


## 🧠 Setting Up the Knowledge Graph Agent

We will use CAMEL's Knowledge Graph Agent to parse PDF documents, extract entities and relationships, and store them in the Neo4j database. The agent uses Mistral and SambaVerse models for graph generation.

In [None]:
from camel.agents import KnowledgeGraphAgent
from camel.models import ModelFactory
from camel.storages import Neo4jGraph
from camel.loaders import UnstructuredIO
from camel.embeddings import MistralEmbedding
from camel.configs import MistralConfig, SambaCloudAPIConfig
from camel.types import ModelPlatformType, ModelType
from tqdm import tqdm
from pathlib import Path
import time
import camel.utils.kg_deduplication_utils as dedupe_util

# Set up Neo4j connection
neo4j_graph = Neo4jGraph(
    url=os.environ["NEO4J_URI"],
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)

# Clear the Neo4j database before starting
print("Clearing Neo4j database...")
neo4j_graph.query("MATCH (n) DETACH DELETE n")
print("✅ Neo4j database cleared successfully.")

# Set up Mistral Large 2 model
mistral_large_2 = ModelFactory.create(
    model_platform=ModelPlatformType.MISTRAL,
    model_type=ModelType.MISTRAL_LARGE,
    model_config_dict=MistralConfig(temperature=0.2).as_dict(),
)

# Use Samba Verse model
sambaverse_api_model = ModelFactory.create(
    model_platform=ModelPlatformType.SAMBA,
    model_type="Meta-Llama-3.1-405B-Instruct",
    model_config_dict=SambaCloudAPIConfig(max_tokens=2048).as_dict(),
    api_key=os.environ["SAMBA_API_KEY"],
    url="https://api.sambanova.ai/v1",
)

# Set up the example files
example_file_dir = Path("pdf")
assert example_file_dir.exists(), "Please set the correct path to the example pdf files."

example_pdf_files = list(example_file_dir.glob("*.pdf"))
print(f"Found {len(example_pdf_files)} PDF files.")

# UnstructuredIO is a tool to parse and chunk the documents.
uio = UnstructuredIO()

# Mistral Large 2 is a model to generate the knowledge graph.
mistral_kg_agent = KnowledgeGraphAgent(model=mistral_large_2)

# Samba Verse model is a model to generate the knowledge graph.
llama_405b_kg_agent = KnowledgeGraphAgent(model=sambaverse_api_model)

## 🏗️ Building the Knowledge Graph

We will now parse the PDF files, extract entities and relationships, and store them in the Neo4j database. The process involves chunking the documents, generating graph elements, deduplicating nodes, and adding relationships to the graph.

In [None]:
import hashlib

def normalize_name(name: str, max_length: int = 64) -> str:
    """Normalize the label name to comply with Neo4j's naming rules"""
    # Remove special characters and replace spaces with underscores
    normalized = "".join(c if c.isalnum() else "_" for c in name)
    # Ensure it does not start with a digit
    if normalized[0].isdigit():
        normalized = "id_" + normalized
    # Remove extra underscores
    normalized = "_".join(filter(None, normalized.split("_")))

    # If the VID is too long, use a hash function to generate a fixed-length VID
    if len(normalized) > max_length:
        # Use the SHA-1 hash function to generate a fixed-length VID
        hash_value = hashlib.sha1(normalized.encode()).hexdigest()
        # Truncate to max_length
        normalized = hash_value[:max_length]

    return normalized

for file in example_pdf_files:
    elements = uio.parse_file_or_url(str(file))
    chunk_elements = uio.chunk_elements(
        elements, chunk_type="chunk_by_title", max_characters=2048
    )

    for element in tqdm(chunk_elements):
        graph_element = llama_405b_kg_agent.run(element, parse_graph_elements=True)

        # Add processing logic to rename 'Date' type
        for node in graph_element.nodes:
            if node.type == "Date":
                node.type = "TimePoint"  # or another name that is not a reserved keyword
            elif node.type == "{self.type}":
                node.type = "Node"  # Set default type
            node.id = normalize_name(node.id, max_length=64)  # Ensure VID length does not exceed 64

        # Generate embeddings for nodes
        node_embeddings = dedupe_util.generate_node_embeddings(graph_element.nodes, MistralEmbedding())

        # Perform internal deduplication
        deduplication_result = dedupe_util.deduplicate_nodes_internally(
            graph_element.nodes, node_embeddings, threshold=0.65
        )

        # Get unique nodes
        unique_nodes = deduplication_result.unique_nodes
        unique_node_ids = {node.id for node in unique_nodes}

        # Filter relationships
        unique_relationships = []
        for rel in graph_element.relationships:
            if rel.subj.id in unique_node_ids and rel.obj.id in unique_node_ids:
                unique_relationships.append(rel)

        # Add nodes in Neo4j after deduplication
        for rel in unique_relationships:
            current_time = time.strftime("%Y-%m-%dT%H:%M:%S", time.localtime())
            neo4j_graph.add_triplet(subj=rel.subj.id, obj=rel.obj.id, rel=rel.type, timestamp=current_time)
    break

## 🔍 Querying the Knowledge Graph

Now that the knowledge graph is built, we can query it to retrieve time-based relationships.

In [None]:
# Query all triplets
all_triplets = neo4j_graph.get_triplet()
if all_triplets:
    for triplet in all_triplets:
        print(f"Subject: {triplet['subj']}, Object: {triplet['obj']}, Relationship: {triplet['rel']}, Timestamp: {triplet['timestamp']}")
else:
    print("No triplets found in the database.")

## 🌟 Highlights

This notebook has guided you through setting up and running a dynamic knowledge graph construction workflow using CAMEL's Knowledge Graph Agent and Neo4j. You can adapt and expand this example for various other scenarios requiring dynamic graph generation and querying.

Key tools utilized in this notebook include:

- **CAMEL**: A powerful multi-agent framework that enables the construction of knowledge graphs from unstructured data.
- **Neo4j**: A graph database used to store and query the knowledge graph.
- **Mistral and SambaVerse Models**: Large language models used to generate the knowledge graph from parsed documents.
- **Deduplication**: Techniques to ensure the uniqueness of nodes and relationships in the graph.

This comprehensive setup allows you to adapt and expand the example for various scenarios requiring dynamic graph generation and querying.

<div class="align-center">
  <a href="https://www.camel-ai.org/"><img src="https://i.postimg.cc/KzQ5rfBC/button.png"width="150"></a>
  <a href="https://discord.camel-ai.org"><img src="https://i.postimg.cc/L4wPdG9N/join-2.png"  width="150"></a></a>
  
⭐ <i>Star us on [*Github*](https://github.com/camel-ai/camel), join our [*Discord*](https://discord.camel-ai.org) or follow our [*X*](https://x.com/camelaiorg)
</div>