<a href="https://colab.research.google.com/github/okonp07/GraphRAG-Pipeline-Deployment-in-Python/blob/main/Knowledge_Base_%26_GraphRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Knowledge Base & GraphRAG (Coding Challenge)**
### Deploying a GraphRAG Pipeline with Neo4j and FAISS using python
|  |  |
|:---|:---|
| **Estimated Runtime** | ~10–20 minutes (depending on dataset size and system specs) |
| **Prior Knowledge** | Python, basic NLP/embeddings, graph databases (Neo4j), cybersecurity concepts (Domain Knowledge) |
| **Key Libraries** | `sentence-transformers`, `faiss`, `neo4j`, `nltk`, `networkx`, `dotenv` |
| **Model Used** | `all-MiniLM-L6-v2` (from `sentence-transformers`) |
| **Primary Use Case** | GraphRAG-based question answering for cybersecurity knowledge extraction |
| **Graph Database** | Neo4j (used to construct and query the knowledge graph) |
| **Vector Store** | FAISS (used for fast semantic similarity search) |
| **System Requirements** | Minimum: 4GB RAM; Recommended: 8GB+ for faster embedding and indexing |
| **Deployment Format** | Jupyter Notebook |
| **Author** | Okon Prince — Data Scientist, AI/ML Engineer & Knowledge Graph Enthusiast |
| **Specialization** | AI for cybersecurity, knowledge graphs, and real-world LLM applications |

*This project implements a lightweight GraphRAG (Graph-based Retrieval-Augmented Generation) system using Python to enhance context-aware language generation. It combines semantic search with knowledge graph construction to retrieve relevant document chunks and generate informed responses via a fine-tuned TinyLlama language model. Designed for cybersecurity applications, the system is optimized for low-resource environments and offers a simple Gradio interface for interaction.*

### **Configuring the Environment and installing dependencies**
In the next few cells, I would be configuring the environment, installing dependcies and libraries that are required for the project to be deployed.

In [2]:
!pip install datasets
!pip install faiss-cpu

[31mERROR: Could not find a version that satisfies the requirement faiss (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for faiss[0m[31m
[0mCollecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.5.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.2/491.2 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116

In [5]:
!pip install neo4j

Collecting neo4j
  Downloading neo4j-5.28.1-py3-none-any.whl.metadata (5.9 kB)
Downloading neo4j-5.28.1-py3-none-any.whl (312 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/312.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━[0m [32m174.1/312.3 kB[0m [31m5.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.3/312.3 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: neo4j
Successfully installed neo4j-5.28.1


In [7]:
# 1. Install required packages
!pip install -q datasets sentence-transformers torch faiss-cpu neo4j python-dotenv transformers spacy networkx matplotlib pytextrank tqdm ipywidgets

# 2. Download spaCy model
!python -m spacy download en_core_web_sm > /dev/null

# 3. Start Neo4j in Google Colab
!apt-get update > /dev/null
!apt-get install -y openjdk-11-jdk > /dev/null
!wget -O neo4j.tar.gz https://dist.neo4j.org/neo4j-community-4.4.14-unix.tar.gz > /dev/null
!tar -xf neo4j.tar.gz > /dev/null
!mv neo4j-community-4.4.14 neo4j > /dev/null

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m58.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m39.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [6]:
# Import Libraries
import os
import json
import torch
import faiss
import numpy as np
import pandas as pd
from typing import List, Dict, Any
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from neo4j import GraphDatabase
import re
import spacy
from tqdm.notebook import tqdm

###**Configure and start Neo4j**
We start by configuring and starting our Neo4j graph database instance. Specifically, we set the database to listen for connections on all interfaces for Bolt (Neo4j's binary protocol) and HTTP, disables authentication, and then starts the Neo4j service.

In [8]:
# Configure and start Neo4j
!echo "dbms.connector.bolt.listen_address=0.0.0.0:7687" >> neo4j/conf/neo4j.conf
!echo "dbms.connector.http.listen_address=0.0.0.0:7474" >> neo4j/conf/neo4j.conf
!echo "dbms.security.auth_enabled=false" >> neo4j/conf/neo4j.conf
!echo "dbms.default_listen_address=0.0.0.0" >> neo4j/conf/neo4j.conf
!cd neo4j && bin/neo4j start > /dev/null

The code below prints a message indicating the Neo4j server is starting and then pauses the script's execution for 10 seconds to allow the server to initialize. It essentially announces the server's startup and waits briefly to ensure it's ready for use.

In [9]:
import time
print("Starting Neo4j server...")
time.sleep(10)  # Give Neo4j time to start

Starting Neo4j server...


###**Loading the Dataset**
The code below loads a subset of the "zeroshot/cybersecurity-corpus" dataset from Hugging Face for demonstration purposes and prints the number of documents loaded. It retrieves cybersecurity-related text data and provides information on the dataset's size (100 Documents).

In [10]:
# Load dataset
print("Loading cybersecurity corpus...")
dataset = load_dataset("zeroshot/cybersecurity-corpus", split="train[:100]")  # Load a subset for demo
print(f"Dataset loaded with {len(dataset)} documents")

Loading cybersecurity corpus...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

sent_train.csv:   0%|          | 0.00/112k [00:00<?, ?B/s]

sent_valid.csv:   0%|          | 0.00/29.7k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/789 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/211 [00:00<?, ? examples/s]

Dataset loaded with 100 documents


###**Data Preparation**
This code extracts the text content from the loaded dataset and creates a list of corresponding document IDs. It prepares the data by assigning the text to the documents variable and generating a list of numerical identifiers for each document.

In [11]:
# Prepare data
documents = dataset['text']
document_ids = list(range(len(documents)))

This code iterates through the documents, converts any non-string entries to strings, and cleans them by replacing newlines and carriage returns with spaces, removing extra whitespace, and stripping leading/trailing whitespace before storing the cleaned documents. Finally, it prints the number of processed documents.

In [12]:
# Clean documents
clean_documents = []
for doc in documents:
    if not isinstance(doc, str):
        doc = str(doc)
    doc = doc.replace('\n', ' ').replace('\r', ' ')
    doc = re.sub(r'\s+', ' ', doc)
    doc = doc.strip()
    clean_documents.append(doc)

documents = clean_documents
print(f"Processed {len(documents)} documents")

Processed 100 documents


###**Embedding**
Below, we  generate numerical representations (embeddings) for the cleaned documents using the SentenceTransformer model and prints the number of generated embeddings along with their dimensions. It transforms the text into a format suitable for machine learning tasks by converting it into dense vector representations.

In [13]:
# Generate embeddings
print("Generating document embeddings...")
embedder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
embeddings = embedder.encode(documents, show_progress_bar=True)
print(f"Generated {len(embeddings)} embeddings of dimension {embeddings.shape[1]}")

Generating document embeddings...


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/4 [00:00<?, ?it/s]

Generated 100 embeddings of dimension 768


###**Creating the FAISS Index**
Next we create a FAISS index, a data structure for efficient similarity search, by initializing it with the appropriate dimensions and adding the document embeddings. It prepares the embeddings for quick retrieval based on similarity.

In [14]:
# Create FAISS index
print("Creating FAISS index...")
embedding_size = embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_size)
index.add(np.array(embeddings).astype('float32'))
print(f"Added {index.ntotal} vectors to FAISS index")

Creating FAISS index...
Added 100 vectors to FAISS index


 The dictionary called `document_map` associates a string representation of each document's index with a dictionary containing the document's ID, text, and its corresponding embedding index. It essentially organizes the documents for efficient retrieval and linking to their embeddings.

In [15]:
# Create document map
document_map = {str(i): {"id": i, "text": documents[i], "embedding_index": i} for i in range(len(documents))}

###**Connect to the Neo4j Database**
We establish a connection to the Neo4j graph database using the provided Bolt protocol address and authentication credentials, then prints a confirmation message upon successful connection. It sets up the communication channel needed to interact with the Neo4j database.

In [16]:
# Set up Neo4j connection
print("Connecting to Neo4j...")
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", ""))
print("Connected to Neo4j")

Connecting to Neo4j...
Connected to Neo4j


###**Reset the Database and Create constraints**

Here, we define two functions, clear_database and create_constraints, to manage the Neo4j database schema. clear_database deletes all nodes and relationships, while create_constraints ensures unique IDs for documents and entities. The code then executes these functions within a Neo4j session, effectively resetting the database and establishing uniqueness constraints.

In [17]:
# Set up the knowledge graph
def clear_database(tx):
    tx.run("MATCH (n) DETACH DELETE n")

def create_constraints(tx):
    try:
        tx.run("CREATE CONSTRAINT document_id IF NOT EXISTS ON (d:Document) ASSERT d.id IS UNIQUE")
        tx.run("CREATE CONSTRAINT entity_name IF NOT EXISTS ON (e:Entity) ASSERT e.name IS UNIQUE")
    except:
        # Fallback for older Neo4j versions
        tx.run("CREATE CONSTRAINT ON (d:Document) ASSERT d.id IS UNIQUE")
        tx.run("CREATE CONSTRAINT ON (e:Entity) ASSERT e.name IS UNIQUE")

with driver.session() as session:
    session.write_transaction(clear_database)
    session.write_transaction(create_constraints)
    print("Database cleared and constraints created")

  session.write_transaction(clear_database)
  session.write_transaction(create_constraints)


Database cleared and constraints created


###**Map Keywords to Vector Entities**
This code defines a dictionary, cs_terms, which maps cybersecurity-related keywords (like "malware," "phishing") to broader cybersecurity entity types (like "THREAT," "ATTACK_VECTOR"). It provides a categorization of cybersecurity terms for graph representation.

In [18]:
# Define cybersecurity entity types
cs_terms = {
    "malware": "THREAT",
    "ransomware": "THREAT",
    "phishing": "ATTACK_VECTOR",
    "vulnerability": "VULNERABILITY",
    "exploit": "ATTACK_TECHNIQUE",
    "cve": "VULNERABILITY",
    "ddos": "ATTACK_TECHNIQUE",
    "firewall": "DEFENSE_MECHANISM",
    "encryption": "DEFENSE_TECHNIQUE",
    "authentication": "DEFENSE_TECHNIQUE"
}

This code defines a dictionary, cs_terms, which maps cybersecurity-related keywords (like "malware," "phishing") to broader cybersecurity entity types (like "THREAT," "ATTACK_VECTOR"). It provides a categorization of cybersecurity terms for graph representation.

In [19]:
# Load spaCy model for entity extraction
nlp = spacy.load("en_core_web_sm")

###**Building the Knowledge Graph**

This is a function that extracts cybersecurity entities from text and builds a knowledge graph in Neo4j. It extracts entities based on predefined terms, creates document and entity nodes, and establishes relationships between them, including document-entity "MENTIONS" relationships and simplified entity-entity relationships like "EXPLOITS" and "MITIGATES." Finally, it iterates through the documents, performs entity extraction, and uses Neo4j transactions to populate the graph database.

In [21]:
# Extract entities and build knowledge graph
def extract_entities(text):
    text_lower = text.lower()
    entities = []

    # Check for domain-specific terms
    for term, entity_type in cs_terms.items():
        if term in text_lower:
            entities.append({"name": term, "type": entity_type})

    return entities

def create_document_node(tx, doc_id, doc_text):
    tx.run(
        "CREATE (d:Document {id: $id, text: $text})",
        id=str(doc_id), text=doc_text[:500]  # Truncate text to avoid large nodes
    )

def create_entity_node(tx, entity_name, entity_type):
    tx.run(
        "MERGE (e:Entity {name: $name, type: $type})",
        name=entity_name.lower(), type=entity_type
    )

def create_document_entity_relationship(tx, doc_id, entity_name):
    tx.run(
        """
        MATCH (d:Document {id: $doc_id})
        MATCH (e:Entity {name: $entity_name})
        MERGE (d)-[:MENTIONS]->(e)
        """,
        doc_id=str(doc_id), entity_name=entity_name.lower()
    )

def create_entity_entity_relationship(tx, entity1_name, entity2_name, relation_type):
    tx.run(
        f"""
        MATCH (e1:Entity {{name: $entity1_name}})
        MATCH (e2:Entity {{name: $entity2_name}})
        MERGE (e1)-[:{relation_type}]->(e2)
        """,
        entity1_name=entity1_name.lower(), entity2_name=entity2_name.lower()
    )

print("Building knowledge graph...")
with driver.session() as session:
    for i, (doc_id, doc) in enumerate(tqdm(zip(document_ids, documents), total=len(documents))):
        # Create document node
        session.write_transaction(create_document_node, doc_id, doc)

        # Extract entities
        entities = extract_entities(doc)

        # Create entity nodes and relationships
        for entity in entities:
            session.write_transaction(create_entity_node, entity["name"], entity["type"])
            session.write_transaction(create_document_entity_relationship, doc_id, entity["name"])

        # Create entity-entity relationships (simplified)
        for i, entity1 in enumerate(entities):
            for entity2 in entities[i+1:]:
                if entity1["name"] != entity2["name"]:
                    if entity1["type"] == "THREAT" and entity2["type"] == "VULNERABILITY":
                        session.write_transaction(create_entity_entity_relationship,
                                                entity1["name"], entity2["name"], "EXPLOITS")
                    elif entity1["type"] == "DEFENSE_MECHANISM" and entity2["type"] == "THREAT":
                        session.write_transaction(create_entity_entity_relationship,
                                                entity1["name"], entity2["name"], "MITIGATES")

print("Knowledge graph built successfully")

Building knowledge graph...


  0%|          | 0/100 [00:00<?, ?it/s]

  session.write_transaction(create_document_node, doc_id, doc)
  session.write_transaction(create_entity_node, entity["name"], entity["type"])
  session.write_transaction(create_document_entity_relationship, doc_id, entity["name"])


Knowledge graph built successfully


###**Perform Retrieval (RAG)**
This code defines a set of functions to perform retrieval-augmented generation (RAG) using both vector search and graph-based search in Neo4j. It includes vector_search to find relevant documents using embeddings, extract_query_entities to identify entities in the query, graph_search to retrieve information from the Neo4j graph, and hybrid_search to combine the results from both methods for improved retrieval.

In [22]:
# Define the GraphRAG query functions
def vector_search(query_text, top_k=5):
    # Convert query to embedding
    query_embedding = embedder.encode(query_text)

    # Search FAISS
    query_embedding = np.array([query_embedding]).astype('float32')
    distances, indices = index.search(query_embedding, top_k)

    # Get results
    results = []
    for i, idx in enumerate(indices[0]):
        doc_id = str(int(idx))
        if doc_id in document_map:
            doc = document_map[doc_id]
            results.append({
                "document_id": doc_id,
                "text": doc["text"],
                "score": float(distances[0][i]),
                "source": "vector"
            })

    return results

def extract_query_entities(query_text):
    entities = []
    query_lower = query_text.lower()

    for term, entity_type in cs_terms.items():
        if term in query_lower:
            entities.append({"name": term, "type": entity_type})

    return entities

def graph_search(entities, top_k=5):
    if not entities:
        return []

    entity_names = [e["name"].lower() for e in entities]
    names_str = ", ".join([f"'{name}'" for name in entity_names])

    query = f"""
    MATCH (d:Document)-[:MENTIONS]->(e:Entity)
    WHERE e.name IN [{names_str}]
    RETURN d.id AS document_id, d.text AS text,
           e.name AS entity_name, e.type AS entity_type,
           count(e) AS relevance
    ORDER BY relevance DESC
    LIMIT {top_k}
    """

    results = []
    with driver.session() as session:
        records = session.run(query)
        for record in records:
            results.append({
                "document_id": record["document_id"],
                "text": record["text"],
                "entity": record["entity_name"],
                "entity_type": record["entity_type"],
                "score": record["relevance"],
                "source": "graph"
            })

    return results

def hybrid_search(query_text, top_k=5):
    # Get vector results
    vector_results = vector_search(query_text, top_k)

    # Get graph results
    query_entities = extract_query_entities(query_text)
    graph_results = graph_search(query_entities, top_k)

    # Combine results
    combined_results = {}

    # Add vector results
    for result in vector_results:
        doc_id = result["document_id"]
        combined_results[doc_id] = result

    # Add graph results
    for result in graph_results:
        doc_id = result["document_id"]
        if doc_id not in combined_results:
            combined_results[doc_id] = result

    # Convert back to list and sort by score
    results_list = list(combined_results.values())
    results_list.sort(key=lambda x: x["score"], reverse=True)

    return results_list[:top_k]

###**Query Examples**
This code defines a function, query_graphrag, that takes a query, retrieves relevant information using a hybrid search approach, and then prints the retrieved results with details such as document ID, score, source, and any extracted entities, providing a user-friendly output of the search results. It effectively demonstrates how to query the GraphRAG system and display the retrieved information.

In [23]:
# Demo query function
def query_graphrag(query_text, top_k=5):
    print(f"\nQuery: {query_text}")
    print(f"Retrieving top {top_k} results...")

    results = hybrid_search(query_text, top_k)

    print(f"Found {len(results)} results:\n")
    for i, result in enumerate(results):
        print(f"Result {i+1}:")
        print(f"Document ID: {result['document_id']}")
        print(f"Score: {result.get('score', 'N/A')}")
        print(f"Source: {result.get('source', 'unknown')}")

        if 'entity' in result:
            print(f"Entity: {result['entity']} ({result['entity_type']})")

        # Show a snippet of the text
        text = result['text']
        if len(text) > 300:
            text = text[:300] + "..."
        print(f"Text: {text}")
        print("---")

    return results

In [24]:
# Run some example queries
print("\nExample queries:")
query_graphrag("What are common phishing techniques?")
query_graphrag("How to protect against ransomware?")
query_graphrag("What are vulnerabilities in authentication systems?")
query_graphrag("What is the relationship between malware and firewalls?")


Example queries:

Query: What are common phishing techniques?
Retrieving top 5 results...
Found 5 results:

Result 1:
Document ID: 15
Score: 1.361126184463501
Source: vector
Text: Lessons Learned from the Facebook Breach: Why Logic Errors Are So Hard to Catch https://t.co/IAaTx3QZiY by @JGamblin #facebook
---
Result 2:
Document ID: 9
Score: 1.3301184177398682
Source: vector
Text: The Tricky Balance in Declining or Accepting Online Payments - https://t.co/aiPOpOnk59 #Fraud #cybercrime
---
Result 3:
Document ID: 93
Score: 1.3050832748413086
Source: vector
Text: Beware email malware posing as an 'overdue invoice': http://t.co/b6KDAzMSeF #malware http://t.co/NqmCL28Xck
---
Result 4:
Document ID: 42
Score: 1.2703430652618408
Source: vector
Text: Summer: A Time for Vacations &amp; Cyberattacks? https://t.co/80wZg2XDXC by @roblemos #summer2019 #cybercrime #phishing #malware
---
Result 5:
Document ID: 70
Score: 1.2587177753448486
Source: vector
Text: RT @CISecurity: Positive this list could b

[{'document_id': '25',
  'text': 'Good news and bad news about the BadUSB malware...http://t.co/L9IDgg8q0u http://t.co/QMJ5aJ8rLI',
  'score': 1.4803177118301392,
  'source': 'vector'},
 {'document_id': '8',
  'text': 'New @ESET Research: Fake or Fake: Keeping up with #OceanLotus decoys #malware #cybersecurity #InfoSec https://t.co/5pRvuRRMis',
  'score': 1.4726920127868652,
  'source': 'vector'},
 {'document_id': '17',
  'text': 'RT @CiscoSecurity: Unless security solutions talk to each other, legitimate threats slip through the cracks. That’s a good reason to shift…',
  'score': 1.4114975929260254,
  'source': 'vector'},
 {'document_id': '42',
  'text': 'Summer: A Time for Vacations &amp; Cyberattacks? https://t.co/80wZg2XDXC by @roblemos #summer2019 #cybercrime #phishing #malware',
  'score': 1.3853330612182617,
  'source': 'vector'},
 {'document_id': '68',
  'text': 'A US policy shift emphasizes forward defense against cyber operations. Supporters say this can reduce attacks; criti

###**Clossing the session**
This code closes the connection to the Neo4j database to release resources and then prints a success message to the console, indicating the completion of the demo. It performs necessary cleanup and provides feedback to the user.

In [25]:
# Cleanup
driver.close()
print("\nDemo completed successfully!")


Demo completed successfully!
