# Create a database with neo4j

### Prerequisites

1) Install Neo4j 4.4.11 locally ( https://neo4j.com/download/ ).
2) Confirm 'neo4j-admin' is on your PATH, or specify its absolute path.
3) You have a .dump file from a previous 'neo4j-admin dump' or otherwise.

### Load a new Neo4j 4.4.11 database from a .dump file

Create or overwrite a local database from the .dump file.

IMPORTANT: 
 - If you have an active Neo4j server running, you may need to stop it first 
   so that 'neo4j-admin load' doesn't conflict with a running database.
 - The database name is the name you want to store your data under.

Use Cases:
 - Migrating data from one instance to another.
 - Replacing a local dev DB with a known data set.
"""

In [None]:
import os
import subprocess

from IPython.display import display, Markdown
import requests

# ------------------------------
# Step 1: Define your configuration and download the DB file
# ------------------------------

NEO4J_VERSION = "4.4.11"  # for reference or checks
NEO4J_HOME = "/path/to/neo4j-4.4.11"  # e.g. your local Neo4j 4.4.11 install
DATABASE_NAME = "dr-neo4j-fraud-ai-accelerator"  # name of the DB to create
DUMP_FILE_PATH = (
    "./fraud-detection-neo4j-4-4-11-Feb-6-2025-01-35-39.dump"  # path to your .dump file
)

# Download dataset from S3
response = requests.get(
    "https://s3.us-east-1.amazonaws.com/datarobot_public_datasets/ai_accelerators/fraud-detection-neo4j-4-4-11-Feb-6-2025-01-35-39.dump",
    stream=True,
)
response.raise_for_status()  # Raises HTTPError if the request returned an unsuccessful status code.

with open(DUMP_FILE_PATH, "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)
print(f"File downloaded successfully and saved to {DUMP_FILE_PATH}")

# If neo4j-admin is not on PATH, specify the full path:
NEO4J_ADMIN_CMD = os.path.join(NEO4J_HOME, "bin", "neo4j-admin")

# If your environment uses systemd or something else to start/stop, adapt as needed
# e.g. "sudo systemctl stop neo4j" or "neo4j stop" if you're using older scripts
STOP_NEO4J_CMD = os.path.join(NEO4J_HOME, "bin", "neo4j") + " stop"
START_NEO4J_CMD = os.path.join(NEO4J_HOME, "bin", "neo4j") + " start"

## Stop any running Neo4j instance

This is an optional step.

In [None]:
print("Stopping any running Neo4j instance (optional if not running)...")

try:
    subprocess.run(STOP_NEO4J_CMD.split(), check=True)
    print("Neo4j service stopped.")
except subprocess.CalledProcessError as e:
    print("Warning: Attempt to stop Neo4j failed or Neo4j was not running.")
    print(e)

## Load the database

Run a command like:
  neo4j-admin load --database=DATABASE_NAME --from=DUMP_FILE_PATH --force

The `--force` flag overwrites if the database already exists. In 4.4.x, 'neo4j-admin load' is the recommended approach. 

In [None]:
load_cmd = [
    NEO4J_ADMIN_CMD,
    "load",
    "--database",
    DATABASE_NAME,
    "--from",
    DUMP_FILE_PATH,
    "--force",
]

print("\nLoading DB with command:", " ".join(load_cmd))
try:
    subprocess.run(load_cmd, check=True)
    print(f"Successfully loaded {DUMP_FILE_PATH} into '{DATABASE_NAME}'.")
except subprocess.CalledProcessError as e:
    print("Error while loading DB from dump:")
    print(e)
    raise SystemExit("Failed to load database.")

## (Re)Start Neo4j 

In [None]:
print("\nStarting Neo4j service...")

try:
    subprocess.run(START_NEO4J_CMD.split(), check=True)
    print("Neo4j started successfully.")
except subprocess.CalledProcessError as e:
    print("Error starting Neo4j service:")
    print(e)
    raise SystemExit("Failed to start Neo4j.")

## Provide a summary

In [None]:
msg = f"""
**Summary**:
- Neo4j 4.4.11 home: `{NEO4J_HOME}`
- Database name: `{DATABASE_NAME}`
- Dump file used: `{DUMP_FILE_PATH}`

You have now loaded and started Neo4j. You can connect to 
this new DB (4.4.11) and confirm your data.

**Example**: 
In Neo4j Browser:
  :use {DATABASE_NAME}
  MATCH (n) RETURN n LIMIT 10;

**Done.**
"""
display(Markdown(msg))