# Step 5 — Importing HALDxAI Data into PostgreSQL and Neo4j

*(English version, rewritten for GitHub documentation)*

This step explains how to:

1. Reset PostgreSQL schemas
2. Reset the Neo4j database
3. Import HALDxAI database tables into PostgreSQL
4. Import nodes & relations into Neo4j
5. Export backups of the final databases

These steps prepare the HALDxAI Knowledge Graph for use in APIs, the WebApp, and large-scale querying.

## 1. Reset PostgreSQL Schemas

This clears all existing tables and recreates the schema from scratch.

In [None]:
%%python
from pathlib import Path
import pandas as pd

from haldxai.database.pg_utils import wipe as pg_wipe, wipe_schema

ROOT      = Path("/path/to/HALDxAI-Project")
DB_FOLDER = ROOT / "data" / "database"

# ① Drop and recreate both schemas (public + hald)
pg_wipe(drop_schema=True)      # or pg_wipe() to keep schema
wipe_schema("hald")            # remove hald schema completely before reloading schema_hald.sql

Expected output:

```
✅ [PG] public schema dropped & recreated
✅ [hald] schema dropped & recreated
```

After running this, execute your schema SQL:

In [None]:
psql -h localhost -p 5432 -U postgres -d postgres \
    -f /path/to/HALDxAI-Project/configs/schema_hald.sql

## 2. Reset Neo4j Database

This removes all nodes and relationships from Neo4j.

In [None]:
%%python
from pathlib import Path
from haldxai.database.neo4j_utils import wipe as neo_wipe

ROOT      = Path("/path/to/HALDxAI-Project")
DB_FOLDER = ROOT / "data" / "database"

# ① Clear Neo4j database
neo_wipe()

## 3. Import Data into PostgreSQL

HALDxAI provides a workflow module for loading CSV files into PostgreSQL.

### Common usage patterns

#### ① Import all tables (replace existing data)

In [None]:
python -m haldxai.workflow.import_to_pg \
    --dir /path/to/HALDxAI-Project/data/database \
    --mode replace

#### ② Import a single table (append mode)

In [None]:
python import_to_pg.py --table articles --mode append

#### ③ Import multiple tables at once

In [None]:
python import_to_pg.py --table articles,entity_catalog_ext --mode replace

To preview the database status:

In [None]:
%%python
from haldxai.database.inspectors import preview_postgres

preview_postgres()

## 4. Import Data into Neo4j

Before importing, HALDxAI validates that:

* All `start_id` and `end_id` exist
* All relations reference valid nodes
* Missing references are cleaned automatically

In [None]:
%%python
from pathlib import Path
from haldxai.database.validate_utils import validate_graph, clean_relationships
from haldxai.database.inspectors import preview_postgres, preview_neo4j

ROOT = Path("/path/to/HALDxAI-Project")

# 1) Validate node & relation integrity
ok, missing_df = validate_graph(
    ROOT,
    "data/database/nodes.csv",
    "data/database/relations.csv"
)

# 2) Clean relations if needed
if not ok:
    clean_relationships(
        ROOT,
        nodes_path="data/database/nodes.csv",
        rels_path="data/database/relations.csv",
        output_path="data/database/relations.cleaned.csv",
    )

### Neo4j Import Command

Use `neo4j-admin database import full`:

In [None]:
neo4j-admin database import full neo4j \
    --overwrite-destination=true \
    --nodes=/path/to/HALDxAI-Project/data/database/nodes.csv \
    --relationships=/path/to/HALDxAI-Project/data/database/relations.cleaned.csv

Preview a sample:

In [None]:
%%python
# preview_neo4j(sample=5)

## 5. Export PostgreSQL Data (Backup)

To create a PostgreSQL dump:

In [None]:
pg_dump -U postgres -d postgres -n hald -F d -j 4 -f hald_dump_dir

This exports:

* The entire `hald` schema
* All tables
* Multi-threaded export (`-j 4`)
* Directory-format dump (recommended for large databases)