Refactor lexical-graph-hybrid-dev: consolidate config, replace FalkorDB with Neo4j, fix notebooks and scripts by mykola-pereyma · Pull Request #189 · awslabs/graphrag-toolkit

mykola-pereyma · 2026-04-08T22:40:06Z

Description

Refactors examples/lexical-graph-hybrid-dev/ to consolidate environment configuration, replace FalkorDB with Neo4j, standardize naming, and fix multiple bugs found during Docker integration testing.

Changes

Environment Configuration

Delete redundant docker/.env.jupyter and docker/.env.template; consolidate into single notebooks/.env.template
Replace 3 S3 bucket vars (LOCAL_EXTRACT_S3, PROMPT_S3, S3_BUCKET_EXTRACK_BUILD_BATCH_NAME) with single S3_BUCKET_NAME + key prefixes
Align DynamoDB table name (graphrag-toolkit-batch-table) across scripts and config
Update models to claude-sonnet-4-6, fix AWS_PROFILE default to default

Docker

Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose
Rename all services to hybrid convention (neo4j-hybrid, pgvector-hybrid, jupyter-hybrid)
Add neo4j Python driver and build-essential to dev Dockerfile
Fix dev-reset.sh: run docker compose down before rebuilding
Remove lexical-graph-src mount from main compose files (fixes dev mode always being detected as True)
Fix dev compose mount path; remove non-existent mysql schema mount

AWS Setup Scripts

Add bedrock:InvokeModel permission to batch inference IAM role policy
Add S3 prompt file upload (extracts text from JSON, uploads as .txt)
Align bucket naming to graphrag-toolkit-ACCOUNT_ID pattern
Apply all fixes to both .sh and .ps1 scripts

Notebooks

Fix titles to match numbering (00-Setup through 04-Cloud Querying)
Standardize dotenv loading (%dotenv magic)
Fix collection_id mismatch (web-docs → best-practices) in notebook 01
Add Neo4jGraphStoreFactory registration to notebook 03 (self-contained kernel)
Add Bedrock Prompt Management section to notebook 04
Add GPU requirement warning for BGEReranker
Remove hardcoded ARNs, empty cells; clear all outputs

Documentation

Reorder README Quick Start (run setup script before configuring .env)
Add .env sync instructions after setup script section
Update model references, fix BATCH_ROLE_NAME, align DynamoDB name in docs

Testing

Docker integration tested on macOS ARM (Podman 5.8.1)
Full pipeline validated: extract → S3 → build graph → query (notebooks 00-04)
AWS resources created/verified/cleaned up (S3, DynamoDB, IAM, Bedrock prompts)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Delete redundant docker/.env.jupyter and docker/.env.template - Consolidate into single notebooks/.env.template (source of truth) - Replace 3 S3 bucket vars with single S3_BUCKET_NAME + prefixes - Align DynamoDB table name (graphrag-toolkit-batch-table) - Fix env var names (INCLUDE_DOMAIN_LABELS, AWS_REGION) - Update models to claude-sonnet-4-6, AWS_PROFILE to default - Add .env and output.log to .gitignore

- Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose - Rename all services to hybrid convention across all compose files - Add neo4j Python driver and build-essential to dev Dockerfile - Fix dev-reset.sh: run docker compose down before rebuilding - Add mysql cleanup to dev-reset.sh - Update container/volume names in reset scripts - Remove lexical-graph-src mount from main compose (fix dev mode detection) - Fix dev compose mount path and remove non-existent mysql schema mount

- Add bedrock:InvokeModel to batch inference IAM role policy - Align bucket name to graphrag-toolkit-ACCOUNT_ID pattern - Fix AWS profile default from padmin to default - Add S3 prompt file upload (extract from JSON, upload as .txt) - Apply all fixes to both .sh and .ps1 scripts

- Fix titles to match numbering (00-Setup through 04-Cloud-Querying) - Standardize dotenv loading (%dotenv magic) - Fix collection_id web-docs to best-practices in notebook 01 - Add Neo4jGraphStoreFactory registration to notebook 03 - Add Bedrock prompt provider section to notebook 04 - Add GPU requirement warning for BGEReranker - Remove hardcoded ARNs, use placeholder comments - Fix hardcoded region to os.environ in notebook 03 - Remove empty trailing cells, clear all outputs

- Update README Quick Start to correct setup order - Add .env sync instructions after setup script - Update model references to claude-sonnet-4-6 - Add RESPONSE_MODEL and EVALUATION_MODEL to docs - Fix BATCH_ROLE_NAME to bedrock-batch-inference-role - Align DynamoDB table name in batch_processing.md and aws_integration.md - Replace padmin with your-profile in setup docs - Replace ccms-rag-extract with graphrag-toolkit bucket names

acarbonetto

Looks good. Please take a look at the comments.

acarbonetto · 2026-04-09T15:48:36Z

+bash setup-bedrock-batch.sh
+```
+
+This creates `graphrag-toolkit-<ACCOUNT_ID>` (S3), `graphrag-toolkit-batch-table` (DynamoDB), and `bedrock-batch-inference-role` (IAM).


I don't like how this creates resources with static names for DynamoDB and IAM. That means we replace/reuse these if the stack deploys twice.
And if this is running in parallel, we could run into many difficulties.

The script already handles re-runs — it checks for existing resources before creation (S3 via head-bucket, DynamoDB and IAM via create-or-skip with "already exists" messages). Re-running is safe and idempotent.
Re parallel-users scenario — this is a single-developer dev example, so static names are an intentional simplicity here.

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

Remove quotes from all env var values in README.md and .env.template. Add link to AWS CLI quickstart guide and aws sts get-caller-identity verification command in prerequisites.

The bare .env pattern on line 31 already matches .env files in all subdirectories, making the explicit paths for local-dev and hybrid-dev notebooks redundant.

…emplate

Replace repeated env vars listing and setup instructions with a link to notebooks/.env.template as the single source of truth.

acarbonetto · 2026-04-10T15:57:08Z


 ```bash
-bash setup-bedrock-batch.sh padmin
+bash setup-bedrock-batch.sh your-profile


Suggested change

bash setup-bedrock-batch.sh your-profile

bash setup-bedrock-batch.sh [your-profile]

acarbonetto · 2026-04-10T15:58:37Z

 # Usage: .\setup-graphrag.ps1 [-Profile <aws_profile>]
 param(
-    [string]$Profile = "padmin"
+    [string]$Profile = "default"


we can remove the default - the check below doesn't pass profile if profile isn't specified

Remove hardcoded 'default' profile from setup-bedrock-batch.sh and .ps1. Scripts now accept an optional profile argument — when omitted, AWS CLI uses its default credential chain (env vars, instance profile, etc.). - .sh: use PROFILE_ARGS conditional variable - .ps1: use @ProfileArgs splatting - Update README.md and setup-bedrock-batch-doc.md accordingly

mykola-pereyma added 5 commits April 8, 2026 14:13

acarbonetto reviewed Apr 9, 2026

View reviewed changes

mykola-pereyma and others added 6 commits April 9, 2026 14:55

Update examples/lexical-graph-hybrid-dev/notebooks/.env.template

023e48f

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

Update examples/lexical-graph-hybrid-dev/README.md

2cf0d00

Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>

Remove quotes from env vars, add AWS CLI quickstart link

c0f2590

Remove quotes from all env var values in README.md and .env.template. Add link to AWS CLI quickstart guide and aws sts get-caller-identity verification command in prerequisites.

Remove redundant .env paths from .gitignore

2c31a61

The bare .env pattern on line 31 already matches .env files in all subdirectories, making the explicit paths for local-dev and hybrid-dev notebooks redundant.

Replace duplicated env vars in aws_integration.md with link to .env.t…

a75cbab

…emplate

Remove duplicate .env configuration sections from README

cd9bfb8

Replace repeated env vars listing and setup instructions with a link to notebooks/.env.template as the single source of truth.

oussamahansal approved these changes Apr 10, 2026

View reviewed changes

acarbonetto approved these changes Apr 10, 2026

View reviewed changes

mykola-pereyma added 2 commits April 16, 2026 12:01

Use brackets for optional profile arg in setup docs

94b98a6

acarbonetto merged commit 9cc2d58 into awslabs:main Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor lexical-graph-hybrid-dev: consolidate config, replace FalkorDB with Neo4j, fix notebooks and scripts#189

Refactor lexical-graph-hybrid-dev: consolidate config, replace FalkorDB with Neo4j, fix notebooks and scripts#189
acarbonetto merged 13 commits intoawslabs:mainfrom
mykola-pereyma:fix/examples-lexical-graph-hybrid-dev

mykola-pereyma commented Apr 8, 2026

Uh oh!

acarbonetto left a comment

Uh oh!

Uh oh!

acarbonetto Apr 9, 2026

Uh oh!

mykola-pereyma Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

acarbonetto Apr 10, 2026

Uh oh!

acarbonetto Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	bash setup-bedrock-batch.sh your-profile
	bash setup-bedrock-batch.sh [your-profile]

Conversation

mykola-pereyma commented Apr 8, 2026

Description

Changes

Environment Configuration

Docker

AWS Setup Scripts

Notebooks

Documentation

Testing

Uh oh!

acarbonetto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

acarbonetto Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

mykola-pereyma Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

acarbonetto Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

acarbonetto Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants