Skip to content

Refactor lexical-graph-hybrid-dev: consolidate config, replace FalkorDB with Neo4j, fix notebooks and scripts#189

Merged
acarbonetto merged 13 commits intoawslabs:mainfrom
mykola-pereyma:fix/examples-lexical-graph-hybrid-dev
Apr 16, 2026
Merged

Refactor lexical-graph-hybrid-dev: consolidate config, replace FalkorDB with Neo4j, fix notebooks and scripts#189
acarbonetto merged 13 commits intoawslabs:mainfrom
mykola-pereyma:fix/examples-lexical-graph-hybrid-dev

Conversation

@mykola-pereyma
Copy link
Copy Markdown
Contributor

Description

Refactors examples/lexical-graph-hybrid-dev/ to consolidate environment configuration, replace FalkorDB with Neo4j, standardize naming, and fix multiple bugs found during Docker integration testing.

Changes

Environment Configuration

  • Delete redundant docker/.env.jupyter and docker/.env.template; consolidate into single notebooks/.env.template
  • Replace 3 S3 bucket vars (LOCAL_EXTRACT_S3, PROMPT_S3, S3_BUCKET_EXTRACK_BUILD_BATCH_NAME) with single S3_BUCKET_NAME + key prefixes
  • Align DynamoDB table name (graphrag-toolkit-batch-table) across scripts and config
  • Update models to claude-sonnet-4-6, fix AWS_PROFILE default to default

Docker

  • Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose
  • Rename all services to hybrid convention (neo4j-hybrid, pgvector-hybrid, jupyter-hybrid)
  • Add neo4j Python driver and build-essential to dev Dockerfile
  • Fix dev-reset.sh: run docker compose down before rebuilding
  • Remove lexical-graph-src mount from main compose files (fixes dev mode always being detected as True)
  • Fix dev compose mount path; remove non-existent mysql schema mount

AWS Setup Scripts

  • Add bedrock:InvokeModel permission to batch inference IAM role policy
  • Add S3 prompt file upload (extracts text from JSON, uploads as .txt)
  • Align bucket naming to graphrag-toolkit-ACCOUNT_ID pattern
  • Apply all fixes to both .sh and .ps1 scripts

Notebooks

  • Fix titles to match numbering (00-Setup through 04-Cloud Querying)
  • Standardize dotenv loading (%dotenv magic)
  • Fix collection_id mismatch (web-docsbest-practices) in notebook 01
  • Add Neo4jGraphStoreFactory registration to notebook 03 (self-contained kernel)
  • Add Bedrock Prompt Management section to notebook 04
  • Add GPU requirement warning for BGEReranker
  • Remove hardcoded ARNs, empty cells; clear all outputs

Documentation

  • Reorder README Quick Start (run setup script before configuring .env)
  • Add .env sync instructions after setup script section
  • Update model references, fix BATCH_ROLE_NAME, align DynamoDB name in docs

Testing

  • Docker integration tested on macOS ARM (Podman 5.8.1)
  • Full pipeline validated: extract → S3 → build graph → query (notebooks 00-04)
  • AWS resources created/verified/cleaned up (S3, DynamoDB, IAM, Bedrock prompts)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Delete redundant docker/.env.jupyter and docker/.env.template
- Consolidate into single notebooks/.env.template (source of truth)
- Replace 3 S3 bucket vars with single S3_BUCKET_NAME + prefixes
- Align DynamoDB table name (graphrag-toolkit-batch-table)
- Fix env var names (INCLUDE_DOMAIN_LABELS, AWS_REGION)
- Update models to claude-sonnet-4-6, AWS_PROFILE to default
- Add .env and output.log to .gitignore
- Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose
- Rename all services to hybrid convention across all compose files
- Add neo4j Python driver and build-essential to dev Dockerfile
- Fix dev-reset.sh: run docker compose down before rebuilding
- Add mysql cleanup to dev-reset.sh
- Update container/volume names in reset scripts
- Remove lexical-graph-src mount from main compose (fix dev mode detection)
- Fix dev compose mount path and remove non-existent mysql schema mount
- Add bedrock:InvokeModel to batch inference IAM role policy
- Align bucket name to graphrag-toolkit-ACCOUNT_ID pattern
- Fix AWS profile default from padmin to default
- Add S3 prompt file upload (extract from JSON, upload as .txt)
- Apply all fixes to both .sh and .ps1 scripts
- Fix titles to match numbering (00-Setup through 04-Cloud-Querying)
- Standardize dotenv loading (%dotenv magic)
- Fix collection_id web-docs to best-practices in notebook 01
- Add Neo4jGraphStoreFactory registration to notebook 03
- Add Bedrock prompt provider section to notebook 04
- Add GPU requirement warning for BGEReranker
- Remove hardcoded ARNs, use placeholder comments
- Fix hardcoded region to os.environ in notebook 03
- Remove empty trailing cells, clear all outputs
- Update README Quick Start to correct setup order
- Add .env sync instructions after setup script
- Update model references to claude-sonnet-4-6
- Add RESPONSE_MODEL and EVALUATION_MODEL to docs
- Fix BATCH_ROLE_NAME to bedrock-batch-inference-role
- Align DynamoDB table name in batch_processing.md and aws_integration.md
- Replace padmin with your-profile in setup docs
- Replace ccms-rag-extract with graphrag-toolkit bucket names
Copy link
Copy Markdown
Collaborator

@acarbonetto acarbonetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please take a look at the comments.

Comment thread examples/lexical-graph-hybrid-dev/README.md Outdated
bash setup-bedrock-batch.sh
```

This creates `graphrag-toolkit-<ACCOUNT_ID>` (S3), `graphrag-toolkit-batch-table` (DynamoDB), and `bedrock-batch-inference-role` (IAM).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how this creates resources with static names for DynamoDB and IAM. That means we replace/reuse these if the stack deploys twice.
And if this is running in parallel, we could run into many difficulties.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script already handles re-runs — it checks for existing resources before creation (S3 via head-bucket, DynamoDB and IAM via create-or-skip with "already exists" messages). Re-running is safe and idempotent.
Re parallel-users scenario — this is a single-developer dev example, so static names are an intentional simplicity here.

Comment thread examples/lexical-graph-hybrid-dev/README.md Outdated
Comment thread examples/lexical-graph-hybrid-dev/README.md Outdated
Comment thread examples/lexical-graph-hybrid-dev/README.md Outdated
Comment thread .gitignore Outdated
Comment thread examples/lexical-graph-hybrid-dev/notebooks/.env.template Outdated
Comment thread examples/lexical-graph-hybrid-dev/notebooks/.env.template Outdated
Comment thread examples/lexical-graph-hybrid-dev/docs/aws_integration.md Outdated
Comment thread examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch-doc.md Outdated
mykola-pereyma and others added 6 commits April 9, 2026 14:55
Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Remove quotes from all env var values in README.md and .env.template.
Add link to AWS CLI quickstart guide and aws sts get-caller-identity
verification command in prerequisites.
The bare .env pattern on line 31 already matches .env files in all
subdirectories, making the explicit paths for local-dev and hybrid-dev
notebooks redundant.
Replace repeated env vars listing and setup instructions with a link
to notebooks/.env.template as the single source of truth.

```bash
bash setup-bedrock-batch.sh padmin
bash setup-bedrock-batch.sh your-profile
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bash setup-bedrock-batch.sh your-profile
bash setup-bedrock-batch.sh [your-profile]

# Usage: .\setup-graphrag.ps1 [-Profile <aws_profile>]
param(
[string]$Profile = "padmin"
[string]$Profile = "default"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the default - the check below doesn't pass profile if profile isn't specified

Remove hardcoded 'default' profile from setup-bedrock-batch.sh and .ps1.
Scripts now accept an optional profile argument — when omitted, AWS CLI
uses its default credential chain (env vars, instance profile, etc.).

- .sh: use PROFILE_ARGS conditional variable
- .ps1: use @ProfileArgs splatting
- Update README.md and setup-bedrock-batch-doc.md accordingly
@acarbonetto acarbonetto merged commit 9cc2d58 into awslabs:main Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants