Refactor lexical-graph-hybrid-dev: consolidate config, replace FalkorDB with Neo4j, fix notebooks and scripts#189
Conversation
- Delete redundant docker/.env.jupyter and docker/.env.template - Consolidate into single notebooks/.env.template (source of truth) - Replace 3 S3 bucket vars with single S3_BUCKET_NAME + prefixes - Align DynamoDB table name (graphrag-toolkit-batch-table) - Fix env var names (INCLUDE_DOMAIN_LABELS, AWS_REGION) - Update models to claude-sonnet-4-6, AWS_PROFILE to default - Add .env and output.log to .gitignore
- Replace FalkorDB with Neo4j 5.25-community + APOC in dev compose - Rename all services to hybrid convention across all compose files - Add neo4j Python driver and build-essential to dev Dockerfile - Fix dev-reset.sh: run docker compose down before rebuilding - Add mysql cleanup to dev-reset.sh - Update container/volume names in reset scripts - Remove lexical-graph-src mount from main compose (fix dev mode detection) - Fix dev compose mount path and remove non-existent mysql schema mount
- Add bedrock:InvokeModel to batch inference IAM role policy - Align bucket name to graphrag-toolkit-ACCOUNT_ID pattern - Fix AWS profile default from padmin to default - Add S3 prompt file upload (extract from JSON, upload as .txt) - Apply all fixes to both .sh and .ps1 scripts
- Fix titles to match numbering (00-Setup through 04-Cloud-Querying) - Standardize dotenv loading (%dotenv magic) - Fix collection_id web-docs to best-practices in notebook 01 - Add Neo4jGraphStoreFactory registration to notebook 03 - Add Bedrock prompt provider section to notebook 04 - Add GPU requirement warning for BGEReranker - Remove hardcoded ARNs, use placeholder comments - Fix hardcoded region to os.environ in notebook 03 - Remove empty trailing cells, clear all outputs
- Update README Quick Start to correct setup order - Add .env sync instructions after setup script - Update model references to claude-sonnet-4-6 - Add RESPONSE_MODEL and EVALUATION_MODEL to docs - Fix BATCH_ROLE_NAME to bedrock-batch-inference-role - Align DynamoDB table name in batch_processing.md and aws_integration.md - Replace padmin with your-profile in setup docs - Replace ccms-rag-extract with graphrag-toolkit bucket names
acarbonetto
left a comment
There was a problem hiding this comment.
Looks good. Please take a look at the comments.
| bash setup-bedrock-batch.sh | ||
| ``` | ||
|
|
||
| This creates `graphrag-toolkit-<ACCOUNT_ID>` (S3), `graphrag-toolkit-batch-table` (DynamoDB), and `bedrock-batch-inference-role` (IAM). |
There was a problem hiding this comment.
I don't like how this creates resources with static names for DynamoDB and IAM. That means we replace/reuse these if the stack deploys twice.
And if this is running in parallel, we could run into many difficulties.
There was a problem hiding this comment.
The script already handles re-runs — it checks for existing resources before creation (S3 via head-bucket, DynamoDB and IAM via create-or-skip with "already exists" messages). Re-running is safe and idempotent.
Re parallel-users scenario — this is a single-developer dev example, so static names are an intentional simplicity here.
Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Co-authored-by: Andrew Carbonetto <andrew.carbonetto@improving.com>
Remove quotes from all env var values in README.md and .env.template. Add link to AWS CLI quickstart guide and aws sts get-caller-identity verification command in prerequisites.
The bare .env pattern on line 31 already matches .env files in all subdirectories, making the explicit paths for local-dev and hybrid-dev notebooks redundant.
Replace repeated env vars listing and setup instructions with a link to notebooks/.env.template as the single source of truth.
|
|
||
| ```bash | ||
| bash setup-bedrock-batch.sh padmin | ||
| bash setup-bedrock-batch.sh your-profile |
There was a problem hiding this comment.
| bash setup-bedrock-batch.sh your-profile | |
| bash setup-bedrock-batch.sh [your-profile] |
| # Usage: .\setup-graphrag.ps1 [-Profile <aws_profile>] | ||
| param( | ||
| [string]$Profile = "padmin" | ||
| [string]$Profile = "default" |
There was a problem hiding this comment.
we can remove the default - the check below doesn't pass profile if profile isn't specified
Remove hardcoded 'default' profile from setup-bedrock-batch.sh and .ps1. Scripts now accept an optional profile argument — when omitted, AWS CLI uses its default credential chain (env vars, instance profile, etc.). - .sh: use PROFILE_ARGS conditional variable - .ps1: use @ProfileArgs splatting - Update README.md and setup-bedrock-batch-doc.md accordingly
Description
Refactors
examples/lexical-graph-hybrid-dev/to consolidate environment configuration, replace FalkorDB with Neo4j, standardize naming, and fix multiple bugs found during Docker integration testing.Changes
Environment Configuration
docker/.env.jupyteranddocker/.env.template; consolidate into singlenotebooks/.env.templateLOCAL_EXTRACT_S3,PROMPT_S3,S3_BUCKET_EXTRACK_BUILD_BATCH_NAME) with singleS3_BUCKET_NAME+ key prefixesgraphrag-toolkit-batch-table) across scripts and configclaude-sonnet-4-6, fixAWS_PROFILEdefault todefaultDocker
neo4j-hybrid,pgvector-hybrid,jupyter-hybrid)neo4jPython driver andbuild-essentialto dev Dockerfiledocker compose downbefore rebuildinglexical-graph-srcmount from main compose files (fixes dev mode always being detected as True)AWS Setup Scripts
bedrock:InvokeModelpermission to batch inference IAM role policy.txt)graphrag-toolkit-ACCOUNT_IDpattern.shand.ps1scriptsNotebooks
%dotenvmagic)collection_idmismatch (web-docs→best-practices) in notebook 01Neo4jGraphStoreFactoryregistration to notebook 03 (self-contained kernel)Documentation
.env).envsync instructions after setup script sectionBATCH_ROLE_NAME, align DynamoDB name in docsTesting
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.