Skip to content

feat: add stress tests and CI benchmark integration#298

Merged
mlwelles merged 460 commits intomainfrom
feature/stress-tests
Feb 18, 2026
Merged

feat: add stress tests and CI benchmark integration#298
mlwelles merged 460 commits intomainfrom
feature/stress-tests

Conversation

@mlwelles
Copy link
Contributor

@mlwelles mlwelles commented Feb 6, 2026

Summary

Adds comprehensive stress tests for sync/async clients and integrates pytest-benchmark into CI. Includes merge of main bringing in the picklable AbortedError fix (#299).

Stress Tests

  • Concurrent read queries and mutations using ThreadPoolExecutor (sync) and asyncio.gather (async)
  • Mixed workload tests combining queries, mutations, commits, and discards
  • Transaction conflict handling and upsert tests
  • Retry utilities (retry(), retry_async(), with_retry(), with_retry_async(), run_transaction(), run_transaction_async())
  • Deadlock regression tests validating the fix from fix: prevent asyncio.Lock deadlock in AsyncTxn.do_request error handling #296
  • All tests have consistent _sync or _async suffixes for clear identification

Targeted Benchmarks

Individual operation benchmarks to pinpoint regression root causes:

Category Operations Benchmarked
Query Simple, with variables, best-effort
Mutation commit_now, explicit commit, discard, N-Quads, delete
Transaction Upsert, batch mutations, run_transaction helper
Client check_version, alter schema

26 total benchmarks (13 sync + 13 async) - when a stress test regresses, compare individual operation times to identify the exact bottleneck.

Test Resources

  • Movie dataset (1million.schema, 1million.rdf.gz) downloaded on demand from dgraph-benchmarks repo
  • Session-scoped fixtures: movies_schema(), movies_rdf_gz(), movies_rdf()
  • Automatic decompression with temp directory cleanup

CI Benchmark Integration

  • pytest-benchmark fixtures added to stress tests
  • New benchmarks job in PR/main CI workflow (STRESS_TEST_MODE=moderate)
  • New workflow for semver tag releases
  • JSON + SVG histogram artifacts uploaded

Makefile Improvements

  • make test PYTEST_ARGS="..." syntax (exports propagate to scripts)
  • make benchmark delegates to test target
  • Default PYTEST_ARGS=-v --benchmark-disable
  • STRESS_TEST_MODE and DGRAPH_IMAGE_TAG exported

Stress Test Modes

Each mode uses a rounds parameter that repeats each test's concurrent batch to create sustained load:

Mode Workers Ops/round Rounds load_movies Target duration
quick (default) 20 200 50 No ~30s
moderate 10 200 8 Yes (1M triples) 5-8 min
full 15 500 15 Yes (1M triples) 12-16 min

Housekeeping

  • Removed deprecated unittest.makeSuite usage (removed in Python 3.13)
  • Replaced manual suite() functions with unittest.main() in test files

all-seeing-code and others added 30 commits September 8, 2020 00:04
Add grpc v1.19.0 note
docs(fix): add setup reference to grpcio v1.19.0 package
…allowing one query and one mutation

Per DGRAPH-2777
chore: enabled syntax highlighting
Docs (discuss feedback): fix incorrect statement about upsert blocks allowing one query and one mutation
- Add 6 unit tests verifying the asyncio.Lock deadlock fix
- Rename _discard_internal → _locked_discard (communicates lock precondition)
- Rename _common_discard → _prepare_discard (in both AsyncTxn and Txn)
- Replace assert with RuntimeError for lock-held check (bandit B101)
Enable continuous performance tracking by triggering benchmarks on main
branch pushes in addition to release tags.
- Remove 1million.rdf.gz and 1million.schema from repo (16MB saved)
- Add _downloaded_data_fixture_path() to fetch from dgraph-benchmarks
- Add DATA_FIXTURE_DIR and DATA_FIXTURE_BASE_URL configuration
- Update .gitignore to exclude downloaded test data
- Replace SYNTHETIC_SCHEMA with movies_schema from dgraph-benchmarks
- Add lazy fixture evaluation to only download data when STRESS_TEST_MODE
  is moderate or full (quick mode skips download entirely)
- Support both numeric UIDs and UUIDs in RDF data conversion
- Update all test queries/mutations to use movies schema predicates
@mlwelles mlwelles force-pushed the feature/stress-tests branch from 164be7e to 8f15345 Compare February 11, 2026 21:12
- sync_client → _sync_client (internal function-scoped)
- session_sync_client → sync_client (session-scoped, main client)
- sync_client_with_movies_schema → stress_test_sync_client
- async_client_with_movies_schema → stress_test_async_client
- async_client_with_movies_schema_for_benchmark → stress_test_async_client_for_benchmark
# Conflicts:
#	.gitignore
#	.pre-commit-config.yaml
#	CONTRIBUTING.md
#	Makefile
#	PUBLISHING.md
#	pydgraph/__init__.py
#	pydgraph/errors.py
#	pyproject.toml
#	scripts/local-test.sh
#	tests/test_async_client.py
#	tests/test_retry.py
#	uv.lock
Replace manual suite() functions using makeSuite (removed in Python
3.13) with unittest.main() which handles test discovery automatically.
@mlwelles mlwelles force-pushed the feature/stress-tests branch 3 times, most recently from c7e6233 to dde7356 Compare February 12, 2026 00:40
Each test file now defines its own client fixtures instead of sharing
them via conftest.py. This makes dependencies explicit and allows
shorter, context-appropriate names:

- test_stress_sync.py / test_benchmark_sync.py: sync_client
- test_stress_async.py: async_client (native) + benchmark_client (sync-wrapped)
- test_benchmark_async.py: benchmark_client (sync-wrapped)

Also splits the former tuple-returning fixture into separate
event_loop and client fixtures, and adds proper type annotations
to fix mypy no-untyped-def errors.
@mlwelles mlwelles force-pushed the feature/stress-tests branch from dde7356 to 336a5d0 Compare February 12, 2026 00:46
Remove _sync_client and sync_client_clean fixtures from conftest.
Consumers now use session-scoped sync_client with explicit drop_all
calls. Also rename movies_schema → movies_schema_path (Path) and
movies_schema_content → movies_schema (str) for clarity.
…chmarks by default

- Remove async_client_clean fixture; use async_client with explicit drop_all
- Convert async stress fixtures to sync (benchmark_event_loop.run_until_complete)
  to avoid pytest-asyncio ScopeMismatch with module-scoped fixtures
- Consolidate stress_client and benchmark_client into single stress_client
  fixture in test_stress_async.py
- Make stress_client module-scoped in both stress test files to reuse across tests
- Skip drop_all/schema alter when movies_data_loaded is True (moderate/full modes)
- Default PYTEST_ARGS now includes --benchmark-disable for faster test runs
Replace iterations config with rounds parameter that repeats each
test's concurrent batch to create sustained load. Tune quick mode
(workers=20, ops=200, rounds=50) to achieve ~30s execution time
with --benchmark-disable.
- Set GRPC_ENABLE_FORK_SUPPORT=0 in conftest to prevent gRPC atfork
  crash when pytest-benchmark calls subprocess.fork() to collect
  machine info while gRPC channels are still open
- Change make benchmark to use STRESS_TEST_MODE=quick instead of
  moderate — the 1M movie dataset overwhelms the local Docker compose
  cluster when pytest-benchmark runs calibration rounds
- Add STRESS_TEST_ROUNDS env var override so make benchmark can set
  rounds=1, letting pytest-benchmark handle repetition instead of
  our custom rounds loop
Makefile exports STRESS_TEST_ROUNDS as empty string when not set,
causing int('') ValueError. Use truthiness check instead of
None check to handle both unset and empty cases.
Replace benchmark() with benchmark.pedantic(rounds=1, iterations=1,
warmup_rounds=0) in all 13 stress test functions so pytest-benchmark
doesn't auto-calibrate and repeat heavy concurrent workloads.

This is the proper fix for benchmark timeouts — the stress test's own
inner loop (stress_config["rounds"]) already handles repetition, so
letting pytest-benchmark compound on top overwhelms the Dgraph cluster.

With pedantic() as the primary control:
- Remove STRESS_TEST_ROUNDS env var (no longer needed anywhere)
- Split benchmark target into two phases with separate Docker clusters
  (Phase 1 targeted benchmarks do drop_all which destabilises the alpha)
- Add _wait_for_alpha_ready() health check after bulk-loading 1M triples
- Add LOG variable support for injecting --log-cli-level into pytest
- Document benchmark output files in Makefile comments
- Add stress-benchmark-results.json to .gitignore
@matthewmcneely
Copy link
Contributor

@mlwelles Can you resolve the conflicts and then merge?

@mlwelles mlwelles merged commit c544061 into main Feb 18, 2026
26 checks passed
@mlwelles mlwelles deleted the feature/stress-tests branch February 18, 2026 22:14
mlwelles added a commit that referenced this pull request Feb 25, 2026
## Summary

- Bumps VERSION from 25.1.0 to 25.2.0 in `pydgraph/meta.py`
- Adds v25.2.0 changelog entry covering 12 commits since v25.1.0

### Highlights in this release

- **Stress tests & CI benchmarks** for performance regression tracking
(#298)
- **AbortedError pickling fix** for Celery/multiprocessing compatibility
(#299)
- **asyncio.Lock deadlock fix** in `AsyncTxn.do_request` error handling
- Dependabot security vulnerability fixes
- CI/CD pipeline improvements

### Verification

- `make build` produces `pydgraph-25.2.0.tar.gz` and
`pydgraph-25.2.0-py3-none-any.whl`
- `make test` passes (219/219)
- `trunk fmt CHANGELOG.md` passes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.