feat: add LDBC dataset integration by DecisionNerd · Pull Request #91 · DecisionNerd/graphforge

DecisionNerd · 2026-02-05T20:47:47Z

Summary

Implements support for loading LDBC Social Network Benchmark (SNB) datasets, completing the final piece of the temporal/spatial/LDBC integration plan.

Background

LDBC provides industry-standard benchmark datasets for graph databases. The SNB models a social network (similar to Facebook) with realistic data including:

Nodes: Person, Post, Comment, Forum, Organisation, Place, Tag, TagClass
Relationships: KNOWS, LIKES, HAS_CREATOR, CONTAINER_OF, HAS_MEMBER, etc.
Temporal properties: Uses DATE and DATETIME types (requires temporal types from feat: implement temporal types (date, datetime, time, duration) #86)

Changes

1. Compression Support (`compression.py`)

Handles .tar.zst archives (Zstandard compression)
Also supports .tar.gz and plain .tar
Optional zstandard dependency with graceful fallback

2. LDBC Schema (`ldbc_schema.py`)

8 node types with property mappings
3 relationship types (more can be added)
Type converters for temporal properties (date, datetime)
Required vs optional property handling

3. LDBC Loader (`ldbc.py`)

Multi-file CSV parser (pipe-delimited)
Node-first loading (builds cache for relationships)
Subdirectory search (handles nested archive structures)
Property validation (required fields, type conversion)

4. Dataset Registration (`ldbc.py`)

SF0.001: 3K nodes, 17K edges (0.5 MB)
SF0.1: 32K nodes, 180K edges (5 MB)
SF1: 328K nodes, 1.8M edges (50 MB)
SF10: 3.3M nodes, 18M edges (500 MB)

Architecture Decisions

Dataclasses for Schemas (Not Pydantic)

Following CLAUDE.md guidance:

Schema definitions: Use dataclasses (internal, performance-critical)
Dataset metadata: Use Pydantic (user-facing, needs validation)

The PropertyMapping, NodeSchema, and RelationshipSchema classes are:

Hard-coded (not user input)
Used in tight loops during CSV parsing
Never exposed as public API
Don't need runtime validation

Temporal Type Integration

LDBC datasets use temporal properties extensively:

creationDate: DATETIME (ISO 8601: 2023-01-15T10:30:00.000+0000)
birthday: DATE (1990-01-15)

This leverages temporal types from #86, demonstrating the value of completing the type system first.

Testing

Unit Tests (54 total)

Compression: 11 tests (tar.gz, .tar, .tar.zst detection/extraction)
Schema: 25 tests (type parsers, property mappings, node/relationship schemas)
Loader: 18 tests (property parsing, CSV finding, node/relationship loading)

Coverage

New code: 100% coverage on LDBC modules
Total: 93.67% (exceeds 85% threshold)
All pre-push checks passing

Example Usage

from graphforge import GraphForge
from graphforge.datasets import load_dataset

# Load smallest scale factor
gf = GraphForge()
load_dataset(gf, "ldbc-snb-sf0.001")

# Query social network
results = gf.execute("""
    MATCH (p:Person)-[k:KNOWS]->(friend:Person)
    WHERE p.birthday > date('1990-01-01')
    RETURN p.firstName, friend.firstName, k.creationDate
    ORDER BY k.creationDate DESC
    LIMIT 10
""")

# Use temporal types
results = gf.execute("""
    MATCH (p:Person)-[:LIKES]->(post:Post)
    WHERE post.creationDate > datetime('2023-01-01T00:00:00+00:00')
    RETURN p.firstName, post.content
""")

Dependencies

Optional: zstandard>=0.22.0 for .tar.zst support

If not installed:

.tar.zst files will raise ImportError with install instructions
.tar.gz and .tar files work without it

Implementation Sequence

This completes the approved plan:

✅ Temporal types (Issue feat: implement temporal types (date, datetime, time, duration) #86, PR feat: implement temporal types (date, datetime, time, duration) #87) - merged
✅ Spatial types (Issue feat: implement spatial types (point, distance) #88, PR feat: implement spatial types (point, distance) #89) - awaiting merge
✅ LDBC datasets (Issue feat: add LDBC dataset integration #51, this PR)

All three pieces now work together:

LDBC uses temporal types for dates/times
Queries can filter by date ranges
Future queries can use spatial types for location data

Future Enhancements

The current implementation includes 3 relationship types. Additional LDBC relationships can be easily added to ldbc_schema.py:

HAS_CREATOR, CONTAINER_OF, HAS_MEMBER
HAS_MODERATOR, HAS_TAG, HAS_TYPE
IS_LOCATED_IN, IS_PART_OF, REPLY_OF
STUDY_AT, WORK_AT

The schema is designed to be incrementally extended.

Performance Notes

The loader uses a node-first strategy:

Load all nodes into graph (by type)
Build cache: (label, id) -> NodeRef
Load relationships (lookup from cache)

This ensures O(1) relationship creation and handles missing nodes gracefully.

Closes #51

Summary by CodeRabbit

New Features
- Added an LDBC Social Network Benchmark loader (ldbc format) with schema-aware CSV import and optional compressed-archive handling.
- Added utilities to detect and extract compressed dataset archives.
- Registered four LDBC dataset variants (SF0.001, SF0.1, SF1, SF10) with metadata.
API
- Exposed LDBCLoader as a public loader for easy discovery and reuse.
Tests
- Added comprehensive tests for compression utilities, loader behavior, and schema parsing.

Implements support for loading LDBC Social Network Benchmark (SNB) datasets from .tar.zst archives with multi-file CSV format. ## Changes **Compression Support:** - Add `compression.py` module for .tar.zst extraction - Support .tar.gz, .tar, and .tar.zst formats - Uses zstandard library (optional dependency) **LDBC Schema:** - Define 8 node types (Person, Post, Comment, Forum, etc.) - Define 3 relationship types (KNOWS, LIKES) - Property mappings with type converters - Temporal property support (uses date/datetime) **LDBC Loader:** - Multi-file CSV parser with pipe delimiter - Node-first loading strategy (cache for relationships) - Required/optional property validation - Subdirectory search for flexible archive structures **Dataset Registration:** - Register 4 scale factors (SF0.001, SF0.1, SF1, SF10) - Metadata includes URLs, node/edge counts, schemas - Auto-registration on import ## Implementation Details Uses dataclasses for internal schema definitions (performance) and Pydantic for dataset metadata (validation). Follows existing CSV loader patterns but handles multi-file format. Temporal types (DATE, DATETIME) work with LDBC timestamps via ISO 8601 parsing. Properties with years/months use isodate.Duration. ## Testing - 11 compression utility tests - 25 schema definition tests - 18 loader functionality tests - 54 total tests, 100% coverage on new code - All pre-push checks passing (93.67% total coverage) ## Example Usage ```python from graphforge import GraphForge from graphforge.datasets import load_dataset gf = GraphForge() load_dataset(gf, "ldbc-snb-sf0.001") # Smallest scale # Query the data results = gf.execute(""" MATCH (p:Person)-[k:KNOWS]->(friend:Person) WHERE p.firstName = 'Alice' RETURN friend.firstName AS name, k.creationDate AS since """) ``` Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-02-05T20:48:11Z

Warning

Rate limit exceeded

@DecisionNerd has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds LDBC SNB support: new LDBCLoader, LDBC schema definitions, compression utilities for .tar.zst/.tar.gz/.tar, dataset registrations for multiple scale factors, and unit tests for compression, loader, and schema parsing.

Changes

Cohort / File(s)	Summary
Loader Package Export `src/graphforge/datasets/loaders/__init__.py`	Exports `LDBCLoader` from the new ldbc loader module.
Compression Utilities `src/graphforge/datasets/loaders/compression.py`	New archive helpers: `ZSTD_AVAILABLE`, `is_compressed_archive`, `extract_tar_zst`, `extract_archive` with format detection and error handling.
LDBC Loader Implementation `src/graphforge/datasets/loaders/ldbc.py`	New `LDBCLoader` class: handles archive extraction or directory input, CSV discovery, two-phase loading (nodes then relationships), property parsing, and node caching.
LDBC Schema Definitions `src/graphforge/datasets/loaders/ldbc_schema.py`	New dataclasses and parsers: `PropertyMapping`, `NodeSchema`, `RelationshipSchema`, type converters, and concrete `NODE_SCHEMAS` / `RELATIONSHIP_SCHEMAS`.
Dataset Source Registration `src/graphforge/datasets/sources/__init__.py`, `src/graphforge/datasets/sources/ldbc.py`	Imports and invokes `register_ldbc_datasets()`; new `register_ldbc_datasets()` registers loader and multiple LDBC dataset variants (SF0.001, SF0.1, SF1, SF10) with metadata.
Tests `tests/unit/datasets/test_compression.py`, `tests/unit/datasets/test_ldbc_loader.py`, `tests/unit/datasets/test_ldbc_schema.py`	Adds unit tests for archive detection/extraction, loader behavior (format, delimiter, parsing, errors, node/relationship creation), and schema parsing/validation.

Sequence Diagram

sequenceDiagram
    actor User
    participant LDBCLoader
    participant Compression
    participant FileSystem
    participant SchemaParser
    participant GraphForge

    User->>LDBCLoader: load(gf, path)
    LDBCLoader->>FileSystem: validate path exists
    alt compressed archive detected
        LDBCLoader->>Compression: extract_archive(path)
        Compression->>FileSystem: read archive (.tar.zst / .tar.gz / .tar)
        Compression->>FileSystem: extract to temp directory
        Compression-->>LDBCLoader: return extraction path
    else directory
        LDBCLoader->>FileSystem: use path directly
    end

    LDBCLoader->>LDBCLoader: _load_nodes(gf, data_dir)
    loop node schemas
        LDBCLoader->>FileSystem: find CSV file
        LDBCLoader->>FileSystem: read CSV rows
        loop rows
            LDBCLoader->>SchemaParser: _parse_properties(row, mappings)
            SchemaParser-->>LDBCLoader: parsed properties
            LDBCLoader->>GraphForge: create_node(label, properties)
            LDBCLoader->>LDBCLoader: cache node by (label,id)
        end
    end

    LDBCLoader->>LDBCLoader: _load_relationships(gf, data_dir, node_cache)
    loop relationship schemas
        LDBCLoader->>FileSystem: find CSV file
        LDBCLoader->>FileSystem: read CSV rows
        loop rows
            LDBCLoader->>LDBCLoader: resolve source/target from cache
            LDBCLoader->>SchemaParser: _parse_properties(row, mappings)
            SchemaParser-->>LDBCLoader: parsed properties
            alt nodes exist
                LDBCLoader->>GraphForge: create_relationship(source, target, type, properties)
            else
                LDBCLoader->>LDBCLoader: skip relationship
            end
        end
    end

    LDBCLoader-->>User: loading complete

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly Related PRs

feat: implement core dataset loading infrastructure #68 — Adds the datasets/loaders and datasets/sources scaffolding that this change extends (loader export and registration points).
feat: add API validation and Pydantic serialization system #85 — Modifies DatasetInfo/Dataset models; register_ldbc_datasets() interacts with that metadata API and may require compatibility checks.

Suggested labels

v0.3

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add LDBC dataset integration' accurately captures the main change—adding support for loading LDBC SNB datasets across multiple scale factors.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering summary, background, detailed changes across four modules, architecture decisions, testing with coverage metrics, usage examples, dependencies, and implementation sequence.
Linked Issues check	✅ Passed	The PR successfully addresses all coding requirements from issue `#51`: LDBCLoader class is implemented, multiple scale factors (SF0.001 to SF10) are supported, schema mappings for all primary entity types are provided, node/edge counts are tested, and integration with the type system is demonstrated.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#51` requirements. No extraneous changes detected; modifications include compression utilities (necessary for dataset handling), schema definitions, loader implementation, dataset registration, and corresponding unit tests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/51-ldbc-datasets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-05T20:50:47Z

Codecov Report

❌ Patch coverage is 81.06796% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.48%. Comparing base (6b7b49c) to head (9f06b2b).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #91      +/-   ##
==========================================
- Coverage   91.17%   90.48%   -0.69%     
==========================================
  Files          21       25       +4     
  Lines        2821     3027     +206     
  Branches      700      737      +37     
==========================================
+ Hits         2572     2739     +167     
- Misses        109      136      +27     
- Partials      140      152      +12

Flag	Coverage Δ
full-coverage	`90.48% <81.06%> (-0.69%)`	⬇️
unittests	`74.66% <42.71%> (-2.34%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
parser	`95.62% <ø> (ø)`
planner	`92.15% <ø> (ø)`
executor	`88.51% <ø> (ø)`
storage	`99.62% <ø> (ø)`
ast	`83.18% <ø> (ø)`
types	`99.00% <ø> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b7b49c...9f06b2b. Read the comment docs.

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@src/graphforge/datasets/loaders/compression.py`:
- Around line 49-55: The tar extraction using tar.extractall(compressed_file) is
vulnerable to path traversal; replace it by iterating TarFile members and
validate each member's final path before extraction (e.g., compute target_path =
os.path.join(extract_to, member.name), resolve both target_path and extract_to
and ensure target_path.startswith(resolved_extract_to)) and only extract
validated members via tar.extract(member, extract_to). Implement this as a
shared helper (e.g., safe_extract_tar(tar: tarfile.TarFile, extract_to: str))
and call it from the code paths that currently use tar.extractall (the block
that opens tar via dctx.stream_reader and the other tar-open block), ensuring no
absolute paths or parent-traversal entries are allowed.
- Around line 3-7: The module docstring incorrectly claims support for .gz and
.zip; update the top-level docstring to only list the formats actually handled
by the code (e.g., .tar.zst and .tar.gz) so it matches the behavior of
extract_archive() and is_compressed_archive(); reference those function names in
the docstring change to clarify supported formats and remove .gz/.zip from the
list (or, if you prefer to implement support, add handling for .gz/.zip inside
extract_archive() and is_compressed_archive() instead—pick one approach and make
docstring and functions consistent).

In `@src/graphforge/datasets/loaders/ldbc.py`:
- Around line 60-69: The current extraction branch (is_compressed_archive)
creates a persistent folder (.{path.stem}_extracted) and never removes it;
change extract logic in the loader so extraction uses a temporary directory
(e.g., tempfile.TemporaryDirectory or tempfile.mkdtemp with explicit cleanup)
instead of path.parent / f".{path.stem}_extracted", set data_dir to that temp
dir, and ensure the temp dir is removed after loading completes (or use a
context manager so cleanup is automatic). Update the code around
is_compressed_archive, extract_archive, and how data_dir is used so any
downstream processing runs while the temp dir is alive and is cleaned up
afterwards.

In `@src/graphforge/datasets/sources/ldbc.py`:
- Around line 23-67: Make the LDBC loader/dataset registration idempotent by
checking for existing registrations before registering: guard the
register_loader("ldbc", LDBCLoader) call (or catch ValueError) and likewise
guard register_dataset(DatasetInfo(...)) so repeated imports/calls don't raise;
reference the symbols register_loader, register_dataset, LDBCLoader and the
DatasetInfo block and implement the same pattern SNAP uses (check-if-registered
or try/except around registration) to skip re-registration.

In `@tests/unit/datasets/test_compression.py`:
- Line 13: Move the I/O-heavy compression tests out of the unit suite by
relocating tests/unit/datasets/test_compression.py to an appropriate
slow/integration directory (e.g., tests/integration/datasets/ or
tests/slow/datasets/) and change the module marker from pytestmark =
pytest.mark.unit to a more appropriate marker such as pytestmark =
pytest.mark.integration or pytestmark = pytest.mark.slow; ensure any test
collection/config (pytest.ini or conftest.py) is aware of the new
directory/marker so these tests run in the slower suite rather than unit.

In `@tests/unit/datasets/test_ldbc_loader.py`:
- Line 11: This test file is an integration test (it instantiates GraphForge and
runs Cypher) but is marked and located as unit; move the test from tests/unit to
tests/integration, change the module-level marker from pytest.mark.unit to
pytest.mark.integration (or add `@pytest.mark.integration` on the test), and
ensure any imports or test utilities referenced (e.g., GraphForge) still resolve
after the move; update test metadata accordingly so the test runner treats it as
an integration test.

🧹 Nitpick comments (3)

src/graphforge/datasets/loaders/ldbc.py (2)
98-109: Missing column access could produce cryptic KeyError.

If the CSV file doesn't contain the expected id_column, line 100 will raise a raw KeyError without context about which file or schema failed.
💡 Suggested defensive access with better error message
                 for row in reader:
                     # Extract node ID
-                    node_id = row[schema.id_column]
+                    try:
+                        node_id = row[schema.id_column]
+                    except KeyError:
+                        raise ValueError(
+                            f"Missing ID column '{schema.id_column}' in {csv_path}"
+                        ) from None
135-146: Same KeyError risk for relationship ID columns.

Similar to node loading, accessing row[schema.source_id_column] and row[schema.target_id_column] without guards will produce cryptic errors if columns are missing.

The silent skip for missing nodes (lines 144-146) is appropriate for handling partial datasets, though a debug-level log could aid troubleshooting.
src/graphforge/datasets/loaders/ldbc_schema.py (1)
103-107: Redundant empty check.

The condition not value or value == "" is redundant since an empty string is falsy.
💅 Simplification
 def parse_list(value: str) -> list[str]:
     """Parse semicolon-separated list."""
-    if not value or value == "":
+    if not value:
         return []
     return [item.strip() for item in value.split(";") if item.strip()]

coderabbitai · 2026-02-05T20:53:33Z

+    is_compressed_archive,
+)
+
+pytestmark = pytest.mark.unit


⚠️ Potential issue | 🟠 Major

Reclassify these I/O-heavy tests out of the unit suite.

Creating/extracting tar archives is not <1ms and doesn’t target parser/planner/executor/storage layers. Please move these tests out of tests/unit/ (e.g., to a slow/integration suite) and update the marker accordingly.

🧭 Suggested marker change (paired with file move)

-pytestmark = pytest.mark.unit +pytestmark = pytest.mark.slow

As per coding guidelines: “Unit tests MUST test ONE component in isolation and complete in < 1ms” and “tests/unit/**/*.py: Unit tests in tests/unit/ MUST test parser, planner, executor, and storage layers in isolation with minimal mocking.”

🤖 Prompt for AI Agents

In `@tests/unit/datasets/test_compression.py` at line 13, Move the I/O-heavy compression tests out of the unit suite by relocating tests/unit/datasets/test_compression.py to an appropriate slow/integration directory (e.g., tests/integration/datasets/ or tests/slow/datasets/) and change the module marker from pytestmark = pytest.mark.unit to a more appropriate marker such as pytestmark = pytest.mark.integration or pytestmark = pytest.mark.slow; ensure any test collection/config (pytest.ini or conftest.py) is aware of the new directory/marker so these tests run in the slower suite rather than unit.

…urity) Fixes 3 high-severity Bandit findings (B202:tarfile_unsafe_members). ## Security Issue tar.extractall() without validation can be exploited by malicious archives containing files with paths like ../../etc/passwd that escape the intended extraction directory (CWE-22: Path Traversal). ## Fix Added filter='data' parameter to all tar.extractall() calls: - extract_tar_zst(): .tar.zst extraction - extract_archive(): .tar.gz extraction - extract_archive(): .tar extraction The 'data' filter (Python 3.12+) is the safest option: - Strips leading slashes from paths - Refuses absolute paths - Refuses symlinks pointing outside extraction dir - Refuses device files ## Verification - Bandit scan: 0 issues (was 3 high-severity) - All tests passing - All pre-push checks passing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/graphforge/datasets/loaders/compression.py`:
- Around line 53-55: The three calls to
tar.extractall(tar.extractall(extract_to, filter="data")) use the Python
3.11-only filter kwarg and will TypeError on Python 3.10; update each occurrence
where variable tar and extract_to are used (the tar.zst, tar.gz and uncompressed
tar extraction blocks) to guard on sys.version_info >= (3,11) and call
tar.extractall(extract_to, filter="data") only for 3.11+, otherwise perform a
safe fallback: iterate tar.getmembers() and validate member paths (prevent path
traversal) before extracting (e.g., build the target path from extract_to and
member.name and skip/raise on unsafe paths) so the extraction logic is secure on
Python 3.10.

Resolves all issues from code review: ## Security (compression.py) - Replace filter='data' with manual path validation - Implement safe_extract_tar() helper function - Validate each tar member before extraction - Refuse absolute paths and parent directory references - Ensure extracted paths remain within target directory ## Documentation (compression.py) - Fix module docstring to only list supported formats - Update to mention .tar.zst, .tar.gz, .tar (removed .gz, .zip) - Reference extract_archive() and is_compressed_archive() functions ## Resource Cleanup (ldbc.py) - Use tempfile.TemporaryDirectory for extraction - Automatic cleanup on context manager exit - No more persistent .{stem}_extracted directories ## Idempotent Registration (ldbc.py sources) - Check _DATASET_REGISTRY before register_dataset() - Try/except around register_loader() like SNAP does - Safe for repeated imports and function calls ## Test Organization - Move test_compression.py: unit -> integration - Move test_ldbc_loader.py: unit -> integration - Update pytestmark to pytest.mark.integration - Update docstrings to explain integration classification All tests passing (54/54), Bandit clean (0 issues), pre-push checks pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

DecisionNerd added the enhancement New feature or request label Feb 5, 2026

coderabbitai Bot reviewed Feb 5, 2026

View reviewed changes

Comment thread src/graphforge/datasets/loaders/compression.py Outdated

DecisionNerd merged commit d6fa4a9 into main Feb 5, 2026
22 checks passed

This was referenced Feb 6, 2026

feat: implement JSON Graph interchange format with typed properties #96

Merged

feat: implement GraphML loader for NetworkRepository datasets #101

Merged

DecisionNerd deleted the feature/51-ldbc-datasets branch February 7, 2026 12:50

This was referenced Feb 13, 2026

docs: refine CLAUDE.md to under 40k characters #140

Merged

fix: TCK step definition mismatch for ORDER BY tests #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LDBC dataset integration#91

feat: add LDBC dataset integration#91
DecisionNerd merged 3 commits into
mainfrom
feature/51-ldbc-datasets

DecisionNerd commented Feb 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 5, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

codecov Bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 5, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DecisionNerd commented Feb 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

1. Compression Support (compression.py)

2. LDBC Schema (ldbc_schema.py)

3. LDBC Loader (ldbc.py)

4. Dataset Registration (ldbc.py)

Architecture Decisions

Dataclasses for Schemas (Not Pydantic)

Temporal Type Integration

Testing

Unit Tests (54 total)

Coverage

Example Usage

Dependencies

Implementation Sequence

Future Enhancements

Performance Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Suggested labels

Uh oh!

codecov Bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DecisionNerd commented Feb 5, 2026 •

edited by coderabbitai Bot

Loading

1. Compression Support (`compression.py`)

2. LDBC Schema (`ldbc_schema.py`)

3. LDBC Loader (`ldbc.py`)

4. Dataset Registration (`ldbc.py`)

coderabbitai Bot commented Feb 5, 2026 •

edited

Loading

codecov Bot commented Feb 5, 2026 •

edited

Loading