Skip to content

fix undirected graphs handling#6

Merged
adsharma merged 3 commits into
Ladybug-Memory:mainfrom
aheev:fix-undirected-check
May 27, 2026
Merged

fix undirected graphs handling#6
adsharma merged 3 commits into
Ladybug-Memory:mainfrom
aheev:fix-undirected-check

Conversation

@aheev
Copy link
Copy Markdown
Contributor

@aheev aheev commented May 26, 2026

  • move convert_arrow_tables_to_csr under IcebugMemGraph
  • directed -> undirected
  • add undirected graph validation checks: from_table == to_table
  • add test_cli
  • bump version to 1.0.0

@aheev aheev force-pushed the fix-undirected-check branch from 16f35e3 to fe7cb86 Compare May 26, 2026 15:39
@adsharma
Copy link
Copy Markdown
Collaborator

The changes look good to me. Since the code touches some of the perf sensitive duckdb queries, suggest testing a large graph such as livejournal or wikidata to ensure that there are no regressions.

I recall wikidata needing at least 32GB of RAM to process.

@adsharma
Copy link
Copy Markdown
Collaborator

adsharma commented May 27, 2026

I tested it on a large dataset. Baseline worked fine. This branch OOM'ed a 64GB machine.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
10037 arun      20   0   96.6g  61.7g   2644 R 200.6  98.3  30:50.29 python3

Log from a successful run:

=== DuckDB to CSR Format Converter ===

Creating CSR graph on FULL DATASET
Source database: wikidata-20250625.db
CSR output database: wikidata-20250625_csr.duckdb
CSR table prefix: wikidata_20250625
Directed: True
DuckDB memory limit: 53.9GB
Storage path: ./wikidata-20250625_csr

=== Creating CSR Graph Data (Optimized SQL Approach) ===
Step 0: Loading edges and nodes from original DB into new DB...
Discovered node tables: ['nodes']
Discovered edge tables: ['edges']
Node type to table mapping: {'nodes': 'nodes'}
  Copied node table: nodes -> wikidata_20250625_nodes

Step 1: Building per-edge-table CSR structures...

  Processing edges: using fallback node table nodes
    Edges: 829,151,317
    indptr: 115475325 entries
    indices: 829151317 entries

✓ Built CSR format: 115475324 nodes, 829151317 edges across 1 edge types
✓ Saved CSR graph data to wikidata-20250625_csr.duckdb

=== Exporting to Parquet and Generating schema.cypher ===
Parquet output directory: wikidata-20250625_csr
  Exported: wikidata_20250625_nodes -> nodes_nodes.parquet
  Exported: wikidata_20250625_indices_edges -> indices_edges.parquet
  Exported: wikidata_20250625_indptr_edges -> indptr_edges.parquet
  Generated: schema.cypher
✓ Export complete. Files saved to: wikidata-20250625_csr

All data saved to: wikidata-20250625_csr.duckdb

=== Conversion Completed Successfully! ===

@adsharma
Copy link
Copy Markdown
Collaborator

I thought about the directed vs undirected confusion a bit more. That flag is not a property of the graph. It's a property of the algorithm (pagerank doesn't need it, but leiden needs it). Or its a property of the storage system (ladybug wants edges stored in both directions).

We could make it explicit by renaming it to --add-reverse-edges. Think we can land this PR and clean up in a separate PR on top.

@adsharma adsharma merged commit 7e8b036 into Ladybug-Memory:main May 27, 2026
1 check passed
@aheev
Copy link
Copy Markdown
Contributor Author

aheev commented May 27, 2026

I thought about the directed vs undirected confusion a bit more. That flag is not a property of the graph. It's a property of the algorithm (pagerank doesn't need it, but leiden needs it). Or its a property of the storage system (ladybug wants edges stored in both directions).

isn't it the property of input graph? we are just changing the internal structure

@adsharma
Copy link
Copy Markdown
Collaborator

See the explanation in #9

user - [:livesin] -> city

is already a directed graph. We can't be converting a directed graph to a directed graph by using --undirected with help="Treat graph as undirected (default: directed)"

By changing the flag to represent what the code is doing rather than "faking a directed graph to be undirected" we reduce confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants