Skip to content

Conversation

@yangxk1
Copy link
Contributor

@yangxk1 yangxk1 commented Nov 5, 2025

Reason for this PR

close #798

This PR refactors and enhances the Python module of Apache GraphAr (incubating), transforming it from a CLI-only tool into a full-featured Python SDK with both programmatic APIs and a command-line interface.

The original Python implementation was tightly coupled as a CLI utility (graphar_cli). To support broader use cases this PR restructures the module into a proper API SDK while preserving and improving CLI functionality.

What changes are included in this PR?

Module restructuring: The cli/ directory is renamed to python/, and internal code is reorganized into logical packages:

  • graphar/: Core Python API (e.g., GraphInfo, VertexInfo, EdgeInfo)
  • client/: CLI implementation (graphar command)
  • graphar/importer/: Data import utilities

New public APIs: Exposes C++ GraphAr types (e.g., Type, FileType, AdjListType) and metadata classes via pybind11 bindings, enabling programmatic access to GraphAr metadata.

Improved CLI:

  • Adds --version flag
  • Updates CI workflows to test both CLI and SDK

Testing:

  • Adds comprehensive unit tests for GraphInfo, vertex/edge metadata, and propertyGroups
  • Includes CLI integration tests
  • Enables pytest in CI for both Ubuntu and macOS

Are these changes tested?

yes, add pytest methods

Are there any user-facing changes?

YES
Users can now import graphar and use APIs to read GraphAr metadata

import graphar

graphar.graph_info.GraphInfo.load(graph_path)

@yangxk1
Copy link
Contributor Author

yangxk1 commented Nov 5, 2025

cc @adsharma

@SemyonSinchenko
Copy link
Member

@yangxk1 Nice work! Does it contain arrow-pyarrow bindings to cpp module? It should be possible to do in a zer-copy way in my understanding... Convert to arrow-type would be nice to expose too...

@yangxk1
Copy link
Contributor Author

yangxk1 commented Nov 5, 2025

@SemyonSinchenko , This PR only involves schema yaml.

At the same time I am working on another PR to build the high_level api. These two PRs are not designed for arrow operations. With these two APIs, our Python SDK can provide core functions.

After completing these two tasks, I will work on c arrow and pyarrow, which will allow users to operate the data set more precisely.

@Thespica
Copy link
Contributor

Thespica commented Nov 5, 2025

I think it will be weird to call cli(command-line-interface) as client. Consider rename it back to cli ?

@yangxk1
Copy link
Contributor Author

yangxk1 commented Nov 5, 2025

I think it will be weird to call cli as client. Consider rename it back to cli ?

An error occurred when I used cli as the package name. Maybe it conflicts with python's own package? I haven't thought of a good plan yet

@yangxk1
Copy link
Contributor Author

yangxk1 commented Nov 5, 2025

I think it will be weird to call cli(command-line-interface) as client. Consider rename it back to cli ?

renamed client->cli

@adsharma
Copy link

adsharma commented Nov 5, 2025

Looks great! This is what I had in mind before I got busy with LadybugDB.

One high level thing to consider: instead of exposing graphar types via types_binding.cc, consider using python's built-in ctypes.

Example:
https://github.com/py2many/py2many/blob/main/pycpp/clike.py#L15-L29

@yangxk1
Copy link
Contributor Author

yangxk1 commented Nov 6, 2025

One high level thing to consider: instead of exposing graphar types via types_binding.cc, consider using python's built-in ctypes.

Thanks for your suggestion, I will study ctypes carefully.
However, types_binding.cc mainly deals with enum classes in c++. I think this cannot be replaced by ctypes

@adsharma
Copy link

adsharma commented Nov 6, 2025

I'm sure there are implementation details such as C++ enums. Was thinking about it from end user perspective:

from ctypes import cint64_t as i64

- assert id_property_type.to_type_name() == "int64"
+ assert  id_property_type == i64

There may be type checkers that understand integer overflow and other concepts to find bugs that'd be possible when you use those types vs graphar specific C++ enums.

A thin translation layer between C++ -> python API could map the types, so the knowledge of these python types doesn't carry over to the rest of the code base.

@yangxk1
Copy link
Contributor Author

yangxk1 commented Nov 6, 2025

Looks great, I'll consider using ctypes to optimize the code!

@yangxk1 yangxk1 merged commit 03b8a32 into apache:main Nov 6, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(python): Refactor python(cli) module

5 participants