Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(interactive): Add support for loading graph with odps table as data source #3305

Merged
merged 23 commits into from
Dec 5, 2023

Conversation

zhanglei1949
Copy link
Collaborator

@zhanglei1949 zhanglei1949 commented Oct 23, 2023

  • Implement ODPSFragmentLoader to support loading vertex/edge from odps table: odps://project/table_name
  • As ODPSFragmentLoader and CSVFragmentLoader all read raw data into arrow::table, we extract the common part to form an AbstractArrowFragmentLoader, which provides the following interface.
class IRecordBatchSupplier {
 public:
  // Will be called until GetNextBatch() returns NullPtr.
  virtual std::shared_ptr<arrow::RecordBatch> GetNextBatch() = 0;
};

class AbstractArrowFragmentLoader : public IFragmentLoader {
  void AddVerticesRecordBatch(
      label_t v_label_id, const std::vector<std::string>& input_paths,
      std::function<std::shared_ptr<IRecordBatchSupplier>(
          label_t, const std::string&, const LoadingConfig&)>
          supplier_creator);

  void AddEdgesRecordBatch(
      label_t src_label_id, label_t dst_label_id, label_t edge_label_id,
      const std::vector<std::string>& input_paths,
      std::function<std::shared_ptr<IRecordBatchSupplier>(
          label_t, label_t, label_t, const std::string&, const LoadingConfig&)>
          supplier_creator);
};

The ODPSFragmentLoader and CSVFragmentLoader just inherit this abstract class and call AddVerticesRecordBatch and AddEdgesRecordBatch with lambda function indicate how to procedure RecordBatch from each input_path. For CSVFragmentLoader we procedure RecordBatches with arrow readers; For ODPSFragmentLoader, we produce RecordBatches with ODPSReadClient.

  • For customized FragmentLoader, use can specify the path to lib via FLEX_OTHER_LOADERS and call Register() function when customized FragmentLoader class is initialized.
    For example, builtin CSVFragmentLoader is registered for scheme=file and format=csv.
const bool CSVFragmentLoader::registered_ = LoaderFactory::Register(
    "file", "csv",
    static_cast<LoaderFactory::loader_initializer_t>(&CSVFragmentLoader::Make));
  • ODPS related code is placed at flex/third_party/odps/include. These code will be opensourced by odps team but not yet ready, we just copy to here to make our ODPSFragmentLoader works. After the odps-cpp-sdk is opensourced, we will replace the dependency with a git submodule.

Fix #3396

@zhanglei1949 zhanglei1949 changed the title feat(Interactive): Add support for loading graph with customized data source feat(Interactive): Add support for loading graph with odps table as data source Dec 4, 2023
@zhanglei1949 zhanglei1949 changed the title feat(Interactive): Add support for loading graph with odps table as data source feat(interactive): Add support for loading graph with odps table as data source Dec 5, 2023
@zhanglei1949 zhanglei1949 merged commit ac31a91 into alibaba:main Dec 5, 2023
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(interactive): Support loading data from odps table.
3 participants