Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates a new function that unifies a list of GraphFrames into a single GraphFrame #10

Closed
wants to merge 12 commits into from

Conversation

ilumsden
Copy link
Collaborator

@ilumsden ilumsden commented Feb 3, 2022

This PR implements a new function called unify_ensemble that takes a list of GraphFrame objects with equal graphs and returns a new GraphFrame containing the data of all the inputs. In the output data, a new DataFrame column, called dataset, is added that informs the user which GraphFrame that row came from. If the dataset attribute of the GraphFrame (explained below) is set, that value will be used for the corresponding rows in the output. Otherwise, the string "gframe_#" is used, with "#" being replaced by the index of the GraphFrame in the input list.

To help link output data to input data, this PR also adds a new dataset attribute to the GraphFrame class and a graphframe_reader decorator to help set this attribute. The dataset attribute is meant to be a string that labels the GraphFrame. For most readers, this attribute will be set automatically by the graphframe_reader decorator. This decorator is meant to be applied to from_X static methods in the GraphFrame class. This decorator does 3 things:

  1. Runs the from_X function it decorates
  2. If the from_X function did not set the dataset attribute and the first argument to from_X is a string, this first argument will be considered a path to the read data, and it will be used to set dataset
  3. Returns the (potentially) modified GraphFrame produced by from_X

@ilumsden ilumsden added area-graphframe Issues and PRs involving Hatchet's core GraphFrame datastructure and associated classes area-utils Issues and PRs related to Hatchets high-level API and other utility libraries priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features labels Feb 3, 2022
@ilumsden ilumsden self-assigned this Feb 3, 2022
@slabasan slabasan modified the milestone: 1.10.0 Feb 3, 2022
@ilumsden ilumsden added this to the 1.10.0 milestone Feb 10, 2022
@ilumsden ilumsden added status-work-in-progress PR is currently being worked on and removed status-ready-for-review This PR is ready to be reviewed by assigned reviewers labels Feb 17, 2022
@ilumsden ilumsden added status-ready-for-review This PR is ready to be reviewed by assigned reviewers and removed status-work-in-progress PR is currently being worked on labels Feb 21, 2022
hatchet/graphframe.py Outdated Show resolved Hide resolved
@slabasan slabasan force-pushed the develop branch 16 times, most recently from b461833 to 48d44ce Compare August 9, 2022 05:03
@ilumsden
Copy link
Collaborator Author

Closing because this functionality is going elsewhere

@ilumsden ilumsden closed this Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-graphframe Issues and PRs involving Hatchet's core GraphFrame datastructure and associated classes area-utils Issues and PRs related to Hatchets high-level API and other utility libraries priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants