Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: have ops on subclassed datapanels return subclass type #57

Merged
merged 2 commits into from
Jun 9, 2021

Conversation

ad12
Copy link
Collaborator

@ad12 ad12 commented Jun 6, 2021

Issue

When datapanels are subclasses, ops on these datapanels (getitem, merge, append, etc) returned instances of mosaic.DataPanel instead of the subclass type

Fix

All class methods can be called from the instance. This will automatically pass the type of the instance (e.g. subclass) to the class method. For example,

import mosaic as ms
class DataPanelSubclass(ms.DataPanel):
    pass

dp = DataPanelSubclass(...)
dp.from_batch(...)  # called with cls=DataPanelSubclass

When from_batch is used from instance methods, we call self.from_batch instead of DataPanel.from_batch.

Consequences and stopgap solutions

By calling self.from_batch, the initializer of the subclass is called as from_batch calls cls(...). This can be an issue if there is state in the current instance (e.g. self) that needs to be passed to the new DataPanel subclass instance.

An example of this is EntityDataPanel, where the index_column should be passed from the current instance to the newly constructed instance. Because there is no way to plumb that information through different calls, the initializer of EntityDataPanel gets called with EntityDataPanel(index_column=None) even if the current instance has an index column. This results in a new column "_ent_index" being added to the new EntityDataPanel.

As a stopgap solution, we have a method that removes the extra "_ent_idx" column any time a new data panel is created, but this is not a scalable solution, because there may be many parameters that need to be plumbed through to construct the subclassed data panel.

We propose a more long-term solution in #56

@ad12 ad12 merged commit cffce64 into dev Jun 9, 2021
@ad12 ad12 deleted the feat-dp-subclass branch June 23, 2021 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant