Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pandas result builder that converts to long format #26

Open
HamiltonRepoMigrationBot opened this issue Feb 26, 2023 · 2 comments
Open
Labels
enhancement New feature or request migrated-from-old-repo Migrated from old repository

Comments

@HamiltonRepoMigrationBot
Copy link
Collaborator

Issue by skrawcz
Monday Apr 25, 2022 at 21:53 GMT
Originally opened as stitchfix/hamilton#121


Is your feature request related to a problem? Please describe.
Hamilton works on "wide" columns -- not "long ones". However the "tidy" data ethos thinks data should be in a long format -- it does make some things easier to do.

Describe the solution you'd like
Add a ResultBuilder variant that takes in how you'd want to collapse the resulting pandas dataframe.

Describe alternatives you've considered
People do this manually -- but perhaps in the result builder makes more sense.

Additional context
Prerequisites for someone picking this up:

  • know Pandas.
  • know python.
  • can write the pandas code to go from wide to long.
  • can read the Hamilton code base to figure out where to add it.
@HamiltonRepoMigrationBot HamiltonRepoMigrationBot added the enhancement New feature or request label Feb 26, 2023
@HamiltonRepoMigrationBot
Copy link
Collaborator Author

Comment by skrawcz
Saturday Apr 30, 2022 at 05:24 GMT


So this doesn't appear to be as simple as I thought it would be.

The issue going wide to long, is that you need some context to know how to collapse things. To pass that context in, you cannot have a static method, since it can't reference self, which is what build_result() in the ResultMixin is.

Here's some possible code -- however it's limited in use to non - distributed/cluster computation settings.

class SimplePythonLongFormatDataFrameGraphAdapter(SimplePythonDataFrameGraphAdapter):
    """Adapter for building a long format pandas dataframe from the result.

    There are two pandas methods that could be used:
     - melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
    or
     - wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html

    The user must tell this object which one to use, and provide the correct arguments.
    """
    def __init__(self, method_name: str, **method_kwargs: Any):
        """

        :param method_name:  the name of the pandas function to use for going from wide to long format.
            Currently "melt" and "wide_to_long".
        :param method_kwargs: the arguments, other than the dataframe, to provide for that specific method.
            See:
             - melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
             - wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
            For information on what arguments to pass in .
        """
        if method_name not in ['melt', 'wide_to_long']:
            raise ValueError(f"Error, unknown {method_name} provided. It should be one of ['melt', 'wide_to_long']")
        self.method_name = method_name
        self.method_kwargs = method_kwargs

    def build_result(self, **outputs: typing.Dict[str, typing.Any]) -> typing.Any:
        """Delegates to the result builder function supplied."""
        wide_df = super(SimplePythonDataFrameGraphAdapter, self).build_result(**outputs)
        pandas_method = getattr(pd, self.method_name)
        long_df = pandas_method(wide_df, **self.method_kwargs)
        del wide_df  # clean this representation up.
        return long_df

@HamiltonRepoMigrationBot
Copy link
Collaborator Author

Comment by elijahbenizzy
Saturday Oct 29, 2022 at 17:12 GMT


@skrawcz I'm not sure I like the abstraction above. Way too coupled to pandas specifics/APIs. Rather, we should come up with a pretty simple API (or multiple) that express what, exactly, we want. melt has a massive amount of complex code, pretty sure wide_to_long just calls it and is more user-friendly. And we should be able to use similar parameters...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request migrated-from-old-repo Migrated from old repository
Projects
None yet
Development

No branches or pull requests

2 participants