New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pandas result builder that converts to long format #26
Comments
Comment by skrawcz So this doesn't appear to be as simple as I thought it would be. The issue going wide to long, is that you need some context to know how to collapse things. To pass that context in, you cannot have a static method, since it can't reference Here's some possible code -- however it's limited in use to non - distributed/cluster computation settings. class SimplePythonLongFormatDataFrameGraphAdapter(SimplePythonDataFrameGraphAdapter):
"""Adapter for building a long format pandas dataframe from the result.
There are two pandas methods that could be used:
- melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
or
- wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
The user must tell this object which one to use, and provide the correct arguments.
"""
def __init__(self, method_name: str, **method_kwargs: Any):
"""
:param method_name: the name of the pandas function to use for going from wide to long format.
Currently "melt" and "wide_to_long".
:param method_kwargs: the arguments, other than the dataframe, to provide for that specific method.
See:
- melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
- wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
For information on what arguments to pass in .
"""
if method_name not in ['melt', 'wide_to_long']:
raise ValueError(f"Error, unknown {method_name} provided. It should be one of ['melt', 'wide_to_long']")
self.method_name = method_name
self.method_kwargs = method_kwargs
def build_result(self, **outputs: typing.Dict[str, typing.Any]) -> typing.Any:
"""Delegates to the result builder function supplied."""
wide_df = super(SimplePythonDataFrameGraphAdapter, self).build_result(**outputs)
pandas_method = getattr(pd, self.method_name)
long_df = pandas_method(wide_df, **self.method_kwargs)
del wide_df # clean this representation up.
return long_df |
Comment by elijahbenizzy @skrawcz I'm not sure I like the abstraction above. Way too coupled to pandas specifics/APIs. Rather, we should come up with a pretty simple API (or multiple) that express what, exactly, we want. |
Issue by skrawcz
Monday Apr 25, 2022 at 21:53 GMT
Originally opened as stitchfix/hamilton#121
Is your feature request related to a problem? Please describe.
Hamilton works on "wide" columns -- not "long ones". However the "tidy" data ethos thinks data should be in a long format -- it does make some things easier to do.
Describe the solution you'd like
Add a ResultBuilder variant that takes in how you'd want to collapse the resulting pandas dataframe.
Describe alternatives you've considered
People do this manually -- but perhaps in the result builder makes more sense.
Additional context
Prerequisites for someone picking this up:
The text was updated successfully, but these errors were encountered: