Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use #3

Closed
ericmjl opened this issue Jul 16, 2018 · 4 comments
Closed

How to use #3

ericmjl opened this issue Jul 16, 2018 · 4 comments

Comments

@ericmjl
Copy link
Member

ericmjl commented Jul 16, 2018

Hey @Zsailer, great to meet you at SciPy 2018!

I think pandas_flavor is what I'd like to switch over to in pyjanitor, where I simply register functions as a pandas accessor rather than subclass the entire dataframe outright.

There is something a bit magical about how pandas_flavor works though. With subclassing, everything is quite transparent - I subclass pandas DataFrames, then have the users wrap their existing dataframe inside a Janitor dataframe, following which, all of the data cleaning methods are available:

import pandas as pd
import janitor as jn

df = pd.DataFrame(...)
df = jn.DataFrame(df).clean_names()...

Say I decorated the Janitor functions as pandas accessors. How would things look like for an end-user? Would it be like the following?

import pandas as pd

df = pd.DataFrame(...).clean_names().remove_empty()...

I guess I'm just wondering, where and when does a decorated function get exposed up to pandas?

Thanks again for putting this out!

@Zsailer
Copy link
Collaborator

Zsailer commented Jul 18, 2018

Hi @ericmjl, great to meet you too!

There are two ways you could expose pyjanitor methods to users:

1. Add an accessor with methods underneath

The recommended way is to add them underneath an accessor object. This would look like:

import pandas as pd
import janitor

df = pd.DataFrame(...)
df = df.janitor.clean_names()
df = df.janitor.remove_empty()

When you import janitor, it registers/attaches the .janitor accessor to the pandas DataFrame. All the janitor methods live underneath this accessor. This keeps the janitor methods self-contained. It also means that every DataFrame in the namespace will have the janitor accessor.

To add an accessor and methods:

import pandas_flavor

@pandas_flavor.register_dataframe_accessor('janitor')
class JanitorAccessor(object):

    def __init__(self, df):
        self.df = df

    def clean_names(self):
        ...

2. Add methods directly to the DataFrame

Your second option is to add methods directly to the DataFrame. This would allow you to chain commands like in your example above. The methods are added to the DataFrame object itself, before initialization.

This would look like:

import pandas as pd
import janitor

df = pd.DataFrame(...).clean_names().remove_empty()

To add methods, simple write them as functions and register them with the DF.

import pandas_flavor

@pandas_flavor.register_dataframe_method
def clean_names(df):
    ...

Does this help answer your question?

@ericmjl
Copy link
Member Author

ericmjl commented Jul 18, 2018

The part that I was missing was that I just had to import janitor, and do nothing with it afterwards 😄. Thanks for clarifying!

One thing that does happen with Pyjanitor though, is that upon decoration, my functions (which all return a dataframe) now return None, which makes them untestable. I think I know what's going on (there is no return statement when registering a function); is this hypothesis correct? If so, would it make sense to put in a PR to return the original function as well, or will this break the functionality of the pandas_flavor?

@Zsailer
Copy link
Collaborator

Zsailer commented Jul 18, 2018

Ah, you're totally right! There should be return statements inside the inner function of the register_dataframe_method and register_series_method decorators. This won't break functionality and should allow you to run tests.

We need to add a return method after these lines:
https://github.com/Zsailer/pandas_flavor/blob/bb892346dbe42c04725f0182c79e401496211bda/pandas_flavor/register.py#L31-L32

and

https://github.com/Zsailer/pandas_flavor/blob/bb892346dbe42c04725f0182c79e401496211bda/pandas_flavor/register.py#L51-L52

If you'd like to put in a PR, that would be great! Otherwise, I can do it later today.

Thanks!

@ericmjl
Copy link
Member Author

ericmjl commented Jul 18, 2018

I'm on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants