Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the usage of meta #72

Closed
ManuelAlvarezC opened this issue Nov 19, 2018 · 1 comment · Fixed by #101
Closed

Drop the usage of meta #72

ManuelAlvarezC opened this issue Nov 19, 2018 · 1 comment · Fixed by #101
Assignees
Milestone

Comments

@ManuelAlvarezC
Copy link
Contributor

RDT shouldn't load data from disk nor need a metadata.json file to operate. It's behavior should be the following:

  • Both HyperTransformer and Transformers should have as input data which is already in the required format. That means that HyperTransformer shouldn't load data from disk, nor the individual transformers prepare it.

  • The filling or not of missing values should be taken out from individual transformers and be handled at the HyperTransformer, as specified in this issue.

@ManuelAlvarezC ManuelAlvarezC added the internal The issue doesn't change the API or functionality label Nov 19, 2018
@ManuelAlvarezC ManuelAlvarezC added this to the 0.2.0 milestone Nov 19, 2018
@ManuelAlvarezC ManuelAlvarezC self-assigned this Nov 19, 2018
@ManuelAlvarezC ManuelAlvarezC modified the milestone: 0.2.0 Jan 6, 2019
@csala
Copy link
Contributor

csala commented Sep 24, 2019

The new behavior will be as follows:

Transformers

Transformers __init__ will expect either nothing at all (most usual case) or just the necessary arguments to know how to handle a particular column.
Transformers fit, transform and fit_transform will expect a single argument called data which will be either a series or a numpy array. If a Series is given, it will be converted into a numpy array before doing anything else. The output will be a numpy array in all cases.
Transformers reverse_transform will expect a single argument called data which will be a numpy array.

Transformers will also be agnostic to the column name within the original DataFrame.

HyperTransformer

The __init__ method will expect an input, called transformers, which will be a dictionary with column names associated to a single transformer class name and its kwargs. It will also have an optional copy argument, which will be boolean and default to True, which will indicate whether to make a copy of the input DataFrame before processing it or not.
The fit, fit_transform, transform and reverse_transform methods will expect a single argument, called data, which will be a DataFrame containing at least the columns indicated in the transformers input of the __init__ method. The output from the transform and fit_transform method will be a DataFrame like the one passed as input with the columns replaced by their transformed counterparts.

The overall usage will look like this:

>>> transformers = {
...     "<column_name>": {
...         "class": "CategoricalTransformer",
...         "kwargs": {
...             "anonymize": True,
...         }
...     }
... }
>>> hyper_transformer = HyperTransformer(transformers, copy=True)
>>> transformed = hyper_transformer.fit_transform(data)
>>> restored = hyper_transformer.reverse_transform(transformed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants