Drop the usage of meta #72

ManuelAlvarezC · 2018-11-19T10:36:35Z

RDT shouldn't load data from disk nor need a metadata.json file to operate. It's behavior should be the following:

Both HyperTransformer and Transformers should have as input data which is already in the required format. That means that HyperTransformer shouldn't load data from disk, nor the individual transformers prepare it.
The filling or not of missing values should be taken out from individual transformers and be handled at the HyperTransformer, as specified in this issue.

The text was updated successfully, but these errors were encountered:

csala · 2019-09-24T15:18:37Z

The new behavior will be as follows:

Transformers

Transformers __init__ will expect either nothing at all (most usual case) or just the necessary arguments to know how to handle a particular column.
Transformers fit, transform and fit_transform will expect a single argument called data which will be either a series or a numpy array. If a Series is given, it will be converted into a numpy array before doing anything else. The output will be a numpy array in all cases.
Transformers reverse_transform will expect a single argument called data which will be a numpy array.

Transformers will also be agnostic to the column name within the original DataFrame.

HyperTransformer

The __init__ method will expect an input, called transformers, which will be a dictionary with column names associated to a single transformer class name and its kwargs. It will also have an optional copy argument, which will be boolean and default to True, which will indicate whether to make a copy of the input DataFrame before processing it or not.
The fit, fit_transform, transform and reverse_transform methods will expect a single argument, called data, which will be a DataFrame containing at least the columns indicated in the transformers input of the __init__ method. The output from the transform and fit_transform method will be a DataFrame like the one passed as input with the columns replaced by their transformed counterparts.

The overall usage will look like this:

>>> transformers = {
...     "<column_name>": {
...         "class": "CategoricalTransformer",
...         "kwargs": {
...             "anonymize": True,
...         }
...     }
... }
>>> hyper_transformer = HyperTransformer(transformers, copy=True)
>>> transformed = hyper_transformer.fit_transform(data)
>>> restored = hyper_transformer.reverse_transform(transformed)

ManuelAlvarezC added the internal The issue doesn't change the API or functionality label Nov 19, 2018

ManuelAlvarezC added this to the 0.2.0 milestone Nov 19, 2018

ManuelAlvarezC self-assigned this Nov 19, 2018

ManuelAlvarezC modified the milestone: 0.2.0 Jan 6, 2019

csala unassigned ManuelAlvarezC Sep 12, 2019

csala assigned JDTheRipperPC Sep 24, 2019

This was referenced Sep 24, 2019

Refactor unittests #78

Closed

Create an Identity Transformer #88

Closed

Fix HyperTransformer.reverse_transform column lookup behavior #91

Closed

Bad usage of 'missing' attribute, HyperTransformer class #97

Closed

csala added enhancement and removed internal The issue doesn't change the API or functionality labels Sep 24, 2019

JDTheRipperPC mentioned this issue Sep 30, 2019

compatibility with rdt issue 72 sdv-dev/SDV#120

Closed

JDTheRipperPC mentioned this issue Oct 15, 2019

Issue 72 drop usage of meta #101

Merged

csala closed this as completed in #101 Oct 15, 2019

csala mentioned this issue Oct 15, 2019

Split HyperTransfomer in two #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop the usage of meta #72

Drop the usage of meta #72

ManuelAlvarezC commented Nov 19, 2018

csala commented Sep 24, 2019 •

edited

Drop the usage of meta #72

Drop the usage of meta #72

Comments

ManuelAlvarezC commented Nov 19, 2018

csala commented Sep 24, 2019 • edited

Transformers

HyperTransformer

csala commented Sep 24, 2019 •

edited