Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cat encode #50

Merged
merged 3 commits into from
Sep 7, 2022
Merged

fix cat encode #50

merged 3 commits into from
Sep 7, 2022

Conversation

eromoe
Copy link
Contributor

@eromoe eromoe commented Sep 7, 2022

No description provided.

@@ -56,7 +56,6 @@ def __init__(self):
def fit(self,testx, y=None):
### Do not change this since Rare class combiner requires this test ##
if isinstance(testx, tuple):
y = testx[1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need in fit .

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have made a mistake in below code:

test_result = MLB.fit_transform(train[everycol])

Instead it should be:

test_result = MLB.transform(test[everycol])

Do you see the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right , my mistake , now fixed

@eromoe
Copy link
Contributor Author

eromoe commented Sep 7, 2022

@AutoViML
I haven't look into why rare class need label encoder return tuple , could you mind to make an explanation ?

If , tuple return is only used for something relate to rare class . Don't you think two label_encoder are more correct ?
Such as

  1. One only fit one string array and return one transformed array
  2. The other fit (string array, y ) and return (transformed array, y )

@eromoe
Copy link
Contributor Author

eromoe commented Sep 7, 2022

@AutoViML
I think you are mainly working on kaggle competition ?

Let's think about OOT (out of time) Testing , we usually have 3 datasets ,

  • train ( data from 2022.09.01 ~2022.09.05)
  • test ( data from 2022.09.06)
  • production ( data on today 2022.09.07 )

Transformer only fit_transform by train , then apply to test and do validation .

Acually there is no need to allow input both train and test in one funtion .
just one input is enough

def FE_convert_all_object_columns_to_numeric(train, features=[])

Just call same funtion again if new data comming .Or do you have another consideration ?

@AutoViML
Copy link
Owner

AutoViML commented Sep 7, 2022

At the moment let's close this thread with this change. If you feel another change is needed elsewhere, you can open another pull request. Okay?

@AutoViML AutoViML closed this Sep 7, 2022
@AutoViML AutoViML reopened this Sep 7, 2022
@AutoViML AutoViML merged commit b30b73d into AutoViML:main Sep 7, 2022
@eromoe
Copy link
Contributor Author

eromoe commented Sep 7, 2022

OK, I won't make pr in recently I think , just gona make some discussion :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants