Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should include 5 different expriment for each learner #8

Open
4 of 5 tasks
WeiFoo opened this issue Nov 28, 2015 · 7 comments
Open
4 of 5 tasks

should include 5 different expriment for each learner #8

WeiFoo opened this issue Nov 28, 2015 · 7 comments
Labels

Comments

@WeiFoo
Copy link
Contributor

WeiFoo commented Nov 28, 2015

  • learner_naive
  • learner_smote
  • learner_tuned
  • learner_tunedSmote
  • learner_tunedBoth
@WeiFoo WeiFoo added the ToDo label Nov 28, 2015
@WeiFoo
Copy link
Contributor Author

WeiFoo commented Nov 29, 2015

SMOTE baseline v.s. Tune SMOTE

img_2597 2

@vivekaxl
Copy link

I think in the Testing Phase2, you should SMOTE both training and tuning.

On Sun, Nov 29, 2015 at 2:04 PM, Wei Fu notifications@github.com wrote:

SMOTE baseline v.s. Tune SMOTE

[image: img_2597 2]
https://cloud.githubusercontent.com/assets/7039841/11459690/64d22b66-96aa-11e5-96c2-e471b5732ca3.JPG


Reply to this email directly or view it on GitHub
#8 (comment).

Regards,

Vivek Nair
Graduate Student,
Computer Science@NC State

http://vivekaxl.com

@WeiFoo
Copy link
Contributor Author

WeiFoo commented Nov 29, 2015

I can do that, but somehow, it seems overffiting. use testing data as training data again... I don't know, I will try that when the whole framework works.

@timm
Copy link

timm commented Nov 29, 2015

@WeiFoo i see u have recalled the ICSE'16 comments

@vivekaxl if we take your advise re phase2, what data do we use to evaluate the different tunings?

@vivekaxl
Copy link

@timm Phase1 would be exactly as described by @WeiFoo. In Phase2, he seems to be only using the training data where as in the baseline approach he is using both training data as well as tuning data(assuming that traning_data + tuning_data = baseline_training_data). It seems to be as this approach gives the baseline approach an undue advantage of having more data to train on (assuming that there are some classes missing from the training_data in the SMOTEing phase)

@WeiFoo
Copy link
Contributor Author

WeiFoo commented Nov 29, 2015

traning_data + tuning_data = baseline_training_data this is right!

for baseline experiment, baseline trainning data is used for training.
for tuning experiment, the same amount of data is used, but split into new_training_data(A), and tuning_data(B). Using A and B, a set of tuned parameters for SMOTE is got.

@vivekaxl You suggest use A+B, the same baselin_tranining_data for phase 2, here, my concern is, the parameters got from tuning is not based on B as part of training data during tuning, B actually was tuning testing data. If inlucding B for phase 2, do the parameters for SMOTE work well? it seems not fair for SMOTE, because you give some extra data that never used as training data. How could we expect the tuned SMOTE work well?

my point is, for baseline and tuning experiment, I used the same amount of data to build learner before prediciton, but in different ways:

  • baseline is just to use all of them as training data
  • tuning is to use some as tuning validation data, and the left as training. and only those training data should be used for prediction.

Yes, here, we're trying to make a balance and don't give any advantage to each part.

@timm
Copy link

timm commented Nov 29, 2015

go. do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants