should include 5 different expriment for each learner #8

WeiFoo · 2015-11-28T19:46:15Z

WeiFoo · 2015-11-29T20:04:15Z

SMOTE baseline v.s. Tune SMOTE

vivekaxl · 2015-11-29T20:09:20Z

I think in the Testing Phase2, you should SMOTE both training and tuning.

On Sun, Nov 29, 2015 at 2:04 PM, Wei Fu notifications@github.com wrote:

SMOTE baseline v.s. Tune SMOTE

[image: img_2597 2]
https://cloud.githubusercontent.com/assets/7039841/11459690/64d22b66-96aa-11e5-96c2-e471b5732ca3.JPG

—
Reply to this email directly or view it on GitHub
#8 (comment).

Regards,

Vivek Nair
Graduate Student,
Computer Science@NC State

http://vivekaxl.com

WeiFoo · 2015-11-29T20:13:07Z

I can do that, but somehow, it seems overffiting. use testing data as training data again... I don't know, I will try that when the whole framework works.

timm · 2015-11-29T21:56:01Z

@WeiFoo i see u have recalled the ICSE'16 comments

@vivekaxl if we take your advise re phase2, what data do we use to evaluate the different tunings?

vivekaxl · 2015-11-29T22:16:41Z

@timm Phase1 would be exactly as described by @WeiFoo. In Phase2, he seems to be only using the training data where as in the baseline approach he is using both training data as well as tuning data(assuming that traning_data + tuning_data = baseline_training_data). It seems to be as this approach gives the baseline approach an undue advantage of having more data to train on (assuming that there are some classes missing from the training_data in the SMOTEing phase)

WeiFoo · 2015-11-29T22:33:36Z

traning_data + tuning_data = baseline_training_data this is right!

for baseline experiment, baseline trainning data is used for training.
for tuning experiment, the same amount of data is used, but split into new_training_data(A), and tuning_data(B). Using A and B, a set of tuned parameters for SMOTE is got.

@vivekaxl You suggest use A+B, the same baselin_tranining_data for phase 2, here, my concern is, the parameters got from tuning is not based on B as part of training data during tuning, B actually was tuning testing data. If inlucding B for phase 2, do the parameters for SMOTE work well? it seems not fair for SMOTE, because you give some extra data that never used as training data. How could we expect the tuned SMOTE work well?

my point is, for baseline and tuning experiment, I used the same amount of data to build learner before prediciton, but in different ways:

baseline is just to use all of them as training data
tuning is to use some as tuning validation data, and the left as training. and only those training data should be used for prediction.

Yes, here, we're trying to make a balance and don't give any advantage to each part.

timm · 2015-11-29T22:57:54Z

go. do.

WeiFoo added the ToDo label Nov 28, 2015

WeiFoo modified the milestone: assess smote for SE text mining Jan 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should include 5 different expriment for each learner #8

should include 5 different expriment for each learner #8

WeiFoo commented Nov 28, 2015

WeiFoo commented Nov 29, 2015

vivekaxl commented Nov 29, 2015

WeiFoo commented Nov 29, 2015

timm commented Nov 29, 2015

vivekaxl commented Nov 29, 2015

WeiFoo commented Nov 29, 2015

timm commented Nov 29, 2015

should include 5 different expriment for each learner #8

should include 5 different expriment for each learner #8

Comments

WeiFoo commented Nov 28, 2015

WeiFoo commented Nov 29, 2015

SMOTE baseline v.s. Tune SMOTE

vivekaxl commented Nov 29, 2015

WeiFoo commented Nov 29, 2015

timm commented Nov 29, 2015

vivekaxl commented Nov 29, 2015

WeiFoo commented Nov 29, 2015

timm commented Nov 29, 2015