Classification using LDA #26

amritbhanu · 2016-07-14T22:12:13Z

Experiment Setup

Datasets - Manney Generator of Stack Exchange sites. 25 datasets.
Running tuning experiment with 5 terms overlap. Select those parameters with max stability score.
Find clusters, and each topic is assigned a sequential "1,2,3 labels".
Now each document will be labelled as 1,2,3, rather than tags
Run SVM. Binary classification.

We have the baseline results with no smote svm, smote svm.

amritbhanu · 2016-07-14T22:54:34Z

http://www.sciencedirect.com/science/article/pii/S0164121216300528

timm · 2016-07-18T20:14:04Z

amrit... is the paper all done? like do that before moving on

t

amritbhanu · 2016-07-19T01:38:21Z

I am on it prof

amritbhanu · 2016-08-04T17:49:00Z

@timm Here is the result of using LDA to automatically label the documents and then use a learner.

From the paper, we cant reproduce results, due to :

Mylyn Project, Eclipse Project, FIrefox project, Netbeans. The preprocessed datasets are not available neither the exact preprocessing steps given. they followed some naming conventions which they havent described.

Experiment:

Took this as an example. http://dl.acm.org/citation.cfm?id=2390074
After doing LDA, they labeled each document to the top weighted topic.
Each document will have a label 1,2,3...
Selected a target label (yes) and rest will be chosen as no. Converted into binary classification.
5 by 5 cross val. Hashing trick with 10k features. SVM Classifier

Conclusion

Baseline SVM didnt perform well, this might be because of the tags which we used to label the Stackexchange websites. This can affect all our previous results which we showed to LN. Basically the numbers will change. Conclusions might remain same or not.
LDA is able to correctly label the documents.

Results:

timm · 2016-08-04T19:44:58Z

am now lost in the details.

please bust fscore into precision and recall

this looks like no win with tuning... right?

please write this up as a 2-4 page pdf doc. define all your terms. dont worry about the start up sections (motivation, background)

but what is your justification for "baseline"? what papers use "baseline"?

t

amritbhanu · 2016-08-05T13:37:58Z

Yes no win with tuning, but our result numbers shown to LN might change. Conclusion might remain same or not.

My baseline results is from our BIGDSE paper, where we just used hashing trick with svm as baseline.

I will compile all these terms and my thoughts into a white paper soon.

timm · 2016-08-05T14:58:41Z

fyi- you may need to tune (1) the feature extraction (of the topics) AND (2) the learner to get improved performance.

right now ur just tuning (1) right?

without doing (2), what you could do is show conclusion instability (a venn diagram of documents classified XYZ via untuned feature extraction repeated 10 times on 10 different data orderings.

with (2) you might get the kinds of improvements wei reported

amritbhanu · 2016-08-05T17:40:56Z

I did (1)tuning and then tried labeling the documents with topics X,Y,Z. On the other hand, the original dataset (stackexchange websites or so called manny dataset generator) were labeled with tags. Once I labeled the document using LDA, (2) the feature extraction used is the feature hasher (hashing trick) and then a learner.
- what my conclusion is with tuning or without tuning, both performed better than the baseline results. So this has to do with the dataset (wrong data) which we used during LN times.
On your suggestion, I will try (1) feature extraction of topics and (2) then a learner.
I didnt understand you about the venn diagram. From tuned results, I will have documents classified as X1, Y1, Z1...and from untuned results I will have documents classified as X2, Y2, Z2. What do you mean now?

amritbhanu added the Work label Jul 14, 2016

amritbhanu added this to the To Dos milestone Jul 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification using LDA #26

Classification using LDA #26

amritbhanu commented Jul 14, 2016 •

edited

Loading

amritbhanu commented Jul 14, 2016

timm commented Jul 18, 2016

amritbhanu commented Jul 19, 2016

amritbhanu commented Aug 4, 2016 •

edited

Loading

timm commented Aug 4, 2016

amritbhanu commented Aug 5, 2016

timm commented Aug 5, 2016

amritbhanu commented Aug 5, 2016 •

edited

Loading

Classification using LDA #26

Classification using LDA #26

Comments

amritbhanu commented Jul 14, 2016 • edited Loading

Experiment Setup

amritbhanu commented Jul 14, 2016

timm commented Jul 18, 2016

amritbhanu commented Jul 19, 2016

amritbhanu commented Aug 4, 2016 • edited Loading

Experiment:

Conclusion

Results:

timm commented Aug 4, 2016

amritbhanu commented Aug 5, 2016

timm commented Aug 5, 2016

amritbhanu commented Aug 5, 2016 • edited Loading

amritbhanu commented Jul 14, 2016 •

edited

Loading

amritbhanu commented Aug 4, 2016 •

edited

Loading

amritbhanu commented Aug 5, 2016 •

edited

Loading