Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
===============Folder Description============ 1 Paper_Latex The source file of paper in Latex 2 archive All the data from LDA and Labeled LDA, and the data for ploting 3 /src, /bin , /data They are for source code. =================Describe the Archive folder============================== This file is to describe the files in the folder /doc ---- Regular LDA (regular-lda-htc.scala,htc_35_topics,document-topic-distribution.csv /2500) ---- HTC ---- HTC.csv ---- Labeled LDA (labeled_lda_htc.scala,htc_topics_72.txt,document-topic-distribution.csv,/2500 ) Doc ---- Regular LDA (the same types as HTC) ----Moto ---- moto.csv ---- Labeled LDA (the same types as HTC) ----bug-num-over-time.xlsx (data for ploting the bugs over time in Topic Mining and Analysis) ----topic_relevance_analysis.xlsx (data for ploting the topic relevance in Topic Mining and Analysis) ----similarity_comparison_of_LDA_Labeled_LDA.xlsx (data for ploting the similarity of regular LDA and Labeled LDA for both vendors) 0 HTC.csv/moto.csv They contains all bugs and the labels manually put on. 1 regular-lda-x.scala The script which is the input to TMT tools to get the topics from regular/Labeled LDA 2 x_35_topics It is from regular/Labeled lDA we labeled, the sequence follows the output from LDA 3 document-topic-distribution.csv It is generated by regular/Labeled LDA. bugs as the row, topics as the columns, the value is the relevance of bug to this topic 4 /2500 It is the output result from TMT tools. =====================For Source Code Description================================ 1 source code 1 main source code is in src/evaluation folder 2 src/sample is the first version, the only valable information is to caculate the the threshold which can make us find the best distribution for LDA. The result is 0.2 2 data in the folder /data/<vendor> 1 bug-time.csv it helps to get the sorted bugs over time. 2 formated-bug-time.csv it is the intermediate data.There are bug id and formated time for each bug 3 label-bug-topic-entire.csv the file includes bug id, bug time, bug title, bug description and bug label from manually work 4 label-topic-distribut.csv It is from Labeled LDA, the bug id, and the relevenace value for each topic 5 label-topics.csv labels from manual work 6 matrix-relevance-l.csv It is the output file. It is the bug and topic average relevance distribution for regular LDA by month. The column is the topics, the row is the time period. 7 matrix-relevance-r.csv It is the output file. It has the same format as "matrix-relevance-r.csv". But it is for label LDA. 8 regular-topic-distribute.csv It is generated by regular LDA. the column is the topics(there is no topic name, but the indexing id is the same as what it is in the regular-topic.csv), the row is the bug id 9 regular-topcis.csv It is topics from regular LDA. We labeled them. 10 topics-bugs-inter.csv This is a matrix, whose rows are the topics from regular LDA, the columns are the topics from Label LDA. The value is the number of bugs intersection between regular and label LDA. 11 topics-bug-l.csv This is a matrix, but only the row is valuable, which shows the number of bugs for topics from label LDA 12 topcis-bug-r.csv This is a matrix, but only the column is valuable, which shows the number of bugs for topics from regular LDA.