No description, website, or topics provided.
Clone or download
Pull request Compare This branch is 19 commits behind danhan:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


===============Folder Description============
1 Paper_Latex
   The source file of paper in Latex

2 archive
   All the data from LDA and Labeled LDA, and the data for ploting

3 /src, /bin , /data
  They are for source code.

=================Describe the Archive folder==============================

This file is to describe the files in the folder /doc

              ---- Regular LDA (regular-lda-htc.scala,htc_35_topics,document-topic-distribution.csv /2500)
    ---- HTC  ---- HTC.csv
              ---- Labeled LDA (labeled_lda_htc.scala,htc_topics_72.txt,document-topic-distribution.csv,/2500 )
              ---- Regular LDA (the same types as HTC)
    ----Moto  ---- moto.csv
              ---- Labeled LDA (the same types as HTC)

    ----bug-num-over-time.xlsx (data for ploting the bugs over time in Topic Mining and Analysis)
    ----topic_relevance_analysis.xlsx (data for ploting the topic relevance in Topic Mining and Analysis)
    ----similarity_comparison_of_LDA_Labeled_LDA.xlsx (data for ploting the similarity of regular LDA and Labeled LDA for both vendors)

0 HTC.csv/moto.csv
  They contains all bugs and the labels manually put on.

1 regular-lda-x.scala
  The script which is the input to TMT tools to get the topics from regular/Labeled LDA

2 x_35_topics
  It is from regular/Labeled lDA we labeled, the sequence follows the output from LDA

3 document-topic-distribution.csv
  It is generated by regular/Labeled LDA. bugs as the row, topics as the columns, the value is the relevance of bug to this topic

4 /2500
  It is the output result from TMT tools.

=====================For Source Code Description================================

1 source code 
  1 main source code is in src/evaluation folder
  2 src/sample is the first version, the only valable information is to caculate the the threshold which can make us find the best distribution for LDA. The result is 0.2

2 data
 in the folder /data/<vendor>
 1 bug-time.csv
   it helps to get the sorted bugs over time.
 2 formated-bug-time.csv
   it is the intermediate data.There are bug id and formated time for each bug
 3 label-bug-topic-entire.csv
   the file includes bug id, bug time, bug title, bug description and bug label from manually work
 4 label-topic-distribut.csv
  It is from Labeled LDA, the bug id, and the relevenace value for each topic
 5 label-topics.csv
   labels from manual work
 6 matrix-relevance-l.csv
   It is the output file. It is the bug and topic average relevance distribution for regular LDA by month. The column is the topics, the row is the time period. 7 matrix-relevance-r.csv
   It is the output file. It has the same format as "matrix-relevance-r.csv". But it is for label LDA.
  8 regular-topic-distribute.csv
   It is generated by regular LDA. the column is the topics(there is no topic name, but the indexing id is the same as what it is in the regular-topic.csv), the row is the bug id
  9 regular-topcis.csv
   It is topics from regular LDA. We labeled them.
  10 topics-bugs-inter.csv
   This is a matrix, whose rows are the topics from regular LDA, the columns are the topics from Label LDA. The value is the number of bugs intersection between regular and label LDA.
  11 topics-bug-l.csv
    This is a matrix, but only the row is valuable, which shows the number of bugs for topics from label LDA
  12 topcis-bug-r.csv
     This is a matrix, but only the column is valuable, which shows the number of bugs for topics from regular LDA.