Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LDA-GA #8

Open
amritbhanu opened this issue Apr 5, 2016 · 4 comments
Open

LDA-GA #8

amritbhanu opened this issue Apr 5, 2016 · 4 comments
Labels

Comments

@amritbhanu
Copy link
Contributor

How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms

[bibtex](@inproceedings{panichella2013effectively,
title={How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms},
author={Panichella, Annibale and Dit, Bogdan and Oliveto, Rocco and Di Penta, Massimiliano and Poshyvanyk, Denys and De Lucia, Andrea},
booktitle={Proceedings of the 2013 International Conference on Software Engineering},
pages={522--531},
year={2013},
organization={IEEE Press}
})

Approaches:

  • Posterior Distribution over the assignments of words to topics
  • Computing the harmonic mean of posterior distribution

Parameters up for tuning:

  • k,n,a,b. (n comes from gibbs sampling generative model)

Definitions:

  • Dominant topic: Let θ be the topic-by-document matrix generated by a particular LDA configuration P = [k, n, α, β]. A generic document dj has a dominant topic ti, if and only if θ(i,j) = max{ (θ(h,j)), h = 1 . . . k}.
  • High inter cluster distance and low intra cluster distance

Evaluation criteria:

  • Internal- Cohesion (intra) and separation (inter). Silhouette Coefficient (-1 to 1)
  • External - External info needed.

Need clarity? - how to convert text into data points. To do the cluster goodness evaluation.

Actual LDA-GA

  • a stochastic search technique based on the mechanism of a natural selection and natural genetics.
  • Stochastic search is the method of choice for solving many hard combinatorial problems.
  • having multiple solutions (individuals) evolving in parallel to explore different parts of the search space
  • an individual (or chromosome) is a particular LDA configuration and the population is represented by a set of different LDA configuration
  • The fitness function that drives the GA evolution is the Silhouette coefficient.
  • α and β varied from 0-1, also α and β can be set to default of 50/k and 0.1
  • The LDA-GA has been implemented in R [37] using the topicmodels and GA libraries
  • STOPPING CRITERIA - For GA, we used the following settings: a crossover probability of 0.6, a mutation probability of 0.01, a population of 100 individuals, and an elitism of 2 individuals. As a stopping criterion for the GA, we terminated the evolution if the best results achieved did not improve for 10 generations; otherwise we stopped after 100 generations

Assumptions:

  • Top 10 words belonging to the topic with the highest probability in the obtained topic distribution were then used to label the class
@amritbhanu amritbhanu mentioned this issue Apr 5, 2016
7 tasks
@timm
Copy link

timm commented Apr 6, 2016

nice

@WeiFoo
Copy link

WeiFoo commented May 31, 2017

@amritbhanu, I searched this paper on Github and tried to find the datasets in this paper.

Two questions for you:

  1. Why didn't you use the same case studies in your LDA-DE as this LDA-GA paper? They have feature localization, traceability link recovery, and Software Artifact Labeling tasks.

Remember one of the ICSE reviewers complain about your case study.....

  1. Did you try to find the datasets?

@WeiFoo WeiFoo reopened this May 31, 2017
@amritbhanu
Copy link
Contributor Author

Yes, I tried, and the link provided by the authors is not working. So, I can not replicate any of their studies. Same goes for their datasets. Even after contacting the authors, no response

@apanichella
Copy link

Hi, I saw this post kinda late... I was searching for other things. Anyhow, the replication package is at the link https://dibt.unimol.it/reports/LDA-GA/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants