Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jul-06-2016 #7

Closed
azhe825 opened this issue Jul 6, 2016 · 0 comments
Closed

Jul-06-2016 #7

azhe825 opened this issue Jul 6, 2016 · 0 comments

Comments

@azhe825
Copy link

azhe825 commented Jul 6, 2016

Baseline paper chosen

Wahono, Romi Satria. "A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks." Journal of Software Engineering 1, no. 1 (2015): 1-16.

Reason: transparency

  1. Construct RQs
  2. Define search string, to retrieve defect prediction papers.
    • The search string is: "(software OR applicati* OR systems ) AND (fault* OR defect* OR quality OR error-prone) AND (predict* OR prone* OR probability OR assess* OR detect* OR estimat* OR classificat*)"
  3. Search in 5 datasets, get 2117 results:
    • ACM Digital Library (dl.acm.org)
    • IEEE eXplore (ieeexplore.ieee.org)
    • ScienceDirect (sciencedirect.com)
    • Springer (springerlink.com)
    • Scopus (scopus.com)
  4. Review title and abstract of the 2117 papers, 213 left.
  5. Review full text of the 213 papers, 71 left.

Experiment Design

  1. Include the 2117 papers in elasticsearch database
  2. Tag the 71 as relevant
  3. Compare two ways to detect the 71 papers within 2117
    • traditional linear review
    • random seed + active learning (random review N papers, then start active learning, uncertainty sampling first, and certainty sampling at some point of time)

Expected result:

  • traditional linear review will need to review 21170.85 papers to find 710.85 relevant ones.
  • learning based review only need to review 21170.15 papers to find 710.85 relevant ones.

Current Progress

  1. Include the 2117 papers in elasticsearch database

Injected citemap.csv (provided by George) into elastic search.

Problems:

  • 576 results returned by query the search string (compared to 2117)
  • within the 576 results, none is in the list of 71 😭
  • only has abstracts, no full text included

To Do

  1. try to inject the original 300 GB data we have into elasticsearch
  2. If 1 does not work out. Use api to query the 5 databases for constructing the 2117 papers.
@azhe825 azhe825 closed this as completed Nov 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant