Zhe's milestone (undetermined) #17

azhe825 · 2016-06-21T13:22:21Z

Current Baseline Approach:
Searching + filtering, then linear review.

Challenges:

To Do:

Work bench
- get data (linear review, time recorded --- baseline approach)
- construct citation matrix
Basic method
- searching (Elasticsearch) [1, 2, 3]
- learning (lexical analysis with term frequency, l2 normalization, SVM...)
- active learning (uncertainty sampling, certainty sampling) [1, 2]
- data balancing? need to test. [2]
Obtain a better initial training set for active learning with fewer reviews [1, 2]
- baseline: random sampling
- clustering based on lexical analysis [1, 2]
- spectral clustering based on citation matrix (citemap?) (Possible expansion: utilize citation matrix on learning) [1, 2]
Get user involved [3]
- show important features and let user to mask them [3]
- allow user to re-review important documents (support vectors) [3]
- let user explore the clusters [3]
Visualization of Learning Result (not sure how this can help right now) [1, 3]
- pretty graphs (d3.js, kibana)
- present results in each cluster

azhe825 · 2016-10-06T14:21:44Z

Implementations of machine assisted reading

(Multi-objective) optimization: reduce number of evaluations by ranking candidates. Ongoing
Defect prediction:
- standard machine assisted reading on projects with no labeled data. (test->train->rank->test->train->rank->...)
- updating with machine assisted reading on new version of codes
- reuse with machine assisted reading on similar projects

timm · 2016-10-12T02:32:19Z

Lets talk to this next time we meet

azhe825 · 2016-10-12T03:43:44Z

Sure

timm assigned timm and azhe825 Oct 12, 2016