This first post is continuously updated based on the discussions in this thread
In the active learning cycle, the model incrementally improves its predictions on the remaining unlabeled records, but hopefully, all relevant records are identified as early in the process as possible. The reviewer decides to stop at some point during the process to conserve resources or when all records have been labeled. In the latter case, no time was saved and therefore the main question is to decide when to stop: i.e. to determine the point at which the cost of labeling more papers by the reviewer is greater than the cost of the errors made by the current model (e.g., Cohen, 2011). Finding 100% of the relevant papers appears to be almost impossible, even for human annotators(Wang, Nayfeh, Tetzlaff, O’Blenis, & Murad, 2020). Therefore, we typically aim to find 95% of the inclusions. However, in the situation of an unlabeled dataset, you don’t know how many relevant papers there are left to be found. So researchers might either stop too early and potentially miss many relevant papers, or stop too late, causing unnecessary further reading(Z. Yu, N. Kraft, & T. Menzies, 2018a).
There are potential stopping rules which have to be implemented, estimating the number of potentially relevant papers or finding an inflection point(Cormack & Grossman, 2015, 2016; Kastner, Straus, McKibbon, & Goldsmith, 2009; Stelfox, Foster, Niven, Kirkpatrick, & Goldsmith, 2013; Ros, Bjarnason, & Runeson, 2017; Wallace et al., 2010, 2012; Webster & Kemp, 2013; Yu & Menzies, 2019).
Another option is to use heuristics (Bloodgood & Vijay-Shanker, 2014; Olsson & Tomanek, 2009; Vlachos, 2008), for example:
Time-based strategy: If you choose a time-based strategy, you decide to stop after an x amount of time. This strategy can be useful when you have a limited amount of time to screen.
Data-driven strategy: When using a data-driven strategy, you e.g. decide to stop after an x amount of consecutive irrelevant papers (this number can be found in the statistics panel). Whether you choose 50, 100, 250, 500, etc. is dependent on the size of the dataset and the goal of the user. You can ask yourself: how important is it to find all the relevant papers?
Mixed strategy: Another option is to stop after an x amount of time unless you exceed the predetermined threshold of consecutive irrelevant papers before that time.
Below we discuss more options in detail. Join the discussion!!
Some useful references:
Beta Was this translation helpful? Give feedback.