Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(FSRS) Use median in calculating cost and remove outliers #3172

Closed
user1823 opened this issue Apr 27, 2024 · 1 comment · Fixed by #3181
Closed

(FSRS) Use median in calculating cost and remove outliers #3172

user1823 opened this issue Apr 27, 2024 · 1 comment · Fixed by #3181

Comments

@user1823
Copy link
Contributor

Corresponding PR in Python optimizer:

Changes:

  • Use median instead of mean to calculate average costs
  • Remove entries where time = 0 ms
  • Remove entries where time > 1200000 ms (20 min)

The second change was probably already made in

@L-M-Sherlock
Copy link
Contributor

Related code:

let recall_costs = {
let default = [14.0, 14.0, 10.0, 6.0];
let mut arr = default;
revlogs
.iter()
.filter(|r| {
r.review_kind == RevlogReviewKind::Review
&& r.button_chosen > 0
&& r.taken_millis > 0
})
.sorted_by(|a, b| a.button_chosen.cmp(&b.button_chosen))
.group_by(|r| r.button_chosen)
.into_iter()
.for_each(|(button_chosen, group)| {
let group_vec = group.into_iter().map(|r| r.taken_millis).collect_vec();
let average_secs =
group_vec.iter().sum::<u32>() as f64 / group_vec.len() as f64 / 1000.0;
arr[button_chosen as usize - 1] = average_secs
});
if arr == default {
return Err(AnkiError::FsrsInsufficientData);
}
arr
};
let learn_cost = {
let revlogs_filter = revlogs
.iter()
.filter(|r| {
r.review_kind == RevlogReviewKind::Learning
&& r.button_chosen >= 1
&& r.taken_millis > 0
})
.map(|r| r.taken_millis);
let length = revlogs_filter.clone().count() as f64;
if length > 0.0 {
revlogs_filter.sum::<u32>() as f64 / length / 1000.0
} else {
return Err(AnkiError::FsrsInsufficientData);
}
};
let forget_cost = {
let review_kind_to_total_millis = revlogs
.iter()
.sorted_by(|a, b| a.cid.cmp(&b.cid).then(a.id.cmp(&b.id)))
.group_by(|r| r.review_kind)
/*
for example:
o x x o o x x x o o x x o x
|<->| |<--->| |<->| |<>|
x means forgotten, there are 4 consecutive sets of internal relearning in this card.
So each group is counted separately, and each group is summed up internally.(following code)
Finally averaging all groups, so sort by cid and id.
*/
.into_iter()
.map(|(review_kind, group)| {
let total_millis: u32 = group.into_iter().map(|r| r.taken_millis).sum();
(review_kind, total_millis)
})
.collect_vec();
let mut group_sec_by_review_kind: [Vec<_>; 5] = Default::default();
for (review_kind, sec) in review_kind_to_total_millis.into_iter() {
group_sec_by_review_kind[review_kind as usize].push(sec)
}
let mut arr = [0.0; 5];
for (review_kind, group) in group_sec_by_review_kind.iter().enumerate() {
let average_secs = group.iter().sum::<u32>() as f64 / group.len() as f64 / 1000.0;
arr[review_kind] = if average_secs.is_nan() {
0.0
} else {
average_secs
}
}
arr
};
let forget_cost = forget_cost[RevlogReviewKind::Relearning as usize] + recall_costs[0];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants