-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill optimal retention parameters #2661
Comments
I think it should not be derived from collection directly, because the deck size would increase over time. And the simulator for finding the optimal retention is used to simulate the future.
Let me collect some codes in the python version optimizer. new_card_revlog = df[(df["review_state"] == New)]
self.first_rating_prob = np.zeros(4)
self.first_rating_prob[
new_card_revlog["review_rating"].value_counts().index - 1
] = (
new_card_revlog["review_rating"].value_counts()
/ new_card_revlog["review_rating"].count()
)
recall_card_revlog = df[
(df["review_state"] == Review) & (df["review_rating"] != 1)
]
self.review_rating_prob = np.zeros(3)
self.review_rating_prob[
recall_card_revlog["review_rating"].value_counts().index - 2
] = (
recall_card_revlog["review_rating"].value_counts()
/ recall_card_revlog["review_rating"].count()
)
df["review_state"] = df["review_state"].map(
lambda x: x if x != New else Learning
)
self.recall_costs = np.zeros(3)
recall_costs = recall_card_revlog.groupby(by="review_rating")[
"review_duration"
].mean()
self.recall_costs[recall_costs.index - 2] = recall_costs / 1000
self.state_sequence = np.array(df["review_state"])
self.duration_sequence = np.array(df["review_duration"])
self.learn_cost = round(
df[df["review_state"] == Learning]["review_duration"].sum()
/ len(df["card_id"].unique())
/ 1000,
1,
)
last_state = self.state_sequence[0]
state_block[last_state] = 1
state_count[last_state] = 1
state_duration[last_state] = self.duration_sequence[0]
for i, state in enumerate(self.state_sequence[1:]):
state_count[state] = state_count.setdefault(state, 0) + 1
state_duration[state] = (
state_duration.setdefault(state, 0) + self.duration_sequence[i]
)
if state != last_state:
state_block[state] = state_block.setdefault(state, 0) + 1
last_state = state
if Relearning in state_count and Relearning in state_block:
forget_cost = round(
state_duration[Relearning] / state_block[Relearning] / 1000
+ recall_cost,
1,
) |
To be honest I'm not very good at following untyped numpy code 😅 Would this be something @asukaminato0721 would be interested in? Re deck size, do you suggest just leaving it at the default 10000 then? I have no preference. |
OK. I can write some Rust code to implement it in https://github.com/open-spaced-repetition/fsrs-rs For deck size, it's OK to just leave it at 10000. |
ok |
So I take it deckSize, daysToSimulate, and seconds per day should be set by the user. Do we need to make the other values customizable, or should we just hide them from the user and automatically calculate them as part of the optimizer step - ie remove the separate Get Params button? |
It's OK if the data is enough to generate stable stats. For new users, I recommend hiding the entire optimal retention module. |
I've added a stub, but the values still need to be derived from the revlog. Help welcome!
anki/rslib/src/scheduler/fsrs/retention.rs
Line 59 in 59759b4
Also on naming: currently we call it 'deck size', but the search is counting all cards using that preset. We should either change the name, or change the search to be limited to the current deck. What do you think is better?
The text was updated successfully, but these errors were encountered: