Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill optimal retention parameters #2661

Closed
dae opened this issue Sep 17, 2023 · 6 comments
Closed

Fill optimal retention parameters #2661

dae opened this issue Sep 17, 2023 · 6 comments
Labels

Comments

@dae
Copy link
Member

dae commented Sep 17, 2023

I've added a stub, but the values still need to be derived from the revlog. Help welcome!

pub fn get_optimal_retention_parameters(

Also on naming: currently we call it 'deck size', but the search is counting all cards using that preset. We should either change the name, or change the search to be limited to the current deck. What do you think is better?

@L-M-Sherlock
Copy link
Contributor

deck size

I think it should not be derived from collection directly, because the deck size would increase over time. And the simulator for finding the optimal retention is used to simulate the future.

Help welcome!

Let me collect some codes in the python version optimizer.

new_card_revlog = df[(df["review_state"] == New)]
self.first_rating_prob = np.zeros(4)
self.first_rating_prob[
    new_card_revlog["review_rating"].value_counts().index - 1
] = (
    new_card_revlog["review_rating"].value_counts()
    / new_card_revlog["review_rating"].count()
)
recall_card_revlog = df[
    (df["review_state"] == Review) & (df["review_rating"] != 1)
]
self.review_rating_prob = np.zeros(3)
self.review_rating_prob[
    recall_card_revlog["review_rating"].value_counts().index - 2
] = (
    recall_card_revlog["review_rating"].value_counts()
    / recall_card_revlog["review_rating"].count()
)


df["review_state"] = df["review_state"].map(
    lambda x: x if x != New else Learning
)


self.recall_costs = np.zeros(3)
recall_costs = recall_card_revlog.groupby(by="review_rating")[
    "review_duration"
].mean()
self.recall_costs[recall_costs.index - 2] = recall_costs / 1000


self.state_sequence = np.array(df["review_state"])
self.duration_sequence = np.array(df["review_duration"])
self.learn_cost = round(
    df[df["review_state"] == Learning]["review_duration"].sum()
    / len(df["card_id"].unique())
    / 1000,
    1,
)

last_state = self.state_sequence[0]
state_block[last_state] = 1
state_count[last_state] = 1
state_duration[last_state] = self.duration_sequence[0]
for i, state in enumerate(self.state_sequence[1:]):
    state_count[state] = state_count.setdefault(state, 0) + 1
    state_duration[state] = (
        state_duration.setdefault(state, 0) + self.duration_sequence[i]
    )
    if state != last_state:
        state_block[state] = state_block.setdefault(state, 0) + 1
    last_state = state

if Relearning in state_count and Relearning in state_block:
    forget_cost = round(
        state_duration[Relearning] / state_block[Relearning] / 1000
        + recall_cost,
        1,
    )

@dae
Copy link
Member Author

dae commented Sep 17, 2023

To be honest I'm not very good at following untyped numpy code 😅 Would this be something @asukaminato0721 would be interested in?

Re deck size, do you suggest just leaving it at the default 10000 then? I have no preference.

@L-M-Sherlock
Copy link
Contributor

OK. I can write some Rust code to implement it in https://github.com/open-spaced-repetition/fsrs-rs

For deck size, it's OK to just leave it at 10000.

@asukaminato0721
Copy link

To be honest I'm not very good at following untyped numpy code 😅 Would this be something @asukaminato0721 would be interested in?

ok

@dae
Copy link
Member Author

dae commented Sep 18, 2023

So I take it deckSize, daysToSimulate, and seconds per day should be set by the user. Do we need to make the other values customizable, or should we just hide them from the user and automatically calculate them as part of the optimizer step - ie remove the separate Get Params button?

@L-M-Sherlock
Copy link
Contributor

It's OK if the data is enough to generate stable stats. For new users, I recommend hiding the entire optimal retention module.

@dae dae closed this as completed in 6074865 Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants