Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update base code #1

Merged
merged 15 commits into from
Apr 20, 2024
Merged

update base code #1

merged 15 commits into from
Apr 20, 2024

Conversation

avalanchesiqi
Copy link
Owner

No description provided.

Brad Miller and others added 15 commits March 22, 2024 15:05
…guide

Topic model rating filter guide update
Topic seed token prefix adjustments and model updates
Guide updates for new thresholds at which writing ability is locked and unlocked.
This commit splits the monolithic scoring binary into two separate scoring binaries (that may still be run sequentially):
1. Prescoring: do expensive pre-computation to learn user and note parameters
2. [Final] Scoring: ingest prescoring outputs in order to save computation time, then run scoring like it is today.

In this commit, the final result of scoring is the same. In the future though, this unlocks much work to simplify the final scorer.
…scorer

Split scoring binary into separate prescoring and final scoring binaries
…udoraters

Final scoring and prescoring each run about ~10mins faster now when run in parallel on one large CPU machine, due to sharing large dataframes in memory across multiple processes instead of re-reading them.

Also, the core scorer itself now runs about ~10mins faster due to cleanup of unused pseudorater computations (uncertainty estimation)
…_memory_optimization

Optimize scorer: used shared memory across processes, and streamline uncertainty estimation
@avalanchesiqi avalanchesiqi merged commit 1e86c32 into code_comments Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants