Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise skip-index performance. #71

Open
hamishmorgan opened this issue May 11, 2012 · 0 comments
Open

Optimise skip-index performance. #71

hamishmorgan opened this issue May 11, 2012 · 0 comments

Comments

@hamishmorgan
Copy link
Member

Space could be further reduced using skip indexing, by optimising the order in which features and entries are indexed. This would require additional re-indexing stages to be introduced:

  • Firstly after counting. The feature and entry indices should be occur in descending order of frequency. Currently they are loosely ordered by frequency, since high frequency words are very likely to occur soon, but it could be optimised.
  • Secondly after filtering, moving all filtered features to high indexes so they never cause gaps in the order indices.

This feature would require substantial reworking, so is probably not worth while unless space usage is considered critical. Space saving is unlikely to excede 25%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant