Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load corpora with mmap #23

Open
andreasvc opened this issue May 5, 2016 · 1 comment
Open

Load corpora with mmap #23

andreasvc opened this issue May 5, 2016 · 1 comment

Comments

@andreasvc
Copy link

Would it be possible to load copora with mmap? This would make it possible to work with corpora larger than the available RAM, and is much more efficient if only a small part of a file is going to be used anyway.

@proycon
Copy link
Owner

proycon commented May 25, 2016

When (encoded) corpora are read to build a pattern model, they are already read line by line and not kept in memory.

Mmap is an interesting suggestion though, I'd have to dive into it deeper to see if there are possibilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants