Indexing a large number of files increases Albert's memory usage drastically #1

hotice · 2015-01-19T13:44:19Z

Albert uses just about 9-10 MiB of RAM on my system which is really great. But after I set it to index some folders containing a large number of files (they contain roughly 280,000 files), its memory usage jumped to 280-300 MiB of RAM.

Tested under Ubuntu 14.10 64bit.

ManuelSchneid3r · 2015-01-19T14:46:49Z

Yes, the reason for alberts performance is that the files are beeing indexed in a way the search algorithm needs them. The word match produces an inverted index, which means there is the same amount of entries as you have words in your files names. The fuzzy index is more complex and has, depending on the config, about three to five times of the memory usage.

The key of further development is to get an space efficient index structure for the file index, which is planned. An other idea is to reduce the amount of files, e.g. by mimetype, which is planned too, but in far future.

If you really need all of those 280,000 files to be launchable from albert, then I would say you are an advanced user and have to accept the amount of memory used. (Although the first idea would still help reducing mem-usage) If not you may still choose the folders which should be indexed (and especially which not).

I will leave it open and come back if one of the ideas is done.

hotice · 2015-01-19T15:00:53Z

Thanks for the answer!

anibalardid · 2015-02-13T13:30:53Z

Hi Manuel . I really love this app. It's cute, fast and functional !

Do you have ETA about new updates ?

regards !

simrc · 2015-05-02T07:32:52Z

Hi Manuel , congratulations for ìdea , software designed very well ! I have a bug , after a few days of use , begins to show results in Chinese , and no longer looks in the directory , and not giving any results . I hope to be helpful , and to have as soon as new updates on this great application !

ManuelSchneid3r · 2015-05-02T10:32:29Z

Yes currently this is the main problem, I know. The pluginsystem is mainly done. Atm I am working on the "ports" of the modules to plugins. I especially spend a lot of time in the files plugin. I needed a lot of time trying to figure out how to reduce the memory usage. I tried different spaceefficient, in-memory data structures like folder maps or radix trees, but all of them are quite cumbersome to handle. Currently I am testing a sqlite database containing the data. I guess this will be the solution for the future, if if does not slow down the lookup too much. This has one essential advantage: No optimizing for space anymore and not limit for future ideas that may imply a larger memusage, e.g. metadata, aliases and stuff like that, since the megabytes will not hurt on the disk.
The Chinese stuff is somehow related to the serialization. This will not be an issue anymore with the database.

@anibalardid I guess not, this is still a hobby project. If some people volunteer for contribution I might make I to an organization project. But atm my work on albert and the releases are heavily dependend on my studies.

Regards

drgibbon · 2015-07-16T23:43:19Z

@ManuelSchneid3r I wonder if using an established search library might make things easier? For example, Recoll (a nice QT desktop search program) uses the xapian backend and it's extremely fast and efficient, even with very large amounts of files.

ManuelSchneid3r · 2015-10-06T23:18:10Z

Since using selective indexing this should not be an issue anymore. Please reopen if im wrong.

ManuelSchneid3r · 2015-10-06T23:20:11Z

@drgibbon Yes xapian is a nice idea. It will somewhen get a dedicated plugin, but there are tons of other things to do.

baltazarortiz · 2016-04-07T03:12:14Z

Still an issue, unfortunately - uses about 1.5 GB of memory to index my home folder. Arguably worth the fast search speed versus other launchers I've tried, though something like recoll's solution would be interesting. I'll have to look into how that works once I learn a bit more about the Albert code.

ManuelSchneid3r · 2016-04-07T15:20:03Z

Recolls backend is xapian, which has been discussed. The scope is an other, which is not less relevant though. Xapian is a software system containung lexers, indexers and searches too with addidional scientific features like stemming and sophisticated search algorithms like BM25 etc. There are reason why a (future) xapian extension (XE) and the file extension (FE) will never be merged:

FE uses util::offlineIndex. util::offlineIndex supports a static tokenizer and 2 searches, prefix and fuzzy, with their index counterparts, inverted and qgram index. Xapian has its own tokenizer, indizes and searches. Which btw means, that, afaik, xapian does not support error tolerant search atm.
FE supports usage counters, xapian does not (afaik)
However xapian has the advantage of document fulltext search. Which clearly separates its use case from the FE. FE is on file level for most mimetypes. Xapian is on document content level, which is fine though.

So they are different use cases for each, and there is definitvely the need for a xapian (or any other text analysis tool) integration. But acutally I have plenty of core stuff to do. I'd appreciate if you take part in development and integrate it, but please communicate it if you do so.

Back to topic: The memory problem is still the same, the isolation of non relevant mimetypes is just a temporary solution. The current architecture is completely in memory and (simplified) as follows:

Every item is stored* by its path*
Every item provides reference strings*
The offline index tokenizes* them by a static* regex *
The offline index builds an inverted index referencing the items
In case of fuzzy search the words of the inverted index are split into q-Grams which itself build an inverted index mapping* q-Grams to words.

Whereever you see stars ther is room for even naive optimizations.

A serious problem is the size of QString. There are some ideas that may or may not be trivial to implement:

Use radix/prefix trees
Use some special maps that do not store the key value (hashmaps)
Completely outsource indexing with e.g. databases like SQLite

But thats all highly practically and theoretically involved and takes a lot of time. Further getting this huge amount of files indexed is a high level requirement, my 20000 files do not even have an impact on memory.

@baltazarortiz how many files did you index? Another aspect is that albert does not require the complete 1.5 gb peranently. It was allocated while indexing and freed after it. The kernel will may get some of it back if it really needs it. Well virtual memory management is complicated.

baltazarortiz · 2016-04-07T17:17:10Z

I'm indexing ~800,000 files with fuzzy search on, so like I said before, I'm not expecting any program to be able to go through all of that without some amount of memory usage. It hasn't noticeably impacted system performance so my system must either not be needing more or is able to grab some of the freed memory like you said. I'm pretty new to software development (and especially open source work), but I've been wanting to learn more about QT, so this could be something fun to look into in what free time I have right now :)

On a side note, are the indexing information and other settings supposed to be stored anywhere? Every time I reboot, Albert loses any settings changes I've made appearance wise or to any of the plugins.

somas95 · 2016-06-08T13:14:26Z

Different plugins for diferent DE's backends would be nice: one for tracker (gnome) and one for Nepomuk (KDE)
Thanks for this wonderful program

ayoisaiah · 2017-01-28T17:49:38Z

Is it possible to add a feature where you could sort of ignore certain folders in the index no matter the directory they appear in? For example, I have several node_modules/ folders scattered around in my home directory with thousands of files usually. I would like to ignore them globally.

ManuelSchneid3r · 2017-01-28T18:32:38Z

Will come.

PaulBGD · 2017-03-12T23:34:17Z

Definitely need ignoring node_modules, those folders are huge!

ManuelSchneid3r · 2017-03-12T23:35:34Z

Can I let Albert ignore certain files/folders? Just for reference.

PaulBGD · 2017-03-12T23:36:15Z

@ManuelSchneid3r Oh wow, thanks! I figured this issue would be updated.

ManuelSchneid3r · 2017-04-15T23:46:27Z

Does v0.11 help to reduce the size? It does not touch the way things are stored. It just offers the user the opportunity to reduce the indexed files to those really needed. Unfortunately everything else would be a tradeoff between space and speed.

ManuelSchneid3r · 2017-05-14T17:24:08Z

@i am closing this issue. There will be no way to asymptotically reduce the space complexity, only factors. Maybe I am wrong, maybe Patricia-Trie or some other data structure may help. But they do not come for free as well. I even stopped in investigating how to improve the space requirements (takes too much time). Therefore the best way to reduce space is to set proper filters. There are MIME filters. A proper implementation of (global) name filters will come soon. For now there are .albertignore files.

I can not imagine that it is a realistic use case that anybody needs several tens or hundred of thousands of files at hand.

ManuelSchneid3r closed this as completed Oct 6, 2015

ManuelSchneid3r added bug Bug P3 Medium Fix, but can wait if there's more import stuff to do. labels Apr 7, 2016

ManuelSchneid3r reopened this Apr 7, 2016

ManuelSchneid3r mentioned this issue Jun 24, 2016

cpu 100% whole time albert is opened #203

Closed

ManuelSchneid3r modified the milestone: v0.8.12 Sep 30, 2016

ManuelSchneid3r self-assigned this Oct 5, 2016

idkCpp mentioned this issue Oct 18, 2016

High Ram Usage #269

Closed

ManuelSchneid3r added the C: Extension label Dec 1, 2016

ManuelSchneid3r modified the milestones: v0.8.12, v0.9, 0.9.1 Jan 4, 2017

Fuhrmann mentioned this issue Jan 27, 2017

Memory leak in file indexer #351

Closed

ManuelSchneid3r modified the milestones: 0.10, v0.11 Mar 19, 2017

ManuelSchneid3r mentioned this issue Mar 20, 2017

Files plugin re-enabled after restart, paths reset #405

Closed

ManuelSchneid3r modified the milestones: v0.11, v0.12 Apr 15, 2017

ManuelSchneid3r closed this as completed May 14, 2017

This was referenced Sep 28, 2017

QML box flicker #516

Closed

Widget Box model: problems on widget rising #522

Closed

gpborges mentioned this issue Jan 7, 2018

Can't launch Albert built from source on RHEL 7.4 #597

Closed

rsyring mentioned this issue Jan 27, 2019

Better install UX for Arch #770

Closed

rik-shaw mentioned this issue Jun 4, 2019

files extension memory problem #812

Closed

idkCpp mentioned this issue Jan 23, 2020

Albert consume <10gb ram left idle #843

Closed

dreamcat4 mentioned this issue Jul 7, 2022

question: Is the Albert SVG icon in fact compatible with QT5 specs? #1031

Closed

ManuelSchneid3r mentioned this issue Mar 28, 2023

Idea: Have albert as a snap #646

Closed

nsirolli mentioned this issue May 17, 2023

Albert not opening files #1235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing a large number of files increases Albert's memory usage drastically #1

Indexing a large number of files increases Albert's memory usage drastically #1

hotice commented Jan 19, 2015

ManuelSchneid3r commented Jan 19, 2015

hotice commented Jan 19, 2015

anibalardid commented Feb 13, 2015

simrc commented May 2, 2015

ManuelSchneid3r commented May 2, 2015

drgibbon commented Jul 16, 2015

ManuelSchneid3r commented Oct 6, 2015

ManuelSchneid3r commented Oct 6, 2015

baltazarortiz commented Apr 7, 2016

ManuelSchneid3r commented Apr 7, 2016

baltazarortiz commented Apr 7, 2016

somas95 commented Jun 8, 2016

ayoisaiah commented Jan 28, 2017

ManuelSchneid3r commented Jan 28, 2017

PaulBGD commented Mar 12, 2017

ManuelSchneid3r commented Mar 12, 2017 •

edited

PaulBGD commented Mar 12, 2017

ManuelSchneid3r commented Apr 15, 2017

ManuelSchneid3r commented May 14, 2017 •

edited

Indexing a large number of files increases Albert's memory usage drastically #1

Indexing a large number of files increases Albert's memory usage drastically #1

Comments

hotice commented Jan 19, 2015

ManuelSchneid3r commented Jan 19, 2015

hotice commented Jan 19, 2015

anibalardid commented Feb 13, 2015

simrc commented May 2, 2015

ManuelSchneid3r commented May 2, 2015

drgibbon commented Jul 16, 2015

ManuelSchneid3r commented Oct 6, 2015

ManuelSchneid3r commented Oct 6, 2015

baltazarortiz commented Apr 7, 2016

ManuelSchneid3r commented Apr 7, 2016

baltazarortiz commented Apr 7, 2016

somas95 commented Jun 8, 2016

ayoisaiah commented Jan 28, 2017

ManuelSchneid3r commented Jan 28, 2017

PaulBGD commented Mar 12, 2017

ManuelSchneid3r commented Mar 12, 2017 • edited

PaulBGD commented Mar 12, 2017

ManuelSchneid3r commented Apr 15, 2017

ManuelSchneid3r commented May 14, 2017 • edited

ManuelSchneid3r commented Mar 12, 2017 •

edited

ManuelSchneid3r commented May 14, 2017 •

edited