From 1a00d2a51ace9f7977cf5297f5e9e770b7d829d2 Mon Sep 17 00:00:00 2001 From: Robert van Engelen Date: Sat, 9 Dec 2023 22:12:56 -0500 Subject: [PATCH] Update README.md --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 1612ec7..730e5f4 100644 --- a/README.md +++ b/README.md @@ -251,7 +251,7 @@ Future enhancements Q&A --- -### How does it work? +### Q: How does it work? Indexing adds a hidden index file `._UG#_Store` to each directory indexed. Files indexed are scanned (never changed!) by ugrep-indexer to generate index @@ -389,7 +389,7 @@ in an indexed file, whereas a standard Bloom filter might have a false positive match. Furthermore, the bit addressing used to index the hashes table enables efficient table compression. -### What is indexing accuracy? +### Q: What is indexing accuracy? Indexing is a form of lossy compression. The higher the indexing accuracy, the faster ugrep search performance should be by skipping more files that do not @@ -407,12 +407,12 @@ many files are not skipped from searching due to indexing noise (i.e. false positives), then a higher accuracy helps to increase the effectiveness of indexing, which may speed up searching. -### What about UTF-16 and UTF-32 files? +### Q: What about UTF-16 and UTF-32 files? UTF-16 and UTF-32 files are indexed too. The indexer treats them as UTF-8 after internally converting them. -### Why bother indexing archives and compressed files? +### Q: Why bother indexing archives and compressed files? Archiving (zip/tar/pax/cpio) and compressing files saves disk space. On the other hand, searching archives and compressed files is slower than searching @@ -424,7 +424,7 @@ the addition of index file entries for archives and compressed files. Note that when archives and compressed files contain binaries, option `-I` ignores these archived/compressed binaries. -### Why is the start-up time of ugrep higher with option --index? +### Q: Why is the start-up time of ugrep higher with option --index? The start-up overhead of `ugrep --index` to construct indexing hash tables depends on the regex patterns. If a regex pattern is very "permissive", i.e. @@ -435,7 +435,7 @@ Unicode character classes and wildcards are used, especially with the unlimited `ugrep --index -r PATTERN /dev/null --stats=vm` to search /dev/null with your PATTERN. -### Why are index files not compressed? +### Q: Why are index files not compressed? Index files should be very dense in information content and that is the case with this new indexing algorithm for ugrep that I designed and implemented.