Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
genivia-inc committed Dec 10, 2023
1 parent 8c40064 commit 1a00d2a
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions README.md
Expand Up @@ -251,7 +251,7 @@ Future enhancements
Q&A
---

### How does it work?
### Q: How does it work?

Indexing adds a hidden index file `._UG#_Store` to each directory indexed.
Files indexed are scanned (never changed!) by ugrep-indexer to generate index
Expand Down Expand Up @@ -389,7 +389,7 @@ in an indexed file, whereas a standard Bloom filter might have a false positive
match. Furthermore, the bit addressing used to index the hashes table enables
efficient table compression.

### What is indexing accuracy?
### Q: What is indexing accuracy?

Indexing is a form of lossy compression. The higher the indexing accuracy, the
faster ugrep search performance should be by skipping more files that do not
Expand All @@ -407,12 +407,12 @@ many files are not skipped from searching due to indexing noise (i.e. false
positives), then a higher accuracy helps to increase the effectiveness of
indexing, which may speed up searching.

### What about UTF-16 and UTF-32 files?
### Q: What about UTF-16 and UTF-32 files?

UTF-16 and UTF-32 files are indexed too. The indexer treats them as UTF-8
after internally converting them.

### Why bother indexing archives and compressed files?
### Q: Why bother indexing archives and compressed files?

Archiving (zip/tar/pax/cpio) and compressing files saves disk space. On the
other hand, searching archives and compressed files is slower than searching
Expand All @@ -424,7 +424,7 @@ the addition of index file entries for archives and compressed files. Note
that when archives and compressed files contain binaries, option `-I` ignores
these archived/compressed binaries.

### Why is the start-up time of ugrep higher with option --index?
### Q: Why is the start-up time of ugrep higher with option --index?

The start-up overhead of `ugrep --index` to construct indexing hash tables
depends on the regex patterns. If a regex pattern is very "permissive", i.e.
Expand All @@ -435,7 +435,7 @@ Unicode character classes and wildcards are used, especially with the unlimited
`ugrep --index -r PATTERN /dev/null --stats=vm` to search /dev/null with your
PATTERN.

### Why are index files not compressed?
### Q: Why are index files not compressed?

Index files should be very dense in information content and that is the case
with this new indexing algorithm for ugrep that I designed and implemented.
Expand Down

0 comments on commit 1a00d2a

Please sign in to comment.