Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add index to speedup incremental deduplication
The deduplication script says that we need that index when using the -i option: $ ./manage.py deduplicate --help [...] -i SINCE, --incremental-scan=SINCE attempts deduplication using an incremental table scan with 1 query filtering sentences added between now and date `d` in `yyyy-mm-dd` format or as a time delta `{n}y {n}m {n}d {n}h {n}min {n}s ago` , then a query per row to find duplicates. DO NOT USE THIS WITHOUT A (text, lang) INDEX. I tried to deduplicate 325 sentences after running 'RESET QUERY CACHE;'. Without the index, in took about 8m40s. With the index, about 1s. Refs #1722.
- Loading branch information