Skip to content

VADOSWARE/fts-benchmark

Repository files navigation

Postgres Full Text Search ("FTS") benchmark

This is a benchmark of Postgres FTS versus other solutions:

Prerequisites

To run the tests, please ensure you have the following installed on your machine:

Quickstart

To set up testing data and run the full benchmark with all FTS engines:

make # equivalent to `make setup run-all`

To run only a single benchmark (in this case, with Postgres FTS):

FTS_ENGINE=pg make setup run

(FTS_ENGINE = 'pg' | 'meilisearch' | 'typesense' | 'opensearch' | 'sqlite-disk')

To only install dependencies:

make setup

Dataset

The benchmark in this repository uses the a public domain movie dataset:

  • On Kaggle

  • On HuggingFace, in particular the following columns:

  • homepage

  • title

  • original_title

  • overview

  • production_companies

  • spoken_languages

  • tagline

Data is processed from CSV into newline delimited JSON (see movies.ndjson.json.gz).

How the benchmark works

Environment variables

ENV Variable Default Example Description
FTS_ENGINE N/A pg The FTS engine to use
DEBUG N/A true Enable debug mode
TIMING N/A true Enable timing information display
DATA_MOVIES_CSV_ZIPPED_PATH ./movies.csv.gz /path/to/movies.csv.gz Path to the movie data set
DATA_MOVIES_CSV_PATH ./movies.csv /path/to/movies.csv Path to the movie data set, uncompressed
DATA_MOVIES_NDJSON_PATH ./movies.ndjson.json /path/to/movies.ndjson.json Path to the newline delimited JSON data for movies
SEARCH_PHRASES_NDJSON_PATH ./search-phrases.ndjson.json /path/to/search-phrases.ndjson.json Path to search phrases to use as newline delimited JSON

Some variables are used per-run and are normally set by more ergonomic top-level Makefile targets:

ENV Variable Default Example Description
INPUT_CSV_PATH $(DATA_MOVIES_CSV_ZIPPED_PATH) /path/to/movies2.csv.gz Path to compressed CSV (normally unzipped by Makefile target)
OP N/A ingest Operation to perform
SQLITE_DISK_DB_PATH ./fts-sqlite-disk-db.sqlite :memory: SQLite DB path
PG_URL postgres://$(PG_USER):$(PG_PASSWORD)@$(PG_HOST):$(PG_PORT)/$(PG_DB) postgres://localhost Postgres DB path
TYPESENSE_HOST localhost typesense.domain.tld Hostname for Typesense server
TYPESENSE_PORT 8108 8109 Port for Typesense server
TYPESENSE_API_KEY badtypesenseapikey tttttttttttttttt API key for Typesense server
MEILI_HOST localhost meili.domain.tld Hostname for MeiliSearch server
MEILI_PORT 7700 7701 Port for MeiliSearch
MEILI_URL http://$(MEILI_HOST):$(MEILI_PORT) https://meili.domain.tld Full URL to use when accessing Meilisearch
MEILI_API_KEY $(MEILI_MASTER_KEY) xxxxxxxxxxxxxxxxxxx MeiliSearch API key
OPENSEARCH_PROTOCOL http https Protocol to use when accessing OpenSearch service
OPENSEARCH_HOST localhost opensearch.domain.tld Host for OpenSearch server
OPENSEARCH_PORT 9200 9201 Port for OpenSearch server
OPENSEARCH_AUTH_USERNAME admin admin Admin username for OpenSearch server
OPENSEARCH_AUTH_PASSWORD admin hunter2 Admin password for OpenSearch server

See Makefile for the code and other variables that might be excluded here.

Running a single benchmark

A single benchmark can be run with the following command:

FTS_ENGINE=<engine> make setup run

Options for FTS_ENGINE:

  • pg
  • meilisearch
  • typesense
  • sqlite.

To run the ingest & query tests with Postgres:

TIMING=true FTS_ENGINE=pg make run

If an error occurs during set up, consider tearing down the existing FTS_ENGINE:

FTS_ENGINE=pg make engine-stop

Setup/Teardown of a single backing service

To control the setup/teardown of a single backing service, use the engine-start and engine-stop top level targets.

For example, if you wanted to start MeiliSearch and poke around on the instance:

FTS_ENGINE=meilisearch make engine-start

After this command returns, you should have an instance of meilisearch running with a stable name (fts-$(FTS_ENGINE)):

$ docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED         STATUS         PORTS                                            NAMES
4d7c0efdf5cf   getmeili/meilisearch:v0.28.1   "tini -- /bin/meilis…"   7 seconds ago   Up 6 seconds   127.0.0.1:7700->7700/tcp                         fts-meili

To stop the service:

FTS_ENGINE=meilisearch make engine-stop

Ingesting documents

Ingesting data into each separate solution is different, and code to do each can be found under src/driver/<engine>.js. For example, the src/driver/pg.mjs contains the code to enable document ingestion to Postgres.

Performing queries

Queries to be performed in the test are specified via YAML and stored in search-phrases.ndjson.json.

This file is read by the automation and related scripts.

Clearing data

To clear all the data inbetween runs:

sudo make clean # sudo is likely needed to clear docker container data folders

About

FTS benchmark comparing Postgres, Typesense, Meilisearch, OpenSearch and SQLite

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published