This is a small Clojure(script) + PostgreSQL application, which, on its surface, is a user interface that allows for searching and content retrieval of large texts. The purpose and reason-to-be for this application is twofold:
Recently, we have been working on two (2) extensions for PostgreSQL, both of which are focussed on PGSQL's text search functionality, performance and limitations.
The PG_SEMANTIC_HEADLINE extension provides PGSQL with the ability to highlight multi-word phrases and dratically improves search result highlighting of multi-word phrases and boolean expressions. Likewise, the extension offers a pattern for pre-realizing lookup vectors and content retrieval arrays to a DB table, that purports to outperform the built-in PGSQL functions by a time factor of 5x 20x faster.
PostgreSQL's Text Search functionality has a number of limitation (read: hard limits) on the size and scope of text it is able to properly index. The PG_LARGE_TEXT_SEARCH extension captures these limitations, and given a large text, will divide that text into n fragments, where each fragment has been proven to conform to the PGSQL limitations. In indexing a large text file, one would save n fragments in a lookup table, each with a valid and conformant TSVector lookup column.
We wanted to create an opportunity to explore modern tools for Clojurescript, and more specifically novel approaches to React.JS, effects and hooks. The goal is to broaden our understanding and patterning of frontend state management without component lifecycle methods, and take advantage of next-generation tooling in an otherwise stable Clojure ecosystem.
Frontend
UiX- https://github.com/pitch-io/uix - Introduces React Effects and Hooks instead of React Lifecycle methods; doea away withhiccupto greatly improve clinet-side rendering speedcljs-ajax- https://github.com/JulianBirch/cljs-ajax - Preferred tocljs-httpfor its overt declaration ofon-successandon-errorhandlers, and its elimination of clojure.core.async/go.semantic-ui-react- https://react.semantic-ui.com/ - Brilliant semantic classing structure, quite attractive, and has a name befitting of this project.shadow-cljs- https://github.com/thheller/shadow-cljs - compile your ClojureScript code with a focus on simplicity and ease of use. Backend- PostgreSQL - https://www.postgresql.org/ - open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance... except when it comes to the performance of full-text search... :)
- Flyway - https://flywaydb.org/ - Database Schema Management and Automation
- HoneySQL - https://github.com/seancorfield/honeysql - Version 2!
- Next.JDBC - https://github.com/seancorfield/next-jdbc
- Reitit Router - https://github.com/metosin/reitit
- Babashka - https://github.com/babashka/babashka
| fast TS_FAST_HEADLINE | builtin TS_HEADLINE |
|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
- PostgreSQL 14+
- Flyway : https://www.red-gate.com/products/flyway/community/download/
- make : O/S-dependant CLI tool
- yarn : https://yarnpkg.com/
- Babashka : https://github.com/babashka/babashka
In order to build to latest version of the PGSQL Extensions we are using, where each is linked as a git submodule under the project root, we will execute the following command:
cd BookSearch
make
The make command will install the following PostgreSQL extensions:
pg_large_text_search- https://github.com/thevermeer/pg_large_text_search - tools for dividing text into fragments that conform to TSVector limitationspg_ts_semantic_headline- https://github.com/thevermeer/pg_ts_semantic_headline - tools for improve ts_headline functionality and performing content retrieval 5x-20x than built-ints_headlinefunction.pg_tap- https://pgtap.org/ - Pure SQL Test automation framework.
Setup requires an install of PostgreSQL 14+, and given a connection, will create its own database via flyway. Providing your PGSQL username below:
make create_db DB_USER=<<USER NAME>>This will create a ROLE for the application to connect to Postgres with, and make that ROLE the OWNER of the booksearch database. With the new booksearch database, we install our extensions (ie. CREATE EXTENSTION), and then run flyway migrate to build our table schema from a sequential series of migrations found in the sql/booksearch folder.
The building of search indices is done via TRIGGER in the table schema; that is, all we need to do to index content, is to add texts to the files table. One could easily do this by inserting into files, like so:
INSERT INTO files
(filename, author, title, content)
('TwoCities.txt', 'Charles Dickens', 'A tale of Two Cities', 'It was the best of times, it was the worst of times...')
That said, we also have filled public_domain_texts with a selection of the 50 most dowloaded texts from Project Gutenberg (https://www.gutenberg.org/browse/scores/top), and they can be imported using:
make load_files DB_USER=booksearch DB_PASS=bookitysearch
Careful! This script can take a few minutes to run. Grab a drink and take a breath.
Create a directory ~/.config/booksearch, and then the file ~/.config/booksearch/db.edn with contents:
{:dbtype "postgres"
:host #profile {:dev "localhost"
:prod "example.us-east-2.rds.amazonaws.com"}
:port #long #or [#env "db_port" 5432]
:dbname #or [#env "db_name" "booksearch"]
:user #or [#env "db_user" "booksearch"]
:password #or [#env "db_password" "bookitysearch"]}By default, the server will load the :dev profile and connect to your localhost database. Edit src/app/server/handlers.clj to change the profile to be loaded to connect to RDS from your local.
For a full AWS deployment, the config is sufficiently different as to be hosted in another repository.
From the BookSearch directory, run the following to install backend dependancies:
clj -M:repl
To start the application REPL in the core server namespace. Once loaded, you will see a app.server=> prompt. In that prompt, type:
(-main)
to open the server to requests. This should produce:
app.server=> (-main)
server running in port 3000
#object[org.eclipse.jetty.server.Server 0x4678320a "Server@4678320a{STARTED}[11.0.18,sto=0]"]
To start the client application for development and hot-reloading of code, use:
yarn # install NPM deps
yarn dev # run dev build in watch mode with CLJS REPLand this should produce something like:
shadow-cljs - HTTP server available at http://localhost:8080
shadow-cljs - server version: 2.25.8 running at http://localhost:9630
shadow-cljs - nREPL server started on port 50868
shadow-cljs - watching build :app
[:app] Configuring build.
[:app] Compiling ...
[:app] Build completed. (845 files, 0 compiled, 0 warnings, 4.40s)
Visit http://localhost:8080 to see the application running.
yarn release # build production bundleclj -M:repl
(-main) ;; to start the server
(reset) ;; to update the server with new changes










