Skip to content

MatthewDarling/BookSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

BookSearch

This is a small Clojure(script) + PostgreSQL application, which, on its surface, is a user interface that allows for searching and content retrieval of large texts. The purpose and reason-to-be for this application is twofold:

Purpose 1. Vetting Performance & Functionality of PostgreSQL Full-Text Indexing Techniques

Recently, we have been working on two (2) extensions for PostgreSQL, both of which are focussed on PGSQL's text search functionality, performance and limitations.

1A. pg_ts_semantic_headline

The PG_SEMANTIC_HEADLINE extension provides PGSQL with the ability to highlight multi-word phrases and dratically improves search result highlighting of multi-word phrases and boolean expressions. Likewise, the extension offers a pattern for pre-realizing lookup vectors and content retrieval arrays to a DB table, that purports to outperform the built-in PGSQL functions by a time factor of 5x 20x faster.

1B. pg_large_text_search

PostgreSQL's Text Search functionality has a number of limitation (read: hard limits) on the size and scope of text it is able to properly index. The PG_LARGE_TEXT_SEARCH extension captures these limitations, and given a large text, will divide that text into n fragments, where each fragment has been proven to conform to the PGSQL limitations. In indexing a large text file, one would save n fragments in a lookup table, each with a valid and conformant TSVector lookup column.

Purpose 2. Greenfield Clojure/Clojurescript Development

We wanted to create an opportunity to explore modern tools for Clojurescript, and more specifically novel approaches to React.JS, effects and hooks. The goal is to broaden our understanding and patterning of frontend state management without component lifecycle methods, and take advantage of next-generation tooling in an otherwise stable Clojure ecosystem.

Technologies Used

Frontend

UI in Pictures

image

fast TS_FAST_HEADLINE builtin TS_HEADLINE
image image
image image
image image
image image
image image

Setup

Prerequisites

Step 1: Make PGSQL extensions from submodules

In order to build to latest version of the PGSQL Extensions we are using, where each is linked as a git submodule under the project root, we will execute the following command:

cd BookSearch
make

The make command will install the following PostgreSQL extensions:

Step 2: Create the Database

Setup requires an install of PostgreSQL 14+, and given a connection, will create its own database via flyway. Providing your PGSQL username below:

make create_db DB_USER=<<USER NAME>>

This will create a ROLE for the application to connect to Postgres with, and make that ROLE the OWNER of the booksearch database. With the new booksearch database, we install our extensions (ie. CREATE EXTENSTION), and then run flyway migrate to build our table schema from a sequential series of migrations found in the sql/booksearch folder.

Step 3: Fill the Database with Content

The building of search indices is done via TRIGGER in the table schema; that is, all we need to do to index content, is to add texts to the files table. One could easily do this by inserting into files, like so:

INSERT INTO files
(filename, author, title, content)
('TwoCities.txt', 'Charles Dickens', 'A tale of Two Cities', 'It was the best of times, it was the worst of times...')

That said, we also have filled public_domain_texts with a selection of the 50 most dowloaded texts from Project Gutenberg (https://www.gutenberg.org/browse/scores/top), and they can be imported using:

make load_files DB_USER=booksearch DB_PASS=bookitysearch

Careful! This script can take a few minutes to run. Grab a drink and take a breath.

Step 4: Configure the Server

Create a directory ~/.config/booksearch, and then the file ~/.config/booksearch/db.edn with contents:

{:dbtype "postgres"
 :host #profile {:dev "localhost"
                 :prod "example.us-east-2.rds.amazonaws.com"}
 :port #long #or [#env "db_port" 5432]
 :dbname #or [#env "db_name" "booksearch"]
 :user #or [#env "db_user" "booksearch"]
 :password #or [#env "db_password" "bookitysearch"]}

By default, the server will load the :dev profile and connect to your localhost database. Edit src/app/server/handlers.clj to change the profile to be loaded to connect to RDS from your local.

For a full AWS deployment, the config is sufficiently different as to be hosted in another repository.

Step 5: Start the Server

From the BookSearch directory, run the following to install backend dependancies:

clj -M:repl

To start the application REPL in the core server namespace. Once loaded, you will see a app.server=> prompt. In that prompt, type:

(-main)

to open the server to requests. This should produce:

app.server=> (-main)
server running in port 3000
#object[org.eclipse.jetty.server.Server 0x4678320a "Server@4678320a{STARTED}[11.0.18,sto=0]"]

Step 6: Start the Client Application

To start the client application for development and hot-reloading of code, use:

yarn      # install NPM deps
yarn dev  # run dev build in watch mode with CLJS REPL

and this should produce something like:

shadow-cljs - HTTP server available at http://localhost:8080
shadow-cljs - server version: 2.25.8 running at http://localhost:9630
shadow-cljs - nREPL server started on port 50868
shadow-cljs - watching build :app
[:app] Configuring build.
[:app] Compiling ...
[:app] Build completed. (845 files, 0 compiled, 0 warnings, 4.40s)

Visit http://localhost:8080 to see the application running.

Production

yarn release # build production bundle

Start a Clojure REPL for the server

clj -M:repl

Manage the server

(-main) ;; to start the server
(reset) ;; to update the server with new changes

About

Application for viewing and searching inside books

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors