# Introduction to Full-Text Search (FTS)

In the last notebook, we built an inverted index by hand and saw its limitations. Now, we'll explore PostgreSQL's built-in **Full-Text Search (FTS)** engine, which solves these problems elegantly.

FTS provides the functions and data types needed to parse text documents, handle language-specific rules (like stop words and stemming), and perform complex queries efficiently.

We will cover the three core components of FTS:
1.  **`tsvector`**: A data type for storing optimized, pre-processed documents.
2.  **`tsquery`**: A data type for representing search queries.
3.  **`@@` Operator**: The match operator that checks if a `tsquery` finds a result in a `tsvector`.

--- 
## Setup

As always, we load the `ipython-sql` extension and connect to our database.

In [1]:
%load_ext sql
%sql postgresql://fahad:secret@localhost:5432/people

--- 
## The `tsvector`: Pre-processing Documents

The `to_tsvector()` function is the heart of FTS. It takes raw text and a language configuration (e.g., `'english'`) and converts it into a `tsvector`.

During this conversion, it automatically:
- **Tokenizes** the text into words.
- **Normalizes** each word into a "lexeme" (e.g., `running` and `ran` both become `run`). This is called **stemming**.
- **Removes stop words** (like 'a', 'is', 'the') based on the language's dictionary.

Let's see it in action.

In [2]:
%%sql
SELECT to_tsvector('english', 'PostgreSQL is a powerful and amazing relational database for searching');

 * postgresql://fahad:***@localhost:5432/people
1 rows affected.


to_tsvector
'amaz':6 'databas':8 'postgresql':1 'power':4 'relat':7 'search':10


Notice the output. The words 'is', 'a', and 'for' are gone. 'searching' has been stemmed to `'search'`, and `amazing` is now `'amaz'`. The numbers next to each lexeme indicate its position in the original text, which is used for ranking.

--- 
## The `tsquery`: Creating a Search Query

The `to_tsquery()` function converts a user's search string into a `tsquery` type. It also performs stemming and removes stop words. It can understand boolean operators like `&` (AND), `|` (OR), and `!` (NOT).

In [3]:
%%sql
-- A search for 'powerful databases'
SELECT to_tsquery('english', 'powerful & databases');

 * postgresql://fahad:***@localhost:5432/people
1 rows affected.


to_tsquery
'power' & 'databas'


Notice that `databases` was automatically stemmed to `'databas'`, matching the lexeme in our `tsvector`.

--- 
## Searching with the `@@` Operator

The `@@` operator returns `true` if the `tsquery` matches the `tsvector`. Let's create a table and perform a real search.

In [4]:
%%sql
DROP TABLE IF EXISTS docs04;
CREATE TABLE docs04 (
    id SERIAL PRIMARY KEY,
    doc TEXT
);

INSERT INTO docs04 (doc) VALUES
('PostgreSQL is a powerful open source relational database'),
('We can use SQL to query the database and retrieve data'),
('Full-text searching is a powerful feature of PostgreSQL');

 * postgresql://fahad:***@localhost:5432/people
Done.
Done.
3 rows affected.


[]

In [5]:
%%sql
-- Find documents that contain both 'search' AND 'powerful'
SELECT id, doc FROM docs04
WHERE to_tsvector('english', doc) @@ to_tsquery('english', 'searching & powerful');

 * postgresql://fahad:***@localhost:5432/people
1 rows affected.


id,doc
3,Full-text searching is a powerful feature of PostgreSQL


The query correctly found document 3 because it contains both lexemes (`search` and `power`), even though the original words were `'searching'` and `'powerful'`.

--- 
## Conclusion

In this notebook, we introduced the core components of PostgreSQL's Full-Text Search:

- **`to_tsvector()`** preprocesses text by tokenizing, stemming, and removing stop words.
- **`to_tsquery()`** creates a parsed query that understands boolean logic and also uses stemming.
- The **`@@`** operator efficiently matches a query against a processed document.

This built-in engine is far more capable and efficient than our manual inverted index. In the next and final notebook of this module, we will explore advanced FTS features like ranking results and using specialized indexes to make these searches lightning fast.