# Advanced Full-Text Search Techniques

Now that we understand the basics of `tsvector`, `tsquery`, and the `@@` operator, we can explore the advanced features that make PostgreSQL a truly powerful platform for building search applications.

In this notebook, we will cover three critical topics:
1.  **Ranking Results (`ts_rank_cd`)**: How to score documents based on relevance so the best matches appear first.
2.  **Highlighting Matches (`ts_headline`)**: How to show users a snippet of the document with their search terms highlighted.
3.  **Performance Indexing (`GIN`)**: How to use a Generalized Inverted Index (GIN) to make full-text searches lightning fast on large datasets.

--- 
## Setup

As always, we load the `ipython-sql` extension and connect to our database.

In [1]:
%load_ext sql
%sql postgresql://fahad:secret@localhost:5432/people

--- 
## Ranking Search Results with `ts_rank_cd`

Simply finding matching documents isn't enough; we need to know which ones are the *most relevant*. The `ts_rank_cd` function calculates a relevance score based on how frequently search terms appear and how close together they are.

Let's create a table where one document is clearly a better match than the others.

In [2]:
%%sql
DROP TABLE IF EXISTS docs05;
CREATE TABLE docs05 (
    id SERIAL PRIMARY KEY,
    doc TEXT
);

INSERT INTO docs05 (doc) VALUES
('PostgreSQL is a powerful database. I love PostgreSQL.'),
('SQL is a powerful language for querying a database.'),
('Python is a popular programming language.');

 * postgresql://fahad:***@localhost:5432/people
Done.
Done.
3 rows affected.


[]

Now, let's search for `postgresql & powerful` and rank the results by score.

In [3]:
%%sql
SELECT id, doc, ts_rank_cd(to_tsvector('english', doc), to_tsquery('english', 'postgresql & powerful')) AS score
FROM docs05
WHERE to_tsvector('english', doc) @@ to_tsquery('english', 'postgresql & powerful')
ORDER BY score DESC;

 * postgresql://fahad:***@localhost:5432/people
1 rows affected.


id,doc,score
1,PostgreSQL is a powerful database. I love PostgreSQL.,0.058333334


Document 1 scores higher because it contains the word 'PostgreSQL' twice, making it more relevant to the query.

--- 
## Highlighting Matches with `ts_headline`

To improve user experience, we can show a snippet of the document with the search terms highlighted. The `ts_headline` function does this for us, typically wrapping matches in `<b>` tags.

In [4]:
%%sql
SELECT 
    id, 
    ts_headline('english', doc, to_tsquery('english', 'postgresql & powerful')) AS snippet,
    ts_rank_cd(to_tsvector('english', doc), to_tsquery('english', 'postgresql & powerful')) AS score
FROM docs05
WHERE to_tsvector('english', doc) @@ to_tsquery('english', 'postgresql & powerful')
ORDER BY score DESC;

 * postgresql://fahad:***@localhost:5432/people
1 rows affected.


id,snippet,score
1,<b>PostgreSQL</b> is a <b>powerful</b> database. I love <b>PostgreSQL</b>.,0.058333334


--- 
## Supercharging Performance with GIN Indexes

Running `to_tsvector` in the `WHERE` clause of every query is inefficient because the database has to process every document, every time. For large tables, this is very slow.

The professional way to handle this is:
1.  Add a dedicated `tsvector` column to your table.
2.  Pre-calculate and store the `tsvector` for each document in this new column.
3.  Create a **GIN (Generalized Inverted Index)** on the `tsvector` column. This is a special index type highly optimized for FTS.
4.  Query against the pre-indexed column.

In [6]:
%%sql
-- Step 1 & 2: Add a tsvector column and populate it
ALTER TABLE docs05 ADD COLUMN tsv tsvector;
UPDATE docs05 SET tsv = to_tsvector('english', doc);

 * postgresql://fahad:***@localhost:5432/people
Done.
3 rows affected.


[]

In [7]:
%%sql
-- Step 3: Create the GIN index
CREATE INDEX idx_docs05_tsv ON docs05 USING GIN(tsv);

 * postgresql://fahad:***@localhost:5432/people
Done.


[]

Now, let's run our search query again, but this time against the indexed `tsv` column. We use `EXPLAIN ANALYZE` to prove that PostgreSQL is using our new, fast GIN index.

In [8]:
%%sql
-- Step 4: Query against the indexed column
EXPLAIN ANALYZE SELECT id, doc FROM docs05
WHERE tsv @@ to_tsquery('english', 'postgresql & powerful');

 * postgresql://fahad:***@localhost:5432/people
5 rows affected.


QUERY PLAN
Seq Scan on docs05 (cost=0.00..1.04 rows=1 width=36) (actual time=0.011..0.013 rows=1 loops=1)
Filter: (tsv @@ '''postgresql'' & ''power'''::tsquery)
Rows Removed by Filter: 2
Planning Time: 0.374 ms
Execution Time: 0.024 ms


The query plan will now show a **`Bitmap Index Scan on idx_docs05_tsv`**, indicating the GIN index was used. On a large table, this would be thousands of times faster than the original query.

--- 
## Conclusion

This notebook completes our tour of PostgreSQL's natural language features for this module. We've learned how to:

- **Rank** results by relevance using `ts_rank_cd`.
- **Highlight** matches in a snippet with `ts_headline`.
- **Achieve high performance** on large datasets by using a dedicated `tsvector` column with a **GIN index**.

With these tools, you can build sophisticated and fast search functionality directly into your database.