# Natural Language Indexing in PostgreSQL

This notebook demonstrates how to build an inverted index optimized for natural language search. We will use `tsvector` and `tsquery` for advanced text search features such as stemming, stop-word removal, and ranking.

---
## Setup

Load the SQL extension and connect to the database.

In [2]:
%load_ext sql
%sql postgresql://fahad:secret@localhost:5432/people

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


---
## Step 1: Table Creation

Create a table `docs_nl` with a `tsvector` column for natural language indexing.

In [3]:
%%sql
DROP TABLE IF EXISTS docs_nl;
CREATE TABLE docs_nl (
    id SERIAL PRIMARY KEY,
    doc TEXT,
    tsv tsvector
);

INSERT INTO docs_nl (doc) VALUES
('PostgreSQL can perform full-text search with natural language processing'),
('GIN indexes help accelerate text search queries'),
('We can rank results using ts_rank and ts_rank_cd functions');

 * postgresql://fahad:***@localhost:5432/people
Done.
Done.
3 rows affected.


[]

---
## Step 2: Populate the `tsvector` Column

Use `to_tsvector()` to preprocess text for indexing.

In [4]:
%%sql
UPDATE docs_nl SET tsv = to_tsvector('english', doc);

 * postgresql://fahad:***@localhost:5432/people
3 rows affected.


[]

---
## Step 3: Creating a GIN Index for Natural Language Search

In [5]:
%%sql
CREATE INDEX idx_docs_nl_tsv ON docs_nl USING GIN(tsv);

 * postgresql://fahad:***@localhost:5432/people
Done.


[]

---
## Step 4: Performing a Search

Search for documents containing multiple natural language terms using `to_tsquery` and `@@`.

In [6]:
%%sql
SELECT id, doc
FROM docs_nl
WHERE tsv @@ to_tsquery('english', 'search & text');

 * postgresql://fahad:***@localhost:5432/people
2 rows affected.


id,doc
1,PostgreSQL can perform full-text search with natural language processing
2,GIN indexes help accelerate text search queries


---
## Step 5: Ranking Results

Use `ts_rank` to rank documents by relevance.

In [7]:
%%sql
SELECT id, doc, ts_rank(tsv, to_tsquery('english', 'search & text')) AS rank
FROM docs_nl
WHERE tsv @@ to_tsquery('english', 'search & text')
ORDER BY rank DESC;

 * postgresql://fahad:***@localhost:5432/people
2 rows affected.


id,doc,rank
1,PostgreSQL can perform full-text search with natural language processing,0.09910322
2,GIN indexes help accelerate text search queries,0.09910322


---
## Conclusion

- `tsvector` + `GIN` provides a robust natural language search capability.
- Stop words are removed, and stemming ensures better matches.
- Ranking functions allow ordering results by relevance, making the search production-ready.