# Building and Loading Text Search in PostgreSQL

## OUTLINE
 1. [PostgreSQL Text storage](#PG_text)
 1. [Task at hand](#task)
 1. [Buiding our Text Document Retrieval DB](#build_it)
 1. [Loading Data](#load_it)
 1. [Executing Queries, Google-lite...very very lite](#search_me) 
 



--- 
<a id='PG_text' ></a>

## PostgreSQL Text Storage

This notebook documents the building of the `BookLines` table using the Information Retrieval (IR) based extension, _full text search_.


<a id='task' /> </a>

## Task at Hand

This lab walks through the process of creating full text search capability within PostgreSQL for integration into other analytical processes of lines for a book (with sub-books).


### Database of Unstructured Text Files 

As was used in the lab, we are going to use this collection of text files.
It is 4.3 megabytes of text and 31 thousand lines, sounds fun!

```BASH
$ ls /dsa/data/all_datasets/book/*
book/1chron.txt    book/acts.txt      book/isaiah.txt    book/nahum.txt
book/1corinth.txt  book/amos.txt      book/james.txt     book/nehemiah.txt
book/1john.txt     book/colossia.txt  book/jeremiah.txt  book/numbers.txt
book/1kings.txt    book/daniel.txt    book/job.txt       book/obadiah.txt
book/1peter.txt    book/deut.txt      book/joel.txt      book/philemon.txt
book/1samuel.txt   book/eccl.txt      book/john.txt      book/philipp.txt
book/1thess.txt    book/ephesian.txt  book/jonah.txt     book/proverbs.txt
book/1timothy.txt  book/esther.txt    book/joshua.txt    book/psalms.txt
book/2chron.txt    book/exodus.txt    book/jude.txt      book/rev.txt
book/2corinth.txt  book/ezekiel.txt   book/judges.txt    book/romans.txt
book/2john.txt     book/ezra.txt      book/lament.txt    book/ruth.txt
book/2kings.txt    book/galatian.txt  book/levit.txt     book/song.txt
book/2peter.txt    book/genesis.txt   book/luke.txt      book/titus.txt
book/2samuel.txt   book/habakkuk.txt  book/malachi.txt   book/zech.txt
book/2thess.txt    book/haggai.txt    book/mark.txt      book/zeph.txt
book/2timothy.txt  book/hebrews.txt   book/matthew.txt
book/3john.txt     book/hosea.txt     book/micah.txt

$ du -skh /dsa/data/all_datasets/book
4.6M	/dsa/data/all_datasets/book
$ wc -l book/*  | tail -n1
  31258 total
```

### However, now we are going to index it line-by-line.

<span style="color:red">
**You will need create and load the database similarly to how you interacted with PostgreSQL in the Database and Analytics course.**
</span>

Remember a few key things:
 1. You will use your pawprint as your user name, and the password you will type in is your normal MU password.
 1. The database is: `dsa_student`
 1. The database host is: `pgsql.dsa.lan`
 1. The schema name is the same as your pawprint.


<a id='build_it' /> </a>

## Building a Text Retrieval Database

#### Examples of all the commands are available [here](../resources/PG_Build_Lines_Search.sql).

You will need to open the terminal, then connect to the database to build your schema tables.

<span style="background-color:yellow">For the commands below, replace the schema name `sebcq5` with your own pawprint.</span>

# in terminal:
#    psql -h pgsql.dsa.lan dsa_student

### Data repository within database.

```SQL
-------------------------
-- Basic Table 
-------------------------
CREATE TABLE dlfy6.BookLines(
        id SERIAL NOT NULL,
        name varchar(250) NOT NULL,
        line_no INT NOT NULL,
        line text NOT NULL
);

ALTER TABLE dlfy6.BookLines
ADD CONSTRAINT pk_BookLines PRIMARY KEY (id);
```

### A column that implements the vector model

```SQL
-------------------------
-- Separate Ts_Vector column
-------------------------
-- TS_Vector for GIN INDEX
ALTER TABLE dlfy6.BookLines
  ADD COLUMN line_tsv_gin tsvector;

UPDATE dlfy6.BookLines
SET line_tsv_gin = to_tsvector('pg_catalog.english', line);
```

### Another column that implements the vector model

```SQL
-- TS_Vector for GIST INDEX
ALTER TABLE dlfy6.BookLines
  ADD COLUMN line_tsv_gist tsvector;

UPDATE dlfy6.BookLines
SET line_tsv_gist = to_tsvector('pg_catalog.english', line);
```

### Step 4: Set up database triggers to parse all new content loaded into the vector models.

```SQL
--TRIGGER
CREATE TRIGGER tsv_gin_update 
	BEFORE INSERT OR UPDATE
	ON dlfy6.BookLines 
	FOR EACH ROW 
	EXECUTE PROCEDURE 
	tsvector_update_trigger(line_tsv_gin,'pg_catalog.english',line);

CREATE TRIGGER tsv_gist_update 
	BEFORE INSERT OR UPDATE
	ON dlfy6.BookLines 
	FOR EACH ROW 
    EXECUTE PROCEDURE
	tsvector_update_trigger(line_tsv_gist,'pg_catalog.english',line);

```

### Step 5:  Add a specialized indexing to the vector models.

```SQL
-------------------------
-- Create Indexes
-------------------------

-- Index on line (Trigram needed,to use Gin Index)
-- CREATE EXTENSION pg_trgm;  -- Done by DB Admin

CREATE INDEX BookLines_line
ON dlfy6.BookLines USING GIN(line gin_trgm_ops);

-- GIN INDEX on line_tsv_gin
CREATE INDEX BookLines_line_tsv_gin
ON dlfy6.BookLines USING GIN(line_tsv_gin);

-- GIST INDEX on line_tsv_gist
CREATE INDEX BookLines_line_tsv_gist
ON dlfy6.BookLines USING GIST(line_tsv_gist);


```

### Complete additional steps to build your IR backend

**<span style='background:yellow'>[See lab](../labs/FullText_PostgreSQL.ipynb)</span>**

---

### Result


Finally, take a look at the resulting table definition:

```SQL
dsa_student=# \dt sebcq5.booklines
          List of relations
 Schema |   Name    | Type  | Owner
--------+-----------+-------+--------
 sebcq5 | booklines | table | sebcq5
(1 row)

dsa_student=# \d sebcq5.booklines
                                       Table "sebcq5.booklines"
    Column     |          Type          | Collation | Nullable |                Default
---------------+------------------------+-----------+----------+---------------------------------------
 id            | integer                |           | not null | nextval('booklines_id_seq'::regclass)
 name          | character varying(250) |           | not null |
 line_no       | integer                |           | not null |
 line          | text                   |           | not null |
 line_tsv_gin  | tsvector               |           |          |
 line_tsv_gist | tsvector               |           |          |
Indexes:
    "pk_booklines" PRIMARY KEY, btree (id)
    "booklines_line" gin (line gin_trgm_ops)
    "booklines_line_tsv_gin" gin (line_tsv_gin)
    "booklines_line_tsv_gist" gist (line_tsv_gist)
Triggers:
    tsv_gin_update BEFORE INSERT OR UPDATE ON booklines FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('line_tsv_gin', 'pg_catalog.english', 'line')
    tsv_gist_update BEFORE INSERT OR UPDATE ON booklines FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('line_tsv_gist', 'pg_catalog.english', 'line')
    
```

<a id='load_it' /> </a>

## Loading Data

To load the data, we will use a python script with follow the basic crawling behavior

 1. For each file/folder in the specified starting folder:
 1. If it is a folder, recurse into folder and process contents
 1. If it is a file, read contents and load into database, one line at a time.

In [1]:
import getpass
# This collects a masked password from the user
mypasswd = getpass.getpass()

········


In [2]:
myuserid = "dlfy6" #change to your pawprint
dbname = "dsa_student"

In [6]:
import os
import psycopg2

try:
    conn = psycopg2.connect("host='pgsql.dsa.lan' port='5432' dbname='{}' user='{}' password='{}'".format(dbname,myuserid,mypasswd))
except:
    print("I am unable to connect to the database")

def loadFile(filename):
    '''
    Read file contents, load into database.
    
    Returns: The document ID that was created
    '''
    line_no = 1
    with conn, conn.cursor() as curs:
        with open(filename, 'r') as infile:
            for line in infile:
                line = line.rstrip('\n')
                ###############################
                # Review the Printout
                ###############################
                #print("Loading: {},{} = {}".format(filename,line_no,line))
                ###############################
                # When you are ready
                # Fill in the SQL variable
                # and Un-comment the curs.execute()
                ###############################
                SQL = "INSERT INTO {}.booklines(name,line_no,line)VALUES (%s,%s,%s) RETURNING id;".format(myuserid) 
                curs.execute(SQL,(filename,line_no,line))
                #row_id = curs.fetchone()[0]
                line_no += 1
    return line_no


#### Use the cell below to test your code edits for above.

##### After testing, when you are ready
 1. comment out the print statements 
 1. Un-comment the cursor execute
 1. Reload the edited cells
 1. Load the cell that defines processFolder
 1. Execute `processFolder()`

In [7]:
def processFolder(folder):
    '''
    Process a folder for files and subfolders
    '''
    
    print('Processing folder: ',folder)
    
    for root, dirs, files in os.walk(folder):
        
        print("root = ", root)
        
        # Process Files
        for file in files:
            if file.endswith(".txt"):
                filename = os.path.join(root, file)
                print('Processing File:',filename)
                document_id = 0
                # Comment out this line to watch the next cell walk the tree
                lines_loaded = loadFile(filename)
                print("Lines Loaded: {}".format(lines_loaded))
                
            elif file.endswith(".html"):
                print("HTML Files Not Handled Yet")
        # Recurse into subfolders
        for d in dirs:
            #print("recursing into ",d)
            processFolder(d)

In [8]:
###########################
# Launch the Parsing
###########################


processFolder('/dsa/data/all_datasets/book');

Processing folder:  /dsa/data/all_datasets/book
root =  /dsa/data/all_datasets/book
Processing File: /dsa/data/all_datasets/book/song.txt
Lines Loaded: 120
Processing File: /dsa/data/all_datasets/book/1chron.txt
Lines Loaded: 945
Processing File: /dsa/data/all_datasets/book/ruth.txt
Lines Loaded: 88
Processing File: /dsa/data/all_datasets/book/1corinth.txt
Lines Loaded: 440
Processing File: /dsa/data/all_datasets/book/titus.txt
Lines Loaded: 49
Processing File: /dsa/data/all_datasets/book/1john.txt
Lines Loaded: 108
Processing File: /dsa/data/all_datasets/book/1kings.txt
Lines Loaded: 819
Processing File: /dsa/data/all_datasets/book/1peter.txt
Lines Loaded: 107
Processing File: /dsa/data/all_datasets/book/1samuel.txt
Lines Loaded: 813
Processing File: /dsa/data/all_datasets/book/1thess.txt
Lines Loaded: 91
Processing File: /dsa/data/all_datasets/book/1timothy.txt
Lines Loaded: 116
Processing File: /dsa/data/all_datasets/book/2chron.txt
Lines Loaded: 825
Processing File: /dsa/data/all_d

##### Example, similar output for the above is available [here](../resources/PG_FTS_Lines_Load.txt).

### Check the Results

```SQL
dsa_student=# select count(*),sum(length(line)) from sebcq5.booklines;
 count |   sum
-------+---------
 31259 | 4315223
(1 row)                                   
```

#### 31K lines

#### Looking at a random line that was added:

```SQL
dsa_student=# \x 
Expanded display is on.
dsa_student=# select * from sebcq5.BookLines where id = 9352;
-[ RECORD 1 ]-+-------------------------------------------------------------------------------------------
id            | 9352
name          | /dsa/data/all_datasets/book/ephesian.txt
line_no       | 135
line          | 6:3: That it may be well with thee, and thou mayest live long on the earth.
line_tsv_gin  | '3':2 '6':1 'earth':17 'live':13 'long':14 'may':5 'mayest':12 'thee':9 'thou':11 'well':7
line_tsv_gist | '3':2 '6':1 'earth':17 'live':13 'long':14 'may':5 'mayest':12 'thee':9 'thou':11 'well':7
```

Notice that we have built a document vector that is stemmed and has removed common (stop) words.



<a id='search_me' /> </a>

## Executing Queries,
### Google-lite...very very lite

Recall, from the video lecture,
the database is now a collection of vectors. 

Now, to query the database we must convert our queries into vectors for matching.

For full documentation, you will want to consult the PostgreSQL documentation.
  * https://www.postgresql.org/docs/current/static/textsearch.html
  * https://www.postgresql.org/docs/current/static/textsearch-controls.html
  * https://www.postgresql.org/docs/current/static/textsearch-features.html

Below we show a few examples, which you can play with and adjust as you see fit.

<span style="color:red">**The following cells are for you to execute.**</span>

#### Basic connection with the DSA Readonly User

To prepare your DB to be read, you will need to grant the `dsa_ro_user` 
schema access and select privileges on your table.

```SQL
GRANT USAGE ON SCHEMA dlfy6 TO dsa_ro_user;
GRANT SELECT ON dlfy6.BookLines TO dsa_ro_user;
```

In [9]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_student

'Connected: dsa_ro_user@dsa_student'

#### A couple of query examples

NOTE:
```
%%sql
```
... allows multi-line SQL statements

NOTE:
Query terms can be joined with boolean operators, 
  * `|` is "or" 
  * `&` is "and"
 

**Activity 1:**
Select id, name, line_no, line, and cover density rank for the following search terms. Refer to the lab and documentation as needed. 
- love or hate, using to_tsquery()

In [10]:
%%sql

SELECT id, name,line_no,line,ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, to_tsquery('love | hate') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
10 rows affected.


id,name,line_no,line,rank
22075,/dsa/data/all_datasets/book/luke.txt,286,"6:32: For if ye love them which love you, what thank have ye? for sinners also love those that love them.",0.4
5879,/dsa/data/all_datasets/book/2samuel.txt,311,"13:15: Then Amnon hated her exceedingly; so that the hatred wherewith he hated her was greater than the love wherewith he had loved her. And Amnon said unto her, Arise, be gone.",0.4
19153,/dsa/data/all_datasets/book/john.txt,621,"13:34: A new commandment I give unto you, That ye love one another; as I have loved you, that ye also love one another.",0.3
1663,/dsa/data/all_datasets/book/1john.txt,26,"2:15: Love not the world, neither the things that are in the world. If any man love the world, the love of the Father is not in him.",0.3
14447,/dsa/data/all_datasets/book/hosea.txt,36,"3:1: Then said the LORD unto me, Go yet, love a woman beloved of her friend, yet an adulteress, according to the love of the LORD toward the children of Israel, who look to other gods, and love flagons of wine.",0.3
1711,/dsa/data/all_datasets/book/1john.txt,74,"4:10: Herein is love, not that we loved God, but that he loved us, and sent his Son to be the propitiation for our sins.",0.3
1719,/dsa/data/all_datasets/book/1john.txt,82,4:18: There is no fear in love; but perfect love casteth out fear: because fear hath torment. He that feareth is not made perfect in love.,0.3
1717,/dsa/data/all_datasets/book/1john.txt,80,"4:16: And we have known and believed the love that God hath to us. God is love; and he that dwelleth in love dwelleth in God, and God in him.",0.3
3204,/dsa/data/all_datasets/book/1samuel.txt,536,"20:17: And Jonathan caused David to swear again, because he loved him: for he loved him as he loved his own soul.",0.3
8603,/dsa/data/all_datasets/book/deut.txt,574,"21:15: If a man have two wives, one beloved, and another hated, and they have born him children, both the beloved and the hated; and if the firstborn son be hers that was hated:",0.3


**Activity 2:**
Select id, name, line_no, line, and cover density rank for the following search terms. Refer to the lab and documentation as needed. 
- love and hate, using to_tsquery()

In [11]:
%%sql

SELECT id, name,line_no,line,ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, to_tsquery('love & hate') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
10 rows affected.


id,name,line_no,line,rank
27006,/dsa/data/all_datasets/book/proverbs.txt,239,8:36: But he that sinneth against me wrongeth his own soul: all they that hate me love death.,0.05
14116,/dsa/data/all_datasets/book/hebrews.txt,11,"1:9: Thou hast loved righteousness, and hated iniquity; therefore God, even thy God, hath anointed thee with the oil of gladness above thy fellows.",0.0333333
4070,/dsa/data/all_datasets/book/2chron.txt,385,"19:2: And Jehu the son of Hanani the seer went out to meet him, and said to king Jehoshaphat, Shouldest thou help the ungodly, and love them that hate the LORD? therefore is wrath upon thee from before the LORD.",0.0333333
29234,/dsa/data/all_datasets/book/psalms.txt,1550,"97:10: Ye that love the LORD, hate evil: he preserveth the souls of his saints; he delivereth them out of the hand of the wicked.",0.0333333
15810,/dsa/data/all_datasets/book/isaiah.txt,1200,"61:8: For I the LORD love judgment, I hate robbery for burnt offering; and I will direct their work in truth, and I will make an everlasting covenant with them.",0.0333333
7500,/dsa/data/all_datasets/book/amos.txt,75,"5:15: Hate the evil, and love the good, and establish judgment in the gate: it may be that the LORD God of hosts will be gracious unto the remnant of Joseph.",0.025
23843,/dsa/data/all_datasets/book/matthew.txt,163,"6:24: No man can serve two masters: for either he will hate the one, and love the other; or else he will hold to the one, and despise the other. Ye cannot serve God and mammon.",0.025
22530,/dsa/data/all_datasets/book/luke.txt,741,"16:13: No servant can serve two masters: for either he will hate the one, and love the other; or else he will hold to the one, and despise the other. Ye cannot serve God and mammon.",0.025
24785,/dsa/data/all_datasets/book/micah.txt,32,"3:2: Who hate the good, and love the evil; who pluck off their skin from off them, and their flesh from off their bones;",0.025
23814,/dsa/data/all_datasets/book/matthew.txt,134,"5:43: Ye have heard that it hath been said, Thou shalt love thy neighbour, and hate thine enemy.",0.025


**Activity 3:**
Select id, name, line_no, line, and cover density rank for the following search terms. Refer to the lab and documentation as needed. 
- love, using to_tsquery()

In [12]:
%%sql

SELECT id, name, line_no, line, ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, to_tsquery('love') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
10 rows affected.


id,name,line_no,line,rank
22075,/dsa/data/all_datasets/book/luke.txt,286,"6:32: For if ye love them which love you, what thank have ye? for sinners also love those that love them.",0.4
19153,/dsa/data/all_datasets/book/john.txt,621,"13:34: A new commandment I give unto you, That ye love one another; as I have loved you, that ye also love one another.",0.3
1719,/dsa/data/all_datasets/book/1john.txt,82,4:18: There is no fear in love; but perfect love casteth out fear: because fear hath torment. He that feareth is not made perfect in love.,0.3
3204,/dsa/data/all_datasets/book/1samuel.txt,536,"20:17: And Jonathan caused David to swear again, because he loved him: for he loved him as he loved his own soul.",0.3
14447,/dsa/data/all_datasets/book/hosea.txt,36,"3:1: Then said the LORD unto me, Go yet, love a woman beloved of her friend, yet an adulteress, according to the love of the LORD toward the children of Israel, who look to other gods, and love flagons of wine.",0.3
19197,/dsa/data/all_datasets/book/john.txt,665,"15:9: As the Father hath loved me, so have I loved you: continue ye in my love.",0.3
1711,/dsa/data/all_datasets/book/1john.txt,74,"4:10: Herein is love, not that we loved God, but that he loved us, and sent his Son to be the propitiation for our sins.",0.3
1717,/dsa/data/all_datasets/book/1john.txt,80,"4:16: And we have known and believed the love that God hath to us. God is love; and he that dwelleth in love dwelleth in God, and God in him.",0.3
1663,/dsa/data/all_datasets/book/1john.txt,26,"2:15: Love not the world, neither the things that are in the world. If any man love the world, the love of the Father is not in him.",0.3
22945,/dsa/data/all_datasets/book/malachi.txt,3,"1:2: I have loved you, saith the LORD. Yet ye say, Wherein hast thou loved us? Was not Esau Jacob's brother? saith the LORD: yet I loved Jacob,",0.3


**Activity 4:**
Select id, name, line_no, line, and cover density rank for the following search terms. Refer to the lab and documentation as needed. 
- test file, using plainto_tsquery()

In [13]:
%%sql

SELECT id, name, line_no, line, ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, plainto_tsquery('test file') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
1 rows affected.


id,name,line_no,line,rank
31259,/dsa/data/all_datasets/book/one_level_down/two_levels_down/test.txt,1,This is just a test file,0.1


<span style="background-color:yellow">_Question:_ what is the effect of the above search? Click here and type your answer.</span>

This plainto_tsquery search for 'test file' is finding lines with both 'test' and 'file'.


**Activity 5:**
Select id, name, line_no, line, and cover density rank for the following search terms. Refer to the lab and documentation as needed. 
- The equivalent query from Activity 4 using to_tsquery()


In [14]:
%%sql
SELECT id, name, line_no, line, ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, to_tsquery('test & file') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
1 rows affected.


id,name,line_no,line,rank
31259,/dsa/data/all_datasets/book/one_level_down/two_levels_down/test.txt,1,This is just a test file,0.1


**Activity 6:**
Select id, name, line_no, line, and cover density rank for the following search terms. Refer to the lab and documentation as needed. 
- love, using plainto_tsquery()

In [15]:
%%sql
SELECT id, name, line_no, line, ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, plainto_tsquery('love') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
10 rows affected.


id,name,line_no,line,rank
22075,/dsa/data/all_datasets/book/luke.txt,286,"6:32: For if ye love them which love you, what thank have ye? for sinners also love those that love them.",0.4
19153,/dsa/data/all_datasets/book/john.txt,621,"13:34: A new commandment I give unto you, That ye love one another; as I have loved you, that ye also love one another.",0.3
1719,/dsa/data/all_datasets/book/1john.txt,82,4:18: There is no fear in love; but perfect love casteth out fear: because fear hath torment. He that feareth is not made perfect in love.,0.3
3204,/dsa/data/all_datasets/book/1samuel.txt,536,"20:17: And Jonathan caused David to swear again, because he loved him: for he loved him as he loved his own soul.",0.3
14447,/dsa/data/all_datasets/book/hosea.txt,36,"3:1: Then said the LORD unto me, Go yet, love a woman beloved of her friend, yet an adulteress, according to the love of the LORD toward the children of Israel, who look to other gods, and love flagons of wine.",0.3
19197,/dsa/data/all_datasets/book/john.txt,665,"15:9: As the Father hath loved me, so have I loved you: continue ye in my love.",0.3
1711,/dsa/data/all_datasets/book/1john.txt,74,"4:10: Herein is love, not that we loved God, but that he loved us, and sent his Son to be the propitiation for our sins.",0.3
1717,/dsa/data/all_datasets/book/1john.txt,80,"4:16: And we have known and believed the love that God hath to us. God is love; and he that dwelleth in love dwelleth in God, and God in him.",0.3
1663,/dsa/data/all_datasets/book/1john.txt,26,"2:15: Love not the world, neither the things that are in the world. If any man love the world, the love of the Father is not in him.",0.3
22945,/dsa/data/all_datasets/book/malachi.txt,3,"1:2: I have loved you, saith the LORD. Yet ye say, Wherein hast thou loved us? Was not Esau Jacob's brother? saith the LORD: yet I loved Jacob,",0.3


# Please explore different queries

  1. Explore different queries below.
  2. Observe how the ranking score is changed with different queries and different numbers of search terms.

In [18]:
%%sql
SELECT id, name, line_no, line, ts_rank_cd(line_tsv_gin, query) AS rank
FROM dlfy6.booklines, to_tsquery('hello | world') query
WHERE query @@ line_tsv_gin
ORDER BY rank DESC LIMIT 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_student
10 rows affected.


id,name,line_no,line,rank
19207,/dsa/data/all_datasets/book/john.txt,675,"15:19: If ye were of the world, the world would love his own: but because ye are not of the world, but I have chosen you out of the world, therefore the world hateth you.",0.5
18543,/dsa/data/all_datasets/book/john.txt,11,"1:10: He was in the world, and the world was made by him, and the world knew him not.",0.3
1706,/dsa/data/all_datasets/book/1john.txt,69,"4:5: They are of the world: therefore speak they of the world, and the world heareth them.",0.3
19262,/dsa/data/all_datasets/book/john.txt,730,"17:14: I have given them thy word; and the world hath hated them, because they are not of the world, even as I am not of the world.",0.3
18626,/dsa/data/all_datasets/book/john.txt,94,3:17: For God sent not his Son into the world to condemn the world; but that the world through him might be saved.,0.3
1663,/dsa/data/all_datasets/book/1john.txt,26,"2:15: Love not the world, neither the things that are in the world. If any man love the world, the love of the Father is not in him.",0.3
1189,/dsa/data/all_datasets/book/1corinth.txt,39,"2:6: Howbeit we speak wisdom among them that are perfect: yet not the wisdom of this world, nor of the princes of this world, that come to nought:",0.2
1664,/dsa/data/all_datasets/book/1john.txt,27,"2:16: For all that is in the world, the lust of the flesh, and the lust of the eyes, and the pride of life, is not of the Father, but is of the world.",0.2
1172,/dsa/data/all_datasets/book/1corinth.txt,22,1:20: Where is the wise? where is the scribe? where is the disputer of this world? hath not God made foolish the wisdom of this world?,0.2
1307,/dsa/data/all_datasets/book/1corinth.txt,157,"7:31: And they that use this world, as not abusing it: for the fashion of this world passeth away.",0.2


# Save your notebook, then `File > Close and Halt`