## Full Text Search (FTS)
Full Text Search or FTS allows you to create, manage and query full text indexes on JSON documents stored in Couchbase buckets. It uses natural language processing for indexing and querying documents, provides relevance scoring on the results of your queries and has fast indexes for querying a wide range of possible text searches.

Some of the supported query-types include simple queries like Match and Term queries, range queries like Date Range and Numeric Range and compound queries for conjunctions, disjunctions and/or boolean queries.

### Configuring the Couchbase Cluster Information for Examples

The configuration is stored in an environment file, `.env` in this folder. 

Note that you might have to check for hidden files to see this file on Unix environments.

This file can be used to update the connection settings.
* DB_HOST: Set to `couchbase://couchbase` by default for connecting to the Couchbase cluster in the docker environment via Docker Compose. If you are running Couchbase locally on your machine via docker or installation, you can change the connection string to `couchbase://localhost`.
* DB_USER: Set to `Administrator` by default. If it is different for your cluster, please update the file.
* DB_PASSWORD: Set to `Password` by default. If it is different for your cluster, please update the file.


In [None]:
# Read the Database information from .env file
from dotenv import load_dotenv
import os

load_dotenv()  # take environment variables from .env file.

In [None]:
DB_HOST = os.getenv("DB_HOST")
DB_USER = os.getenv("DB_USER")
DB_PASSWORD = os.getenv("DB_PASSWORD")
print(f"Environment Settings \n{DB_HOST=} \n{DB_USER=} \n{DB_PASSWORD=}")

### Connecting to Couchbase Cluster
- Connection String: `couchbase://couchbase` would connect to the Couchbase instance.
- PasswordAuthenticator: It specifies the username & password used to access the Cluster.

#### Note
If you are running Couchbase locally on your machine via docker or installation, you can change the connection string to `couchbase://localhost` via the configuration file `.env`

In [None]:
import couchbase.search as search
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions, SearchOptions
from couchbase.exceptions import CouchbaseException

### Note
If you are running Couchbase locally on your machine via docker or installation, you can change the connection string to `couchbase://localhost`

In [None]:
# get a reference to our cluster
cluster = Cluster(
    DB_HOST,
    ClusterOptions(PasswordAuthenticator(DB_USER, DB_PASSWORD)),
)

## Creating FTS Index using the Web Console to search all documents in Travel-Sample 
- Go to `Search` Tab in Cluster and click on `Add Index`

![Create-FTS](./img/Create-FTS.png)

- Create the Index `travel-sample-search` to search across all documents in the default collection as shown in the image
You need to do the following:
    - Add the Index Name as `travel-sample-search`
    - Select the Bucket as `travel-sample`
    - In Advanced, check the box that says `Store Dynamic Fields`. It ensures inclusion of field-content in returned results.

![Travel-Search-Index](./img/Travel-Search-Index.png)

Building the index could take a few minutes. Until the index is completely built, you can get different results based on the current status of the index. You can try running the next cell a few times to see the changing results.

## Querying for Documents using FTS 
Now that we have created a search index, we can use it to find documents based on search strings. You can also try changing the search strings and observe the matched documents.

## Scoring 
Search result scoring occurs at a query time. The result of the search request is ordered by score (relevance), with the descending sort order unless explicitly set not to do so.

Couchbase uses a slightly modified version of the standard tf-idf algorithm. This deviation is to normalize the score and is based on tf-idf algorithm.

For more details on scoring, refer to the [documentation](https://docs.couchbase.com/server/current/fts/fts-scoring.html).

In [None]:
# Search for a specified string query across all documents
try:
    result = cluster.search_query(
        "travel-sample-search", search.QueryStringQuery("ibis")
    )

    for row in result.rows():
        print(f"Found row: {row}")

    print(f"Reported total rows: {result.metadata().metrics().total_rows()}")

except CouchbaseException as ex:
    import traceback

    traceback.print_exc()

## Search Returning Specified Fields
To specify the fields of the document that you are interested in, you can set the fields attribute in`SearchOptions()`. These fields will be returned for the documents that match the search string.

You can access the fields of a result using the fields attribute of the result.

In [None]:
# Search for a specified string query across all documents returning the specified fields name and description
try:
    result = cluster.search_query(
        "travel-sample-search",
        search.QueryStringQuery("ibis"),
        SearchOptions(fields=["name", "description"]),
    )

    for row in result.rows():
        print(f"Found row with Score: {row.score}\n{row.fields}")

    print(f"Reported total rows: {result.metadata().metrics().total_rows()}")

except Exception as e:
    print(e)

## Highlight Fields matching Search Results
It is possible to enable highlighting for matched fields. You can either rely on the default highlighting style or provide a specific one. The following snippet uses HTML formatting for two field description and name. If the search matches these fields, they will be highlighted using the `<mark> </mark>` tags. The highlighted fields can be seen in the fragments attribute of the results.

This is useful for debugging or highlighting the search results on the frontend for your app.

In [None]:
try:
    result = cluster.search_query(
        "travel-sample-search",
        search.QueryStringQuery("downtown"),
        SearchOptions(
            highlight_style=search.HighlightStyle.Html,
            highlight_fields=["description", "name"],
        ),
    )
    for row in result:
        print(row.score, row.fragments)
except Exception as e:
    print(e)

## Search for exact Matches
In order to search for documents with exact matches, you can use `TermQuery()` instead of `QueryStringQuery()` which we have been using so far. This will return only documents that match the exact search term. Note that these documents might not contain these exact matches in the returned fields.

This is useful when you need to search for exact matches in your application.

In [None]:
result = cluster.search_query(
    "travel-sample-search",
    search.TermQuery("marriot"),
    SearchOptions(fields=["name", "description"]),
)
try:
    for row in result:
        print(row.score, row.fields)
except Exception as e:
    print(e)

## Search Facets
Facets are aggregate information collected on a result set and are useful when it comes to categorization of result data. The SDK allows you to provide many different facet configurations to the Search Engine, the following example shows how to create a facet based on a term.

Facets are useful in providing filters that indicate the number of documents that match the search. You can have the same term matching across different types of documents. Facets provide an aggregation of the documents that match the search term.

In [None]:
try:
    result = cluster.search_query(
        "travel-sample-search",
        search.QueryStringQuery("north"),
        SearchOptions(
            facets={"types": search.TermFacet("type", 5)}, fields=["name", "type"]
        ),
    )
    for row in result:
        print(row.fields)

    # This result indicates that there are 391 records of type landmark, 150 of type hotel & 29 of type airport matching the search term
    print(result.facets()["types"].terms)

except Exception as e:
    print(e)

## Creating an FTS Index on Specific Fields in the Database
- Create a Quick Index from Search Menu
![Quick-Index](./img/Quick-Index.png)

- Select the configuration as defined in the image
    - Set the name of the index as `hotel_address` 
    - Set the Keyspace as \`travel-sample\`.inventory.hotel
    - In Select Fields, select `address` as the field to index. 
    - Set the type as text & check "Include in search results" so that the addresses are stored along with the index.
    - Leave the rest of the default settings and click on `Add`.
    - Create the Index by clicking on `Create Index`
    
![Hotels-Address-Index](./img/Hotel-Address-Index.png)

## Difference between Quick Index and Classic Index Editor
If you were following the steps closely, you might have noticed that we created the Index in a different way than before. That is because we used the Quick Index Editor this time around.

The classic editor is an advanced tool in which users directly configure the index mapping. The quick editor allows users to configure the mapping by working with sample documents and higher-level abstractions.

The classic editor is intended for users that are already familiar with the concepts of full-text search, while the quick editor is intended for new users who are still learning about full-text search.

Therefore, if you’re not sure where to start, try the Quick Editor.

Now let us try to run some queries against the hotel addresses.

In [None]:
# search for "north" in addresses returning the address
result = cluster.search_query(
    "hotel_address", search.TermQuery("north"), SearchOptions(fields=["address"])
)

for row in result.rows():
    print(f"Score: {row.score}, {row.fields}")

In [None]:
# Search for "street" in addresses returning the address
result = cluster.search_query(
    "hotel_address",
    search.QueryStringQuery("street"),
    SearchOptions(fields=["address"]),
)

for row in result.rows():
    print(f"Score: {row.score}, {row.fields}")

In [None]:
# Search for "bush" in addreses returning the address
result = cluster.search_query(
    "hotel_address", search.QueryStringQuery("bush"), SearchOptions(fields=["address"])
)

for row in result.rows():
    print(row.score, row.fields)

## Example Use Case: Create an Index to Query the Hotel Reviews
Steps to create the index
1. On the Full Text Search UI, click “Add Index”.
2. Specify an index name,  “hotel_reviews”, and select the travel-sample bucket. 
3. Since each document in the travel-sample bucket has a “type” field indicating the type of document, leave “JSON type field” set to “type”.
4. Under type mappings:  
- Click “+ Add Type Mapping”, and specify “hotel” as the type name, since the requirement is to search all hotel documents.  
- A list of available analyzers can be accessed by means of the pull-down menu to the right of the type name field.  For this use case, leave “inherit” selected so that the type mapping inherits the default analyzer from the index. You can read more about the types of analyzers supported in Couchbase [here](https://docs.couchbase.com/server/current/fts/fts-analyzers.html).
- Since the requirement is to search the hotel review content fields, check “only index specified fields”.  With this checked, only user-specified fields from the document are included in the index for the hotel type mapping (the mapping will not be dynamic, meaning that all fields are considered available for indexing). 
- Click OK.  
- Mouse over the row with the hotel type mapping, click the + button, and then click “insert child mapping”.  Child mappings are used to specify a document-field whose value is a JSON object. This will allow the array of review sub-documents to be included in the index.  Enter the property name “reviews”, leave “inherit” selected in the analyzer drop-down, check “only index specified fields”, and click OK. 
- Mouse over the row with the reviews child mapping, click the + button, and then click “insert child field”.  The option insert child field allows a field to be individually included for (or excluded from) indexing, provided that it contains a single value or an array rather than a JSON object. This will allow the content field from the array of review sub-documents to be included in the index.  Specify the following: 
    - field: Enter the name of the field to be indexed, “content”.
    - type: Leave this set to text for the content field.
    - searchable as: Leave this the same as the field name for the current use case.  It can be used to indicate an alternate field name. 
    - analyzer: As was done for the type mapping, for this use case, leave “inherit” selected so that the type mapping inherits the default analyzer.
    - index checkbox: Leave this checked, so that the field is included in the index.  Unchecking the box would explicitly remove the field from the index.
    - store checkbox: Check this setting to include the field content in the search results which permits highlighting of matched expressions in the results.  This is useful for testing the index, but not recommended in production environment if highlighting isn’t required since it increases index size.
    - “include in _all field” checkbox: Check this since the use case requirement is to search multiple fields. 
    - “include term vectors” checkbox: Check this too during development and testing of our index to allow highlighting of results.  
    - docvalues checkbox: Uncheck this setting.  This setting stores the field values in the index which provides support for Search Facets, and for the sorting of search results based on field values, neither of which we need in this use case. 
    - Click OK.
- Finally, uncheck the checkbox next to the “default” type mapping.  If the default mapping is left enabled, all documents in the bucket are included in the index, regardless of whether the user actively specifies type mappings. Only the hotel documents are required, and they are included by the hotel type mapping added previously. 
- Click on "Create Index" to create the index.

![Hotel-Reviews-Index](./img/Hotel-Reviews-Index.png)

In [None]:
# Let us try to search for reviews mentioning about breakfast
try:
    result = cluster.search_query(
        "hotel_reviews",
        search.QueryStringQuery("breakfast"),
        SearchOptions(fields=["reviews.content"]),
    )

    for row in result.rows():
        print(f"Score: {row.score}")
        print(f"Document Id: {row.id}")
        print(row.fields)

except Exception as e:
    print(e)

In [None]:
# Let us try to search for reviews mentioning about staff
try:
    result = cluster.search_query(
        "hotel_reviews",
        search.QueryStringQuery("staff"),
        SearchOptions(fields=["reviews.content"]),
    )

    for row in result.rows():
        print(f"Score: {row.score}")
        print(f"Document Id: {row.id}")
        print(row.fields)

except Exception as e:
    print(e)

## Exercise 5.1
1. Create an index to search for war museum in landmarks collection & highlight the matches
2. Use the same index to find bridges
3. Find the Youth Hostels in the travel-sample bucket

## Solutions
### Index Creation

In [None]:
# War Museums


In [None]:
# Bridges


In [None]:
# Youth Hostels


## References
- [Full Text Search](https://docs.couchbase.com/server/current/fts/fts-introduction.html)
- [Creating Full Text Search Indexes](https://docs.couchbase.com/server/current/fts/fts.html)
- [FTS using Python SDK](https://docs.couchbase.com/python-sdk/current/howtos/full-text-searching-with-sdk.html)