# Synonyms API quick start

<a target="_blank" href="https://colab.research.google.com/github/Mikep86/elasticsearch-labs/blob/main/notebooks/search/06-quick-start.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This interactive notebook will introduce you to the [Synonyms API](https://www.elastic.co/blog/update-synonyms-elasticsearch-introducing-synonyms-api) using the official [Elasticsearch Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html). You'll create & update synonym sets, configure an index to use synonyms, and run queries that leverage synonyms for enhanced relevancy.

## Create Elastic Cloud deployment

If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.

Once logged in to your Elastic Cloud account, go to the [Create deployment](https://cloud.elastic.co/deployments/create) page and select **Create deployment**. Leave all settings with their default values.

## Install packages and import modules

To get started, we'll need to connect to our Elastic deployment using the Python client.
Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.

First we need to install the `elasticsearch` Python client.

In [None]:
!pip install -qU elasticsearch

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/409.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/409.3 kB[0m [31m910.3 kB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m399.4/409.3 kB[0m [31m6.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m409.3/409.3 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.5/59.5 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.8/143.8 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25h

## Initialize the Elasticsearch client

Now we can instantiate the [Elasticsearch python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/index.html), providing the cloud id and password in your deployment.

In [None]:
from elasticsearch import Elasticsearch
from getpass import getpass

CLOUD_ID = getpass("Elastic Cloud ID")
CLOUD_PASSWORD = getpass("Elastic Password")

# Create the client instance
client = Elasticsearch(
    cloud_id=CLOUD_ID,
    basic_auth=("elastic", CLOUD_PASSWORD)
)

Elastic Cloud ID··········
Elastic Password··········


If you're running Elasticsearch locally or self-managed, you can pass in the Elasticsearch host instead. [Read more](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_certificate_fingerprints_python_3_10_or_later) on how to connect to Elasticsearch locally.

Confirm that the client has connected with this test.

In [None]:
print(client.info())

{'name': 'instance-0000000000', 'cluster_name': '8e1234500ddb440b9c31084ad2cf0d2a', 'cluster_uuid': '4cSLpFXlQdewP_artWe7CA', 'version': {'number': '8.10.3', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'c63272efed16b5a1c25f3ce500715b7fddf9a9fb', 'build_date': '2023-10-05T10:15:55.152563867Z', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


## Configure & populate the index

Our client is set up and connected to our Elastic deployment. Now we need to configure the index that will store our test data and populate it with some documents. We'll use a small index of books with the following fields:

- `title`
- `authors`
- `publish_date`
- `num_reviews`
- `publisher`

### Configure the index

First ensure that you do not have a previously created index with the name `book_index`.

In [4]:
client.indices.delete(index="book_index", ignore_unavailable=True)

ObjectApiResponse({'acknowledged': True})

🔐 NOTE: at any time you can come back to this section and run the `delete` function above to remove your index and start from scratch.

Let's create our initial synonyms set next.

In [5]:
synonyms_set = [
    {
        "id": "synonym-1",
        "synonyms": "js, javascript, java script"
    }
]

client.synonyms.put_synonym(id="my-synonyms-set", synonyms_set=synonyms_set)

ObjectApiResponse({'result': 'created', 'reload_analyzers_details': {'_shards': {'total': 19, 'successful': 17, 'failed': 0}, 'reload_details': []}})



In order to use synonyms, we need to define a [custom analyzer](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html) that uses the [`synonym`](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html) or [`synonym_graph`](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html) token filter. Let's create an index that's configured to use an appropriate custom analyzer.


In [6]:
settings = {
    "analysis": {
        "analyzer": {
            "my_custom_index_analyzer": {
                "tokenizer": "standard",
                "filter": [
                    "lowercase"
                ]
            },
            "my_custom_search_analyzer": {
                "tokenizer": "standard",
                "filter": [
                    "lowercase",
                    "my_synonym_filter"
                ]
            }
        },
        "filter": {
            "my_synonym_filter": {
                "type": "synonym_graph",
                "synonyms_set": "my-synonyms-set",
                "updateable": True
            }
        }
    }
}

mappings = {
    "properties": {
        "title": {
            "type": "text",
            "analyzer": "my_custom_index_analyzer",
            "search_analyzer": "my_custom_search_analyzer"
        },
        "summary": {
            "type": "text",
            "analyzer": "my_custom_index_analyzer",
            "search_analyzer": "my_custom_search_analyzer"
        }
    }
}

client.indices.create(index='book_index', mappings=mappings, settings=settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'book_index'})

There are a few things to note in the configuration:

- We are using the `synonym_graph` token filter.
- We have defined two analyzers: `my_custom_index_analyzer` and `my_custom_search_analyzer`. `my_custom_search_analyzer` is used as a [search analyzer](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html).
- `my_synonym_filter` is used only in `my_custom_search_analyzer`.

A configuration like this is used to allow us to a) use multi-word synonyms and b) update synonyms without reindexing.

Multi-word synonym handling is an important topic to understand if you want to leverage synonyms to their maximum benefit. See the [the subtleties of Elasticsearch synonyms](https://mauricius.dev/the-subtleties-of-elasticsearch-synonyms/) for more information.

### Populate the index

Run the following command to upload some test data, containing information about 10 popular programming books from this [dataset](https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/data.json).

In [7]:
import json
from urllib.request import urlopen

url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/data.json"
response = urlopen(url)
books = json.loads(response.read())

operations = []
for book in books:
    operations.append({"index": {"_index": "book_index"}})
    operations.append(book)
client.bulk(index="book_index", operations=operations, refresh=True)

ObjectApiResponse({'errors': False, 'took': 37, 'items': [{'index': {'_index': 'book_index', '_id': 'NtdiSYsBGHjk6-WLAKIM', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'N9diSYsBGHjk6-WLAKIM', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'ONdiSYsBGHjk6-WLAKIM', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1, 'status': 201}}, {'index': {'_index': 'book_index', '_id': 'OddiSYsBGHjk6-WLAKIM', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1, 'status': 201}}, {'index': {'_i

## Aside: Pretty printing Elasticsearch search results

Your `search` API calls will return hard-to-read nested JSON.
We'll create a little function called `pretty_search_response` to return nice, human-readable outputs from our examples.

In [8]:
def pretty_search_response(response):
    if len(response['hits']['hits']) == 0:
        print('Your search returned no results.')
    else:
        for hit in response['hits']['hits']:
            id = hit['_id']
            publication_date = hit['_source']['publish_date']
            score = hit['_score']
            title = hit['_source']['title']
            summary = hit['_source']['summary']
            publisher = hit["_source"]["publisher"]
            num_reviews = hit["_source"]["num_reviews"]
            authors = hit["_source"]["authors"]
            pretty_output = (f"\nID: {id}\nPublication date: {publication_date}\nTitle: {title}\nSummary: {summary}\nPublisher: {publisher}\nReviews: {num_reviews}\nAuthors: {authors}\nScore: {score}")
            print(pretty_output)

## Run queries

Let's use our synonyms in some Elasticsearch queries. We'll start by searching for books about Javascript.

In [12]:
response = client.search(
    index="book_index",
    query={
        "multi_match": {
            "query": "java script",
            "fields": [
                "title^10",
                "summary",
            ]
        }
    }
)

pretty_search_response(response)


ID: O9diSYsBGHjk6-WLAKIM
Publication date: 2018-12-04
Title: Eloquent JavaScript
Summary: A modern introduction to programming
Publisher: no starch press
Reviews: 38
Authors: ['marijn haverbeke']
Score: 20.307524

ID: OtdiSYsBGHjk6-WLAKIM
Publication date: 2015-03-27
Title: You Don't Know JS: Up & Going
Summary: Introduction to JavaScript and programming as a whole
Publisher: oreilly
Reviews: 36
Authors: ['kyle simpson']
Score: 19.787104

ID: PtdiSYsBGHjk6-WLAKIM
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 17.064087


Notice that even though we searched for the term "java script", we got results containing the terms "JS" and "JavaScript". Our synonyms are working!

Now let's try searching for books about AI.

In [13]:
response = client.search(
    index="book_index",
    query={
        "multi_match": {
            "query": "AI",
            "fields": [
                "title^10",
                "summary",
            ]
        }
    }
)

pretty_search_response(response)

Your search returned no results.


We don't get any results! Let's try using the Synonyms API to add a new synonym rule for AI.



In [15]:
client.synonyms.put_synonym_rule(set_id="my-synonyms-set", rule_id="synonym-2", synonyms="ai, artificial intelligence")

ObjectApiResponse({'result': 'updated', 'reload_analyzers_details': {'_shards': {'total': 21, 'successful': 18, 'failed': 0}, 'reload_details': [{'index': 'book_index', 'reloaded_analyzers': ['my_custom_search_analyzer'], 'reloaded_node_ids': ['biL-gxQYS76-6xpHQkZu4Q']}]}})

If we run the query again, we should now get some results.

In [16]:
response = client.search(
    index="book_index",
    query={
        "multi_match": {
            "query": "AI",
            "fields": [
                "title^10",
                "summary",
            ]
        }
    }
)

pretty_search_response(response)


ID: R9eBSYsBGHjk6-WL76KJ
Publication date: 2020-04-06
Title: Artificial Intelligence: A Modern Approach
Summary: Comprehensive introduction to the theory and practice of artificial intelligence
Publisher: pearson
Reviews: 39
Authors: ['stuart russell', 'peter norvig']
Score: 42.500813
