# Exploring Elastic Search

**Mehdi Boustani** - S221594  
**Nicolas Schneiders** - S203005  
**Maxim Piron** - S211493  
**Andreas Stistrup** - S212891  

*Faculty of Applied Sciences, University of Liège*

April 28, 2025


# Introduction

# Installation & configuration

## Docker

### Installing Elasticsearch
If you don't have Docker installed yet, you can download and install it from the [official website](https://www.docker.com/). 

Once Docker is running on your machine, launch Elasticsearch using the following command:

In [None]:
!docker run -p 127.0.0.1:9200:9200 -d --name elasticsearch \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "xpack.license.self_generated.type=trial" \
  -v "elasticsearch-data:/usr/share/elasticsearch/data" \
  docker.elastic.co/elasticsearch/elasticsearch:8.15.0


## Dependencies  
Let's install all the necessary Python packages we'll be using throughout this tutorial.


In [None]:
# requests       → to interact with the Elasticsearch REST API
# elasticsearch  → official Elasticsearch Python client
# pandas         → for handling and analyzing tabular data (e.g., dataset exploration)
# matplotlib     → for optional data visualization (e.g., query stats or aggregations)

!pip install requests elasticsearch==8.15.0 pandas matplotlib

## Connexion

In [None]:
from pprint import pprint
from elasticsearch import Elasticsearch, helpers

es = Elasticsearch('http://localhost:9200')
info = es.info()

print('Connected to ElasticSearch !')
pprint(info.body)


## Importing data with the bulk api

To efficiently load a large dataset into Elasticsearch, we use the Bulk API. This method allows us to insert multiple documents in a single request, which is much faster and more efficient than indexing documents one by one. In this example, we will import the contents of our `apod.json` file—where each element is a document—into a new index called `apod`.

In [None]:
import json

with open("apod.json", "r") as f:
    data = json.load(f)

# Prepare the actions for the bulk
actions = [
    {
        "_index": "apod",
        "_id": doc["title"], # We use the title as index since it is a unique field (the unicity is important!)
        "_source": doc
    }
    for doc in data
]

# We import the data in bulk
try:
    helpers.bulk(es, actions)
    print("Bulk import terminé !")
except  Exception as e:
    print(e)

## Basic queries in ElasticSearch

Elasticsearch exposes a RESTful API, which means you interact with it using standard HTTP methods. Here are the most common operations:

- **GET**: Read a document or perform a search
- **POST**: Add a new document
- **PUT**: Create or replace a document or an index
- **DELETE**: Remove a document or an index

### GET method

The GET method is used to retrieve data from our json file by providing an id. If the document with the specified id doesn't exist, it throws an exception.

In [None]:
try:
    doc = es.get(index="apod", id="A Hazy Harvest Moon")
    pprint(doc['_source'])

except:
    print("A document with this id doesn't exist!")

### POST method

The POST method is used to create a new document. When using the index() method without specifying an id, elasticsearch automatically generates one (not the title as the other documents).

In [None]:
from datetime import datetime

new_id = "A New APOD"

new_doc = {
    "date": datetime.now().strftime("%Y-%m-%d"),
    "title": new_id,
    "explanation": "This is a new document added via POST.",
    "image_url": "https://apod.nasa.gov/apod/image/2410/new_apod.jpg",
    "authors": "Mehdi Boustani"
}

res = es.index(index="apod", document=new_doc)

print("Document added successfully")

### PUT method

The PUT method is used to create or replace a document at a specified id. If a document with that id already exists, it will be overwritten.

In [None]:
replaced_id = "Replaced APOD"

doc = {
    "date": "2024-10-02",
    "title": replaced_id,
    "explanation": "This document replaces any previous one with the same ID.",
    "image_url": "https://apod.nasa.gov/apod/image/2410/new_apod.jpg",
    "authors": "Mehdi Boustani"
}

# Let's replace our previously created document
es.index(index="apod", id=doc["title"], document=doc)

print(f"Document with id '{new_id}' replaced by a new document with id '{replaced_id}'")

### DELETE method

This method is used to delete a document by its id. The id must be known and specified in the request.

In [None]:
try:
    es.delete(index="apod", id=replaced_id)
    print(f"Document with ID {replaced_id} deleted.")

except:
    print("The specified document to delete doesn't exist")

# Delete the entire index (be careful, this is command is irreversible)
# es.indices.delete(index="apod")
# print("Index 'apod' deleted.")

## DPL vs SQL

Elasticsearch doesn't use traditional SQL language to query data, but rather a **DSL (Domain Specific Language)** based on JSON.

### Main differencies

1. **Query Structure**
   - **SQL**: Uses a strict syntax with clauses like `SELECT`, `FROM`, `WHERE`
   - **DSL**: Uses a nested JSON format, offering more flexibility in how queries are expressed

2. **Search Types**
   - **SQL**: Focuses mainly on exact matches
   - **DSL**: Supports advanced search techniques like full-text search, fuzzy matching, and range queries

Let's explore practical examples comparing SQL concepts with Elasticsearch's DSL.

### Full-text search

In [None]:
query = {
    "query": {
        "match": {
            "title": "moon"
        }
    }
}

response = es.search(index="apod", body=query)
for hit in response["hits"]["hits"]:
    print(hit["_source"])

**SQL equivalent:** `SELECT * FROM apod WHERE title LIKE '%moon%'`

### Exact match with term

In [None]:
query = {
    "query": {
        "term": {
            "title.keyword": {
                "value": "A Hazy Harvest Moon"
            }
        }
    }
}

response = es.search(index="apod", body=query)
for hit in response["hits"]["hits"]:
    print(hit["_source"])


**SQL equivalent:** `SELECT * FROM apod WHERE title = 'A Hazy Harvest Moon'`

### Range Query (Numeric/date filtering)

In [None]:
# We filter documents by a date range
query = {
    "query": {
        "range": {
            "date": {
                "gte": "2020-01-01",
                "lte": "2020-01-15"
            }
        }
    }
}

response = es.search(index="apod", body=query)
for hit in response["hits"]["hits"]:
    print(hit["_source"])


**SQL equivalent:** `SELECT * FROM apod WHERE date BETWEEN '2020-01-01' AND '2020-12-31'`

### Fuzzy Query (Typo-tolerant search)

In [None]:
# Typo-tolerant search with fuzzy
query = {
    "query": {
        "fuzzy": {
            "title": {
                "value": "Galaxi",
                "fuzziness": "AUTO"
            }
        }
    }
}

response = es.search(index="apod", body=query)
for hit in response["hits"]["hits"]:
    print(hit["_source"])

**SQL equivalent:** No direct equivalent, similar to a `LIKE` with typos

# Elastic search as a search engine

The goal of Elasticsearch is to empower client workflows to retrieve data from your database using powerful, flexible queries. To customize search behavior, Elasticsearch offers several fine-tuning parameters. In this section, we’ll explore the filter, must, must_not, and should clauses.

A key concept here is document scoring. When you run a query, Elasticsearch calculates a relevance score for each candidate document and orders results accordingly. You then return the top n documents based on that ranking. To further control how scores influence ordering, you can use the boost parameter to adjust relevance and achieve custom ranking.

## Filter requests

When you apply a filter criterion to your query, you define one or more clauses that documents must satisfy to be included. Filters are score-neutral, they don’t alter a document’s relevance score, they only prune out non-matching hits. Below we’ll explore a selection of the most common filter clauses.

In [None]:
from pprint import pprint
index_name = "apod"

# This query will filter out documents that do not match the date "2024-09-27"
term_query = {
    "term": {"date": "2024-09-27"}
}

# This query will filter documents with a date between "2024-09-09" and "2024-09-30"
range_query = {
    "range": {
        "date": {"gte": "2024-09-09", "lte": "2024-09-30"}
    }
}

# This query will filter documents that have a non-null value for the field "image_url"
exists_query = {
    "exists": {"field": "note"}
}

# This query will filter documents that have the exact term "David Martinez Delgado et al." in the "authors" field
term_authors_query = {
    "term": {"authors.keyword": "David Martinez Delgado et al."}
}

# This query will filter documents that have a title starting with "Comet"
prefix_query = {
    "prefix": {"title.keyword": "Comet"}
}


print("=== Term Query on date ===")
res = es.search(index=index_name, body={"query": {"bool": {"filter": [term_query]}}})
for hit in res["hits"]["hits"]:
    pprint(hit["_source"])

print("\n=== Range Query on date ===")
res = es.search(index=index_name, body={"query": {"bool": {"filter": [range_query]}}})
for hit in res["hits"]["hits"]:
    pprint(hit["_source"])

print("\n=== Exists Query on note ===")
res = es.search(index=index_name, body={"query": {"bool": {"filter": [exists_query]}}})
for hit in res["hits"]["hits"]:
    pprint(hit["_source"])

print("\n=== Exact Term Query on authors ===")
res = es.search(index=index_name, body={"query": {"bool": {"filter": [term_authors_query]}}})
for hit in res["hits"]["hits"]:
    pprint(hit["_source"])

print("\n=== Prefix Query on title ===")
res = es.search(index=index_name, body={"query": {"bool": {"filter": [prefix_query]}}})
for hit in res["hits"]["hits"]:
    pprint(hit["_source"])


## Must requests

The must criterion works much like filter in that it first determines which records are eligible but with one key difference: when a document matches a must clause, its relevance score is increased. You can include multiple must clauses, and they’re combined with a logical AND (i.e., a document must satisfy all of them).

In [None]:
must_title = {
    "match": {"title": "Comet"}
}

must_explanation = {
    "match": {"explanation": "nebula"}
}

print("=== Must: Title Contains 'light' (size=2) ===")
res = es.search(
    index=index_name,
    # The size parameter limits the number of results returned. Default is 10.
    body={
        "size": 2,
        "query": {
            "bool": {
                "must": [must_title]
            }
        }
    }
)
for hit in res["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}")
    pprint(hit["_source"])


print("\n=== Must: Title Contains 'Comet' AND Explanation Contains 'nebula' (size=2) ===")
res = es.search(
    index=index_name,
    body={
        "size": 2,
        "query": {
            "bool": {
                "must": [
                    must_title,
                    must_explanation
                ]
            }
        }
    }
)
for hit in res["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}")
    pprint(hit["_source"])


In the first query, you’ll notice the first document’s _score is higher than the second’s—clearly demonstrating how the must clause impacts relevance scoring.

## Must_not requests

Although its name might imply the opposite of must, must_not actually behaves like a negated filter. Any document that matches a must_not clause is simply removed from the set of candidates.

In [None]:
must_not_image = {
    "exists": {"field": "image_url"}
}

print("=== Example 1: must_not exists image_url (size=2) ===")
res1 = es.search(
    index=index_name,
    body={
        "size": 2,
        "query": {
            "bool": {
                "must_not": [must_not_image]
            }
        }
    }
)
for hit in res1["hits"]["hits"]:
    pprint(hit["_source"])


filter_date = {
    "range": {
        "date": {"gte": "2024-09-09", "lte": "2024-09-30"}
    }
}
must_not_comet = {
    "prefix": {"title.keyword": "Comet"}
}

print("\n=== Example 2: range on date AND must_not prefix 'Comet' (size=2) ===")
res2 = es.search(
    index=index_name,
    body={
        "size": 2,
        "query": {
            "bool": {
                "filter":   [filter_date],
                "must_not": [must_not_comet]
            }
        }
    }
)
for hit in res2["hits"]["hits"]:
    # Only docs from 2024-09-27 whose title does NOT start with "Comet"
    pprint(hit["_source"])


## Should requests

The should clause in a bool query implements a logical OR across its clauses:

1. **Standalone should (no must clauses)**  
   - A document only needs to match at least one should clause to be included.

2. **Combined must + should**  
   - All must clauses still act as required filters that documents must satisfy every must.  
   - Each should clause that matches simply boosts the document’s relevance score; non-matching should clauses do not exclude the document.


In [None]:
# This query will filter documents that match at least one the specified conditions
query1 = {
    "size": 2,
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Comet" } },
                { "match": { "explanation": "nebula" } },
                { "match": { "authors": "David Martinez Delgado et al." } }
            ],
        }
    }
}

print("=== Query 1: Standalone should (title OR explanation) ===")
res1 = es.search(index=index_name, body=query1)
for hit in res1["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}")
    pprint(hit["_source"])


# This query will filter documents that match the date "2024-09-27" and boost the score if the title or explanation matches
query2 = {
    "size": 2,
    "query": {
        "bool": {
            "must": [
                { "range": { "date": { "gte": "2024-09-20", "lte": "2024-09-30" } } }
            ],
            "should": [
                { "match": { "title": "Comet" } },
                { "match": { "explanation": "comet" } }
            ]
        }
    }
}

print("\n=== Query 2: must range 2024-09-20 to 2024-09-30 + should (boost if title/explanation) ===")
res2 = es.search(index=index_name, body=query2)
for hit in res2["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}")
    pprint(hit["_source"])


## Boosting request

When you want to increase the relevance of certain documents, you attach a positive boost to your query clauses (for example, a match or term query) by adding a boost parameter—this simply multiplies that clause’s score contribution in the final _score. To softly penalize documents without filtering them out entirely, you use the boosting query: it takes a required positive query and a negative query, and for any document that matches the negative clause, it multiplies its overall score by a negative_boost factor (a value between 0 and 1).

In [None]:
from pprint import pprint
from elasticsearch import Elasticsearch

es = Elasticsearch('http://localhost:9200')
index_name = "apod"

# 1️⃣ Positive boost: increase score when explanation contains "meteor"
#    Uses `boost` directly on a match query.
print("=== Positive Boost: explanation contains 'meteor' (boost=2.0) ===")
pos_query = {
    "size": 2,
    "query": {
        "match": {
            "explanation": {
                "query": "meteor",
                "boost": 2.0      # positive boost on match
            }
        }
    }
}
res = es.search(index=index_name, body=pos_query)
for hit in res["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}") 
    pprint(hit["_source"])

# 2️⃣ Negative boost: de-emphasize docs where explanation contains "comet"
#    Uses the boosting query with match_all as the positive clause.
print("\n=== Negative Boost: explanation contains 'comet' (negative_boost=0.5) ===")
neg_query = {
    "size": 2,
    "query": {
        "boosting": {
            "positive": { "match_all": {} },        # match everything
            "negative": {                           # demote these
                "match": { "explanation": "comet" }
            },
            "negative_boost": 0.5                   # reduce score by 50% on match
        }
    }
}
res = es.search(index=index_name, body=neg_query)
for hit in res["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}")
    pprint(hit["_source"])

# 3️⃣ Combined boost: promote "meteor" matches and demote "comet" matches
print("\n=== Combined Boost: +2.0 for 'meteor', -0.5 for 'comet' ===")
both_query = {
    "size": 2,
    "query": {
        "boosting": {
            "positive": {
                "match": {
                    "explanation": {
                        "query": "meteor",
                        "boost": 2.0
                    }
                }
            },
            "negative": {
                "match": {
                    "explanation": "comet"
                }
            },
            "negative_boost": 0.5
        }
    }
}
res = es.search(index=index_name, body=both_query)
for hit in res["hits"]["hits"]:
    print(f"_score={hit['_score']:.2f}")
    pprint(hit["_source"])


# Advanced features

In this section, we’ll dive into a handful of Elasticsearch’s most powerful advanced features. While Elasticsearch offers a wealth of capabilities beyond what we cover here, the three topics we’ll focus on are:

- **Aggregations**: Real-time analytics and data summarization  
- **Highlighting**: Extracting and emphasizing matching text snippets  
- **Autocomplete**: Instant-search experiences via suggesters and search-as-you-type  

Each feature can be mixed and matched or extended with dozens of other Elasticsearch tools to build rich, high-performance search applications tailored to your needs.


## Aggregations

Aggregation queries don’t return individual documents, instead, they compute analytics over the set of matched records. Elasticsearch supports three main aggregation types:

- **Bucket aggregations** group documents into “buckets” based on shared values (e.g., terms, date intervals, numeric ranges, or histograms).  
- **Metric aggregations** calculate statistics (such as count, sum, average, min/max) over those documents.  
- **Pipeline aggregations** take the output of one or more aggregations and run further calculations on it, enabling you to chain operations together.

In this tutorial, we’ll use those aggregations to show how you can both segment your data and then perform successive analyses on those segments.  

In [None]:
# We want to get the number of documents per month
# To do so, we will use a date_histogram aggregation to group documents by month
# and a cumulative_sum aggregation to get the cumulative count of documents over time.
body = {
    "size": 0,
    "aggs": {
        "entries_per_month": {
            # Bucket agg that uses a date_histogram to group documents by month
            "date_histogram": {
                "field":             "date",
                "calendar_interval": "month",
                "format":            "yyyy-MM"
            },
            # Pipeline agg that calculates the cumulative sum of the monthly counts
            "aggs": {
                # Metric agg that counts the number of documents in each month
                "monthly_count": {
                    "value_count": { "field": "date" }
                },
                # Pipeline agg that calculates the cumulative sum of the monthly counts
                "cumulative_entries": {
                    "cumulative_sum": {
                        "buckets_path": "monthly_count"
                    }
                }
            }
        }
    }
}

response = es.search(index="apod", body=body)

for bucket in response["aggregations"]["entries_per_month"]["buckets"]:
    month = bucket["key_as_string"]
    count = bucket["monthly_count"]["value"]
    cum   = bucket["cumulative_entries"]["value"]
    print(f"{month} → count: {count}, cumulative: {cum}")


## Highlighting

Highlighting in Elasticsearch works by surrounding each occurrence of a query term in your document text with customizable tags (by default <em>/</em>, but you can use any HTML or marker you like). When you include a highlight section in your search request, Elasticsearch will:

1. Analyze the specified field(s) to find where your query terms fall.
2. Extract short snippets (fragments) around each match.
3. Wrap each matching term in your chosen pre_tags and post_tags.
4. Return those snippets alongside each hit in a top-level highlight block.

This makes it easy to show users exactly where and in what context—their search terms appeared.

In [None]:
query = {
  "query": {
    "match": {
      "explanation": "comet"
    }
  },
  "highlight": {
    # Customize the highlight tags
    "pre_tags":  ["<mark>"],
    "post_tags": ["</mark>"],
    # Specify in wich fields we want to highlight
    "fields": {
      "explanation": {
        # This skip the fragmentation of the text and return the whole text with each match highlighted
        "number_of_fragments": 0, 
      }
    }
  }
}

response = es.search(index="apod", body=query)

print("=== Highlighted Results ===")
for hit in response["hits"]["hits"]:
    print(f"Title: {hit['_source']['title']}")
    print("Highlighted explanation:")
    for fragment in hit["highlight"]["explanation"]:
        print(fragment)
    print("\n")


## Autocomplete

The final feature we’ll cover is autocomplete. Elasticsearch offers four different approaches, but we’ll choose the simplest one with the least setup: the search-as-you-type mechanism.

The first step is to reindex your data to add the field required for this feature.


In [None]:
if es.indices.exists(index="apod_v2"):
    es.indices.delete(index="apod_v2")  

# New index mapping with search_as_you_type
mapping = {
    "mappings": {
        "properties": {
            "title": {
                "type":"search_as_you_type", # Enable autocomplete search
                "max_shingle_size": 3
            },
            "date":        { "type": "date",   "format": "yyyy-MM-dd" },
            "explanation": { "type": "text" },
            "image_url":   { "type": "keyword" },
            "authors":     { "type": "text" }
        }
    }
}

es.indices.create(index="apod_v2", body=mapping)

es.reindex(
    body={
        "source": { "index": "apod" },
        "dest":   { "index": "apod_v2" }
    },
    wait_for_completion=True,
)

print("===Reindexing complete===")


To demonstrate search-as-you-type in this notebook, we’ll embed an ipywidgets.Combobox as our search bar. Under the hood, each time you type a character, a tiny Python function sends a bool_prefix query to our search_as_you_type index and updates the dropdown with the matching titles. Note that you might need to press the enter key once to activate the search bar.

In [None]:
!pip install ipywidgets jupyterlab_widgets                      
!jupyter nbextension enable --py --sys-prefix widgetsnbextension
!pip install jupyterlab
!jupyter labextension install @jupyter-widgets/jupyterlab-manager

In [None]:
# Realized with the help of chatGPT
import ipywidgets as widgets
from IPython.display import display
import threading
import time

# Create the Search bar
combo = widgets.Combobox(
    placeholder='Type to search titles…',
    options=[],
    description='Search:',
    ensure_option=False,
    continuous_update=True
)
display(combo)

last_call = 0
lock = threading.Lock()

def fetch_suggestions(text):
    # Elasticsearch bool_prefix query against search-as-you-type
    body = {
        "query": {
            "multi_match": {
                "query": text,
                "type":  "bool_prefix",
                "fields": ["title","title._2gram","title._3gram"]
            }
        },
        "_source": ["title"],
        "size": 5
    }
    resp = es.search(index="apod_v2", body=body)
    return [hit["_source"]["title"] for hit in resp["hits"]["hits"]]

def on_value_change(change):
    global last_call
    value = change['new']
    now = time.time()
    with lock:
        if now - last_call < 0.3:
            return
        last_call = now
    if value:
        combo.options = fetch_suggestions(value)
    else:
        combo.options = []

combo.observe(on_value_change, names='value')

# Limitations and comparison with SQL
One of the most commonly mentioned limitations in forums and blogs is that Elasticsearch performance and data durability can be heavily constrained by the resources allocated to the cluster. To maintain good performance and ensure data survivability, it may be necessary to increase the number of nodes and replicate data across them. This is not a technical limitation per se, but rather an infrastructure constraint that depends on available resources.

From a technical standpoint, Elasticsearch is built on top of the Lucene engine, which is optimized for full-text search rather than relational queries such as joins or transactions. While Elasticsearch does offer limited support for join-like operations, they are far less powerful than those available in SQL databases and are typically very resource-intensive. As a result, these features are often disabled in production configurations.

In terms of consistency, Elasticsearch follows an eventual consistency model rather than strong consistency. This means that in certain race conditions, inconsistencies in query results can occur shortly after data is written or updated.

Although a portion of Elasticsearch is available for free, accessing the full feature set—particularly advanced security, monitoring, and machine learning capabilities—requires a paid license.

Nevertheless, Elasticsearch remains one of the most powerful and widely adopted solutions for full-text search, offering high performance, scalability, and a rich query language. Its ability to index and search large volumes of textual data in near real-time makes it a strong choice for use cases such as log analytics, product search, and document indexing.

Alternatives to Elasticsearch include Apache Solr, which is also built on Lucene and excels in traditional search applications, and OpenSearch, a community-driven fork of Elasticsearch that retains many of its features under a fully open-source license.

### Comparison between relational Databases and ElasticSearch
As with most decisions in technology, selecting the appropriate tools hinges on your specific use case. It's essential to assess the features you require, understand your typical workflows, and determine the data handling properties that are most critical for your operations. The following table provides a comparative overview of two prominent technologies, aiding in an informed decision-making process.

| **Aspect**                | **Elasticsearch**                                                                                                                             | **MySQL**                                                                                                                       |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|
| **Data Model**            | Document-oriented (NoSQL); stores data as JSON documents.                                                                                     | Relational (SQL); stores data in structured tables with predefined schemas.                                                     |
| **Primary Use Cases**     | Full-text search, log and event data analysis, real-time analytics, and applications requiring complex search capabilities.                   | Transactional applications, structured data storage, and scenarios requiring complex joins and ACID compliance.                 |
| **Query Language**        | Elasticsearch Query DSL (Domain Specific Language); designed for flexible and complex search operations.                                      | SQL (Structured Query Language); widely adopted for structured data querying and manipulation.                                  |
| **Joins & Relationships** | Limited support for joins; alternatives like nested documents and parent-child relationships exist but can be complex and resource-intensive. | Robust support for joins, foreign keys, and complex relational queries.                                                         |
| **Schema Flexibility**    | Schema-less; allows dynamic mapping, making it adaptable to varying data structures.                                                          | Schema-based; requires predefined schemas, offering strict data validation and integrity.                                       |
| **Consistency Model**     | Eventually consistent; suitable for scenarios where immediate consistency is not critical.                                                    | Strong consistency with ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliable transactions.        |
| **Scalability**           | Horizontally scalable; designed to handle large volumes of data across distributed systems.                                                   | Vertically scalable; can be scaled horizontally with additional configurations but is primarily optimized for vertical scaling. |
| **Performance**           | Optimized for search operations; excels in scenarios requiring rapid full-text search and analytics.                                          | Optimized for transactional operations; performs well in scenarios requiring complex transactions and data integrity.           |
| **Security Features**     | Basic security features available; advanced features like role-based access control and encryption are part of the commercial offerings.      | Offers robust security features, including user authentication, SSL support, and role-based access control.                     |
| **Licensing**             | Open-source with a dual license model; some advanced features require a commercial license.                                                   | Open-source (GPL) with commercial support available; widely adopted and supported by a large community.                         |


# Conclusion

We began this tutorial by introducing the goals of Elasticsearch and how it leverages complex indexing mechanisms to deliver high performance and scalability. We also provided a brief recap of what indexes are, how they are used in relational databases, and the limitations and performance costs they can introduce.

As we progressed, we walked through setting up an Elasticsearch cluster using Docker, connecting to it, and uploading bulk data. We then explored the basic API operations for document manipulation by ID and highlighted the differences between Query DSL and SQL. From there, we reviewed a broad range of advanced query types and features specifically tailored for building powerful search engine experiences.

Using Elasticsearch proved to be quite intuitive. Even though we only covered a subset of its capabilities, it quickly became clear how the technology fits into real-world applications. Along the way, we encountered many configuration parameters that allow you to fine-tune Elasticsearch's behavior to meet specific needs.

While Elasticsearch excels at full-text search, it may require more resources than a traditional database system. We also noted that its eventual consistency model differs from conventional databases, making it less suitable for some workflows. Nevertheless, if your application requires a robust and flexible full-text search engine, Elasticsearch is one of the most powerful solutions available, thanks to its rich feature set and high configurability.

By following this tutorial, you should now have the knowledge and skills to build your first search engine application. With a thoughtfully designed search interface and parameterized queries, you can now deliver accurate and relevant results to your users efficiently and effectively.

# References

1. <a id="freecodecamp"></a> [Elasticsearch Course for Beginners - FreeCodeCamp](https://www.youtube.com/watch?v=a4HBKEda_F8&ab_channel=freeCodeCamp.org)

2. <a id="elasticdoc"></a> [Elastic Official Documentation](https://www.elastic.co/docs/get-started)

3. <a id="elasticlab"></a> [Elastic Search Lab - Tutorials](https://www.elastic.co/search-labs/tutorials)

4. <a id="elasticlabBoolQueries"> [Elastic Search Lab - Tutortial - Filters](https://www.elastic.co/search-labs/tutorials/search-tutorial/full-text-search/filters)

5. <a id="elasticscore"></a> [Elastic Search Lab - Understanding Elasticsearch scoring and the Explain API](https://www.elastic.co/search-labs/blog/elasticsearch-scoring-and-explain-api)

6. <a id="mustnot"></a> [Soumendra - Stack Overflow - Difference between must_not and filter in elasticsearch](https://stackoverflow.com/questions/47226479/difference-between-must-not-and-filter-in-elasticsearch)

7. <a id="aggregations"></a> [logz.io - Daniel Berman - A Basic Guide To Elasticsearch Aggregations](https://logz.io/blog/elasticsearch-aggregations)

8. <a id="highlighting"></a> [Elastic Official Documentation - Highlighting](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/highlighting)

9. <a id="autocomplete"></a> [Opster - Amit Khandelwal - Elasticsearch Autocomplete Search](https://opster.com/guides/elasticsearch/how-tos/elasticsearch-auto-complete-guide/)

10. <a id="JoinQueries"></a> [Elastic Official Documentation - Joining Queries](https://www.elastic.co/docs/reference/query-languages/query-dsl/joining-queries)

11. <a id="ressourcesLimit"></a> [Reddit - What are the limits of elastic search?](https://www.reddit.com/r/elasticsearch/comments/6xm8wv/what_are_the_limits_of_elastic_search/)

12. <a id="ESalternative1"></a> [sematext - 11 Alternatives to Elasticsearch, OpenSearch, and Solr](https://sematext.com/blog/elasticsearch-opensearch-solr-alternatives/)

13. <a id="ESalternative2"></a> [BIGDATA - Elasticsearch Alternatives - The Ultimate Guide](https://bigdataboutique.com/blog/elasticsearch-alternatives-the-ultimate-guide-59ad00)

14. <a id="ESvsSQL1"></a> [Medium - Elasticsearch vs. Traditional Databases: Diving into Elastic search's Strengths](https://medium.com/@rajeevprasanna/elasticsearch-vs-traditional-databases-diving-into-elastic-searchs-strengths-c6f55b9b449f)

15. <a id="ESvsSQL2"></a> [knowi - Elasticsearch vs. MySQL: What to Choose?](https://www.knowi.com/blog/elasticsearch-vs-mysql-what-to-choose/)

16. <a id="TableGen"></a> [Table Generator](https://www.tablesgenerator.com/markdown_tables)

# Process and Work Distribution
