Skip to content

Latest commit

 

History

History
55 lines (28 loc) · 4.69 KB

101.md

File metadata and controls

55 lines (28 loc) · 4.69 KB

Elastic 101

Abstract

If you've not worked with Elasticsearch before, you may find the experience frustrating when search doesn't work like you expect.

This document gives you the high-level overview you may not have had, and shares some strategies and tactics to use make sense of Elasticsearch.

It is not intended to be an Elasticsearch tutorial, rather, a page outlining various important principles and gotchas that may have slipped your attention whilst attemtpting to get to grips with making queries and updating data.

TL;DR

Elastic doesn't directly store or search for data using the search terms you supply; it both stores and searches using a representation of the data and search terms you supply.

As such the query syntax you naturally think might work, such as match does not perform 1:1 comparisons against the stored documents (like SQL does) but operates instead on the stored representations.

If you attempt to perform a term match (which in Elastic terms is an exact match) you will often get zero results, because "exact term" !== "stored representation".

The rest of this document will expand on these concepts and get you up to speed.

A rough primer on how Elasticsearch works

In SQL, to search for a known phrase you might query the database using the exact term or a LIKE query. The database would compare, character by character, the search and column value, and if the characters match, the row is returned.

Because Elasticsearch works with "natural language" it doesn't match 1:1 (out of the box) and instead uses a variety of strategies to store "representations" of the document, primarily using "tokens" and "analysers" to compare search input against stored data.

As an example, the phrase the foxes jumped over the wooden gate might be tokenised and stored (along with the original document) as fox ,jump, wood, gate .

When a user then searches for jumping foxes Elastic tokenises this phrase into jump, fox and then those tokens are used to perform the search against the the tokens of any previously-stored documents, returning multiple matching (or closely matching!) documents along with a "score" on how good each match is.

Note that full text searches are tuned to work with whole tokens (this is why searches are so fast) so using the match query to search for fo would return no results, but matching against fox or using the match_phrase_prefix would. This is also why using term queries often fail; the hit needs to be the whole term and in the right case, not just a partial match within the text, such as foxes.

The following documents properly expand on this intro:

What to know about adding data

Elasticsearch is designed to be flexible and easy to get started; you PUT some data into an index, and BOOM! It's ready to be queried; no annoying schemas or setup. Not only that, but you can add new fields as you think of them, and Elastic is happy to add them on the fly (unlike SQL, for example).

To the new or casual users, what you probably don't know is that Elastic is applying a schema – or mapping – in the background, and by being casual about how the data is interpreted by Elastic, you've opted-in to full-text search (i.e. tokenisation and analysis) for all fields.

What this means in practice is that if you want to just use Elastic as a simple key:value store – and you haven't specifically supplied document field mappings – subsequent searches will play by the tokenisation rules discussed earlier. This can make basic searching counter-intuitive as things like partial words will miss, and "exact" phrases won't seem to match no matter how you juggle Elastic's myriad query clauses.

The solution to this is to have a schema in mind, just as SQL, and map your fields when you get started. If you've alresdy started and want to change existing mapping, you'll need to create a new index with the correct mapping and reindex your data into that index.

The following documents shed more light on this matter:

What to know about making queries

Now you know a little about how Elasticsearch works, let's cover a few gotchas about queries.