Skip to content

Commit

Permalink
[DOCS] Add concepts section to analysis topic (elastic#50801)
Browse files Browse the repository at this point in the history
This helps the topic better match the structure of
our machine learning docs, e.g.
https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html

This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts'
child page, but I plan to add other concepts, such as 'Index time vs.
search time', with later PRs.
  • Loading branch information
jrodewig authored and SivagurunathanV committed Jan 21, 2020
1 parent 94c781a commit 073a113
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 14 deletions.
2 changes: 1 addition & 1 deletion docs/reference/analysis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ looking for:

include::analysis/overview.asciidoc[]

include::analysis/anatomy.asciidoc[]
include::analysis/concepts.asciidoc[]

include::analysis/testing.asciidoc[]

Expand Down
18 changes: 5 additions & 13 deletions docs/reference/analysis/anatomy.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[[analyzer-anatomy]]
== Anatomy of an analyzer
=== Anatomy of an analyzer

An _analyzer_ -- whether built-in or custom -- is just a package which
contains three lower-level building blocks: _character filters_,
Expand All @@ -10,8 +10,7 @@ blocks into analyzers suitable for different languages and types of text.
Elasticsearch also exposes the individual building blocks so that they can be
combined to define new <<analysis-custom-analyzer,`custom`>> analyzers.

[float]
=== Character filters
==== Character filters

A _character filter_ receives the original text as a stream of characters and
can transform the stream by adding, removing, or changing characters. For
Expand All @@ -22,8 +21,7 @@ elements like `<b>` from the stream.
An analyzer may have *zero or more* <<analysis-charfilters,character filters>>,
which are applied in order.

[float]
=== Tokenizer
==== Tokenizer

A _tokenizer_ receives a stream of characters, breaks it up into individual
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
Expand All @@ -37,9 +35,7 @@ the term represents.

An analyzer must have *exactly one* <<analysis-tokenizers,tokenizer>>.


[float]
=== Token filters
==== Token filters

A _token filter_ receives the token stream and may add, remove, or change
tokens. For example, a <<analysis-lowercase-tokenfilter,`lowercase`>> token
Expand All @@ -53,8 +49,4 @@ Token filters are not allowed to change the position or character offsets of
each token.

An analyzer may have *zero or more* <<analysis-tokenfilters,token filters>>,
which are applied in order.




which are applied in order.
11 changes: 11 additions & 0 deletions docs/reference/analysis/concepts.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[[analysis-concepts]]
== Text analysis concepts
++++
<titleabbrev>Concepts</titleabbrev>
++++

This section explains the fundamental concepts of text analysis in {es}.

* <<analyzer-anatomy>>

include::anatomy.asciidoc[]

0 comments on commit 073a113

Please sign in to comment.