Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Analysis] Deprecate Standard Html Strip Analyzer in master #26719

Merged
merged 5 commits into from
Jan 9, 2019

Conversation

johtani
Copy link
Contributor

@johtani johtani commented Sep 20, 2017

Deprecate standard_html_strip analyzer in 6.x

  • Add deprecation log if using the analyzer
  • Cannot create index with the analyzer after 6.1.0

I will make PR for removing the analyzer on master

Closes #4704

@johtani johtani added :Search/Analysis How text is split into tokens v6.1.0 labels Sep 20, 2017
@jasontedor
Copy link
Member

Cannot create index with the analyzer after 6.1.0

This concerns me as a breaking change that we should not make in a minor release? I think we can only deprecate in 6.x but not break until 7.0.0.

@johtani
Copy link
Contributor Author

johtani commented Sep 20, 2017

@jasontedor You are right. I will change it for onlt deprecation logging.
Can we remove the analyzer in 7.0?

@jasontedor
Copy link
Member

Can we remove the analyzer in 7.0?

Is it needed in 7.x to support indices created in 6.x with the analyzer? If this is correct, I think we have to wait until 8.0.0 to remove?

@johtani johtani force-pushed the remove_standard_html_analyzer branch 2 times, most recently from 7a49f86 to 1364d3f Compare October 7, 2017 12:19
@lcawl lcawl added v6.2.0 and removed v6.1.0 labels Dec 12, 2017
@colings86 colings86 added v6.3.0 and removed v6.2.0 labels Jan 22, 2018
@romseygeek
Copy link
Contributor

cc @elastic/es-search-aggs

@colings86
Copy link
Contributor

@johtani @romseygeek Is this still something that needs to be done? If so what should we do to get this merged?

@romseygeek
Copy link
Contributor

We should still do this, I think. @johtani do you have time to take this up?

@johtani
Copy link
Contributor Author

johtani commented Oct 25, 2018

@colings86 @romseygeek Oh, sorry. Yes, I have. I will update this week.

@johtani johtani force-pushed the remove_standard_html_analyzer branch from 1364d3f to af9ccc5 Compare October 26, 2018 05:44
@johtani
Copy link
Contributor Author

johtani commented Oct 28, 2018

@romseygeek I've updated this. Could you review this?

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johtani. I left some comments.

@@ -1,8 +1,5 @@
[[breaking-changes-6.1]]
== Breaking changes in 6.1
++++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is left over from a previous version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch...

if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_6_5_0)) {
DEPRECATION_LOGGER.deprecatedAndMaybeLog("standard_html_strip_deprecation",
"Deprecated analyzer [standard_html_strip] used, " +
"replaced by using [html_strip] char_filter");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/replaced/replace/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably also be explicit that you need to replace it with a custom analyzer using standard tokenizer and html_strip char_filter, plus lowercase filter and any other filters you have been using.

/**
* Check that the deprecated analyzer name "standard_html_strip" issues a deprecation warning for indices created before 6.4.0
*/
public void testStandardHtmlStripAnalyzerNoDeprecationPre6_5() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem to do what the comment suggests it should do? I think it can be repurposed to check that an exception is thrown if you try to create an analyzer of type standard_html_strip with an index created version greater than or equal to 7.0, and the test above can just check that the warning is emitted on any version before 7.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I forgot add "does NOT" after copy & paste...

I'm not sure your comment about 7.0. Is your comment about PR for master branch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't noticed that this was against 6.x. Do you have a separate PR against master? I think it makes most sense to do this against master and then backport?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I made this PR for logging deprecation and I thought we removeed this analyzer in 7 at that time.

I agree to do against master first. I will change this PR to against 7. sorry

} else if ("standard_html_strip".equals(analyzer)) {
DEPRECATION_LOGGER.deprecatedAndMaybeLog("standard_html_strip_deprecation",
"Deprecated analyzer [standard_html_strip] used, " +
"replaced by using [html_strip] char_filter");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/replaced/replace/

==== Deprecated standard_html_strip analyzer

Deprecated standard_html_strip analyser in 6.5.
This will be not create in 7.0 and be removed in 8.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

The standard_html_strip analyzer has been deprecated, and should be replaced with a combination of the standard tokenizer and html_strip char_filter. Indexes created using this analyzer will still be readable in elasticsearch 7.0, but it will not be possible to create new indexes using it.

@johtani johtani force-pushed the remove_standard_html_analyzer branch from d6bf840 to 613c6de Compare November 1, 2018 17:10
@johtani johtani changed the base branch from 6.x to master November 1, 2018 17:10
@johtani johtani changed the title [Analysis] Deprecate Standard Html Strip Analyzer in 6.x [Analysis] Deprecate Standard Html Strip Analyzer in master Nov 2, 2018
@johtani johtani force-pushed the remove_standard_html_analyzer branch 2 times, most recently from fe0ed0c to 76a42c5 Compare November 2, 2018 06:01
@johtani
Copy link
Contributor Author

johtani commented Nov 2, 2018

@romseygeek Change this to against master branch. Please review again if you have time :)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple more comments.

"use a custom analyzer using [standard] tokenizer and [html_strip] char_filter, plus [lowercase] filter");
} catch (Exception e) {
fail("expected IAE");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use expectThrows here?

@johtani
Copy link
Contributor Author

johtani commented Nov 5, 2018

@romseygeek Thanks for you comments. Fix these. Your 2nd comment means that we should issue deprecation log for any index created with 6.x, right?

I have a question about test.

How do we test 6.6.1 <= versions < 7.0.0_alpha1 with VersionUtils.randomVersionBetween(...) ?

@romseygeek
Copy link
Contributor

How do we test 6.6.1 <= versions < 7.0.0_alpha1 with VersionUtils.randomVersionBetween(...) ?

You can use VersionUtils.getPreviousVersion(Version.V_7_0_0_alpha1) to get the upper bound, I think?

LGTM otherwise. Thanks!

@johtani
Copy link
Contributor Author

johtani commented Nov 8, 2018

retest this please

@johtani johtani force-pushed the remove_standard_html_analyzer branch from 2175c32 to b2bbf27 Compare November 14, 2018 06:48
@pcsanwald
Copy link
Contributor

@elasticmachine test this please

@tomcallahan tomcallahan added v7.0.0 and removed v6.4.1 labels Jan 3, 2019
Deprecate only Standard Html Strip Analyzer
If user create index with the analyzer since 7.0, es throws an exception.
If an index was created before 7.0, es issue deprecation log
We will remove it in 8.0

Related elastic#4704
Use expectThrows
Change versions for deprecation log

Related elastic#4704
Change to use getPreviousVersion

Related elastic#4704
@johtani johtani force-pushed the remove_standard_html_analyzer branch 2 times, most recently from 1531b72 to af6c151 Compare January 8, 2019 06:20
Change version to 7_0_0 and fix deprecationlogger error

Related elastic#4704
@johtani johtani force-pushed the remove_standard_html_analyzer branch from af6c151 to 51bdace Compare January 8, 2019 07:41
@johtani johtani merged commit 38b698d into elastic:master Jan 9, 2019
johtani added a commit to johtani/elasticsearch that referenced this pull request Jan 11, 2019
johtani added a commit to johtani/elasticsearch that referenced this pull request Jan 11, 2019
johtani added a commit to johtani/elasticsearch that referenced this pull request Jan 17, 2019
johtani added a commit to johtani/elasticsearch that referenced this pull request Jan 18, 2019
johtani added a commit that referenced this pull request Jan 22, 2019
Backport #26719 to 6.x

Related #4704

(cherry picked from commit 38b698d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants