build(docs-infra): improve search quality #25750

petebacondarwin · 2018-08-30T16:38:38Z

mary-poppins · 2018-08-30T16:46:37Z

You can preview f2fa79e at https://pr25750-f2fa79e.ngbuilds.io/.

IgorMinar · 2018-08-30T20:53:29Z

This seems to be better. @jenniferfell can you test your queries with this version please?

gkalpak · 2018-09-05T09:21:19Z

aio/tools/transforms/angular-base-package/processors/generateKeywords.js

+          token = token
+              .trim()
+              .replace(/^[_\-"'`({[<$*]+/, '')
+              .replace(/[_\-"'`)}\]>.]+$/, '');


This will not strip away matching brackets (e.g. foo(), bar[]). Is that intentional?
Is there any downside in including both opening and closing symbols in both .replace() RegExps?

Hmm, good spot. This means that we might end up with things like foo( in the index...

In the query lexers (on the browser) we now split the terms by whitespace only. So the following query, https://pr25750-f2fa79e.ngbuilds.io/?search=downgradeModule(), results in no terms matching. Given that we will not keep such punctuation in the index, we should modify the lexer appropriately.

petebacondarwin · 2018-09-05T10:42:20Z

We considered just zipping up and downloading the entire text content of the docs, then building the index directly from the original content. This is problematic in a number of ways:

the rendered content hides structural information such as headings, titles and most importantly members of classes
the parsing of the rendered content does not result in improvements to the searching issues highlighted
the size of the content (even zipped) is bigger than the current download
the building of the index is considerably longer (from ~3secs to up to 10secs!).

petebacondarwin · 2018-09-05T10:43:43Z

aio/src/app/search/search-worker.js

+        // E.g. if the search is "ngCont guide" then we search for "ngCont guide titleWords:ngCont*"
+        var titleQuery = 'titleWords:*' + query.split(' ', 1)[0] + '*';
+        results = index.search(query + ' ' + titleQuery);
+      }


Basically we now attempt a straight search with the given query; and only fall back on a more relaxed title based search if the original query returns nothing. This results in fewer false positives.

petebacondarwin · 2018-09-05T10:44:42Z

aio/src/app/search/search-worker.js

@@ -27,12 +27,13 @@ self.onmessage = handleMessage;
 // Create the lunr index - the docs should be an array of objects, each object containing
 // the path and search terms for a page
 function createIndex(addFn) {
+  lunr.QueryLexer.termSeparator = lunr.tokenizer.separator = /\s+/;


The built-in lexer was splitting on some punctuation, which are actually significant when searching code. This was causing false negatives. Now we just split on whitespace.

petebacondarwin · 2018-09-05T10:45:53Z

aio/tools/transforms/angular.io-package/index.js

@@ -37,7 +37,7 @@ module.exports = new Package('angular.io', [gitPackage, apiPackage, contentPacka
    checkAnchorLinksProcessor.$runBefore = ['convertToJsonProcessor'];
    checkAnchorLinksProcessor.$runAfter = ['fixInternalDocumentLinks'];
    // We only want to check docs that are going to be output as JSON docs.
-    checkAnchorLinksProcessor.checkDoc = (doc) => doc.path && doc.outputPath && extname(doc.outputPath) === '.json';
+    checkAnchorLinksProcessor.checkDoc = (doc) => doc.path && doc.outputPath && extname(doc.outputPath) === '.json' && doc.docType !== 'json-doc';


If we don't ignore json-doc files, the anchor checker get confused by the JSON escaped URLs in the rendered content of the search term file.

petebacondarwin · 2018-09-05T10:47:09Z

@jenniferfell has suggested that supporting quoted "phrase searching", to get fewer false positives, but lunr does not support this out of the box, and effort required to implement it on top of lunr, the benefit seemed unworthy.

petebacondarwin · 2018-09-05T10:47:41Z

Still to be done (in another PR):

lower the ranking of deprecated API docs

mary-poppins · 2018-09-05T15:09:21Z

You can preview 66784da at https://pr25750-66784da.ngbuilds.io/.

mary-poppins · 2018-09-14T13:19:45Z

You can preview d64323c at https://pr25750-d64323c.ngbuilds.io/.

mary-poppins · 2018-09-20T08:07:00Z

You can preview 5aaa179 at https://pr25750-5aaa179.ngbuilds.io/.

mary-poppins · 2018-09-20T08:46:02Z

You can preview 4bfae61 at https://pr25750-4bfae61.ngbuilds.io/.

jenniferfell · 2018-09-20T13:35:09Z

Site search for http:

API results are much better. Putting all the deprecated results last is a big improvement.
API results with strikethrough for deprecated APIs LGTM. (fwiw...for others reading this who might not know...Google Dev Doc Style Guide doesn't include this convention, nor do Android dev docs, but I do see it in xcode and other 3rd party docs. It's an emerging convention, and it's much cleaner than the word "deprecated" in the results list.)
Hmmm....Guide results are odd. HTTPClient is 3rd in the results. But at least still in the top 3. Not sure if anything can be done on this without breaking something else. We might need to look at why and then maybe tweak content of guides slightly? Is this because matches in titles aren't taking precedence any more?

Site search for node.js

Celebration! :-) Perfect!

API index page:

Strikethrough of deprecated LGTM
When package type selected, disabled status field LGTM

Badges:

Having "deprecated" back again is great, but it would be nice to use a more alert-like color than gray. Can we switch and have ngmodules be gray and deprecated be the red that ngmodules is right now? Or maybe an orange, because security risk is (beautifully and perfectly) red?

API package and type pages for deprecated APIs:

LGTM. Pages for types such as HttpModule page LGTM. I like the deprecated badge and the deprecated bold in-line title at the end of the description.
Package overview pages. Having strikethrough and Deprecated in bold for things that are deprecated is great. It's a bit heavy when the entire package is deprecated, but I prefer keeping that for consistency...and I really like having it in packages like Common where only a few things are deprecated.

No show stoppers. MUCH IMPROVED!!

petebacondarwin · 2018-09-20T13:42:23Z

Great. I will fix the broken Travis test and swap the colours. Then mark it for merge.

mary-poppins · 2018-09-20T18:41:21Z

You can preview c316ecd at https://pr25750-c316ecd.ngbuilds.io/.

petebacondarwin · 2018-09-20T18:47:47Z

Regarding colour changes. This is what I came up with:

mary-poppins · 2018-09-20T18:54:56Z

You can preview e638808 at https://pr25750-e638808.ngbuilds.io/.

JoostK · 2018-09-20T19:01:41Z

There's something weird going on with the search results, it seems.

I was exploring the various modules to see their badge color and started searching for ReactiveFormsModule. If you arrive at typing React (I'm very sorry 😆) then all of a sudden NgElementStrategy is the only result, whereas one character less, or more, will resurface ReactiveFormsModule. The current deployment on angular.io does not show the same behaviour.

Regardless of this minor thing, this PR brings a huge improvement to the search results overall!

petebacondarwin · 2018-09-21T08:45:41Z

@JoostK thanks for testing this. I can explain why this is happening but I am not sure we can do much better. Here is what happens...

reac
- The stemmer does not convert reac into any matching keyword
- So no results from a normal query
- trigger the special title query, which is much more relaxed and matches any chars in the title
- ReactiveFormsModule API page and Reactive Forms guide page match
react
- The stemmer does convert this to match keywords such as react and reacts, but not reactive it seems.
- NgElementStrategy contains the word react in its text, so is included in the results
- Component Interaction contains the word reacts in its text, so is included in the results
- ReactiveFormsModule only contains the word reactive and not react or reacts etc, so is not included in the results
reacti
- once again the stemmer fails to match keywords and so we are back to the special title query again, with the same results as for reac.

We don't have any control over the stemmer, AFAIK, so there is not so much we can do.

mary-poppins · 2018-09-21T09:07:47Z

You can preview 8ca28bb at https://pr25750-8ca28bb.ngbuilds.io/.

PR Close #25750

…5750) PR Close #25750

PR Close #25750

…5750) PR Close #25750

PR Close angular#25750

…gular#25750) PR Close angular#25750

angular-automatic-lock-bot · 2019-09-14T01:52:51Z

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

petebacondarwin added action: review The PR is still awaiting reviews from at least one requested reviewer effort1: hours severity1: confusing target: patch This PR is targeted for the next patch release labels Aug 30, 2018

petebacondarwin added this to the Backlog milestone Aug 30, 2018

googlebot added the cla: yes label Aug 30, 2018

ngbot bot modified the milestones: Backlog, needsTriage Aug 30, 2018

petebacondarwin added this to REVIEW in docs-infra Aug 31, 2018

gkalpak approved these changes Sep 5, 2018

View reviewed changes

petebacondarwin commented Sep 5, 2018

View reviewed changes

jenniferfell self-requested a review September 13, 2018 20:27

petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels Sep 15, 2018

petebacondarwin moved this from REVIEW to MERGE in docs-infra Sep 15, 2018

petebacondarwin modified the milestones: needsTriage, Backlog Sep 17, 2018

ngbot bot modified the milestones: Backlog, needsTriage Sep 17, 2018

petebacondarwin added the merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note label Sep 17, 2018

petebacondarwin force-pushed the aio-search-improvments branch from 82d5d0c to 5aaa179 Compare September 20, 2018 07:58

petebacondarwin force-pushed the aio-search-improvments branch from 5aaa179 to 4bfae61 Compare September 20, 2018 08:36

jenniferfell approved these changes Sep 20, 2018

View reviewed changes

petebacondarwin force-pushed the aio-search-improvments branch from 4bfae61 to c316ecd Compare September 20, 2018 18:33

petebacondarwin force-pushed the aio-search-improvments branch from c316ecd to e638808 Compare September 20, 2018 18:47

build(docs-infra): expose deprecated status on items more clearly

8ca28bb

petebacondarwin force-pushed the aio-search-improvments branch from e638808 to 8ca28bb Compare September 21, 2018 09:00

petebacondarwin added the action: merge The PR is ready for merge by the caretaker label Sep 21, 2018

kara pushed a commit that referenced this pull request Sep 21, 2018

build(docs-infra): improve search quality (#25750)

ec96332

PR Close #25750

kara pushed a commit that referenced this pull request Sep 21, 2018

fix(docs-infra): render security risk labels (#25750)

4f6bef5

PR Close #25750

kara pushed a commit that referenced this pull request Sep 21, 2018

build(docs-infra): expose deprecated status on items more clearly (#2…

6945223

…5750) PR Close #25750

kara closed this in 96f9f03 Sep 21, 2018

kara pushed a commit that referenced this pull request Sep 21, 2018

fix(docs-infra): render security risk labels (#25750)

cea2e04

PR Close #25750

kara pushed a commit that referenced this pull request Sep 21, 2018

build(docs-infra): expose deprecated status on items more clearly (#2…

026b60c

…5750) PR Close #25750

petebacondarwin deleted the aio-search-improvments branch September 21, 2018 18:41

FrederikSchlemmer pushed a commit to FrederikSchlemmer/angular that referenced this pull request Jan 3, 2019

build(docs-infra): improve search quality (angular#25750)

8b617c6

PR Close angular#25750

FrederikSchlemmer pushed a commit to FrederikSchlemmer/angular that referenced this pull request Jan 3, 2019

fix(docs-infra): render security risk labels (angular#25750)

6fae71a

PR Close angular#25750

FrederikSchlemmer pushed a commit to FrederikSchlemmer/angular that referenced this pull request Jan 3, 2019

build(docs-infra): expose deprecated status on items more clearly (an…

3cbcdff

…gular#25750) PR Close angular#25750

angular-automatic-lock-bot bot locked and limited conversation to collaborators Sep 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(docs-infra): improve search quality #25750

build(docs-infra): improve search quality #25750

petebacondarwin commented Aug 30, 2018 •

edited

mary-poppins commented Aug 30, 2018

IgorMinar commented Aug 30, 2018

gkalpak Sep 5, 2018

petebacondarwin Sep 5, 2018

petebacondarwin Sep 5, 2018

petebacondarwin Sep 5, 2018

petebacondarwin commented Sep 5, 2018

petebacondarwin Sep 5, 2018

petebacondarwin Sep 5, 2018

petebacondarwin Sep 5, 2018

petebacondarwin commented Sep 5, 2018

petebacondarwin commented Sep 5, 2018

mary-poppins commented Sep 5, 2018

mary-poppins commented Sep 14, 2018

mary-poppins commented Sep 20, 2018

mary-poppins commented Sep 20, 2018

jenniferfell commented Sep 20, 2018

petebacondarwin commented Sep 20, 2018

mary-poppins commented Sep 20, 2018

petebacondarwin commented Sep 20, 2018

mary-poppins commented Sep 20, 2018

JoostK commented Sep 20, 2018

petebacondarwin commented Sep 21, 2018 •

edited

mary-poppins commented Sep 21, 2018

angular-automatic-lock-bot bot commented Sep 14, 2019

build(docs-infra): improve search quality #25750

build(docs-infra): improve search quality #25750

Conversation

petebacondarwin commented Aug 30, 2018 • edited

mary-poppins commented Aug 30, 2018

IgorMinar commented Aug 30, 2018

gkalpak Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin commented Sep 5, 2018

petebacondarwin Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin Sep 5, 2018

Choose a reason for hiding this comment

petebacondarwin commented Sep 5, 2018

petebacondarwin commented Sep 5, 2018

mary-poppins commented Sep 5, 2018

mary-poppins commented Sep 14, 2018

mary-poppins commented Sep 20, 2018

mary-poppins commented Sep 20, 2018

jenniferfell commented Sep 20, 2018

petebacondarwin commented Sep 20, 2018

mary-poppins commented Sep 20, 2018

petebacondarwin commented Sep 20, 2018

mary-poppins commented Sep 20, 2018

JoostK commented Sep 20, 2018

petebacondarwin commented Sep 21, 2018 • edited

mary-poppins commented Sep 21, 2018

angular-automatic-lock-bot bot commented Sep 14, 2019

petebacondarwin commented Aug 30, 2018 •

edited

petebacondarwin commented Sep 21, 2018 •

edited