Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(docs-infra): improve search quality #25750

Closed

Conversation

petebacondarwin
Copy link
Member

@petebacondarwin petebacondarwin commented Aug 30, 2018

Fixes #24380
Fixes #25721

@petebacondarwin petebacondarwin added action: review The PR is still awaiting reviews from at least one requested reviewer effort1: hours severity1: confusing target: patch This PR is targeted for the next patch release labels Aug 30, 2018
@petebacondarwin petebacondarwin added this to the Backlog milestone Aug 30, 2018
@ngbot ngbot bot modified the milestones: Backlog, needsTriage Aug 30, 2018
@mary-poppins
Copy link

You can preview f2fa79e at https://pr25750-f2fa79e.ngbuilds.io/.

@IgorMinar
Copy link
Contributor

This seems to be better. @jenniferfell can you test your queries with this version please?

@petebacondarwin petebacondarwin added this to REVIEW in docs-infra Aug 31, 2018
token = token
.trim()
.replace(/^[_\-"'`({[<$*]+/, '')
.replace(/[_\-"'`)}\]>.]+$/, '');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not strip away matching brackets (e.g. foo(), bar[]). Is that intentional?
Is there any downside in including both opening and closing symbols in both .replace() RegExps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good spot. This means that we might end up with things like foo( in the index...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the query lexers (on the browser) we now split the terms by whitespace only. So the following query, https://pr25750-f2fa79e.ngbuilds.io/?search=downgradeModule(), results in no terms matching. Given that we will not keep such punctuation in the index, we should modify the lexer appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@petebacondarwin
Copy link
Member Author

We considered just zipping up and downloading the entire text content of the docs, then building the index directly from the original content. This is problematic in a number of ways:

  • the rendered content hides structural information such as headings, titles and most importantly members of classes
  • the parsing of the rendered content does not result in improvements to the searching issues highlighted
  • the size of the content (even zipped) is bigger than the current download
  • the building of the index is considerably longer (from ~3secs to up to 10secs!).

// E.g. if the search is "ngCont guide" then we search for "ngCont guide titleWords:ngCont*"
var titleQuery = 'titleWords:*' + query.split(' ', 1)[0] + '*';
results = index.search(query + ' ' + titleQuery);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we now attempt a straight search with the given query; and only fall back on a more relaxed title based search if the original query returns nothing. This results in fewer false positives.

@@ -27,12 +27,13 @@ self.onmessage = handleMessage;
// Create the lunr index - the docs should be an array of objects, each object containing
// the path and search terms for a page
function createIndex(addFn) {
lunr.QueryLexer.termSeparator = lunr.tokenizer.separator = /\s+/;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The built-in lexer was splitting on some punctuation, which are actually significant when searching code. This was causing false negatives. Now we just split on whitespace.

@@ -37,7 +37,7 @@ module.exports = new Package('angular.io', [gitPackage, apiPackage, contentPacka
checkAnchorLinksProcessor.$runBefore = ['convertToJsonProcessor'];
checkAnchorLinksProcessor.$runAfter = ['fixInternalDocumentLinks'];
// We only want to check docs that are going to be output as JSON docs.
checkAnchorLinksProcessor.checkDoc = (doc) => doc.path && doc.outputPath && extname(doc.outputPath) === '.json';
checkAnchorLinksProcessor.checkDoc = (doc) => doc.path && doc.outputPath && extname(doc.outputPath) === '.json' && doc.docType !== 'json-doc';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't ignore json-doc files, the anchor checker get confused by the JSON escaped URLs in the rendered content of the search term file.

@petebacondarwin
Copy link
Member Author

@jenniferfell has suggested that supporting quoted "phrase searching", to get fewer false positives, but lunr does not support this out of the box, and effort required to implement it on top of lunr, the benefit seemed unworthy.

@petebacondarwin
Copy link
Member Author

Still to be done (in another PR):

  • lower the ranking of deprecated API docs

@mary-poppins
Copy link

You can preview 66784da at https://pr25750-66784da.ngbuilds.io/.

@mary-poppins
Copy link

You can preview d64323c at https://pr25750-d64323c.ngbuilds.io/.

@petebacondarwin petebacondarwin added action: merge The PR is ready for merge by the caretaker and removed action: review The PR is still awaiting reviews from at least one requested reviewer labels Sep 15, 2018
@petebacondarwin petebacondarwin moved this from REVIEW to MERGE in docs-infra Sep 15, 2018
@petebacondarwin petebacondarwin modified the milestones: needsTriage, Backlog Sep 17, 2018
@ngbot ngbot bot modified the milestones: Backlog, needsTriage Sep 17, 2018
@petebacondarwin petebacondarwin added the merge: caretaker note Alert the caretaker performing the merge to check the PR for an out of normal action needed or note label Sep 17, 2018
@mary-poppins
Copy link

You can preview 5aaa179 at https://pr25750-5aaa179.ngbuilds.io/.

@mary-poppins
Copy link

You can preview 4bfae61 at https://pr25750-4bfae61.ngbuilds.io/.

@jenniferfell
Copy link
Contributor

Site search for http:

  • API results are much better. Putting all the deprecated results last is a big improvement.

  • API results with strikethrough for deprecated APIs LGTM. (fwiw...for others reading this who might not know...Google Dev Doc Style Guide doesn't include this convention, nor do Android dev docs, but I do see it in xcode and other 3rd party docs. It's an emerging convention, and it's much cleaner than the word "deprecated" in the results list.)

  • Hmmm....Guide results are odd. HTTPClient is 3rd in the results. But at least still in the top 3. Not sure if anything can be done on this without breaking something else. We might need to look at why and then maybe tweak content of guides slightly? Is this because matches in titles aren't taking precedence any more?

Site search for node.js

  • Celebration! :-) Perfect!

API index page:

  • Strikethrough of deprecated LGTM
  • When package type selected, disabled status field LGTM

Badges:

  • Having "deprecated" back again is great, but it would be nice to use a more alert-like color than gray. Can we switch and have ngmodules be gray and deprecated be the red that ngmodules is right now? Or maybe an orange, because security risk is (beautifully and perfectly) red?

API package and type pages for deprecated APIs:

  • LGTM. Pages for types such as HttpModule page LGTM. I like the deprecated badge and the deprecated bold in-line title at the end of the description.
  • Package overview pages. Having strikethrough and Deprecated in bold for things that are deprecated is great. It's a bit heavy when the entire package is deprecated, but I prefer keeping that for consistency...and I really like having it in packages like Common where only a few things are deprecated.

No show stoppers. MUCH IMPROVED!!

@petebacondarwin
Copy link
Member Author

Great. I will fix the broken Travis test and swap the colours. Then mark it for merge.

@mary-poppins
Copy link

You can preview c316ecd at https://pr25750-c316ecd.ngbuilds.io/.

@petebacondarwin
Copy link
Member Author

Regarding colour changes. This is what I came up with:

screen shot 2018-09-20 at 19 47 13

@mary-poppins
Copy link

You can preview e638808 at https://pr25750-e638808.ngbuilds.io/.

@JoostK
Copy link
Member

JoostK commented Sep 20, 2018

There's something weird going on with the search results, it seems.

I was exploring the various modules to see their badge color and started searching for ReactiveFormsModule. If you arrive at typing React (I'm very sorry 😆) then all of a sudden NgElementStrategy is the only result, whereas one character less, or more, will resurface ReactiveFormsModule. The current deployment on angular.io does not show the same behaviour.

Regardless of this minor thing, this PR brings a huge improvement to the search results overall!

@petebacondarwin
Copy link
Member Author

petebacondarwin commented Sep 21, 2018

@JoostK thanks for testing this. I can explain why this is happening but I am not sure we can do much better. Here is what happens...

  • reac
    • The stemmer does not convert reac into any matching keyword
    • So no results from a normal query
    • trigger the special title query, which is much more relaxed and matches any chars in the title
    • ReactiveFormsModule API page and Reactive Forms guide page match
  • react
    • The stemmer does convert this to match keywords such as react and reacts, but not reactive it seems.
    • NgElementStrategy contains the word react in its text, so is included in the results
    • Component Interaction contains the word reacts in its text, so is included in the results
    • ReactiveFormsModule only contains the word reactive and not react or reacts etc, so is not included in the results
  • reacti
    • once again the stemmer fails to match keywords and so we are back to the special title query again, with the same results as for reac.

We don't have any control over the stemmer, AFAIK, so there is not so much we can do.

@mary-poppins
Copy link

You can preview 8ca28bb at https://pr25750-8ca28bb.ngbuilds.io/.

@petebacondarwin petebacondarwin added the action: merge The PR is ready for merge by the caretaker label Sep 21, 2018
kara pushed a commit that referenced this pull request Sep 21, 2018
kara pushed a commit that referenced this pull request Sep 21, 2018
@kara kara closed this in 96f9f03 Sep 21, 2018
kara pushed a commit that referenced this pull request Sep 21, 2018
@petebacondarwin petebacondarwin deleted the aio-search-improvments branch September 21, 2018 18:41
FrederikSchlemmer pushed a commit to FrederikSchlemmer/angular that referenced this pull request Jan 3, 2019
FrederikSchlemmer pushed a commit to FrederikSchlemmer/angular that referenced this pull request Jan 3, 2019
FrederikSchlemmer pushed a commit to FrederikSchlemmer/angular that referenced this pull request Jan 3, 2019
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Sep 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker cla: yes effort1: hours target: patch This PR is targeted for the next patch release
Projects
Development

Successfully merging this pull request may close these issues.

Indicate deprecated APIs in package overview pages provide more cues for deprecated APIs
7 participants