Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searchable documentation? #8209

Closed
ivirshup opened this issue Sep 4, 2018 · 30 comments · Fixed by #9330

Comments

@ivirshup
Copy link

@ivirshup ivirshup commented Sep 4, 2018

There used to be a search bar in the documentation, which now appears to have been removed. For me, this makes the documentation site a lot less helpful.

screen shot 2018-09-04 at 1 26 55 pm

screen shot 2018-09-04 at 1 27 06 pm

Is this something that's coming back?

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Sep 4, 2018

Unfortunately, Google shut down "Google Site Search" which powered the old search box (see #5911 for their email) I'd love to have a useful (ad-free) search box. But I don't know how to make that happen. Getting it back will require someone with relevant expertise to decide to step in and help lead the effort.

Bing is a possible alternative:

https://blogs.bing.com/2017-05/build-your-own-web-search-service-with-bing-custom-search
https://azure.microsoft.com/en-us/pricing/details/cognitive-services/bing-custom-search/

We would have to find a way to fund a budget for using it though NumFocus, unless there are any free offerings for OSS projects.

Alternatively, the docs are actually hosted on a server. If there is some indexing engine that we can run ourselves, that might be an option. Again, anyone with expertise would be extremely appreciated to come help make this happen.

@mattpap

This comment has been minimized.

Copy link
Contributor

@mattpap mattpap commented Sep 4, 2018

Given that bokeh's docs are indexed by google search nevertheless, as a temporary measure, one can use queries like site:bokeh.pydata.org gmapplot to limit search only to bokeh's website. Also, is there a reason why we don't just use sphinx's (client side) search functionality?

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Sep 4, 2018

Also, is there a reason why we don't just use sphinx's (client side) search functionality?

It's not very good, and requires dumping the entire index into a huge JS blob on every page, IIRC. I'm not sure it would work well with the way we version our docs, either. But maybe things are better, it could be worth a try, too.

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Sep 5, 2018

Out of curiosity, how are the datashader, pyviz, and holoviews sites doing their search?

@jbednar

This comment has been minimized.

Copy link
Contributor

@jbednar jbednar commented Sep 5, 2018

We copied the site configuration for all those sites from Bokeh originally, so alas, we have had the same problem Bokeh has; the search now just returns bogus full-web search results. :-(

We'd love to have something set up that's more useful, even if it's just a specially crafted URL for google to focus the search to our site.

@bryevdv bryevdv modified the milestone: short-term Sep 11, 2018
@hklarner

This comment has been minimized.

Copy link

@hklarner hklarner commented Feb 12, 2019

A message explaining the disappearance or a link to this issue in place of the search box would prevent annoyed bewilderment. Its just very uncommon to offer a documentation that is not searchable.

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Feb 12, 2019

Would it be possible to just have the search route you to something like: https://www.google.com/search?q=site:bokeh.pydata.org+inurl:{current_version}+{query}?

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Feb 12, 2019

I didn't have any luck:

screen shot 2019-02-11 at 23 36 20

I also tried without the {}, and with double quotes around the text. Google does not seem to like the . characters. At one point it also said I was sending suspicious traffic and made me pass a CAPTCHA. So, I can't say that's a promising start. But it gets to my earlier point:

Getting it back will require someone with relevant expertise to decide to step in and help lead the effort.

OSS Is a collaboration. It has to be. There is more work to be done to support hundreds of thousands of users than ~2 people (at present) can possibly ever do alone. if someone with the right experience (or time) can prove this out, get something further along and more demonstrably useful, I'd love to try working it into the docs.

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Feb 12, 2019

Oh, that was kind of a rough f-string style template. A little more formally (using python conventions) it'd be:

f"https://www.google.com/search?q=site:bokeh.pydata.org+inurl:{version_in_url}+{query.replace(' ', '+')}"

For example, searching for "network plot" on the latest docs would send you to https://www.google.com/search?q=site:bokeh.pydata.org+inurl:latest+network+plot

Of course, I'm really glad you're doing the work maintaining this library. I'd love to help fix this, but I don't think I've got the relevant experience to fix it.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Feb 12, 2019

@ivirshup Ah, until we ditch Python 2 later this year, f-strings are not usable by us so I have not really used them either.

Unfortunately, while that works for "latest" (which was also true of what I tried earlier) it does not work for "1.0.4" again google really seems to hate the dots

screen shot 2019-02-12 at 13 26 28

I also tried with quotes around the version

https://www.google.com/search?q=site:bokeh.pydata.org+inurl:"1.0.4"+network+plot

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Feb 13, 2019

Ah yeah, I had only checked if it worked for inurl:latest. Anyways, I'd assumed this would get implemented in javascript instead of python, since it only need information from the current page.

So I guess like:

function search(query) {
    ver = window._BOKEH_CURRENT_VERSION
    url = `https://www.google.com/search?q=site:bokeh.pydata.org+${ver}+${query.replace(/ /g, "+")}`
    window.open(url, "_self")
}

So search("network plots interactive") would take you to: https://www.google.com/search?q=site:bokeh.pydata.org+1.0.4+network+plots+interactive

But maybe with some URI cleaning?

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Feb 13, 2019

@ivirshup That's promising! I will see if I am able to incorporate something before the next release

EDIT: well actually, spoke too soon, changing to 1.0.2 and still everything goes to latest:

screen shot 2019-02-12 at 18 13 31

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Feb 13, 2019

At this point I think I will give the sphinx search plugin another shot. It's been several years, perhaps things have improved somewhat.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Feb 13, 2019

Another option would be to try and publish to RTD, since they have their own search indexing AFAIK, but that seems like a huge undertaking. Our docs build is very complicated and seems to not fit neatly into their expectations.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Feb 19, 2019

So, to give an update. I am going to work towards migrating our docs to RTD. Even possibly engaging with them for commercial consulting on the project. I can't speculate how long this might take but my hope is sometime in the next few months. It's a big task, if anyone wants to pitch in, help is welcome.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Apr 13, 2019

Further update: we have submitted a NumFocus small project grant proposal to fund paying RTD to implement a migration. failing that, there are some other funds we can allocate. Best hope for completion is mid 2019

@blaise-sumo

This comment has been minimized.

Copy link

@blaise-sumo blaise-sumo commented May 21, 2019

@bryevdv The RTD folks are great, and IIRC they have a special deal for open source projects on their commercial site readthedocs.com

Thanks for doing so much for the docs.

Further update: we have submitted a NumFocus small project grant proposal to fund paying RTD to implement a migration. failing that, there are some other funds we can allocate. Best hope for completion is mid 2019

@bryevdv bryevdv added this to the 1.4 milestone Jul 16, 2019
@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Aug 17, 2019

Unfortunately despite everyone's best intentions, the RTD integration did not succeed, for a variety of reasons. Our intent is now to migrate the docs off the pydata server to a static site behind CloudFront, and integrate a FTS service such as algolia at their free or cheap tier. I already have an s3 bucket with the historical docs modified to work on a static site. This will still probably take a few additional months at the level of time I have outside my day job to work on this.

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Aug 18, 2019

Thanks for the update!

I noticed there was a search tab on the docs site over the past week or so that seems to be gone now. Any chance that could stick around until a more permanent solution is found?

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Aug 18, 2019

It was never actually active on the real deployed site (the link tab did not actually function), only on the test deploys to RTD (which were not public). In any case, I have had to revert all that work. That search is 100% depedent on RTD builds which we are not using.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Sep 30, 2019

OK I was feeling pretty despondent about this, all the fully managed SaaS search offerings are way too expense (starting at 50 USD/mo and going up from there). Client search (e.g. lunr.js, etc) would be slow and/or require an enormous JS index file load. Running your own FTS is both super complicated, and still fairly more expensive than we can afford (might raise our current AWS spend by 50% or more)

There is a free service called Doc Search provided by Algolia. The default (fairly unconfigurable) result seemed really subpar to me, only showinfg up to 5 results in an instant search box. But I finally stumbled across this fiddle (created by one of the Algolia devs) that shows how to use their index in a more sophisticated way to get a much larger paginated set of results:

https://jsfiddle.net/maxiloc/oemnhuv4/

I am waiting to get the project enrolled in DocSearch to test this out, but fingers crossed...

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Oct 1, 2019

Very happy news, got the necessary information back and plugged it in and it seems to work just fine. Here is the same fiddle above tailored to Bokeh's new index:

https://jsfiddle.net/xk1hd4qg/

I will proceed on this basis!

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Oct 8, 2019

I'm afraid I have to tamp down expectations here once again. The default configured "public docsearch" is very poor. I had thought I could improve it as described above but the algolia staff seems to be frustrated by my asks. They told me to "run the scraper myself first" to sort out the config, but apparently just doing that and nothing else was enough to generate "64000 operations" (whatever those are) which is enough overbudget that the account was suspended. If they merge the latest PR then perhaps things will work out. If not, this was probably another dead end.

In that case, I guess realistically the only option (and the cheapest workable one I know of) woudl be to run our own FTS and server but I'm not sure confident have the budget for it. A DO box powerful enough might push 20 USD/mo

Frankly this is frustrating beyond measure...

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Oct 9, 2019

@bryevdv, just another possibility I stumbled on, google's current custom site search seems to work alright. Here's an example pointed at the latest docs. There are free plans (where it runs in the user's browser) and looks like results can be customized – which would be nice for giving reference docs higher priority.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Oct 9, 2019

@ivirshup Thank you for the suggestion, but unless something has changed that I don't know about, GCSE requires admitting ads, which I am not willing to do. (But if I am mistaken please correct me!)

EDIT: I guess they have an ad-free non-profit tier. I am not sure if we could get in under NumFOCUS status or not, since it is a sponsored project there. The Bokeh project itself is not a 501c3 (it's not any legal entity at all, for that matter). I can look in to it though.

where it runs in the user's browser

What exactly does this mean? That a search requires a huge JS bundle ala lunr or the sphinx static search? That doesn't seem to be the case, but I really don't know how else to interpret it.

which would be nice for giving reference docs higher priority.

Points to there not being one solution that works for everyone, I suppose. I personally would rather have narrative docs show up first.

I also would really like a solution that affords faceting by release version

@ivirshup

This comment has been minimized.

Copy link
Author

@ivirshup ivirshup commented Oct 9, 2019

I was thinking since the docs are hosted under pydata, maybe that could count for the non-profit tier.

runs in the user's browser ... What exactly does this mean?

This is a little unclear to me as well. There's some mention of it being javascript based, but the page isn't too huge. I think they might build an index on their servers and ship that to the client?

I personally would rather have narrative docs show up first.

Definitely agree with this, but it would be nice to get the reference docs for HoverTool in the first page if I search for hover. I think pages with the search term in a section heading could use a boost in the rankings.

I also would really like a solution that affords faceting by release version

This looks possible. The example I sent is limited to https://bokeh.pydata.org/en/latest/.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Oct 19, 2019

@mattpap @bokeh/core FYI NumFOCUS is working on registering the appropriate non-profit status with Google, at which point we should be able to create a CSE for docs.bokeh.org with ads disabled. It may take another week or so to complete. I'd propose to stall 1.4 release until week after next (if necessary) to try and get a search into the 1.4 docs.

@mattpap

This comment has been minimized.

Copy link
Contributor

@mattpap mattpap commented Oct 20, 2019

I'd propose to stall 1.4 release until week after next (if necessary) (...)

That's fine. Though it would be good to have the release done before November 2nd.

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Oct 20, 2019

That's fine. Though it would be good to have the release done before November 2nd.

Agree 100%, if the search is not available by then I would intend to just release 1.4 anyway

@bryevdv

This comment has been minimized.

Copy link
Member

@bryevdv bryevdv commented Oct 28, 2019

For now got the client-side sphinx search working again. Will switch to ad-free GCSE under NF status, if we are able.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.