New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented SearchIndex module based on lunr.js #118

Merged
merged 9 commits into from Nov 9, 2015

Conversation

Projects
None yet
2 participants
@reliak
Member

reliak commented Nov 8, 2015

To add items to the search index, the meta data SearchIndexItem must be set like:

Meta("SearchIndexItem", new SearchIndexItem((string)@doc["Url"],
(string)@doc["Title"], @doc.Content))

Values for document url, title and content are obligatory.

For JS, lunr.min.js is a prerequisite, which comes with the example. jQuery is not a dependency, but it is used for convinience in the example project.
The SearchIndex module generates JavaScript code, that is loadable via lunr.js.
All code is wrapped in a searchModule, which can be used via searchModule.search(query) to search the index. The function returns an array of the following type:

{ url, title, description }

This information can be used to display the results in a pleasent way and to generate an url that links to the respective document.
The example for the module is based on the GeneratedBlog example.
It uses jQuery to dynamically show search results while typing.

I added the possibility to add a stopword list to reduce the size of the generated index. The stoplist from lunr.js was disabled. Maintaining the stopwordlist via Wyam ist more flexible, because users can easily specify stopwordlists for other languages.
By default, the lunr.js stemmer is disabled, because I think that it results in too many false positives, but it can be enabled in the constructor.

@daveaglick

This comment has been minimized.

Show comment
Hide comment
@daveaglick

daveaglick Nov 9, 2015

Member

Looking at this now - don't worry about rebasing the PR, I've resolved the conflicts on my local branch.

Member

daveaglick commented Nov 9, 2015

Looking at this now - don't worry about rebasing the PR, I've resolved the conflicts on my local branch.

@daveaglick daveaglick merged commit 8d9e6d4 into Wyamio:develop Nov 9, 2015

@daveaglick

This comment has been minimized.

Show comment
Hide comment
@daveaglick

daveaglick Nov 9, 2015

Member

This was awesome, really excellent work. The example really sells it too, so thanks for including that. I'm going to push this in the next release - as far as I know, we've now got the only static site generator that can automatically generate client-side searching support.

I've merged the PR because I want to start using it right away. However, I do have a couple comments and observations that may warrant discussion.

  • I'm still thinking about this module and the Sitemap module pulling in all documents in the build vs. using input documents. I know we added a fluent method to Sitemap to use the input documents instead, but I'm leaning towards flipping the default and making the behavior of using all documents opt-in in these two modules. That would better match the other existing modules - in fact, these two are the only place that IExecutionContext.Documents is called outside of the Documents module. Perhaps I'll even create a global extension, .WithAllDocuments() that would turn on this behavior for any module. That would make them all consistent and would just require one extra fluent method to make any module use all documents from the build instead of the input documents as these do now. Thoughts?
  • I've also been thinking about your use of a special configuration object that gets placed in the metadata of documents to be processed that communicates settings to the module for each of these documents. I think it's great - I hadn't ever considered this more document-centric approach to configuring modules before and can see a lot of benefits. That said, there are cases where we may want or need to configure via a more functional paradigm (for example, to limit memory consumption by the metadata items when using large numbers of documents). I'll add an extra fluent method and/or constructor to SearchIndex that takes a delegate which can be used for creating the SearchIndexItem on the fly instead of storing in metadata (and the same for Sitemap). That won't impact the current usage at all, but will just enable an additional way of configuring it if the user wants to.
  • It's interesting that SearchIndex concats the output with the input documents. I'm not strongly opposed to that behavior, but it is different than other modules and might be unexpected. If SearchIndex were to only output the new document it generates, the same concat behavior could be achieved by simply wrapping the SearchIndex module with a Concat module. Same goes for Sitemap - after a closer look, that module is writing its output directly to disk. It should probably output the sitemap as a new document result so that downstream modules can do what they need to with it (in which case the filename that gets passed in would be used to set the result document's RelativeFilePath, etc. metadata). The only module that should really be writing to disk is WriteFiles - that supports the broadest range of use cases since we may eventually enable scenarios where a build can be directly uploaded to FTP, etc. without going to the disk first.

Altogether, fantastic work. I'm really excited to have you on board as a team member.

Member

daveaglick commented Nov 9, 2015

This was awesome, really excellent work. The example really sells it too, so thanks for including that. I'm going to push this in the next release - as far as I know, we've now got the only static site generator that can automatically generate client-side searching support.

I've merged the PR because I want to start using it right away. However, I do have a couple comments and observations that may warrant discussion.

  • I'm still thinking about this module and the Sitemap module pulling in all documents in the build vs. using input documents. I know we added a fluent method to Sitemap to use the input documents instead, but I'm leaning towards flipping the default and making the behavior of using all documents opt-in in these two modules. That would better match the other existing modules - in fact, these two are the only place that IExecutionContext.Documents is called outside of the Documents module. Perhaps I'll even create a global extension, .WithAllDocuments() that would turn on this behavior for any module. That would make them all consistent and would just require one extra fluent method to make any module use all documents from the build instead of the input documents as these do now. Thoughts?
  • I've also been thinking about your use of a special configuration object that gets placed in the metadata of documents to be processed that communicates settings to the module for each of these documents. I think it's great - I hadn't ever considered this more document-centric approach to configuring modules before and can see a lot of benefits. That said, there are cases where we may want or need to configure via a more functional paradigm (for example, to limit memory consumption by the metadata items when using large numbers of documents). I'll add an extra fluent method and/or constructor to SearchIndex that takes a delegate which can be used for creating the SearchIndexItem on the fly instead of storing in metadata (and the same for Sitemap). That won't impact the current usage at all, but will just enable an additional way of configuring it if the user wants to.
  • It's interesting that SearchIndex concats the output with the input documents. I'm not strongly opposed to that behavior, but it is different than other modules and might be unexpected. If SearchIndex were to only output the new document it generates, the same concat behavior could be achieved by simply wrapping the SearchIndex module with a Concat module. Same goes for Sitemap - after a closer look, that module is writing its output directly to disk. It should probably output the sitemap as a new document result so that downstream modules can do what they need to with it (in which case the filename that gets passed in would be used to set the result document's RelativeFilePath, etc. metadata). The only module that should really be writing to disk is WriteFiles - that supports the broadest range of use cases since we may eventually enable scenarios where a build can be directly uploaded to FTP, etc. without going to the disk first.

Altogether, fantastic work. I'm really excited to have you on board as a team member.

@reliak

This comment has been minimized.

Show comment
Hide comment
@reliak

reliak Nov 9, 2015

Member

Thanks for your kind words, I'm glad you like it.

Regarding your points:

  • I think you are right, we should keep it more compliant to the rest of Wyam.
    In Fact, I began to modify the Sitemap module, so that it doesn't write the content by itself,
    but sends it back to the stream, so a following WriteFiles module can do the job.
  • I'm totally with you. If you find a way to make it more functional, thus reduce memory consumption, go for it.
  • You're totally right. That's almost the same that I wanted to write, after better understanding the workflow and internals of Wyam :-)
    As pointed out in point 1 (and in your comment) I will rework the sitemap and also the searchindex accordingly.

It would be nice if you keep me up to date how the search index works for your documentation.
You might need to extend the stopwordlist to reduce the size of the search index.
I'm especially interested how big the index file gets in your case. Some more work might be required, to reduce the size of the index.

I'm really excited to have you on board as a team member.

Glad to be on board :)

Member

reliak commented Nov 9, 2015

Thanks for your kind words, I'm glad you like it.

Regarding your points:

  • I think you are right, we should keep it more compliant to the rest of Wyam.
    In Fact, I began to modify the Sitemap module, so that it doesn't write the content by itself,
    but sends it back to the stream, so a following WriteFiles module can do the job.
  • I'm totally with you. If you find a way to make it more functional, thus reduce memory consumption, go for it.
  • You're totally right. That's almost the same that I wanted to write, after better understanding the workflow and internals of Wyam :-)
    As pointed out in point 1 (and in your comment) I will rework the sitemap and also the searchindex accordingly.

It would be nice if you keep me up to date how the search index works for your documentation.
You might need to extend the stopwordlist to reduce the size of the search index.
I'm especially interested how big the index file gets in your case. Some more work might be required, to reduce the size of the index.

I'm really excited to have you on board as a team member.

Glad to be on board :)

This was referenced Nov 9, 2015

daveaglick added a commit that referenced this pull request Nov 9, 2015

SearchIndex now uses a delegate for getting the SearchIndexItem and r…
…eads from the input documents instead of getting all documents, re #118
@daveaglick

This comment has been minimized.

Show comment
Hide comment
@daveaglick

daveaglick Nov 9, 2015

Member

I was really anxious to see how this worked, so I went ahead and made the changes discussed above so that I could integrate into the new Wyam site. It worked perfectly! Here's a screen shot:

2015-11-09_16h30_26

I've limited the search index to the names of classes and interfaces to make sure it was small enough, but that's really all I need right now. At some point I may investigate how to get the JS search index file to load asynchronously in the background so that we can load larger index files without noticeable lag, but this works great for now.

For context, here's the relevant segments of the config file:

Pipelines.Add("Code",
    //ReadSolution(@"..\Code\Wyam\Wyam.sln")  // Read from the master Wyam branch in the Git submodule
    ReadSolution(@"..\..\Wyam\Wyam.sln")  // Read from the current branch on disk in the actual repo
        .WhereProject(x => !x.EndsWith(".Tests"))
);

Pipelines.Add("API",
    Documents("Code"),
    AnalyzeCSharp()
        .WhereNamespaces(false)
        .WhereNamespaces(x => !x.StartsWith("Wyam.Modules.Razor.Microsoft"))
        .WherePublic()
        .WithCssClasses("pre", "prettyprint")
        .WithWritePathPrefix("api"),
    Razor()
        .WithViewStart(@"api\_ApiViewStart.cshtml"),
    ConcatBranch(
        Where(@doc.String("Kind") == "NamedType"),
        SearchIndex(new SearchIndexItem(
            "/" + @doc.String("WritePath").Replace("/index.html", string.Empty), 
            @doc.String("DisplayName"), 
            @doc.String("DisplayName"))),
        Meta("WritePath", @"Scripts\searchIndex.js")
    ),
    WriteFiles()
);

That loads the code source files in the "Code" module and then uses the AnalyzeCSharp module to output a document-per-symbol with some specific settings, runs each symbol document through a Razor template, and then concats the search index. You can see I'm using the new delegate support to create SearchIndexItem on the fly for each input document (notice the magic @doc syntax that translates to a lambda).

Great work!

Member

daveaglick commented Nov 9, 2015

I was really anxious to see how this worked, so I went ahead and made the changes discussed above so that I could integrate into the new Wyam site. It worked perfectly! Here's a screen shot:

2015-11-09_16h30_26

I've limited the search index to the names of classes and interfaces to make sure it was small enough, but that's really all I need right now. At some point I may investigate how to get the JS search index file to load asynchronously in the background so that we can load larger index files without noticeable lag, but this works great for now.

For context, here's the relevant segments of the config file:

Pipelines.Add("Code",
    //ReadSolution(@"..\Code\Wyam\Wyam.sln")  // Read from the master Wyam branch in the Git submodule
    ReadSolution(@"..\..\Wyam\Wyam.sln")  // Read from the current branch on disk in the actual repo
        .WhereProject(x => !x.EndsWith(".Tests"))
);

Pipelines.Add("API",
    Documents("Code"),
    AnalyzeCSharp()
        .WhereNamespaces(false)
        .WhereNamespaces(x => !x.StartsWith("Wyam.Modules.Razor.Microsoft"))
        .WherePublic()
        .WithCssClasses("pre", "prettyprint")
        .WithWritePathPrefix("api"),
    Razor()
        .WithViewStart(@"api\_ApiViewStart.cshtml"),
    ConcatBranch(
        Where(@doc.String("Kind") == "NamedType"),
        SearchIndex(new SearchIndexItem(
            "/" + @doc.String("WritePath").Replace("/index.html", string.Empty), 
            @doc.String("DisplayName"), 
            @doc.String("DisplayName"))),
        Meta("WritePath", @"Scripts\searchIndex.js")
    ),
    WriteFiles()
);

That loads the code source files in the "Code" module and then uses the AnalyzeCSharp module to output a document-per-symbol with some specific settings, runs each symbol document through a Razor template, and then concats the search index. You can see I'm using the new delegate support to create SearchIndexItem on the fly for each input document (notice the magic @doc syntax that translates to a lambda).

Great work!

@reliak

This comment has been minimized.

Show comment
Hide comment
@reliak

reliak Nov 9, 2015

Member

Cool, that really looks good! The flexibility of Wyam really shines here.

Member

reliak commented Nov 9, 2015

Cool, that really looks good! The flexibility of Wyam really shines here.

daveaglick added a commit that referenced this pull request Nov 22, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment