Plugin needlessly indexes syntax highlighting in code snippets #58

saralilyb · 2018-03-26T14:24:44Z

I want to report a bug:

When I use jekyll algolia to index my blog, it chokes on one particular post, which has a lot of syntax highlighting. https://jtth.net/notes/choosing-ruby-or-clojure/. It doesn't continue, or move on, or anything, just dies.

What is the current behavior?

I get the error One of your records weights 37.36 Kb and has been rejected.

What is your expected behavior?

Indexing the post without ten thousand <span> tags for syntax highlighting non-linguistically queryable data. Or some way, at least, to exclude the post, or something.

Git repository to reproduce the issue:

Log file here: https://gist.github.com/jtth/0be2db8ab1e3780990cbf5a283cc235c

Ruby version used:

ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin17]

Jekyll version used:

jekyll 3.7.3

The text was updated successfully, but these errors were encountered:

pixelastic · 2018-03-27T10:04:07Z

Thanks for the report and especially for the log file, it makes debugging the issue much easier.

What is happening is that the plugin is assigning an excerpt to each record. It does that automatically by calling the .excerpt method directly from Jekyll. For whatever reason, your excerpt here is including the whole syntax highlight of your first code snippet.

I received a bunch of excerpt-related issues lately, I think I'll have to rewrite this part and not use the Jekyll default method.

In the meantime, you can work around the issue by adding a custom hook that will clear the excerpt. Something like this should work:

module Jekyll
  module Algolia
    module Hooks
      def self.before_indexing_each(record, node)
        record[:excerpt] = nil
        record[:excerpt_html] = nil

        record
      end
    end
  end
end

Let me know how it goes. I'll fix it in one of the next releases.

DirtyF · 2018-03-27T10:51:18Z

Could you test it on jekyll:master?
I think jekyll/jekyll#6724 tries to address this

saralilyb · 2018-03-27T15:53:53Z

@pixelastic Using that custom hook gave a different error, but I'm still somehow over the size limit.

I solved the problem, at least for myself, by just dropping  in that particular file before all the code, but I don't usually use breaks. I guess I will use them more.

@DirtyF As much as I'm thankful for your product and the free tier, I don't wanna build jekyll from source on my local machine and my server.

pixelastic · 2018-03-27T16:49:16Z

@jtth Would you have a repository where I could reproduce the issue? I would need access to the original markdown file, before the syntax highlighting transform it into <span>s.

@DirtyF I don't think this is related. The issue you mention make sure that Liquid tags are not cut in the middle by the excerpt and get correctly transformed. Here the issue that that Liquid highlight tag is actually correctly transformed, but a few lines of code become hundred of lines of html highlight soup.

Even if the latest Jekyll would fix it, I think I would need to have my own way of extracting the excerpt from the plugin, in order to maintain backward compatibility with older Jekyll versions. I might simply take the first nodes_to_index of the page/posts (the first <p> in most cases, which is also Jekyll's default)

saralilyb · 2018-03-27T16:53:34Z

@pixelastic Here's the raw markdown. https://gist.github.com/jtth/92e9d1f9b5ae4e060cc341f4ae55cf79.

pixelastic · 2018-03-27T16:54:37Z

@jtth I had a typo in my previous example. You should set record[:excerpt_text] to nil instead of record[:excerpt] and that should work better.

I've also pushed a potential fix on the excerpt branch of the plugin. I'd like to test it on your website before releasing a new version. Thanks for the markdown file! Would you mind sharing your _config.yml as well? I assume you changed the default excerpt_separator value?

saralilyb · 2018-03-27T17:12:27Z

@pixelastic My config is here: https://gist.github.com/jtth/87b1c0296a441701c2d69dde2cb74bcf.

My separator is . I didn't even realize that a newline was default; this is an old theme I've customized quite a bit. That makes a few other bugs make sense, like how my list of blog posts (not on index) submitted a record of the first parts of all the posts themselves rather than end after the first paragraph. I just added separators to prevent this behavior. Maybe I'll change it back to a newline later, but now I have full text search through Algolia rather than partial search of titles or tags. (Which I'm inferring is the default behavior with a \n\n separator, right?)

Changing record[:excerpt] to record[:excerpt_text] made it work without the separator.

pixelastic · 2018-03-28T14:23:21Z

Maybe I'll change it back to a newline later, but now I have full text search through Algolia rather than partial search of titles or tags. (Which I'm inferring is the default behavior with a \n\n separator, right?)

The default behavior is to have full text search, not only titles and tags, on the content of every page :)

I exclude a few pages that I know for sure you wouldn't want to index (like the pagination pages, or the 404). In your case, you have a preview of all your posts in /notes/, and the plugin will index that, which you don't want (otherwise if a query matches the content of the first paragraph of any post, it will suggest /notes). I see that you added notes.md to the list of files to exclude, which is exactly how you should solve this problem :)

saralilyb · 2018-03-28T14:56:02Z

Interesting. Then why would adding  to the one problematic post fix things?

Y'all have provided a lot of good answers already in the documentation; I just didn't read it all before diving in.

pixelastic · 2018-03-28T17:07:44Z

Interesting. Then why would adding to the one problematic post fix things?

I don't really know TBH, I would need access to your repo to dig deeper. But as the issue seems to be fixed I won't bother :)

I'm closing this one, but feel free to repost or open a new one if you're still having issues.

Cheers,

pixelastic added this to TODO in Jekyll via automation Mar 27, 2018

pixelastic moved this from TODO to In progress in Jekyll Mar 27, 2018

pixelastic closed this as completed Mar 28, 2018

Jekyll automation moved this from In progress to Done Mar 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin needlessly indexes syntax highlighting in code snippets #58

Plugin needlessly indexes syntax highlighting in code snippets #58

saralilyb commented Mar 26, 2018

pixelastic commented Mar 27, 2018 •

edited

DirtyF commented Mar 27, 2018

saralilyb commented Mar 27, 2018

pixelastic commented Mar 27, 2018

saralilyb commented Mar 27, 2018

pixelastic commented Mar 27, 2018

saralilyb commented Mar 27, 2018

pixelastic commented Mar 28, 2018

saralilyb commented Mar 28, 2018

pixelastic commented Mar 28, 2018

Plugin needlessly indexes syntax highlighting in code snippets #58

Plugin needlessly indexes syntax highlighting in code snippets #58

Comments

saralilyb commented Mar 26, 2018

I want to report a bug:

What is the current behavior?

What is your expected behavior?

Git repository to reproduce the issue:

Ruby version used:

Jekyll version used:

pixelastic commented Mar 27, 2018 • edited

DirtyF commented Mar 27, 2018

saralilyb commented Mar 27, 2018

pixelastic commented Mar 27, 2018

saralilyb commented Mar 27, 2018

pixelastic commented Mar 27, 2018

saralilyb commented Mar 27, 2018

pixelastic commented Mar 28, 2018

saralilyb commented Mar 28, 2018

pixelastic commented Mar 28, 2018

pixelastic commented Mar 27, 2018 •

edited