Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin needlessly indexes syntax highlighting in code snippets #58

Closed
saralilyb opened this issue Mar 26, 2018 · 10 comments
Closed

Plugin needlessly indexes syntax highlighting in code snippets #58

saralilyb opened this issue Mar 26, 2018 · 10 comments
Projects

Comments

@saralilyb
Copy link

I want to report a bug:

When I use jekyll algolia to index my blog, it chokes on one particular post, which has a lot of syntax highlighting. https://jtth.net/notes/choosing-ruby-or-clojure/. It doesn't continue, or move on, or anything, just dies.

What is the current behavior?

I get the error One of your records weights 37.36 Kb and has been rejected.

What is your expected behavior?

Indexing the post without ten thousand <span> tags for syntax highlighting non-linguistically queryable data. Or some way, at least, to exclude the post, or something.

Git repository to reproduce the issue:

Log file here: https://gist.github.com/jtth/0be2db8ab1e3780990cbf5a283cc235c

Ruby version used:

ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin17]

Jekyll version used:

jekyll 3.7.3

@pixelastic
Copy link
Collaborator

pixelastic commented Mar 27, 2018

Thanks for the report and especially for the log file, it makes debugging the issue much easier.

What is happening is that the plugin is assigning an excerpt to each record. It does that automatically by calling the .excerpt method directly from Jekyll. For whatever reason, your excerpt here is including the whole syntax highlight of your first code snippet.

I received a bunch of excerpt-related issues lately, I think I'll have to rewrite this part and not use the Jekyll default method.

In the meantime, you can work around the issue by adding a custom hook that will clear the excerpt. Something like this should work:

module Jekyll
  module Algolia
    module Hooks
      def self.before_indexing_each(record, node)
        record[:excerpt] = nil
        record[:excerpt_html] = nil

        record
      end
    end
  end
end

Let me know how it goes. I'll fix it in one of the next releases.

@pixelastic pixelastic added this to TODO in Jekyll via automation Mar 27, 2018
@pixelastic pixelastic moved this from TODO to In progress in Jekyll Mar 27, 2018
@DirtyF
Copy link
Contributor

DirtyF commented Mar 27, 2018

Could you test it on jekyll:master?
I think jekyll/jekyll#6724 tries to address this

@saralilyb
Copy link
Author

@pixelastic Using that custom hook gave a different error, but I'm still somehow over the size limit.

I solved the problem, at least for myself, by just dropping <!--more--> in that particular file before all the code, but I don't usually use breaks. I guess I will use them more.

@DirtyF As much as I'm thankful for your product and the free tier, I don't wanna build jekyll from source on my local machine and my server.

@pixelastic
Copy link
Collaborator

@jtth Would you have a repository where I could reproduce the issue? I would need access to the original markdown file, before the syntax highlighting transform it into <span>s.

@DirtyF I don't think this is related. The issue you mention make sure that Liquid tags are not cut in the middle by the excerpt and get correctly transformed. Here the issue that that Liquid highlight tag is actually correctly transformed, but a few lines of code become hundred of lines of html highlight soup.

Even if the latest Jekyll would fix it, I think I would need to have my own way of extracting the excerpt from the plugin, in order to maintain backward compatibility with older Jekyll versions. I might simply take the first nodes_to_index of the page/posts (the first <p> in most cases, which is also Jekyll's default)

@saralilyb
Copy link
Author

@pixelastic
Copy link
Collaborator

@jtth I had a typo in my previous example. You should set record[:excerpt_text] to nil instead of record[:excerpt] and that should work better.

I've also pushed a potential fix on the excerpt branch of the plugin. I'd like to test it on your website before releasing a new version. Thanks for the markdown file! Would you mind sharing your _config.yml as well? I assume you changed the default excerpt_separator value?

@saralilyb
Copy link
Author

@pixelastic My config is here: https://gist.github.com/jtth/87b1c0296a441701c2d69dde2cb74bcf.

My separator is <!--more-->. I didn't even realize that a newline was default; this is an old theme I've customized quite a bit. That makes a few other bugs make sense, like how my list of blog posts (not on index) submitted a record of the first parts of all the posts themselves rather than end after the first paragraph. I just added separators to prevent this behavior. Maybe I'll change it back to a newline later, but now I have full text search through Algolia rather than partial search of titles or tags. (Which I'm inferring is the default behavior with a \n\n separator, right?)

Changing record[:excerpt] to record[:excerpt_text] made it work without the separator.

@pixelastic
Copy link
Collaborator

Maybe I'll change it back to a newline later, but now I have full text search through Algolia rather than partial search of titles or tags. (Which I'm inferring is the default behavior with a \n\n separator, right?)

The default behavior is to have full text search, not only titles and tags, on the content of every page :)

I exclude a few pages that I know for sure you wouldn't want to index (like the pagination pages, or the 404). In your case, you have a preview of all your posts in /notes/, and the plugin will index that, which you don't want (otherwise if a query matches the content of the first paragraph of any post, it will suggest /notes). I see that you added notes.md to the list of files to exclude, which is exactly how you should solve this problem :)

@saralilyb
Copy link
Author

Interesting. Then why would adding <!--more--> to the one problematic post fix things?

Y'all have provided a lot of good answers already in the documentation; I just didn't read it all before diving in.

@pixelastic
Copy link
Collaborator

Interesting. Then why would adding to the one problematic post fix things?

I don't really know TBH, I would need access to your repo to dig deeper. But as the issue seems to be fixed I won't bother :)

I'm closing this one, but feel free to repost or open a new one if you're still having issues.

Cheers,

Jekyll automation moved this from In progress to Done Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Jekyll
  
Done
Development

No branches or pull requests

3 participants