RFC: lazy-load items #3

denisdefreyne · 2016-01-03T13:37:12Z

The implementation of this RFC depends on not having a preprocessor, or having a preprocessor whose effects can be analysed. Work on such a preprocessor is pending (see #7).

denisdefreyne · 2016-01-03T13:37:47Z

Question: does it make sense to lazy-load layouts? I don’t think it does, given that a site will have a handful of layouts at most.

denisdefreyne · 2016-01-03T13:45:54Z

CC @RubenVerborgh — this is the RFC you are looking for (still quite WIP though). The idea you brought up is described in the “alternatives” section.

RubenVerborgh · 2016-01-03T16:06:28Z

Would it make sense to make compilation speed part of the motivation? It makes a major difference for my use case (BibTeX datasource). Not tested yet with other datasources.

the content and attributes for each item needs to be loaded at some point anyway, in order for the checksum to be calculated.

The content needs to be loaded indeed, but not the attributes since nanoc/nanoc#793.

denisdefreyne · 2016-01-03T16:07:39Z

I’d welcome a PR to make attribute loading lazy! My idea would be to allow attributes to be a lambda that evaluates to the attributes.

RubenVerborgh · 2016-01-03T16:10:37Z

That would indeed be cool. Would it make sense to do the same with content in the same pass?

denisdefreyne · 2016-01-03T16:17:53Z

Hmm, not sure. The content will be loaded anyway (for the checksum) so lazy-loading it likely won’t make a difference. I’d say “no” and go for attributes only (to eliminate the YAML parsing overhead).

RubenVerborgh · 2016-01-03T16:25:58Z

Okay, I'll proceed with attributes.

I do a have use case for lazy content though: the content for BibTeX datasource items is generated after parsing. I don't want to keep on nagging about my own project of course 😉, but it's just a reminder that not all datasources are filesystem.

For the sake of discussion, here is a brief sketch of how the BibTeX datasource works:
– input: a folder with .bib files, each of which contains multiple entries (100+ entries are not an exception)
– as checksum data, the file contents are used (no parsing needed)
– to determine the item identifiers, the file has to be split
– to determine the contents (= .bib entry with special markup removed), the entry has to be parsed
– to determine the attributes, the parsed fields have to be unescaped

The two last steps take 33% of my compilation time.

denisdefreyne · 2016-01-03T16:30:15Z

In that case, it makes sense for content to be (optionally) lazy too. 👍 for lazifying both content and attributes.

connorshea · 2016-11-30T07:23:59Z

Is there any progress on this? I'm currently working on adding "versions" to GitLab's documentation website and it's causing some issues with compile time (e.g. going from 4 unique sets of documentation to 8 is causing the compile time to grow from 4 minutes to 13), I suspect in part due to the problems described here.

Running nanoc compile has a single Ruby process using 2.4GB of RAM and it takes a few minutes before any pages actually start being created.

tmp/compiled_content is 93.7MB. Site compiled in 789.85s.

connorshea · 2016-11-30T07:39:21Z

Hm, upon further testing it seems that much of the time is taken because Nanoc is comparing the older version of the site to see what it should recompile.

For our repo:
master (no public/, no items in tmp/): 61.70s
add-versions (public/, items in tmp/): 789.85s
add-versions (no public/, no items in tmp/): 382.10s

This doesn't really explain why it's taking an unholy amount of time (add-versions is 15 minutes vs. 4 minutes for master) on CI though since nothing is cached there, perhaps because it's hitting a RAM limit? I'll also need to check how much of that is going into pulling down repos.

denisdefreyne · 2016-11-30T07:41:34Z

@connorshea Are you on the latest version of Nanoc? There have been some performance issues recently, which have all been fixed. (Ironically, a recent optimisation made things far worse in terms of compilation speed.)

connorshea · 2016-11-30T07:44:59Z

@ddfreyne was on 4.4.2 for master, 4.4.0 for add-versions, I'll test again with 4.4.2 on add-versions.

One thing I just realized for the test wherein it takes 789.85s is that the public and tmp folders had content from master, which has a fairly dissimilar structure from add-versions (each directory under content has subdirectories for each respective version), so the compiler was likely confused by the complete different structure of the content when I switched over. Hence the discrepancy in the compile times.

denisdefreyne · 2016-11-30T07:45:08Z

As for the state of the RFC: it’s work in progress and blocked by having a redesigned preprocessor. The preprocessor as it stands now is a bottleneck to future optimisations in terms of CPU and RAM usage, as it’s a black box where anything can happen, and its effects cannot be analysed. This means that unless the preprocessor is replaced with something smarter, Nanoc will have to keep loading all items into memory.

connorshea · 2016-11-30T07:49:52Z

383.98s on 4.4.2, so the same.

denisdefreyne · 2016-11-30T07:51:33Z

@connorshea Interesting… Nanoc 4.3.8 fixed the last known performance issue, so it might be related to having more content in the add-versions branch. Is there a place where I can check out this repository?

connorshea · 2016-11-30T07:57:05Z

@ddfreyne Yup: https://gitlab.com/gitlab-com/gitlab-docs/

It's a bit odd in that it doesn't have most of the content in content/ initially. You need to run the Rake task (rake pull_repos) which shallow clones the repositories in the tmp/ directory and then copies the content from the doc/ directory in each repo into their respective folders in content/. It's a bit convoluted I suppose, but it works pretty well all things considered.

Here's pretty much what you need to run to test this:

git clone https://gitlab.com/gitlab-com/gitlab-docs.git
cd gitlab-docs
bundle install
rake pull_repos
nanoc compile
git checkout add-versions
# RAKE_FORCE_DELETE deletes the content directories since the Rake task doesn't currently overwrite them,
# I've never had it do anything wrong but if you're paranoid you can just delete the `tmp/ce/`, `content/ce/`, `tmp/ee/`, `content/ee/`,
# `tmp/omnibus/`, `content/omnibus/`, `tmp/runner/`, and `content/runner/` directories manually.
RAKE_FORCE_DELETE=true rake pull_repos

denisdefreyne · 2016-11-30T08:06:19Z

@connorshea I’d prefer not to pollute the discussion of this PR here. Shall we move the conversation to the Google group or Gitter?

connorshea · 2016-11-30T08:12:29Z

@ddfreyne of course, here's a Google Group thread: https://groups.google.com/forum/#!topic/nanoc/4iLK826kO7A

Add RFC: lazy-load items

430626f

denisdefreyne added the work in progress label Jan 3, 2016

denisdefreyne force-pushed the lazy-load-items branch from 9c2972b to 3dc706a Compare January 3, 2016 13:46

Describe lazy-loading content alternative

5816096

denisdefreyne force-pushed the lazy-load-items branch from 3dc706a to 5816096 Compare January 3, 2016 13:47

Document dependency on #1

919f8dd

denisdefreyne added the has dependency label Jan 3, 2016

RubenVerborgh mentioned this pull request Jan 3, 2016

Lazy-load items nanoc/nanoc#794

Merged

2 tasks

RFC: lazy-load items #3

Are you sure you want to change the base?

RFC: lazy-load items #3

Uh oh!

Conversation

denisdefreyne commented Jan 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

denisdefreyne commented Jan 3, 2016

Uh oh!

denisdefreyne commented Jan 3, 2016

Uh oh!

RubenVerborgh commented Jan 3, 2016

Uh oh!

denisdefreyne commented Jan 3, 2016

Uh oh!

RubenVerborgh commented Jan 3, 2016

Uh oh!

denisdefreyne commented Jan 3, 2016

Uh oh!

RubenVerborgh commented Jan 3, 2016

Uh oh!

denisdefreyne commented Jan 3, 2016

Uh oh!

connorshea commented Nov 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connorshea commented Nov 30, 2016

Uh oh!

denisdefreyne commented Nov 30, 2016

Uh oh!

connorshea commented Nov 30, 2016

Uh oh!

denisdefreyne commented Nov 30, 2016

Uh oh!

connorshea commented Nov 30, 2016

Uh oh!

denisdefreyne commented Nov 30, 2016

Uh oh!

connorshea commented Nov 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

denisdefreyne commented Nov 30, 2016

Uh oh!

connorshea commented Nov 30, 2016

Uh oh!

Uh oh!

denisdefreyne commented Jan 3, 2016 •

edited

Loading

connorshea commented Nov 30, 2016 •

edited

Loading

connorshea commented Nov 30, 2016 •

edited

Loading