New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep references on which documents reference which #336

Open
jdomingos opened this Issue Oct 29, 2012 · 16 comments

Comments

@jdomingos

jdomingos commented Oct 29, 2012

Hi

The problem is that when I add a html.eco file to the documents folder, it regenerates all.
It should only generate the added file and continue to serve (the http server stalls while it is generating) on the port, I think.
I believe that in docpad 4.x, something of this sort was done, am i wrong?

> docpad run
info: Welcome to DocPad v6.10.0
info: Plugins: associatedfiles, buildr, cachr, cleanurls, eco, marked, partials, text
info: Environment: development
info: Generating...
info: Currently on renderFiles at 0%
info: Currently on renderFiles at 0%
info: Currently on renderFiles at 0%
info: Currently on renderFiles at 0%
info: Currently on writeFiles at 0%
info: Currently on writeFiles at 0%
info: Generated all 1842 files in 177.831 seconds
info: DocPad listening to http://localhost:9778/ on directory ...
info: Watching setup starting...
info: Watching setup
info: The action completed successfully
/* added some new html.eco file with article layout */
info: Regenerating at 12:05:52
info: Generating...
info: Generated 0 files in 0.002 seconds
info: Regenerated at 12:05:52

In my opinion, if a layout is changed, it should regenerate all. But if a file with no dependencies is changed, only that file should be rendered and written.

> docpad run
info: Welcome to DocPad v6.10.0
info: Plugins: associatedfiles, cachr, cleanurls, eco, marked, partials, text
info: Environment: development
info: Generating...
info: Currently on renderFiles at 0%
info: Currently on renderFiles at 0%
info: Currently on renderFiles at 0%
info: Currently on writeFiles at 0%
info: Currently on writeFiles at 0%
info: Currently on writeFiles at 0%
info: Currently on writeFiles at 52%
info: Generated all 1842 files in 200.292 seconds
info: DocPad listening to http://localhost:9778/ on directory ...
info: Watching setup starting...
info: Watching setup
info: The action completed successfully
/* changed article */
info: Regenerating at 10:40:46
info: Generating...
info: Generated 440 files in 97.845 seconds
info: Regenerated at 10:42:24

Thanks.


Want to back this issue? Place a bounty on it! We accept bounties via Bountysource.

@balupton

This comment has been minimized.

Member

balupton commented Oct 30, 2012

Good question. Differential rendering is definitely in there. However there are a few nuances, so let me explain.

  1. If a document (or any of its layouts) does a call to getCollection or the like, then we shall mark that document as referenceOthers: true
  2. If something is modified, it will regenerate that document, as well as anything that has referenceOthers: true - unless that document has standalone: true in its meta data.

Without this referenceOthers capability, then things that reference other documents (for instance blog post listings) would not be updated when a document they reference is updated (e.g. a blog post changes its title).

Of course, it could be more efficient by having referenceOthers know exactly what documents it actually references, so if you have a stylesheet it doesn't re-render all the referenceOthers documents - however such intelligence also brings risk - docpad has done real good so far in being as risk free as possible - if there is a risk of us rendering incorrect data, we don't do it. To get around this efficiency problem, we've got the standalone: true header, that I use a lot for stylesheets etc that I modify a lot.

Btw, you can tell if it is re-rendering everything as it will say Generated all. Where if it is just re-rendering some, it won't say the all bit. Give the standalone: true header a go and let me know if that suits your needs as a suitable workaround.

@jdomingos

This comment has been minimized.

jdomingos commented Oct 31, 2012

The standalone flag is a good option but not for what I need.
In fact, with your explanation, I understood why all my documents were regenerating - they were referencing others and that makes perfect sense.

The only issue with massive regeneration is that docpad does not continue to serve while it's regenerating and that is bad in my case. While it's regenerating (200+ seconds, takes longer and more memory each time), if I add one more file (or change one) there is a good possibility that the rendering stalls (I don't know for how long or if forever). After 10 minutes I usually restart it.

I think the website should continue to be live while a regeneration is in place and then update the files at once, don't you think?

@balupton

This comment has been minimized.

Member

balupton commented Oct 31, 2012

I think the website should continue to be live while a regeneration is in place and then update the files at once, don't you think?

Definitely. It's something I've noticed as well too.

One thought I have is to create a new database for the regeneration, and once done, swap out the old database for the new one. The problem with this, is that we would have to enforce people to attach listeners to the database during an event, rather than just doing docpadInstance.getDatabase().on as otherwise their listeners will be lost. Alternatively, we could reset the original database with the new data - but then child collections and things would play up.

When doing this rewrite, we should take #335 into account as well.

@jdomingos

This comment has been minimized.

jdomingos commented Nov 2, 2012

Thanks for the explanation.

What would you suggest as a suitable workaround?

My idea is to have one process running 'docpad server' and then having a script running 'docpad generate' everytime anything changes (file added or changed).

Do you think this will create concurrency issues? It seems to work and continue to serve whilst a regeneration is in place (docpad watch command returns an error when anything changes: ' layout not found').

@balupton

This comment has been minimized.

Member

balupton commented Nov 6, 2012

For the meantime there isn't really a workaround. Though over the past week I've done a ton of work on improving performance of DocPad - so with the latest version of DocPad (and any plugins - especially the partials plugin) - performance should be quite good.

However, we still have the same issue of the moment of downtime while a regeneration occurs. A quick workaround could be getting the serveDocument middleware to check if a generation is currently occuring, if it is then forward onto the static middleware by calling next. This would break dynamic pages - however, would be a suitable workaround in most cases.

A proper fix would be the doing the database swap thing - however if it isn't trivial then it will be quite involved...

@balupton balupton referenced this issue Nov 26, 2012

Closed

Performance #352

@iammerrick

This comment has been minimized.

iammerrick commented Nov 26, 2012

Thanks @balupton I am a man in need of performance I was just scrapping for ideas.

@balupton

This comment has been minimized.

Member

balupton commented Jul 27, 2013

Closing in favour of #359

@balupton balupton closed this Jul 27, 2013

@balupton balupton reopened this Jul 27, 2013

@balupton

This comment has been minimized.

Member

balupton commented Jul 27, 2013

Actually that issue while related should not deprecate this one, updating the title of this now to avoid ambiguity.

@balupton

This comment has been minimized.

Member

balupton commented Dec 7, 2013

@balupton

This comment has been minimized.

Member

balupton commented Jan 28, 2014

This is the next step for performance ultimisation.

@ghost ghost assigned balupton Jan 28, 2014

@balupton

This comment has been minimized.

Member

balupton commented Jan 28, 2014

So with DocPad v6.62, html files will no longer cause css files to regenerate, but we are still plagued by modifying a css files, html files that reference others will still regenerate.

The only solution to this is keeping track specifically which documents reference which, which is possible. The only concern here is garbage collection, and dead pointers.

The idea is to introduce a few new properties:

referencesOthers: true
referencesCollections: ['stylesheet']
referencesFiles: ['./a.html', './b.html']

Things that will modify referencesCollections:

  • @getDatabase()
  • @getCollection('stylesheet')
  • @getFiles(query)

Things that will modify referencesFiles

  • @include('./a.html')
  • @getFile('./a.html)

Probably some more that I forget.

@Naatan

This comment has been minimized.

Naatan commented Feb 18, 2014

When this gets implemented it will definitely need some type of pattern matching. As this is intended for big sites that are slowed down by large collections of documents it'd be unfeasible to add all those files manually to your meta.

Additionally I would personally favor being able to control these references in a central config file, rather than in a documents meta tags. I think that'd be easier to implement too. Perhaps an idea is to have a phase 1 with a central config file and a phase 2 that adds optional overrides through meta tags.

@balupton

This comment has been minimized.

Member

balupton commented Feb 18, 2014

For the most part, a user should never need to set the reference headers themselves, as the template helpers should do that automatically.

E.g. currently we have the template helpers:

            # Set that we reference other files
            referencesOthers: (flag) ->
                document = @getDocument()
                document.referencesOthers()
                return null

            # Get a pre-defined collection
            getCollection: (name) ->
                @referencesOthers()
                return docpad.getCollection(name)

            # Get another file's URL based on a relative path
            getFile: (query,sorting,paging) ->
                @referencesOthers()
                result = docpad.getFile(query,sorting,paging)
                return result

Instead, these would be changed to:

            # Get a pre-defined collection
            getCollection: (name) ->
                collection = docpad.getCollection(name)
                @getDocument()?.referencesCollections.add(collection)
                return collection

            # Get another file's URL based on a relative path
            getFile: (query,sorting,paging) ->
                file = docpad.getFile(query,sorting,paging)
                @getDocument()?.referencesFiles.add(file)
                return file

But maybe it actually makes more sense to have this in a central database like your suggestion, so something like:

            # Get a pre-defined collection
            getCollection: (name) ->
                collection = docpad.getCollection(name)
                (docpad.referencesCollections[@document.id] ?= []).concat(collection.name)  if @document?.id
                return collection

            # Get another file's URL based on a relative path
            getFile: (query,sorting,paging) ->
                file = docpad.getFile(query,sorting,paging)
                (docpad.referencesFiles[@document.id] ?= []).concat(file.id)  if @document?.id
                return file

Which could reduce memory constraints, make it easier to wipe the references on partial:false generations. But would make querying of references more difficult, however I'm not too concerned about that, as I doubt it is a valid use case.

@Naatan

This comment has been minimized.

Naatan commented Feb 18, 2014

Ah I misunderstood what you were intending to do. That's certainly the best solution but the amount of gotcha's is bound to be quite terrifying. Still, definitely the ideal solution and one that will make DocPad stand out amongst the crowd.

Perhaps it'd still be a good idea to cut it into 2 phases though. Phase 1 being the manual phase, wherein the core logic for referencing other files directly is working and manually managed. Phase 2 would be the gotcha phase wherein these references would be managed automatically by docpad. That way you could already get a feel for the way the system works once phase 1 is done and make adjustments as needed before delving into phase 2.

I'll try to help with this as much as I can once time allows it. Been meaning to get into the DocPad innards.

@balupton

This comment has been minimized.

Member

balupton commented Feb 18, 2014

Ah I misunderstood what you were intending to do.

No worries, I mustn't have explained things as well as I could have!

@balupton

This comment has been minimized.

Member

balupton commented Feb 6, 2016

I'm considering doing a crowd fund for this at around $5000USD. Is that something people are interested in. Would more than half the speed of initial generations for most users, as well as turn subsequent generations to seconds or less for most users.

https://bevry.slack.com/archives/funding/p1454439317000003

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment