Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Discussion: Importer Speed Optimisations #590
Importing 500 tumblr posts takes a very very long time due to the nature of the API. Currently this means that for the initial generation for that import, it will take about 3 minutes until you can access your site and start hacking. We need a better way of doing this.
Here's some options.
Defer importing. What we could do is just defer the loading of tumblr articles until later. So we do an initial quick generation without importing anything, then do another generation once everything is imported.
Lazy importing. Another option is we defer the importing, and then generate as we import new things. Perhaps even have the importer constantly be listening to new things (would be awesome for a pub/sub importer relationship where new documents could be created at any time and we can be notified, and import them as they come).
Caching. Another option is once we've imported things, we could use
Not sure how we could accomplish these. I'm sure it's possible, just not sure how yet, or which way is best.
Personally I would like to see the caching route where importing the posts would just take a while the first time. We've been deeply involved with Tumblr for several years now, and it is far more common to see blogs with 5,000+ posts rather than 500.
Then after the initial archive of posts have been written, the next imports would only load and cache the newer posts. The only thing that would need to be thought through would be a way to force the entire set of posts to be imported and cached again. This would be helpful in the case where imported posts might be deleted or edited.
This was referenced
Sep 3, 2013
would be nice to have command line switch like
When in this mode the plugin could write the imported data to the documents dir (or whatever the user specifies)
we could also provide a date from which to retrieve the posts from .i.e. anything before the date is ignored and anything >= to the date is imported. That way we don't re-import data.
This is how it could look from the command line
I think this would also mean we need to be able to define a plugin type i.e.
As per https://discuss.bevry.me/t/deprecating-in-memory-docpad-importers-exporters/87 this issue is now outside the scope of DocPad.