Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of raw directory that is not parsed #276

Closed
Delapouite opened this issue Jul 24, 2012 · 38 comments
Closed

Introduction of raw directory that is not parsed #276

Delapouite opened this issue Jul 24, 2012 · 38 comments

Comments

@Delapouite
Copy link
Contributor

@Delapouite Delapouite commented Jul 24, 2012

Update: See the raw plugin.

Hi

If I remember correctly, in the first versions of Docpad, only the content of the src/documents folder was processed.
The public folder (now 'files') was just straight copied to the out folder.

Now, the content of src/files is processed as well. This behavior is handy for a lot of reason like the attached plugin.
But it also slows down the all generation for nothing if we don't need to do anything special with those files.

Will it be wise to have back a specific folder in the src (src/clone?) which content is just recursively copied to out without funky backbone collections adding and stuff?

I know it may ends up in confusion and maybe it only deserves a plugin that hook on of of the final event like serverBefore to make this operation.

@dimitarkolev
Copy link

@dimitarkolev dimitarkolev commented Jul 26, 2012

I vote for this, i used to have a lot of files in files(public) folder but since it is processed nowadays and it really slows things down in my case from 5-6sec to 2min i just created what it written above in bash with the help of rsync :) and it gets executed after docpad has finished with generation. I put all those files in folder outside src folder in order not to mess up with docpad structure. It would be nice to have that king of data/public/files/clone folder back to docpad :)

@balupton
Copy link
Member

@balupton balupton commented Jul 27, 2012

Hrmmm, interesting.

I'm up for re-adding it, though I'd like to keep it as a power-user feature rather than something taught to everyone - as it does add more complexity to the decision making process of "where do I put my file" - something we I feel even with two directories there is still too much complexity!

Perhaps we can call that directory raw as in we don't do anything special to it?

@dimitarkolev
Copy link

@dimitarkolev dimitarkolev commented Jul 30, 2012

raw is nice name

@balupton
Copy link
Member

@balupton balupton commented Oct 24, 2012

With large files, this is definitely becoming an issue. There are a few options for implementation here:

  1. Use the operating system copy abilities - fastest, but we cannot detect which files have changed, meaning that we copy everything over regardless if they exist, or don't copy over files that already exist
  2. Use node.js for copying - slower, but we can detect which files have changed since last time, and copy only those that have changed
  3. Hybrid between the two - medium, use node.js for detecting file changes, however use OS for copying changed files

I'd like to get your feedback on this before I proceed.

@zenorocha
Copy link

@zenorocha zenorocha commented Oct 24, 2012

I'm looking forward for this feature. Nowadays I have a 500mb JavaScript library in my project and this slows down a lot my generation process.

@zenorocha
Copy link

@zenorocha zenorocha commented Oct 24, 2012

Btw, the hybrid alternative seems better.

@djalmaaraujo
Copy link

@djalmaaraujo djalmaaraujo commented Oct 25, 2012

+1

@eduardolundgren
Copy link

@eduardolundgren eduardolundgren commented Oct 25, 2012

I need that too

@cirdes
Copy link

@cirdes cirdes commented Nov 11, 2012

+1

@balupton
Copy link
Member

@balupton balupton commented Nov 13, 2012

Considering the interest, I'll look into getting this up in the next batch of work.

Will probably just go with option 1 at the start, as that will provide the least overhead and the quickest solution which is what the raw directory is all about.

For the meantime, you could run npm install --save bal-util then add the following to your docpad configuration file:

    # =================================
    # DocPad Events

    events:

        # Copy over our Raw Directory when DocPad Starts Up
        docpadReady: (opts,next) ->
            # Prepare
            docpad = @docpad
            config = docpad.getConfig()
            balUtil = require('bal-util')
            rawPath = require('pathUtil').join(config.srcPath, 'raw', '*')

            command = ['cp', '-Rn', rawPath, config.outPath]
            docpad.log('info', 'raw directory copying over...')
            balUtil.spawn command, {output:true}, (err) ->
                return next(err)  if err
                docpad.log('info', 'raw directory copied over')
                return next()

and that should work (haven't test the above code snippet, so could be incorrect but won't be far off).

Actually, if someone wants to make that a plugin, that'll be awesome!

@zenorocha
Copy link

@zenorocha zenorocha commented Nov 14, 2012

Thanks for looking into this, I've tried to use this snippet and I got:

error: An error occured: exited with a non-zero status code Error: exited with a non-zero status code

I'll try to make it work.

@balupton
Copy link
Member

@balupton balupton commented Nov 14, 2012

To debug, try this:

            balUtil.spawn command, (err,args...) ->
                if err
                    console.log(args)
                    return next(err)
@zenorocha
Copy link

@zenorocha zenorocha commented Nov 22, 2012

I see that there's a ignorePattern attribute on docpad.coffee, this thing can help? Where can I find documentation for that?

@balupton
Copy link
Member

@balupton balupton commented Apr 9, 2013

Any chance of someone being thrilled enough to make this into a plugin? Would love it if someone could!

@Hypercubed
Copy link

@Hypercubed Hypercubed commented Apr 13, 2013

FYI. I have some TSV data files that are getting mangled before being copied to the out directory. I either need a raw directory as discussed or some way to prevent docpad from converting tabs to spaces in data files. This seams like a common use case not necessary a power user feature requiring a plugin but that's just my opinion FWIW.

@Hypercubed
Copy link

@Hypercubed Hypercubed commented Apr 23, 2013

It also appears that d3js JavaScript library files (https://github.com/mbostock/d3/blob/master/d3.js) are unusable after processed from the files directory. Is it just me?

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 10, 2013

So, issue #491 fixed my problem with tabs in tab delimited files. However, I still in some cases need a /raw directory for large data files. When I place a 427mb file in /files I get the following error:

FATAL ERROR: CALL_AND_RETRY_0 Allocation failed - process out of memory

I guess I still don't understand the technical reason Docpad is altering the /files files rather than just coping them.

@balupton
Copy link
Member

@balupton balupton commented May 10, 2013

DocPad doesn't alter files at all and just copies them across. The issue is that the way this happens in DocPad is load the files into memory, then write them. Hence why we run out of memory when doing so for large projects.

We need to load them into memory as documents may want to query them. Therefore, the solution is to have a new raw directory.

Just tested the following locally and it worked great:

module.exports =

    events:

        # Copy over our raw directory
        writeAfter: (opts,next) ->
            # Prepare
            docpad = @docpad
            config = docpad.getConfig()
            balUtil = require('bal-util')
            rawPath = config.srcPath+'/raw/'
            # the trailing / indicates to cp that the files of this directory should be copied over
            # rather than the directory itself

            command = ['cp', '-Rn', rawPath, config.outPath]
            docpad.log('debug', 'Copying raw directory')
            balUtil.spawn command, {output:true}, (err) ->
                # return next(err)  if err
                docpad.log('debug', 'Copied raw directory')
                return next()

If we can move that to a plugin and add some tests, I'll be a happy man.

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 10, 2013

Is it possible to change the way docpad copies the files (using streams for example) to avoid the memory issue? I have to so something similar while processing these large files in my custom route.

I'd love to tackle making this a plugin but may take me some time (I'm fairly new to the whole node.js/docpad system).

By the way. The command needs to be:

command = ['cp', '-Rn', rawPath, config.outPath+'\']

or the copy will fail if the out directory doesn't already exist.

@dimitarkolev
Copy link

@dimitarkolev dimitarkolev commented May 10, 2013

I believe that simply coping the files will not work when there are thousands of files in the raw folder (we do have projects with more than 10k files). It will make the whole generation process extremely slow. It should work more like rsync or at leas do a "date modified" comparison before coping the file.

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 10, 2013

Actually the -n (--no-clobber) flag in the cp command will prevent the files from being overwritten so the files will only be copied once. However, if the files change a docpad clean would be needed. An rsync would be better.

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 10, 2013

Ok... I've got a repo. I couldn't figure out the tests. Maybe next week.

https://github.com/Hypercubed/docpad-plugin-raw

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 10, 2013

Well, a real test is going to be hard. The tester also has out of memory issue when comparing test/out/ to test/out-expected/.

FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory

@balupton
Copy link
Member

@balupton balupton commented May 10, 2013

@Hypercubed haha, okay, I wonder if there is some native command we can use to compare the two directories... but perhaps it's time we start doing streams based comparisons.... happy to accept pull requests and discuss if anything needs to be done

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 11, 2013

Does anyone think it is a good/bad idea to use a hard link instead of copy:

command = ['cp', '-Rnl', rawPath, config.outPath+'\']

It seams to work and is much faster.

@dimitarkolev
Copy link

@dimitarkolev dimitarkolev commented May 11, 2013

Great idea, but what about windows support? Its not my problem no windows
here but i believe there are many people doing dev on windows

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 11, 2013

I'm developing using cygwin under windows and the cp -Rnl command works great. I think for dos we would need to use another command entirely.

@dimitarkolev
Copy link

@dimitarkolev dimitarkolev commented May 12, 2013

Cygwin is an option but it will add another requirement for running docpad under windows. The solution might to put it as configuration in docpad.cson. This way i can use rsync or any other external application. Thinking about it i am getting closer to the idea of having simple plugin that does just that :)

@balupton
Copy link
Member

@balupton balupton commented May 12, 2013

Sure, so the raw plugin can be rewritten to use a node.js solution that uses streams for cross-platform compatibility, however that will increase the time needed to do it's task. The native executables for the operating system are always way faster.

How does rsync work?

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 13, 2013

Ok, I've pushed a version that uses xcopy \e instead for cp on windows without cygwin in the path. You can also change the command (i.g. rsync) by setting the plugins.raw.command in docpadConfig. This is all seaming more and more hackish by the day but is working great for my >1 Gb files.

@dimitarkolev
Copy link

@dimitarkolev dimitarkolev commented May 13, 2013

Here is my rsync command rsync --exclude ".svn" -a ./src/raw/ ./out/ using rsync solves many problems ill try to list some

  1. Exclude source control folders, having them in out might lead to problems
  2. Very fast for thousands of small files that change from time to time coping only those that has been changed
  3. When working with large files only differences are copied (rsync has its ingenious way to do it) which saves a lot of time
  4. Available everywhere (on Windows with cygwin)

As a side not we can add -d option which will delete the the files that are no longer in raw folder but its not default behavior of docpad so i dont use it.

@Hypercubed
Copy link

@Hypercubed Hypercubed commented May 15, 2013

@dimitarkolev Try the plugin with this in your config file:

plugins:
    raw:
        commands:
            raw: ['rsync', '-a', './src/raw/', './out/' ]
@balupton
Copy link
Member

@balupton balupton commented Jun 25, 2013

For what it's worth, here is a gist that rsyncs for deployment: https://gist.github.com/Hypercubed/5804999

@balupton
Copy link
Member

@balupton balupton commented Jul 12, 2013

@Hypercubed happy to make your raw plugin an official plugin on the docpad org. That means we'll test it with each new docpad version. Let me know if you're down for that.

@Hypercubed
Copy link

@Hypercubed Hypercubed commented Jul 15, 2013

@balupton I don't mind. Although I'm still not completely satisfies with this solution because it relies shell commands but I don't have anything better right now.

@balupton
Copy link
Member

@balupton balupton commented Sep 27, 2013

I guess the decision here, is should we proceed with an execute command based approach, or a node.js stream pipe approach.

@balupton
Copy link
Member

@balupton balupton commented Nov 28, 2013

@balupton balupton closed this Nov 28, 2013
@pflannery
Copy link
Contributor

@pflannery pflannery commented Nov 28, 2013

There is also a plugin for running bash, sh, cmd or powershell scripts as documents or during specified docpad events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants