Skip to content
This repository

[WIP] Drop find and grep in favor of ag / nak #2369

Open
wants to merge 45 commits into from
Garen Torikian

This PR represents a culmination of various 20% time I took over the last few weeks. It is designed to drop support for find/grep for filelist and search. Pull up a chair and a cup of coffee while I describe a change for a feature no one asked for. :sweat_smile:

Why?

find/grep should be dropped for the following reasons:

  1. The code is hard to maintain. (I should know, I wrote or worked on a lot of it.)
    The code is a mess because we have to account for several annoyances, such as BSD vs. GNU flavors, bash-strings-as-scripts passing (which requires all sorts of escaping functions), and, in the case of find/grep missing, we actually compile PCRE support to provide our own binaries.
  2. There's no Windows support. Conceivably, a user could SSH into a Windows box and run Cloud9. Dependence on find/grep limits this.
  3. There's no easy way to maintain a list of accepted file types and ignored directories (see below for more). For example, we aren't overlooking *-backup~ from sm or .c9revisions.
  4. There's no support for users to define their own search settings or exclusions.
  5. There's no support (or simple support) for files without extensions.
    Want to search a Makefile, or a script starting with #!/usr/bin/env? Our current method of assembling in Javascript, writing a grep search, and tossing it over to the system via bash -c isn't supporting this. Good luck figuring out where to start.
  6. It's slow--slow enough to not overlook all the other deficiencies listed above.

At first, I was pretty keen on just supporting ack to overcome all of this. It's written in Perl, it supports ignore files, and it should be good enough, right?

Well, not quite.

Advantages and Problems with Ack

ack 1.96, which is the current stable version, maintains a list of "safe" code files (with and without extensions). In fact, the current searchinfiles code was designed to emulate ack's behavior, and contains a list of known extensions to use.

ack 2.0 will move beyond using extensions to define files. This makes complete sense: every day another new technology drops, and no one wants to keep maintaining a list to add items like .jade, .scss, .styl and so on. So, ack 2.0 uses Perl's -B switch to determine if a file is binary. If it's not, it's searched. Perfect.

Unfortunately, ack 2.0 is alpha, and has been "under development" for around two years. As such, it doesn't really hold up to its promises of just searching non-binary. On top of all that, relying on ack for the filelist module wouldn't work, because it really doesn't even list binary files--it just outright ignores them.

Rather than hack up a solution in Perl, I went and chose something else.

About Ag

ag is written in C. It's fast. It supports all of the above mentioned requirements (can list binaries, but only searches text; allows users to provide their own search settings; supports Windows, e.t.c.), and can be used to replace both find and grep.

Perfect! I figured I would just use ag, and, for replacing files, continue our technique of just using Perl to replace file contents.

Then, I had a panic attack. What if, conceivably, a user's machine didn't have Perl? They'd be SOL. So I needed another solution for replacing contents in files that didn't use Perl.

Enter Nak

After reading Felix's Faster than C notes, I became inspired to just write a fast ack clone, in Node.js. After all, if a user is SSHing into a box, it is guaranteed that they have Node.js installed.

Previously, TJ had written an ack clone in Node, but his code was not very performant. At least, it was slower than ack.

I benchmarked and rewrote and learned a lot. I created nak which, while not supporting everything ack does, does nearly everything ag does, at least, and 100% supports everything we need for Cloud9.

Benchmarks

You like numbers? Me too. They're fun.

Here's the average time for grabbing the filelist in cloud9infra five times (about 33,761 files). The commands do the exact same thing (get hidden files, e.t.c.) and exclude the same nonsense directories ( .git, .c9revisions, sm backups, e.t.c.). :

ag nak ack find
0m0.184s 0m0.940s 0m1.103s 0m1.093s

Here are benchmarks for finding the phrase "va" in cloud9infra:

ag nak ack grep
0m2.599s 0m20.021s 0m29.876s 0m20.891s

What ends up happening is that, with ag, the filelist comes up before the IDE even finishes initializing. Similarly, search results return before the panel finishes animating up. It's some impressive shit.

Ignoring Files

There are now three ways to ignore files:

  1. With a standard Cloud9 IDE .agignore file. This will be loaded for every filelist and searchinfiles command. It follows gitignore-style wildcards. It excludes uninteresting directories, minified JS files, and Cloud9 extraneous content. This makes mainting our own rule-list much, much easier.

  2. Users can define their own .agignore on a per-directory basis in their workspaces.

  3. While searching, users can exclude filetypes with -; for example, -*_test.js excludes all Javascript files ending with "_test.js".

Testing

Mike did a huge favor by rewriting the tests to be compatible with Mocha. I extended his test suite, and also duplicated the commands, one for ag and one for nak; each test also validates that the results of ag and nak are the exact same.

Final Thoughts

My idea is that we provide various binaries for ag, and, in the case that we don't support a platform, we fall back to nak.

By my count, we're missing the following binaries:

  • freebsd
  • sunos
  • win32

In any event, though, all these binaries are built on an 64-bit machines, and if I learned anything from working on Cloud9 Local, it's that people are still using 32-bit machines (I tried to bundle x64 Node binaries for users without Node on their machine, and it wouldn't work for those running x32).

We won't really need to continue dropping new ag binaries, since it's currently fast enough, and, through the mocha tests, we've already covered every feature we need at the moment.

A more real problem, though, is that I'm not at all sure if one Linux binary (built on Ubuntu) will support other Linux distros (Fedora, CentOS, Gentoo...). If that's the case, then we should definitely fall back to nak. I couldn't figure out a way--using process or the os module--to see if we could specifically identify the type of the Linux distro.

Finally, it'd be great to also pull in these two PRs, which rename the packed files to end in .min.js:

I had a lot of fun writing performant Node code. Hooray!

@ajaxorg/liskov

@lennartcl @mostafaeweda @nightwing (as seen above, I am pretty sure non-team members cannot @ a team).

Geoff Greer

If I ever add user quotes to the Ag readme, "It's some impressive shit." is definitely going in there.

Just FYI, Windows support is almost certainly broken in Ag by now. I barely had it working back in 0.1. The wiki and readme are out of date. :(

But all the other OSes should be fine, including FreeBSD and Solaris.

Matt

This is perhaps the most epic PR I've ever seen. Lots of thought and care obviously went into this.

Just one problem: We're going to have to make the console output bar slide up faster.

Great job Garen.

Bas
Owner

Very impressive PR indeed. I like seeing you updated unit tests as well.

plugins-server/cloud9.ide.search/search.js
((26 lines not shown))
114 95
         if (!query)
115 96
             return;
116 97
 
117  
-        // grep has a funny way of handling new lines (that is to say, it's non-existent)
118  
-        // if we're not doing a regex search, then we must split everything between the
119  
-        // new lines, escape the content, and then smush it back together; due to
120  
-        // new lines, this is also why we're  now passing -P as default to grep
121  
-        if (!options.replaceAll && !options.regexp) {
122  
-            var splitQuery = query.split("\\n");
  98
+        if (this.env.useAg) {
1

This if statement should be split up into two separate plugins or code-files or anything. Makes unit testing also easier because switching environments isn't necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
plugins-server/cloud9.ide.filelist/filelist.js
((26 lines not shown))
27 21
         });
28 22
     };
29 23
 
  24
+    this.isAgAvailable = function() {
  25
+        return Fs.existsSync(this.env.agCmd);
1

Uh wtf, sync code?!?!?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
plugins-server/cloud9.ide.filelist/filelist.js
((5 lines not shown))
11 10
 var Path = require("path");
  11
+var Fs = require("fs");
  12
+Fs.existsSync = Fs.existsSync || Path.existsSync;
1

My eyes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
plugins-server/cloud9.ide.filelist/filelist-plugin.js
((17 lines not shown))
21 26
     Filelist.setEnv({
22  
-        findCmd: options.findCmd,
23  
-        platform: options.platform
  27
+        platform:  platform,
  28
+        arch:  arch,
  29
+        agCmd:  options.agCmd || path.join(__dirname, "..", "cloud9.ide.search", [platform, arch].join("_"), "ag"),
1

So why isn't this logic all in the config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Jan Jongboom

I'd rather have plugins-server/cloud9.ide.search/darwin_x64/ag (plus evt. other builds) in a separate repo like we always do.

plugins-server/cloud9.ide.search/search.js
((25 lines not shown))
26 21
         });
27 22
     };
28 23
 
  24
+    this.isAgAvailable = function() {
  25
+        return Fs.existsSync(this.env.agCmd);
1
Sergi Mansilla
sergi added a note October 30, 2012

Please, make it async!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Zef Hemel

I Instapaper'ed this pull request.

Jan Jongboom

Maybe we should publish it through http://www.lulu.com/

Garen Torikian

@ggreer Thanks for the head's up, and thanks for your work on ag.

@sergi @janjongboom Sheesh, so much anger for such a tiny synch! Ok, I took the sync logic out; I also split the ag/nak "assembleCommands" logic into two separate files. This also cleans up the setEnv stuff considerably because, as Jan points out, we're not ever really going to switch environments.

Sergi Mansilla

Sorry Garen, I didn't want to transmit any anger about that! It is just that the sync/async stuff is an ongoing theme here, and we were just being idiots by over-commenting on it.

This is the most impressive PR I've seen, if only everybody put some thought on explaining PR rationale like you did the world would be definitely a better place. Or more documented, at least :)

Thanks for this, I've been waiting for ag super-speed for so long!!

Jan Jongboom

So here's one other idea: if we load the ack/ng stuff through architect the code between switching can be done in config rather than in the plugin itself.

Zef Hemel

Question: once this is into cloud9 OSS. How can we deploy this in infra? For the gear projects we can just rsync in a few binaries, but what about SSH projects? We currently are very hands-off and don't intall any tools into SSH projects.

Jan Jongboom

Fall back on nak when it's not present? Currently we assume that grep is on the machines, so this is already better.

Zef Hemel

Right, but then we have to somehow load that into node process with VFS. I think now it just runs it as an external exec call, but I may have read that wrong.

Jan Jongboom

But that'll work just fine on SSH machines?

Zef Hemel

@janjongboom Will nack be installed on the SSH machine? If so, how?

Garen Torikian

So here's one other idea: if we load the ack/ng stuff through architect the code between switching can be done in config rather than in the plugin itself.

@janjongboom I'm not really clear on how to do this. I guess I would do something like:

    {
        packagePath: "./cloud9.ide.search",
        useAg: true
    },

and then use that property later on?

Jan Jongboom

No, Ag en Nak both have a plugin, and they both provide a file-searcher thingy or something. In the config we'll load either one of them and cloud9.ide.search only consumes the file-searcher.

Garen Torikian

I factored out the plugins into their own separate modules but didn't have time to redo the tests. I'll likely finish that up tomorrow and hopefully this can get merged soon ?

Garen Torikian

Oh hell, who needs sleep.

We now have two plugins, cloud9.search.ag and search.nak, that provide the "codesearcher" function. (filesearch and codesearch were already taken by cloud9.fs. damn).

default.js determines when to use which plugin. filelist and searchinfiles just consume it. tests have been rewritten to take account for this change and they're passing.

Garen Torikian

There was a known issue in ag that directory globs were not being properly ignored: ggreer/the_silver_searcher#100

I updated the ag binaries and the tests to support directory ignoring. I also found a bug around this in nak, and updated that package as well.

Garen Torikian

@basdewachter BTW why are some tests skipped?

./plugins-server/cloud9.ide.filelist/filelist_test.js SKIPPED
./plugins-server/cloud9.ide.revisions/revisions_test.js SKIPPED
./plugins-server/cloud9.ide.search/search_test.js SKIPPED
Sergi Mansilla

@gjtorikian Because they are blacklisted, they are obsolete and have to be updated. My PR #2301 re-enables the revisions one, but unfortunately it has been waiting for a month.

Sergi Mansilla

Pinging @ajaxorg/liskov to review this soon. There is a lot of work and awesomeness put into this PR.

Mostafa Eweda
Collaborator

I see beautiful stuff here
However, I wonder how come after all those comments & still pinging someone from liskov to review this !

Some review: remarks:

  • The tests included are mocha tests & they didn't work for when used mocha. can it be changed to asyncjs tests ? & checked to not fail ?
Garen Torikian

@mostafaeweda See above--these tests are marked as ignored anyway so they're not running.

Plus, it was @mikedeboer who originally converted/wrote these tests into mocha. My understanding is that those sorts of tests will be supported soon.

Also very interesting that it begin to fail after your commits :) So I'll see why that happened suddenly.

Mostafa Eweda
Collaborator

Yes, strange ! - I double checked my changes because of that :)

Also, plz include a c9 infra PR representing how will this (work / fallback) in various configurations: ssh, ftp & openshift ?

Garen Torikian

@mostafaeweda Ok I found the root cause and fixed the problem.

The build server uses linux_ia32 as their os_architecture. The tests, unlike the code, does not check if "ag" is present on a machine before running. I changed the test to only run ag if the binary is still there.

On top of that, I decided to build the 32-bit linux binary for ag, so that our tests pass with ag (and it'll help users sshing into such machines). Hooray!

plugins-server/cloud9.search.nak/codesearcher-nak.js
((5 lines not shown))
  5
+ * @license GPLv3 <http://www.gnu.org/licenses/gpl.txt>
  6
+ */
  7
+
  8
+"use strict";
  9
+
  10
+var path = require("path");
  11
+
  12
+module.exports = function setup(options, imports, register) {
  13
+  var nakCmd = options.nakCmd;
  14
+
  15
+  var codesearcher = {
  16
+      assembleFilelistCommand: function (options) {
  17
+        var args;
  18
+
  19
+        args = ["-l",                                                     // filenames only   
  20
+                "-p", path.join(__dirname, "..", "cloud9.ide.search", ".agignore")]; // use the Cloud9 ignore file                     
2
Harutyun Amirjanyan Collaborator

i think this won't work over vfs, since in that case args are sent to remote machine where __dirname doesn't exist

With who can I confirm this further? It works on EC2/cloud9infra so I'm not sure what else to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Bert Belder

@ggreer

Just FYI, Windows support is almost certainly broken in Ag by now. I barely had it working back in 0.1. The wiki and readme are out of date. :(

What is the issue with that? I can probably help you out here.

PS, do I know you as angryparsley or did I get some people mixed up?

Garen Torikian

Ah yeah, this whole house of cards has come tumbling down. @nightwing was right (for a different reason), this apparently does not work on cloud9infra. At least, I couldn't get the right paths for either ag or nak after cloning a project and placing it on an openshift server.

This makes me sad. I'll need someone's help figuring this piecing out, because I'm not sure how the architecture works in this area.

Geoff Greer

@piscisaureus

I don't know. I don't have a Windows VM anymore. I added pthreads a little while ago, and that's not something that normally comes with Windows. There are probably a couple other issues, but the code itself shouldn't be that hard to port. I've made sure it's always C89 in the hopes that Visual Studio could one day compile it.

Also yes I am AngryParsley. Small world!

Jan Jongboom

@gjtorikian Put it in the c9pm repo and run c9pm install ag on first use of the plugin? (Can even do this from the client, we have some nifty 'run this on server' command that we use for deployments as well. if which heroku doesnt result into anything -> install tools). After that it's always in the same place (~/lib/pkg/ag f.e.)

Mostafa Eweda
Collaborator

Maybe easier & better to put it on the shared gear
We now have openshift shared space among all gears
in openshift.js
PATH += ":/usr/libexec/openshift/cartridges/c9-0.1/root/bin"
& add all the needed binaries there (ag, node 0.6 & node 0.8, ....etc)

That maybe better than waiting for installation & syncing stuff to gears

Jan Jongboom

Yes that seems nicer

Garen Torikian

Will putting it in openshift.js work for SSH machines? @mostafaeweda

Zef Hemel

No, that is a good point...

Mostafa Eweda
Collaborator

For SSH machines, it won't either have the c9pm command
We have to fallback to something
Maybe back to find & grep :-1: ?
OR how SSH machines will have that anyway ?

Zef Hemel

Or we upload a OS-specific binary, but that's nasty. Or we require the tool to be installed, if it's not, we show instructions on how to do that.

Mostafa Eweda
Collaborator

Can't we run something like npm install -g nack upon connection initiation with SSH machines ?

Zef Hemel

That could work. Generally I'm not a big fan of installing stuff in our SSH workspaces, but in this case it may be the best way to go.

Garen Torikian

What if we just stop nak as an npm dependency, and I just rewrite the search plugin to use the nak code ?

At this point, any updates to the code would be made for performance reasons. We don't have to have it as a separate NPM module.

Zef Hemel

That code still has to make it to the SSH server/gear right? Perhaps a VFS extension can be used for that. Maybe @creationix could help out there.

Garen Torikian

In case someone is out there thinking "dear Lord what's going on with this PR?!" we've decided to drop the ag binaries (sorry @ggreer :( ) and implement a pure Node solution that's going to be streamed through a VFS extension to work on openshift and ssh.

On top of that, I did some optimizations that turned the above searched in cloud9infra from 20s to 11s. That's damned good enough.

c9bot
Collaborator

Build Status: :broken_heart:Failed (17bd2ce, job info)

c9bot
Collaborator

Build Status: :broken_heart:Failed (97c32a1, job info)

c9bot
Collaborator

Build Status: :broken_heart:Failed (56b06bd, job info)

Sergi Mansilla

Dear Lord what's going on with this PR?!

Garen Torikian

I already explained, @sergi !!!!!!

No but nak now works nicely with VFS; @mattpardee has more info on this, as he's got a project using it.

Ruben Daniels
Owner

I'm gonna make sure this PR gets into Cloud9 v3. This is epic.

Sergi Mansilla
sergi commented March 15, 2013

:+1: BLAZING fast!

Garen Torikian

Hey guys, glad you like it. :)

I made some non-insignificant changes recently and bumped the minor version to 0.2.0. As far as I can remember, nothing should break regarding Cloud9's implementation.

I did also, however, drop a huge boatload of changes to shave something like another 150ms off large queries. This will come out in 0.2.1 in the next few days. If you guys need or want something please let me know or log an issue and I'll make it happen.

Ruben Daniels
Owner

@gjtorikian I need your help! :). Please talk to me on Skype, Cell or Mail. Thanks!

Chris Manson

Hey Guys, any update on this? I don't see it marked as any milestone and it's been quite for quite a while now

Ruben Daniels
Owner
Chris Manson

So.. will it be committed to this repo when it is done?

Ruben Daniels
Owner

We'll release that new version as OSS, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 45 unique commits by 2 authors.

Jul 02, 2012
Garen Torikian Add ack 41a15a3
Aug 26, 2012
Garen Torikian Merge branch 'master' into ack 78b86a7
Aug 31, 2012
Garen Torikian Merge branch 'master' into ack ec53a8a
Garen Torikian Stash initial work for ag 92839f6
Garen Torikian Stash fc7e4a0
Sep 01, 2012
Garen Torikian New ag, update filelist. b7af213
Sep 04, 2012
Garen Torikian Update work 71a70c6
Garen Torikian Continue updates 6ff33ff
Sep 05, 2012
Garen Torikian Stash binary 86099b8
Sep 25, 2012
Garen Torikian Stash 75657ae
Oct 06, 2012
Garen Torikian Mergee master 986ad07
Garen Torikian Get ag working for search in files, too 794e2e8
Garen Torikian Start ack work a9b947f
Oct 08, 2012
Garen Torikian Start ack support f48584c
Oct 15, 2012
Garen Torikian Quick update of agignore 584b8e0
Garen Torikian More updates 55d00cd
Oct 26, 2012
Garen Torikian Merge master (and search refactor) 9305666
Garen Torikian Get ag/nak filelist working, fix ag search, start nak search 1ffd9c0
Garen Torikian Merge master e98da1c
Garen Torikian Fully functional 530db5d
Garen Torikian Testing updates bea7256
Garen Torikian Add linux/openshift binary 4b7cfb3
Garen Torikian Update test b80b943
Garen Torikian Continue work on search tests ed6de2d
Oct 27, 2012
Garen Torikian Shock: test driven development DOES find bugs in code bbf847e
Oct 28, 2012
Garen Torikian Finalize tests and fix broken code. Huzzah b54e368
Garen Torikian By default, colors are off 6c24109
Oct 29, 2012
Garen Torikian Add hint in searchinfiles 3005ecb
Garen Torikian Add license file 1c8b1b2
Oct 30, 2012
Garen Torikian Split ag and nak logic into separate files 85e27e7
Nov 01, 2012
Garen Torikian Remove unneeded FS 1640748
Garen Torikian Remove unneeded FS 7fe987f
Nov 07, 2012
Garen Torikian Provide separate nak/ag plugins for architect afe23d0
Garen Torikian Update ag/nak to support dir ignores, update testing for this too 68e80de
Nov 08, 2012
Mostafa Eweda Some cleanup bf8e36b
Mostafa Eweda Merge branch 'master' into ack fba3815
Mostafa Eweda Fix failing node 0.6 c8de2b9
Garen Torikian Update ag binary for linux aed5b2f
Garen Torikian Update ag binary for osx d1ec74b
Garen Torikian Fix tests in case of lack of ag 7827395
Garen Torikian Add 32bit linux binary for ag 1369556
Nov 09, 2012
Garen Torikian Merge branch 'master' of github.com:ajaxorg/cloud9 into ack adc8580
Nov 29, 2012
Garen Torikian Merge master 17bd2ce
Garen Torikian Reimplement filelist with nak as VFS 97c32a1
Nov 30, 2012
Garen Torikian Implement VFS nak for searchinfiles 56b06bd
Something went wrong with that request. Please try again.