hilightLine speed improvement #28

lyokha · 2015-10-31T13:29:08Z

hilightLine now concatenates whole areas delimited by color stack
instead of every next character separately.

Nicola, I added a smaller improvement in hilightLine. Not that I really detected real speed improvement after it, but the change made concatenation algorithm cleaner by retirement of concatenation of every character by (++).

hilightLine now concatenates whole areas delimited by color stack instead of every next character separately

hilightLine speed improvement

awgn · 2015-10-31T13:34:10Z

Merged! 👍

lyokha · 2015-10-31T13:50:43Z

Hope I won't do any changes till tomorrow :) By the way simple test of matching any not empty line to a file with approx. 300 lines

time cgrep --color -r --lang=Cpp -i -P '.+' src/CexmcChargeExchangeReconstructor.cc

shows that a new version approx. 3 times faster. And I also found that by some reason -P '.*' does not work (it accepts * to be a literal character) whereas -G '.*' works fine, but ... it has another issue: it prints

cgrep: Prelude.!!: index too large

no matter whether --color is set or not.

lyokha · 2015-10-31T14:09:55Z

Another issue I just found. If I have literal ## in code (let it be src/CGrep/Parser/Cpp/Token.hs) then using format option with #l will replace it by found match, i.e.

cgrep --format='#f ⎜ #n #l' -r "$@" -P '\*' ./src/CGrep/Parser/Cpp/Token.hs

will replace line

./src/CGrep/Parser/Cpp/Token.hs:362:                             "##", "<:", ":>", "<%", "%>", "%:", "::", ".*", "+=", "-=",

with

./src/CGrep/Parser/Cpp/Token.hs ⎜ 362                              "*", "<:", ":>", "<%", "%>", "%:", "::", ".*", "+=", "-=",

If I do not set any format then cgrep shows original line correctly.

awgn · 2015-10-31T14:22:24Z

I'm investigating on the first one right now...

On Sat, Oct 31, 2015 at 3:09 PM, Alexey Radkov notifications@github.com
wrote:

Another issue I just found. If I have literal ## in code (let it be
src/CGrep/Parser/Cpp/Token.hs) then using format option with #l will
replace it by found match, i.e.

cgrep --format='#f ⎜ #n #l' -r "$@" -P '*' ./src/CGrep/Parser/Cpp/Token.hs

will replace line

./src/CGrep/Parser/Cpp/Token.hs:362: "##", "<:", ":>", "<%", "%>", "%:", "::", ".*", "+=", "-=",

with

./src/CGrep/Parser/Cpp/Token.hs ⎜ 362 "", "<:", ":>", "<%", "%>", "%:", "::", ".", "+=", "-=",

If I do not set any format then cgrep shows original line correctly.

—
Reply to this email directly or view it on GitHub
#28 (comment).

The difference between theory and practice is bigger in practice than in
theory.

lyokha · 2015-10-31T14:40:55Z

I suspect that matching with empty line can make harm to mkOutput. At least in my case I see that matches can be empty

cgrep --color -r --lang=Cpp -i -P -d2 '.*' src/CexmcChargeExchangeReconstructor.cc 
Cgrep 6.5.9!
options   : Options {file = "", word_match = False, prefix_match = False, suffix_match = False, edit_dist = False, regex_posix = False, regex_pcre = True, ignore_case = True, code = False, comment = False, literal = False, semantic = False, identifier = False, keyword = False, directive = False, header = False, number = False, string = False, char = False, oper = False, no_filename = False, no_linenumber = False, lang = ["Cpp"], lang_maps = False, force_language = Nothing, multiline = 1, recursive = True, prune_dir = [], deference_recursive = False, invert_match = False, max_count = 9223372036854775807, count = False, show_match = False, color = True, format = Nothing, json = False, xml = False, jobs = 1, cores = 0, chunk = 16, asynch = False, debug = 2, no_turbo = False, others = [".*","src/CexmcChargeExchangeReconstructor.cc"]}
languages : [Cpp]
pattern   : [".*"]
files     : ["src/CexmcChargeExchangeReconstructor.cc"]
isTermIn  : True
isTermOut : True
strategy  : running regex (pcre) search on src/CexmcChargeExchangeReconstructor.cc...
tokens    : [(0,"/*"),(2,"")]
src/CexmcChargeExchangeReconstructor.cc:1:/*

and this Prelude index error in case of -G '.*' option seems to be the consequence of subtracting of 1 from 0 in mkOutput and then taking index from that by (!!) in ls !! (n-1).

awgn · 2015-10-31T15:26:44Z

Trying to fix the second one right now...

awgn · 2015-10-31T16:18:21Z

Alexey,

I believe to have fixed both bugs. You can pull from master and check it
out.

If everything is fine tomorrow we can release it on hackage.

Ciao,
N.

On Sat, Oct 31, 2015 at 3:40 PM, Alexey Radkov notifications@github.com
wrote:

I suspect that matching with empty line can make harm to mkOutput. At
least in my case I see that matches can be empty

cgrep --color -r --lang=Cpp -i -P -d2 '.' src/CexmcChargeExchangeReconstructor.cc
Cgrep 6.5.9!
options : Options {file = "", word_match = False, prefix_match = False, suffix_match = False, edit_dist = False, regex_posix = False, regex_pcre = True, ignore_case = True, code = False, comment = False, literal = False, semantic = False, identifier = False, keyword = False, directive = False, header = False, number = False, string = False, char = False, oper = False, no_filename = False, no_linenumber = False, lang = ["Cpp"], lang_maps = False, force_language = Nothing, multiline = 1, recursive = True, prune_dir = [], deference_recursive = False, invert_match = False, max_count = 9223372036854775807, count = False, show_match = False, color = True, format = Nothing, json = False, xml = False, jobs = 1, cores = 0, chunk = 16, asynch = False, debug = 2, no_turbo = False, others = [".","src/CexmcChargeExchangeReconstructor.cc"]}
languages : [Cpp]
pattern : ["."]
files : ["src/CexmcChargeExchangeReconstructor.cc"]
isTermIn : True
isTermOut : True
strategy : running regex (pcre) search on src/CexmcChargeExchangeReconstructor.cc...
tokens : [(0,"/"),(2,"")]
src/CexmcChargeExchangeReconstructor.cc:1:/*

and this Prelude index error in case of -G '.*' option seems to be the
consequence of subtracting of 1 from 0 in mkOutput and then taking index
from that by (!!) in ls !! (n-1).

—
Reply to this email directly or view it on GitHub
#28 (comment).

The difference between theory and practice is bigger in practice than in
theory.

lyokha · 2015-10-31T16:50:52Z

Array index error has really gone! But

-P '.*' stopping at first empty match
Replacement of ## by the pattern in case when filter option is set

are still here.

And color escape sequences algorithm requires additional patch from me for cases when the match is empty:

diff --git a/src/CGrep/Output.hs b/src/CGrep/Output.hs
index d4cf6df..33b4ecd 100644
--- a/src/CGrep/Output.hs
+++ b/src/CGrep/Output.hs
@@ -243,5 +243,5 @@ hilightLine ts =  hilightLine' (hilightIndicies ts, 0, 0)
                   (next, rest) = splitAt nn s

 hilightIndicies :: [Token] -> [(Int, Int)]
-hilightIndicies = foldr (\t a -> let b = fst t in (b, b + length (snd t) - 1) : a) []
+hilightIndicies = foldr (\t a -> let b = fst t in (b, b + length (snd t) - 1) : a) [] . filter ((>0) . length . snd)

Could you patch it? ... because otherwise I will have to open a new Pull Request. :)

lyokha · 2015-10-31T17:11:36Z

Hmm. Incredible. Point 1 is an error in PCRE library (or I somehow misunderstand how to use patterns there). Here is a part of my ghci session.

Prelude> :m +Data.Array
Prelude Data.Array> :set -hide-package regex-pcre-builtin
package flags have changed, resetting and loading new packages...
Prelude Data.Array> import Text.Regex.PCRE as P
Prelude Data.Array P> import Text.Regex.Posix as G
Prelude Data.Array P G> P.getAllTextMatches $ "/*\n *\nHell" P.=~ ".*" :: (Array Int) (MatchText String)
array (0,1) [(0,array (0,0) [(0,("/*",(0,2)))]),(1,array (0,0) [(0,("",(2,0)))])]
Prelude Data.Array P G> G.getAllTextMatches $ "/*\n *\nHell" G.=~ ".*" :: (Array Int) (MatchText String)
array (0,5) [(0,array (0,0) [(0,("/*",(0,2)))]),(1,array (0,0) [(0,("",(2,0)))]),(2,array (0,0) [(0,(" *",(3,2)))]),(3,array (0,0) [(0,("",(5,0)))]),(4,array (0,0) [(0,("Hell",(6,4)))]),(5,array (0,0) [(0,("",(10,0)))])]

So this point is not related to cgrep.

lyokha · 2015-10-31T17:17:32Z

Though it may be not an error but special behaviour of PCRE implementation in multiline environment.

lyokha · 2015-10-31T17:36:56Z

Ah, the second issue (## replacement) has also gone! (Probably I ran some cached version.) So it seems that now it does not have any issues found so far! (Except necessity to apply the patch I sent in a comment above). And I also noticed up to 2 times faster runs in some cases after your patch applied! Today was a really nice work!

awgn · 2015-10-31T17:44:18Z

I have just applied your patch 👍
Now I'm going to celebrate Halloween...

Thanks for your valuable collaboration :-)

speed improvement

f2fffe1

hilightLine now concatenates whole areas delimited by color stack instead of every next character separately

awgn added a commit that referenced this pull request Oct 31, 2015

Merge pull request #28 from lyokha/master

cbdc391

hilightLine speed improvement

awgn merged commit cbdc391 into awgn:master Oct 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hilightLine speed improvement #28

hilightLine speed improvement #28

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

awgn commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

hilightLine speed improvement #28

hilightLine speed improvement #28

Conversation

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015

awgn commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

lyokha commented Oct 31, 2015

awgn commented Oct 31, 2015