Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hilightLine speed improvement #28

Merged
merged 1 commit into from
Oct 31, 2015
Merged

hilightLine speed improvement #28

merged 1 commit into from
Oct 31, 2015

Conversation

lyokha
Copy link
Collaborator

@lyokha lyokha commented Oct 31, 2015

hilightLine now concatenates whole areas delimited by color stack
instead of every next character separately.

Nicola, I added a smaller improvement in hilightLine. Not that I really detected real speed improvement after it, but the change made concatenation algorithm cleaner by retirement of concatenation of every character by (++).

hilightLine now concatenates whole areas delimited by color stack
instead of every next character separately
awgn added a commit that referenced this pull request Oct 31, 2015
hilightLine speed improvement
@awgn awgn merged commit cbdc391 into awgn:master Oct 31, 2015
@awgn
Copy link
Owner

awgn commented Oct 31, 2015

Merged! 👍

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

Hope I won't do any changes till tomorrow :) By the way simple test of matching any not empty line to a file with approx. 300 lines

time cgrep --color -r --lang=Cpp -i -P '.+' src/CexmcChargeExchangeReconstructor.cc

shows that a new version approx. 3 times faster. And I also found that by some reason -P '.*' does not work (it accepts * to be a literal character) whereas -G '.*' works fine, but ... it has another issue: it prints

cgrep: Prelude.!!: index too large

no matter whether --color is set or not.

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

Another issue I just found. If I have literal ## in code (let it be src/CGrep/Parser/Cpp/Token.hs) then using format option with #l will replace it by found match, i.e.

cgrep --format='#f ⎜ #n #l' -r "$@" -P '\*' ./src/CGrep/Parser/Cpp/Token.hs

will replace line

./src/CGrep/Parser/Cpp/Token.hs:362:                             "##", "<:", ":>", "<%", "%>", "%:", "::", ".*", "+=", "-=",

with

./src/CGrep/Parser/Cpp/Token.hs ⎜ 362                              "*", "<:", ":>", "<%", "%>", "%:", "::", ".*", "+=", "-=",

If I do not set any format then cgrep shows original line correctly.

@awgn
Copy link
Owner

awgn commented Oct 31, 2015

I'm investigating on the first one right now...

On Sat, Oct 31, 2015 at 3:09 PM, Alexey Radkov notifications@github.com
wrote:

Another issue I just found. If I have literal ## in code (let it be
src/CGrep/Parser/Cpp/Token.hs) then using format option with #l will
replace it by found match, i.e.

cgrep --format='#f ⎜ #n #l' -r "$@" -P '*' ./src/CGrep/Parser/Cpp/Token.hs

will replace line

./src/CGrep/Parser/Cpp/Token.hs:362: "##", "<:", ":>", "<%", "%>", "%:", "::", ".*", "+=", "-=",

with

./src/CGrep/Parser/Cpp/Token.hs ⎜ 362 "", "<:", ":>", "<%", "%>", "%:", "::", ".", "+=", "-=",

If I do not set any format then cgrep shows original line correctly.


Reply to this email directly or view it on GitHub
#28 (comment).

The difference between theory and practice is bigger in practice than in
theory.

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

I suspect that matching with empty line can make harm to mkOutput. At least in my case I see that matches can be empty

cgrep --color -r --lang=Cpp -i -P -d2 '.*' src/CexmcChargeExchangeReconstructor.cc 
Cgrep 6.5.9!
options   : Options {file = "", word_match = False, prefix_match = False, suffix_match = False, edit_dist = False, regex_posix = False, regex_pcre = True, ignore_case = True, code = False, comment = False, literal = False, semantic = False, identifier = False, keyword = False, directive = False, header = False, number = False, string = False, char = False, oper = False, no_filename = False, no_linenumber = False, lang = ["Cpp"], lang_maps = False, force_language = Nothing, multiline = 1, recursive = True, prune_dir = [], deference_recursive = False, invert_match = False, max_count = 9223372036854775807, count = False, show_match = False, color = True, format = Nothing, json = False, xml = False, jobs = 1, cores = 0, chunk = 16, asynch = False, debug = 2, no_turbo = False, others = [".*","src/CexmcChargeExchangeReconstructor.cc"]}
languages : [Cpp]
pattern   : [".*"]
files     : ["src/CexmcChargeExchangeReconstructor.cc"]
isTermIn  : True
isTermOut : True
strategy  : running regex (pcre) search on src/CexmcChargeExchangeReconstructor.cc...
tokens    : [(0,"/*"),(2,"")]
src/CexmcChargeExchangeReconstructor.cc:1:/*

and this Prelude index error in case of -G '.*' option seems to be the consequence of subtracting of 1 from 0 in mkOutput and then taking index from that by (!!) in ls !! (n-1).

@awgn
Copy link
Owner

awgn commented Oct 31, 2015

Trying to fix the second one right now...

@awgn
Copy link
Owner

awgn commented Oct 31, 2015

Alexey,

I believe to have fixed both bugs. You can pull from master and check it
out.

If everything is fine tomorrow we can release it on hackage.

Ciao,
N.

On Sat, Oct 31, 2015 at 3:40 PM, Alexey Radkov notifications@github.com
wrote:

I suspect that matching with empty line can make harm to mkOutput. At
least in my case I see that matches can be empty

cgrep --color -r --lang=Cpp -i -P -d2 '.' src/CexmcChargeExchangeReconstructor.cc
Cgrep 6.5.9!
options : Options {file = "", word_match = False, prefix_match = False, suffix_match = False, edit_dist = False, regex_posix = False, regex_pcre = True, ignore_case = True, code = False, comment = False, literal = False, semantic = False, identifier = False, keyword = False, directive = False, header = False, number = False, string = False, char = False, oper = False, no_filename = False, no_linenumber = False, lang = ["Cpp"], lang_maps = False, force_language = Nothing, multiline = 1, recursive = True, prune_dir = [], deference_recursive = False, invert_match = False, max_count = 9223372036854775807, count = False, show_match = False, color = True, format = Nothing, json = False, xml = False, jobs = 1, cores = 0, chunk = 16, asynch = False, debug = 2, no_turbo = False, others = [".
","src/CexmcChargeExchangeReconstructor.cc"]}
languages : [Cpp]
pattern : ["."]
files : ["src/CexmcChargeExchangeReconstructor.cc"]
isTermIn : True
isTermOut : True
strategy : running regex (pcre) search on src/CexmcChargeExchangeReconstructor.cc...
tokens : [(0,"/
"),(2,"")]
src/CexmcChargeExchangeReconstructor.cc:1:/*

and this Prelude index error in case of -G '.*' option seems to be the
consequence of subtracting of 1 from 0 in mkOutput and then taking index
from that by (!!) in ls !! (n-1).


Reply to this email directly or view it on GitHub
#28 (comment).

The difference between theory and practice is bigger in practice than in
theory.

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

Array index error has really gone! But

  1. -P '.*' stopping at first empty match
  2. Replacement of ## by the pattern in case when filter option is set

are still here.

And color escape sequences algorithm requires additional patch from me for cases when the match is empty:

diff --git a/src/CGrep/Output.hs b/src/CGrep/Output.hs
index d4cf6df..33b4ecd 100644
--- a/src/CGrep/Output.hs
+++ b/src/CGrep/Output.hs
@@ -243,5 +243,5 @@ hilightLine ts =  hilightLine' (hilightIndicies ts, 0, 0)
                   (next, rest) = splitAt nn s

 hilightIndicies :: [Token] -> [(Int, Int)]
-hilightIndicies = foldr (\t a -> let b = fst t in (b, b + length (snd t) - 1) : a) []
+hilightIndicies = foldr (\t a -> let b = fst t in (b, b + length (snd t) - 1) : a) [] . filter ((>0) . length . snd)

Could you patch it? ... because otherwise I will have to open a new Pull Request. :)

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

Hmm. Incredible. Point 1 is an error in PCRE library (or I somehow misunderstand how to use patterns there). Here is a part of my ghci session.

Prelude> :m +Data.Array
Prelude Data.Array> :set -hide-package regex-pcre-builtin
package flags have changed, resetting and loading new packages...
Prelude Data.Array> import Text.Regex.PCRE as P
Prelude Data.Array P> import Text.Regex.Posix as G
Prelude Data.Array P G> P.getAllTextMatches $ "/*\n *\nHell" P.=~ ".*" :: (Array Int) (MatchText String)
array (0,1) [(0,array (0,0) [(0,("/*",(0,2)))]),(1,array (0,0) [(0,("",(2,0)))])]
Prelude Data.Array P G> G.getAllTextMatches $ "/*\n *\nHell" G.=~ ".*" :: (Array Int) (MatchText String)
array (0,5) [(0,array (0,0) [(0,("/*",(0,2)))]),(1,array (0,0) [(0,("",(2,0)))]),(2,array (0,0) [(0,(" *",(3,2)))]),(3,array (0,0) [(0,("",(5,0)))]),(4,array (0,0) [(0,("Hell",(6,4)))]),(5,array (0,0) [(0,("",(10,0)))])]

So this point is not related to cgrep.

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

Though it may be not an error but special behaviour of PCRE implementation in multiline environment.

@lyokha
Copy link
Collaborator Author

lyokha commented Oct 31, 2015

Ah, the second issue (## replacement) has also gone! (Probably I ran some cached version.) So it seems that now it does not have any issues found so far! (Except necessity to apply the patch I sent in a comment above). And I also noticed up to 2 times faster runs in some cases after your patch applied! Today was a really nice work!

@awgn
Copy link
Owner

awgn commented Oct 31, 2015

I have just applied your patch 👍
Now I'm going to celebrate Halloween...

Thanks for your valuable collaboration :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants