Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for two problems #38

Merged
merged 2 commits into from
Jan 25, 2017
Merged

Fixes for two problems #38

merged 2 commits into from
Jan 25, 2017

Conversation

frederik-h
Copy link
Contributor

Hi,
these changes fix two problems.

  1. --invert-match is broken
    $ cat <<END | stack exec cgrep -- -v foo
    foo
    bar
    END
    Using 'cgreprc' configuration file...
    cgrep: Prelude.head: empty list

With my change, I get the expected result:
$cat <<END | stack exec cgrep -- -v foo
foo
bar
END
Using 'cgreprc' configuration file...
:2:bar

  1. Output.getOffset2d has a memory leak

When run on a long file with many matching lines,
cgrep becomes very slow. For instance,
I used a list of english-words and searched for the word
"house" which matches many lines:

$ git clone https://github.com/dwyl/english-words
$ time stack exec cgrep -- house english-words/words.txt
Using 'cgreprc' configuration file...
english-words/words.txt:7345:4:alehouse
english-words/words.txt:7346:4:alehouses
english-words/words.txt:8584:5:almshouse
english-words/words.txt:8585:5:almshouses
(...)
english-words/words.txt:351557:5:workhouse
english-words/words.txt:351558:5:workhoused
english-words/words.txt:351559:5:workhouses

real 1m29.212s
user 0m26.680s
sys 0m10.784s

The problem is a memory leak in Output.getOffset2d caused by the sharing of the
list prc defined in this function.
Doing the same search with my change to Output.getOffset2d yields the following statistics:

real 0m3.212s
user 0m3.116s
sys 0m0.260s

Frederik

Frederik Harwath added 2 commits January 25, 2017 13:52
* Handle empty token lists in Output.showLineCol

Inverting the matching lines leads to empty tokens list
in calls to the function showLineCol.
Previous definition led to sharing of the list prc
which resulted in a memory leak.
This made the program extremly slow in the case
of many matching lines.
@awgn awgn merged commit fc8b800 into awgn:master Jan 25, 2017
@awgn
Copy link
Owner

awgn commented Jan 25, 2017

Thanks for the fixes!

@frederik-h frederik-h deleted the fixes branch January 25, 2017 14:00
@frederik-h
Copy link
Contributor Author

You're welcome. I also have a more efficient implementation of Output.mkOutput.
If you are interested, I can also file a pull request for this.
For the example above, it yields a running time of about 0.6s on my computer.

@awgn
Copy link
Owner

awgn commented Jan 27, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants