Skip to content
This repository has been archived by the owner on Oct 18, 2019. It is now read-only.

add haskell implementations #34

Merged
merged 4 commits into from
Mar 22, 2018

Conversation

tippenein
Copy link
Contributor

@tippenein tippenein commented May 30, 2016

Adds 2 implementations

  1. using a bytestring indice search
  2. with regex (regex-tdfa)

The results are included in benchmark.prof and on my machine are:
regex - 7.652 s (7.358 s .. 7.819 s)
indice - 2.887 s (2.830 s .. 2.919 s)

The Makefile includes the commands needed to run these benchmarks yourself. (make benchmark)

Thanks to @Gabriel439 for the majority of this implementation - here

edit: Added memory command to makefile to show memory usage.
indice: 28 MB total memory in use
regex: 9 MB total memory in use

@tippenein tippenein force-pushed the regex-and-indice-impl-and-benchmarks branch 2 times, most recently from b71f067 to 7d6c835 Compare May 30, 2016 23:43
1. implemented using a bytestring indice search
2. implemented with regex (regex-tdfa)
@tippenein tippenein force-pushed the regex-and-indice-impl-and-benchmarks branch from 7d6c835 to 1aef479 Compare May 30, 2016 23:51
@dimroc
Copy link
Owner

dimroc commented Jun 9, 2016

Thanks for the contribution guys. I especially love the memory consumption data. I've been hoping to do that on quite a few other implementations.

Two things to note:

  1. Output isn't sorted. All other implementations sort the output in descending order, with highest matching neighborhood being at the top. This does affect benchmarks.
  2. regex results seem off. According to tmp/haskell_regex_results.txt, park-slope-gowanus mentioned the knicks 119016 times. The regex results don't match the index result. haskell_indice_results.txt shows park-slope-gowanus has 258 matches which is correct.

Feel free to run one of the other implementations and compare results. Once you have the output sorted, you can just diff against the other outputs. I've attached the regex result below for you to see what I see.

haskell_regex_results.txt

Looking forward to the next commit.

@tippenein tippenein force-pushed the regex-and-indice-impl-and-benchmarks branch 3 times, most recently from 0bb0770 to 4e9ae86 Compare July 25, 2017 18:08
@tippenein tippenein force-pushed the regex-and-indice-impl-and-benchmarks branch from 4e9ae86 to 5cb62c1 Compare July 25, 2017 18:20
@tippenein
Copy link
Contributor Author

tippenein commented Jul 25, 2017

Regex results were a mistake in taking the Right result from an Either instead of checking the Right's Maybe. 691d50f

Sorting didn't have much effect on the time, but I've actually gotten a weaker CPU since the first time I ran this perf 😄

Files are identical and sorted

It's been ~1 year since I touched this, but I actually came back to this code recently for some processing I needed to do at work, so... here it is 👍

@dimroc dimroc merged commit 1c82057 into dimroc:master Mar 22, 2018
@dimroc
Copy link
Owner

dimroc commented Mar 22, 2018

🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants