-
Notifications
You must be signed in to change notification settings - Fork 5
/
criterion.lhs
78 lines (64 loc) · 2.14 KB
/
criterion.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
> {-# LANGUAGE OverloadedStrings #-}
We use Criterion to run a number of micro benchmarks that match
different regular expressions against strings.
> import Text.RegExp
> import Text.RegExp.Matching.Leftmost as Leftmost
> import Text.RegExp.Matching.Longest as Longest
> import Text.RegExp.Matching.LeftLong as LeftLong
>
> import Criterion.Main
>
> main :: IO ()
> main = defaultMain
> [ bgroup "full"
> [ bgroup mode
> [ bench name $ call re str
> | (name, re, str) <-
> [ ("phone", phone're, phone'str)
> , ("html" , html're , html'str)
> ]
> ]
> | (mode, call) <-
> [ ("accept", whnf . acceptFull)
> , ("count" , whnf . (matchingCount :: RegExp Char -> String -> Int))
> ]
> ]
> , bgroup "partial"
> [ bgroup mode
> [ bench name $ call re str
> | (name, re, str) <-
> [ ("rna", rna're, rna'str)
> ]
> ]
> | (mode, call) <-
> [ ("accept" , whnf . acceptPartial)
> , ("leftmost", whnf . Leftmost.matching)
> , ("longest" , whnf . Longest.matching)
> , ("leftlong", whnf . LeftLong.matching)
> ]
> ]
> ]
The following regular expression for phone numbers matches uniquely
against phone numbers like the one given below.
> phone're :: RegExp Char
> phone're = "[0-9]+(-[0-9]+)*"
>
> phone'str :: String
> phone'str = "0431-880-7267"
As an example for an ambiguous match we match the following regular
expression wich reminds one of HTML documents.
> html're :: RegExp Char
> html're = "(<\\w*>.*</\\w*>)*"
This expressions matches the string below in two different ways.
> html'str :: String
> html'str = "<p>some</p><p>text</p>"
To benchmark partial matchings we search for a protein sequence in an
RNA sequence. Protein sequences start with `AUG`, followed by codons
(triplets) built from the bases adenin (`A`), cytosine (`C`), guanin
(`G`), and uracil (`U`), and end with `UAG`, `UGA`, or `UAA`.
> rna're :: RegExp Char
> rna're = "AUG([ACGU][ACGU][ACGU])*(UAG|UGA|UAA)"
For example, the following RNA sequence contains the protein sequence
`AUGACACUUGAAUGA`.
> rna'str :: String
> rna'str = "UUACGGAUGACACUUGAAUGACUGA"