-
Notifications
You must be signed in to change notification settings - Fork 49
/
nimble_parsec.ex
2048 lines (1603 loc) · 63.1 KB
/
nimble_parsec.ex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
defmodule NimbleParsec do
@moduledoc "README.md"
|> File.read!()
|> String.split("<!-- MDOC !-->")
|> Enum.fetch!(1)
defmacrop is_combinator(combinator) do
quote do
is_list(unquote(combinator))
end
end
@doc """
Defines a parser (and a combinator) with the given `name` and `opts`.
The parser is a function that receives two arguments, the binary
to be parsed and a set of options. You can consult the documentation
of the generated parser function for more information.
This function will also define a combinator that can be used as
`parsec(name)` when building other parsers. See `parsec/2` for
more information on invoking compiled combinators.
## Beware!
`defparsec/3` is executed during compilation. This means you can't
invoke a function defined in the same module. The following will error
because the `date` function has not yet been defined:
defmodule MyParser do
import NimbleParsec
def date do
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
end
defparsec :date, date()
end
This can be solved in different ways. You may simply
compose a long parser using variables. For example:
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
defparsec :date, date
end
Alternatively, you may define a `Helpers` module with many
convenience combinators, and then invoke them in your parser
module:
defmodule MyParser.Helpers do
import NimbleParsec
def date do
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
end
end
defmodule MyParser do
import NimbleParsec
import MyParser.Helpers
defparsec :date, date()
end
The approach of using helper modules is the favorite way
of composing parsers in `NimbleParsec`.
## Options
* `:inline` - when true, inlines clauses that work as redirection for
other clauses. It is disabled by default because of a bug in Elixir
v1.5 and v1.6 where unused functions that are inlined cause a
compilation error
* `:debug` - when true, writes generated clauses to `:stderr` for debugging
"""
defmacro defparsec(name, combinator, opts \\ []) do
compile(:def, :defp, name, combinator, opts)
end
@doc """
Defines a private parser (and a combinator) with the given `name` and `opts`.
The same as `defparsec/3` but the parsing function is private.
"""
defmacro defparsecp(name, combinator, opts \\ []) do
compile(:defp, :defp, name, combinator, opts)
end
@doc """
Defines a combinator with the given `name` and `opts`.
It is similar to `defparsec/3` except it does not define
an entry-point parsing function, just the combinator function
to be used with `parsec/2`.
"""
defmacro defcombinator(name, combinator, opts \\ []) do
compile(nil, :def, name, combinator, opts)
end
@doc """
Defines a combinator with the given `name` and `opts`.
It is similar to `defparsecp/3` except it does not define
an entry-point parsing function, just the combinator function
to be used with `parsec/2`.
"""
defmacro defcombinatorp(name, combinator, opts \\ []) do
compile(nil, :defp, name, combinator, opts)
end
defp compile(parser_kind, combinator_kind, name, combinator, opts) do
combinator =
quote bind_quoted: [
parser_kind: parser_kind,
combinator_kind: combinator_kind,
name: name,
combinator: combinator,
opts: opts
] do
{defs, inline} = NimbleParsec.Compiler.compile(name, combinator, opts)
NimbleParsec.Recorder.record(
__MODULE__,
parser_kind,
combinator_kind,
name,
defs,
inline,
opts
)
if opts[:export_metadata] do
def __nimble_parsec__(unquote(name)),
do: unquote(combinator |> Enum.reverse() |> Macro.escape())
end
if inline != [] do
@compile {:inline, inline}
end
if combinator_kind == :def do
for {name, args, guards, body} <- defs do
def unquote(name)(unquote_splicing(args)) when unquote(guards), do: unquote(body)
end
else
for {name, args, guards, body} <- defs do
defp unquote(name)(unquote_splicing(args)) when unquote(guards), do: unquote(body)
end
end
end
parser = compile_parser(name, parser_kind)
quote do
unquote(parser)
unquote(combinator)
end
end
defp compile_parser(_name, nil) do
:ok
end
defp compile_parser(name, :def) do
quote bind_quoted: [name: name] do
{doc, spec, {name, args, guards, body}} = NimbleParsec.Compiler.entry_point(name)
Module.get_attribute(__MODULE__, :doc) || @doc doc
@spec unquote(spec)
def unquote(name)(unquote_splicing(args)) when unquote(guards), do: unquote(body)
end
end
defp compile_parser(name, :defp) do
quote bind_quoted: [name: name] do
{_doc, spec, {name, args, guards, body}} = NimbleParsec.Compiler.entry_point(name)
@spec unquote(spec)
defp unquote(name)(unquote_splicing(args)) when unquote(guards), do: unquote(body)
end
end
@opaque t :: [combinator]
@type bin_modifier :: :integer | :utf8 | :utf16 | :utf32
@type range :: inclusive_range | exclusive_range
@type inclusive_range :: Range.t() | char
@type exclusive_range :: {:not, Range.t()} | {:not, char}
@type min_and_max :: {:min, non_neg_integer} | {:max, pos_integer}
@type call :: mfargs | fargs | atom
@type mfargs :: {module, atom, args :: [term]}
@type fargs :: {atom, args :: [term]}
@type gen_times :: Range.t() | non_neg_integer | nil
@type gen_weights :: [pos_integer] | nil
@type opts :: Keyword.t()
# Steps to add a new combinator:
#
# 1. Update the combinator type below
# 2. Update the compiler with combinator
# 3. Update the compiler with label step
#
@typep combinator :: bound_combinator | maybe_bound_combinator | unbound_combinator
@typep bound_combinator ::
{:bin_segment, [inclusive_range], [exclusive_range], bin_modifier}
| {:string, binary}
| :eos
@typep maybe_bound_combinator ::
{:label, t, binary}
| {:traverse, t, :pre | :post | :constant, [mfargs]}
@typep unbound_combinator ::
{:choice, [t], gen_weights}
| {:eventually, t}
| {:lookahead, t, :positive | :negative}
| {:parsec, atom | {module, atom}}
| {:repeat, t, mfargs, gen_times}
| {:times, t, pos_integer}
@doc ~S"""
Generate a random binary from the given parsec.
Let's see an example:
import NimbleParsec
generate(choice([string("foo"), string("bar")]))
The command above will return either "foo" or "bar". `generate/1`
is often used with pre-defined parsecs. In this case, the
`:export_metadata` flag must be set:
defmodule SomeModule do
import NimbleParsec
defparsec :parse,
choice([string("foo"), string("bar")]),
export_metadata: true
end
# Reference the parsec and generate from it
NimbleParsec.parsec({SomeModule, :parse})
|> NimbleParsec.generate()
|> IO.puts()
`generate/1` can often run forever for recursive algorithms.
Read the notes below and make use of the `gen_weight` and `gen_times`
option to certain parsecs to control the recursion depth.
## Notes
This feature is currently experimental and may change in many ways.
Overall, there is no guarantee over the generated output, except
that it will generate a binary that is parseable by the parsec
itself, but even this guarantee may be broken by parsers that have
custom validations. Keep in mind the following:
* `generate/1` is not compatible with NimbleParsec's dumped via
`mix nimble_parsec.compile`;
* `parsec/2` requires the referenced parsec to set `export_metadata: true`
on its definition;
* `choice/2` will be generated evenly. You can pass `:gen_weights`
as a list of positive integer weights to balance your choices.
This is particularly important for recursive algorithms;
* `repeat/2` and `repeat_while/3` will repeat between 0 and 3 times unless
a `:gen_times` option is given to these operations. `times/3` without a `:max`
will also additionally repeat between 0 and 3 times unless `:gen_times` is given.
The `:gen_times` option can either be an integer as the number of times to
repeat or a range where a random value in the range will be picked;
* `eventually/2` always generates the eventually parsec immediately;
* `lookahead/2` and `lookahead_not/2` are simply discarded;
* Validations done in any of the traverse definitions are not taken into account
by the generator. Therefore, if a parsec does validations, the generator may
generate binaries invalid to said parsec;
"""
def generate(parsecs) do
parsecs
|> Enum.reverse()
|> generate(nil, [])
|> elem(0)
|> IO.iodata_to_binary()
end
defp generate([{:parsec, fun} | _parsecs], nil, _acc) when is_atom(fun) do
raise "cannot generate parsec(#{inspect(fun)}), use a remote parsec instead"
end
defp generate([{:parsec, fun} | parsecs], mod, acc) when is_atom(fun) do
generate([{:parsec, {mod, fun}} | parsecs], mod, acc)
end
defp generate([{:parsec, {mod, fun}} | outer_parsecs], outer_mod, acc) do
{gen, _} = generate(gen_export(mod, fun), mod, [])
generate(outer_parsecs, outer_mod, [gen | acc])
end
defp generate([{:string, string} | parsecs], mod, acc) do
generate(parsecs, mod, [string | acc])
end
defp generate([{:bin_segment, inclusive, exclusive, modifier} | parsecs], mod, acc) do
gen = gen_bin_segment(inclusive, exclusive)
gen =
if modifier == :integer,
do: gen,
else: :unicode.characters_to_binary([gen], :unicode, modifier)
generate(parsecs, mod, [gen | acc])
end
defp generate([:eos | parsecs], mod, acc) do
if parsecs == [] do
generate([], mod, acc)
else
raise ArgumentError, "found :eos not at the end of parsecs"
end
end
defp generate([{:traverse, t, _, _} | parsecs], mod, acc) do
generate(t ++ parsecs, mod, acc)
end
defp generate([{:label, t, _} | parsecs], mod, acc) do
generate(t ++ parsecs, mod, acc)
end
defp generate([{:choice, choices, weights} | parsecs], mod, acc) do
pick = if weights, do: weighted_random(choices, weights), else: list_random(choices)
{gen, _aborted?} = generate(pick, mod, [])
generate(parsecs, mod, [gen | acc])
end
defp generate([{:lookahead, _, _} | parsecs], mod, acc) do
generate(parsecs, mod, acc)
end
defp generate([{:repeat, t, _, gen} | parsecs], mod, acc) do
generate(parsecs, mod, gen_times(t, int_random(gen), mod, acc))
end
defp generate([{:times, t, max} | parsecs], mod, acc) do
generate(parsecs, mod, gen_times(t, Enum.random(0..max), mod, acc))
end
defp generate([], _mod, acc), do: {Enum.reverse(acc), false}
defp gen_export(mod, fun) do
unless Code.ensure_loaded?(mod) do
raise "cannot handle parsec(#{inspect({mod, fun})}) because #{inspect(mod)} is not available"
end
try do
mod.__nimble_parsec__(fun)
rescue
_ ->
raise "cannot handle parsec(#{inspect({mod, fun})}) because #{inspect(mod)} " <>
"did not set :export_metadata when defining #{fun}"
end
end
defp gen_times(_t, 0, _mod, acc), do: acc
defp gen_times(t, n, mod, acc) do
case generate(t, mod, []) do
{gen, true} -> [gen | acc]
{gen, false} -> gen_times(t, n - 1, mod, [gen | acc])
end
end
defp gen_bin_segment(inclusive, exclusive) do
gen =
if(inclusive == [], do: [0..255], else: inclusive)
|> list_random()
|> int_random()
if Enum.any?(exclusive, &exclude_bin_segment?(&1, gen)) do
gen_bin_segment(inclusive, exclusive)
else
gen
end
end
defp exclude_bin_segment?({:not, min..max}, gen), do: gen >= min and gen <= max
defp exclude_bin_segment?({:not, char}, gen) when is_integer(char), do: char == gen
defp int_random(nil), do: Enum.random(0..3)
defp int_random(_.._ = range), do: Enum.random(range)
defp int_random(int) when is_integer(int), do: int
# Enum.random uses reservoir sampling but our lists are short, so we use length + fetch!
defp list_random(list) when is_list(list),
do: Enum.fetch!(list, :rand.uniform(length(list)) - 1)
defp weighted_random(list, weights) do
weighted_random(list, weights, :rand.uniform(Enum.sum(weights)))
end
defp weighted_random([elem | _], [weight | _], chosen) when chosen <= weight,
do: elem
defp weighted_random([_ | list], [weight | weights], chosen),
do: weighted_random(list, weights, chosen - weight)
@doc ~S"""
Returns an empty combinator.
An empty combinator cannot be compiled on its own.
"""
@spec empty() :: t
def empty() do
[]
end
@doc """
Invokes an already compiled combinator with name `name` in the
same module.
Every parser defined via `defparsec/3` or `defparsecp/3` can be
used as combinators. However, the `defparsec/3` and `defparsecp/3`
functions also define an entry-point parsing function, as implied
by their names. If you want to define a combinator with the sole
purpose of using it in combinator, use `defcombinatorp/3` instead.
## Use cases
`parsec/2` is useful to implement recursive definitions.
Note while `parsec/2` can be used to compose smaller combinators,
the preferred mechanism for doing composition is via regular functions
and not via `parsec/2`. Let's see a practical example. Imagine
that you have this module:
defmodule MyParser do
import NimbleParsec
date =
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
time =
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
defparsec :datetime, date |> ignore(string("T")) |> concat(time), debug: true
end
Now imagine that you want to break `date` and `time` apart
into helper functions, as you use them in other occasions.
Generally speaking, you should **NOT** do this:
defmodule MyParser do
import NimbleParsec
defcombinatorp :date,
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
defcombinatorp :time,
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
defparsec :datetime,
parsec(:date) |> ignore(string("T")) |> concat(parsec(:time))
end
The reason why the above is not recommended is because each
`parsec/2` combinator ends-up adding a stacktrace entry during
parsing, which affects the ability of `NimbleParsec` to optimize
code. If the goal is to compose combinators, you can do so
with modules and functions:
defmodule MyParser.Helpers do
import NimbleParsec
def date do
integer(4)
|> ignore(string("-"))
|> integer(2)
|> ignore(string("-"))
|> integer(2)
end
def time do
integer(2)
|> ignore(string(":"))
|> integer(2)
|> ignore(string(":"))
|> integer(2)
|> optional(string("Z"))
end
end
defmodule MyParser do
import NimbleParsec
import MyParser.Helpers
defparsec :datetime,
date() |> ignore(string("T")) |> concat(time())
end
The implementation above will be able to compile to the most
efficient format as possible without forcing new stacktrace
entries.
The only situation where you should use `parsec/2` for composition
is when a large parser is used over and over again in a way
compilation times are high. In this sense, you can use `parsec/2`
to improve compilation time at the cost of runtime performance.
By using `parsec/2`, the tree size built at compile time will be
reduced although runtime performance is degraded as `parsec`
introduces a stacktrace entry.
## Remote combinators
You can also reference combinators in other modules by passing
a tuple with the module name and a function to `parsec/2` as follows:
defmodule RemoteCombinatorModule do
defcombinator :upcase_unicode, utf8_char([...long, list, of, unicode, chars...])
end
defmodule LocalModule do
# Parsec that depends on `:upcase_A`
defparsec :parsec_name,
...
|> ascii_char([?a..?Z])
|> parsec({RemoteCombinatorModule, :upcase_unicode})
end
Remote combinators are useful when breaking the compilation of
large modules apart in order to use Elixir's ability to compile
modules in parallel.
## Examples
A good example of using `parsec` is with recursive parsers.
A limited but recursive XML parser could be written as follows:
defmodule SimpleXML do
import NimbleParsec
tag = ascii_string([?a..?z, ?A..?Z], min: 1)
text = ascii_string([not: ?<], min: 1)
opening_tag =
ignore(string("<"))
|> concat(tag)
|> ignore(string(">"))
closing_tag =
ignore(string("</"))
|> concat(tag)
|> ignore(string(">"))
defparsec :xml,
opening_tag
|> repeat(lookahead_not(string("</")) |> choice([parsec(:xml), text]))
|> concat(closing_tag)
|> wrap()
end
SimpleXML.xml("<foo>bar</foo>")
#=> {:ok, [["foo", "bar", "foo"]], "", %{}, {1, 0}, 14}
In the example above, `defparsec/3` has defined the entry-point
parsing function as well as a combinator which we have invoked
with `parsec(:xml)`.
In many cases, however, you want to define recursive combinators
without the entry-point parsing function. We can do this by
replacing `defparsec/3` by `defcombinatorp`:
defcombinatorp :xml,
opening_tag
|> repeat(lookahead_not(string("</")) |> choice([parsec(:xml), text]))
|> concat(closing_tag)
|> wrap()
When using `defcombinatorp`, you can no longer invoke
`SimpleXML.xml(xml)` as there is no associated parsing function.
You can only access the combinator above via `parsec/2`.
"""
@spec parsec(name :: atom) :: t
@spec parsec(t, name :: atom) :: t
@spec parsec({module, function_name :: atom}) :: t
@spec parsec(t, {module, function_name :: atom}) :: t
def parsec(combinator \\ empty(), name)
def parsec(combinator, name) when is_combinator(combinator) and is_atom(name) do
[{:parsec, name} | combinator]
end
def parsec(combinator, {module, function})
when is_combinator(combinator) and is_atom(module) and is_atom(function) do
[{:parsec, {module, function}} | combinator]
end
@doc ~S"""
Defines a single ASCII codepoint in the given ranges.
`ranges` is a list containing one of:
* a `min..max` range expressing supported codepoints
* a `codepoint` integer expressing a supported codepoint
* `{:not, min..max}` expressing not supported codepoints
* `{:not, codepoint}` expressing a not supported codepoint
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_lowercase,
empty()
|> ascii_char([?0..?9])
|> ascii_char([?a..?z])
end
MyParser.digit_and_lowercase("1a")
#=> {:ok, [?1, ?a], "", %{}, {1, 0}, 2}
MyParser.digit_and_lowercase("a1")
#=> {:error, "expected ASCII character in the range '0' to '9', followed by ASCII character in the range 'a' to 'z'", "a1", %{}, {1, 0}, 0}
"""
@spec ascii_char([range]) :: t
@spec ascii_char(t, [range]) :: t
def ascii_char(combinator \\ empty(), ranges)
when is_combinator(combinator) and is_list(ranges) do
{inclusive, exclusive} = split_ranges!(ranges, "ascii_char")
bin_segment(combinator, inclusive, exclusive, :integer)
end
@doc ~S"""
Defines a single UTF-8 codepoint in the given ranges.
`ranges` is a list containing one of:
* a `min..max` range expressing supported codepoints
* a `codepoint` integer expressing a supported codepoint
* `{:not, min..max}` expressing not supported codepoints
* `{:not, codepoint}` expressing a not supported codepoint
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_utf8,
empty()
|> utf8_char([?0..?9])
|> utf8_char([])
end
MyParser.digit_and_utf8("1é")
#=> {:ok, [?1, ?é], "", %{}, {1, 0}, 2}
MyParser.digit_and_utf8("a1")
#=> {:error, "expected utf8 codepoint in the range '0' to '9', followed by utf8 codepoint", "a1", %{}, {1, 0}, 0}
"""
@spec utf8_char([range]) :: t
@spec utf8_char(t, [range]) :: t
def utf8_char(combinator \\ empty(), ranges)
when is_combinator(combinator) and is_list(ranges) do
{inclusive, exclusive} = split_ranges!(ranges, "utf8_char")
bin_segment(combinator, inclusive, exclusive, :utf8)
end
@doc ~S"""
Adds a label to the combinator to be used in error reports.
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_and_lowercase,
empty()
|> ascii_char([?0..?9])
|> ascii_char([?a..?z])
|> label("digit followed by lowercase letter")
end
MyParser.digit_and_lowercase("1a")
#=> {:ok, [?1, ?a], "", %{}, {1, 0}, 2}
MyParser.digit_and_lowercase("a1")
#=> {:error, "expected a digit followed by lowercase letter", "a1", %{}, {1, 0}, 0}
"""
@spec label(t, String.t()) :: t
@spec label(t, t, String.t()) :: t
def label(combinator \\ empty(), to_label, label)
when is_combinator(combinator) and is_combinator(to_label) and is_binary(label) do
non_empty!(to_label, "label")
[{:label, Enum.reverse(to_label), label} | combinator]
end
@doc ~S"""
Defines an integer combinator with of exact length or `min` and `max`
length.
If you want an integer of unknown size, use `integer(min: 1)`.
This combinator does not parse the sign and is always on base 10.
## Examples
With exact length:
defmodule MyParser do
import NimbleParsec
defparsec :two_digits_integer, integer(2)
end
MyParser.two_digits_integer("123")
#=> {:ok, [12], "3", %{}, {1, 0}, 2}
MyParser.two_digits_integer("1a3")
#=> {:error, "expected ASCII character in the range '0' to '9', followed by ASCII character in the range '0' to '9'", "1a3", %{}, {1, 0}, 0}
With min and max:
defmodule MyParser do
import NimbleParsec
defparsec :two_digits_integer, integer(min: 2, max: 4)
end
MyParser.two_digits_integer("123")
#=> {:ok, [123], "", %{}, {1, 0}, 2}
MyParser.two_digits_integer("1a3")
#=> {:error, "expected ASCII character in the range '0' to '9', followed by ASCII character in the range '0' to '9'", "1a3", %{}, {1, 0}, 0}
If the size of the integer has a min and max close to each other, such as
from 2 to 4 or from 1 to 2, using choice may emit more efficient code:
choice([integer(4), integer(3), integer(2)])
Note you should start from bigger to smaller.
"""
@spec integer(pos_integer | [min_and_max]) :: t
@spec integer(t, pos_integer | [min_and_max]) :: t
def integer(combinator \\ empty(), count_or_opts)
when is_combinator(combinator) and (is_integer(count_or_opts) or is_list(count_or_opts)) do
validate_min_and_max!(count_or_opts, 1)
min_max_compile_runtime_chars(
combinator,
ascii_char([?0..?9]),
count_or_opts,
:__compile_integer__,
:__runtime_integer__,
[]
)
end
@doc ~S"""
Defines an ASCII string combinator with an exact length or `min` and `max`
length.
The `ranges` specify the allowed characters in the ASCII string.
See `ascii_char/2` for more information.
If you want a string of unknown size, use `ascii_string(ranges, min: 1)`.
If you want a literal string, use `string/2`.
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :two_lowercase_letters, ascii_string([?a..?z], 2)
end
MyParser.two_lowercase_letters("abc")
#=> {:ok, ["ab"], "c", %{}, {1, 0}, 2}
"""
@spec ascii_string([range], pos_integer | [min_and_max]) :: t
@spec ascii_string(t, [range], pos_integer | [min_and_max]) :: t
def ascii_string(combinator \\ empty(), range, count_or_opts)
when is_combinator(combinator) and is_list(range) and
(is_integer(count_or_opts) or is_list(count_or_opts)) do
min_max_compile_runtime_chars(
combinator,
ascii_char(range),
count_or_opts,
:__compile_string__,
:__runtime_string__,
[quote(do: integer)]
)
end
@doc ~S"""
Defines an UTF8 string combinator with of exact length or `min` and `max`
codepoint length.
The `ranges` specify the allowed characters in the UTF8 string.
See `utf8_char/2` for more information.
If you want a string of unknown size, use `utf8_string(ranges, min: 1)`.
If you want a literal string, use `string/2`.
Note that the combinator matches on codepoints, not graphemes. Therefore
results may vary depending on whether the input is in `nfc` or `nfd`
normalized form.
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :two_letters, utf8_string([], 2)
end
MyParser.two_letters("áé")
#=> {:ok, ["áé"], "", %{}, {1, 0}, 3}
"""
@spec utf8_string([range], pos_integer | [min_and_max]) :: t
@spec utf8_string(t, [range], pos_integer | [min_and_max]) :: t
def utf8_string(combinator \\ empty(), range, count_or_opts)
when is_combinator(combinator) and is_list(range) and
(is_integer(count_or_opts) or is_list(count_or_opts)) do
min_max_compile_runtime_chars(
combinator,
utf8_char(range),
count_or_opts,
:__compile_string__,
:__runtime_string__,
[quote(do: utf8)]
)
end
@doc ~S"""
Defines an end of string combinator.
The end of string does not produce a token and can be parsed multiple times.
This function is useful to avoid having to check for an empty remainder after
a successful parse.
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :letter_pairs, utf8_string([], 2) |> repeat() |> eos()
end
MyParser.letter_pairs("hi")
#=> {:ok, ["hi"], "", %{}, {1, 0}, 2}
MyParser.letter_pairs("hello")
#=> {:error, "expected end of string", "o", %{}, {1, 0}, 4}
"""
@spec eos :: t
@spec eos(t) :: t
def eos(combinator \\ empty()) do
[:eos | combinator]
end
@doc ~S"""
Concatenates two combinators.
## Examples
defmodule MyParser do
import NimbleParsec
defparsec :digit_upper_lower_plus,
concat(
concat(ascii_char([?0..?9]), ascii_char([?A..?Z])),
concat(ascii_char([?a..?z]), ascii_char([?+..?+]))
)
end
MyParser.digit_upper_lower_plus("1Az+")
#=> {:ok, [?1, ?A, ?z, ?+], "", %{}, {1, 0}, 4}
"""
@spec concat(t, t) :: t
def concat(left, right) when is_combinator(left) and is_combinator(right) do
right ++ left
end
@doc """
Duplicates the combinator `to_duplicate` `n` times.
"""
@spec duplicate(t, non_neg_integer) :: t
@spec duplicate(t, t, non_neg_integer) :: t
def duplicate(combinator \\ empty(), to_duplicate, n)
def duplicate(combinator, to_duplicate, 0)
when is_combinator(combinator) and is_combinator(to_duplicate) do
combinator
end
def duplicate(combinator, to_duplicate, n)
when is_combinator(combinator) and is_combinator(to_duplicate) and is_integer(n) and n >= 1 do
Enum.reduce(1..n, combinator, fn _, acc -> to_duplicate ++ acc end)
end
@doc """
Puts the result of the given combinator as the first element
of a tuple with the `byte_offset` as second element.
`byte_offset` is a non-negative integer.
"""
@spec byte_offset(t) :: t
@spec byte_offset(t, t) :: t
def byte_offset(combinator \\ empty(), to_wrap)
when is_combinator(combinator) and is_combinator(to_wrap) do
quoted_post_traverse(combinator, to_wrap, {__MODULE__, :__byte_offset__, []})
end
@doc """
Puts the result of the given combinator as the first element
of a tuple with the `line` as second element.
`line` is a tuple where the first element is the current line
and the second element is the byte offset immediately after
the newline.
"""
@spec line(t) :: t
@spec line(t, t) :: t
def line(combinator \\ empty(), to_wrap)
when is_combinator(combinator) and is_combinator(to_wrap) do
quoted_post_traverse(combinator, to_wrap, {__MODULE__, :__line__, []})
end
@doc ~S"""
Traverses the combinator results with the remote or local function `call`.
`call` is either a `{module, function, args}` representing
a remote call, a `{function, args}` representing a local call
or an atom `function` representing `{function, []}`.
The function given in `call` will receive 5 additional arguments.
The rest of the parsed binary, the parser results to be post_traversed,
the parser context, the current line and the current offset will
be prepended to the given `args`. The `args` will be injected at
the compile site and therefore must be escapable via `Macro.escape/1`.
The line and offset will represent the location after the combinators.
To retrieve the position before the combinators, use `pre_traverse/3`.
The `call` must return a tuple `{rest, acc, context}` with list of
results to be added to the accumulator as first argument and a context
as second argument. It may also return `{:error, reason}` to stop
processing. Notice the received results are in reverse order and
must be returned in reverse order too.
The number of elements returned does not need to be
the same as the number of elements given.
This is a low-level function for changing the parsed result.
On top of this function, other functions are built, such as