Skip to content

Commit

Permalink
Merge pull request #4 from bishabosha/split-logic-extract
Browse files Browse the repository at this point in the history
split logic for extraction
  • Loading branch information
bishabosha committed Feb 10, 2024
2 parents a41346a + eeabb95 commit ff19a3f
Show file tree
Hide file tree
Showing 5 changed files with 268 additions and 105 deletions.
91 changes: 91 additions & 0 deletions _docs/index.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,107 @@
# Enhanced String Interpolator

_You can read some background motivation for this project in my [blog post](https://bishabosha.github.io/articles/simple-parsing-with-strings.html)._

## Basic usage

Currently all functionality is accessed with one import:
```scala
import stringmatching.regex.Interpolators.r
```

### Example 1

A simple example can be illustrated here, where we parse some basic text representing a sequence of integers, delimited by `"["` and `"]"`, and separated by `", "`.
In one pattern, you can extract a typed value `xs: IndexedSeq[Int]` as follows:

```scala sc:nocompile
"[23, 56, 71]" match
case r"[${r"$xs%d"}...(, )]" => xs.sum // 150
```

The get an intuition for how `r` works, it works like the [simple string matcher](https://www.scala-lang.org/api/3.3.1/scala/StringContext$s$.html#unapplySeq-fffffd22) from the standard library (i.e. `case s"$foo $bar"`), except that after each splice you can append an optional format string, such as `%d`.

### Example 2

The format pattern is stripped before matching against the string, meaning that it only provides a directive of how to interpret the splice.

To illustrate, take the following example:

```scala sc:nocompile
"age: 23, year: 2019" match
case r"age: $n%d, year: $y%d" => n + y
```

The two `%d` format strings will be removed before matching, and tell the interpolator to treat `n` and `d` as integer patterns. This means that the behavior of above snippet is equivalent to the following:

```scala sc:nocompile
"age: 23, year: 2019" match
case str @ s"age: $n0, year: $y0" =>
(n0.toIntOption, y0.toIntOption) match
case (Some(n), Some(y)) => n + y
_ => throw MatchError(str)
```

## Possible Formats

### String Pattern

e.g. `$foo`, which extracts `val foo: String`.

### Int Pattern

e.g. `$foo%d`, which extracts `val foo: Int`.

### Long Pattern

e.g. `$foo%L`, which extracts `val foo: Long`.

### Float Pattern

e.g. `$foo%f`, which extracts `val foo: Float`.

### Double Pattern

e.g. `$foo%g`, which extracts `val foo: Double`.

### Split Pattern

e.g. `$foo...(<regex>)`, which extracts `val foo: List[String]`.

This is equivalent to extracting with `$foo` and then performing`foo.split(raw"<regex>").toIndexedSeq`.

This means that inside the `<regex>` you may put any valid regex accepted by `scala.util.matching.Regex`.
String escape characters are also not processed within the regex.

There is also a special case where if the first element of the sequence is expected to be empty you can drop it with the `$foo..!(<regex>)` pattern.


Putting this all together, you could split Windows style strings with the following pattern:

```scala sc:nocompile
raw"C:\foo\bar\baz.pdf" match
case r"C:$elems..!(\\)" => elems.mkString("/")
// yields "foo/bar/baz.pdf"
```

### Nested Patterns

The `r` interpolator can also match on `Seq` of strings, arbitrarily nested.

For example

```scala sc:nocompile
val strings: Seq[String] = ???

val foo: Seq[Int] = strings match
case r"$foo%d" => foo
```

or even

```scala sc:nocompile
val stringss: Seq[Seq[String]] = ???

val foo: Seq[Seq[Int]] = stringss match
case r"$foo%d" => foo
```
1 change: 1 addition & 0 deletions project.scala
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
// Main
//> using scala "3.4.0-RC3"
//> using options -source:future -Yexplicit-nulls
//> using options -project enhanced-string-interpolator -siteroot ${.}

//> using publish.ci.computeVersion "git:tag"
Expand Down
5 changes: 4 additions & 1 deletion src/main/scala/stringmatching/regex/Interpolators.scala
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ object Interpolators:
end PatternElement

/** Holder for the pattern elements described by a string interpolated with `r`. */
case class Pattern(elements: Seq[PatternElement])
enum Pattern:
case Literal(glob: String)
case Single(glob: String, pattern: PatternElement)
case Multiple(glob: String, patterns: Seq[PatternElement])

extension (inline sc: StringContext)
/** use in patterns like `case r"$foo...(, )" => println(foo)` */
Expand Down
66 changes: 43 additions & 23 deletions src/main/scala/stringmatching/regex/Macros.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ object Macros:
'{ new RSStringContext[t]($patternExpr) }
end rsApplyExpr

/** Process a `RSStringContext` into a well-typed call to [[stringmatching.regex.Runtime.extract]]
/** Process a `RSStringContext` into a well-typed call to
* [[stringmatching.regex.Runtime.unsafeExtract]]
*/
def rsUnapplyExpr[R: Type, Base: Type](
rsSCExpr: Expr[RSStringContext[R]],
Expand All @@ -35,16 +36,24 @@ object Macros:
returnType match
case '[t] =>
'{
Runtime.extract[Base, t]($patternExpr.elements, levels = $levelsExpr)($scrutinee)
Runtime.unsafeExtract[Base, t]($patternExpr, levels = $levelsExpr)($scrutinee)
}
end match
end rsUnapplyExpr

private object Reify:

given PatternToExpr: ToExpr[Pattern] with
def apply(pattern: Pattern)(using Quotes): Expr[Pattern] =
'{ Pattern(${ Expr.ofSeq(pattern.elements.map(Expr(_))) }) }
import Pattern.*

def apply(pattern: Pattern)(using Quotes): Expr[Pattern] = pattern match
case Literal(glob) =>
'{ Literal(${ Expr(glob) }) }
case Single(glob, pattern) =>
'{ Single(${ Expr(glob) }, ${ Expr(pattern) }) }
case Multiple(glob, patterns) =>
'{ Multiple(${ Expr(glob) }, ${ Expr.ofSeq(patterns.map(Expr(_))) }) }
end PatternToExpr

given FormatPatternToExpr: ToExpr[FormatPattern] with
import FormatPattern.*
Expand Down Expand Up @@ -145,29 +154,40 @@ object Macros:
case _ => report.errorAndAbort(s"unsupported format: `%$format`")
case rest =>
PatternElement.Glob(globPattern(rest))
Pattern(PatternElement.Glob(globPattern(g)) +: rest0)

val g0 = globPattern(g)

if rest0.isEmpty then Pattern.Literal(g0)
else if rest0.sizeIs == 1 then Pattern.Single(g0, rest0.head)
else Pattern.Multiple(g0, rest0)
end parsed

private def refineResult(pattern: Pattern)(using Quotes): quotes.reflect.TypeRepr =
import quotes.reflect.*
val args = pattern.elements
.drop(1)
.map:
case PatternElement.Glob(_) => TypeRepr.of[String]
case PatternElement.Split(_, _) => TypeRepr.of[IndexedSeq[String]]
case PatternElement.SplitEmpty(_, _) => TypeRepr.of[IndexedSeq[String]]
case PatternElement.Format(format, _) =>
format match
case FormatPattern.AsInt => TypeRepr.of[Int]
case FormatPattern.AsLong => TypeRepr.of[Long]
case FormatPattern.AsDouble => TypeRepr.of[Double]
case FormatPattern.AsFloat => TypeRepr.of[Float]
if args.size == 0 then TypeRepr.of[EmptyTuple]
else if args.size == 1 then args.head
else if args.size <= 22 then AppliedType(defn.TupleClass(args.size).typeRef, args.toList)
else
report.errorAndAbort(s"too many captures: ${args.size} (implementation restriction: max 22)")
end if

def typeOfPattern(element: PatternElement) = element match
case PatternElement.Glob(_) => TypeRepr.of[String]
case PatternElement.Split(_, _) => TypeRepr.of[IndexedSeq[String]]
case PatternElement.SplitEmpty(_, _) => TypeRepr.of[IndexedSeq[String]]
case PatternElement.Format(format, _) =>
format match
case FormatPattern.AsInt => TypeRepr.of[Int]
case FormatPattern.AsLong => TypeRepr.of[Long]
case FormatPattern.AsDouble => TypeRepr.of[Double]
case FormatPattern.AsFloat => TypeRepr.of[Float]

pattern match
case Pattern.Literal(_) => TypeRepr.of[EmptyTuple]
case Pattern.Single(_, pattern) => typeOfPattern(pattern)
case Pattern.Multiple(_, elements) =>
val args = elements.map(typeOfPattern)
if args.size <= 22 then AppliedType(defn.TupleClass(args.size).typeRef, args.toList)
else
report.errorAndAbort(
s"too many captures: ${args.size} (implementation restriction: max 22)"
)
end if
end match
end refineResult

private def wrapping[Base: Type](using Quotes): Int =
Expand Down
Loading

0 comments on commit ff19a3f

Please sign in to comment.