In [1]:
import { requireCytoscape, requireCarbon } from "./lib/draw";

requireCarbon();
requireCytoscape();

# Domain Specific Languages (DSL) and Regular Expressions

## Where Were We?

1. Language primitives (i.e., building blocks of languages)
2. **Language paradigms** (i.e., combinations of language primitives)
    - Last time: React
    - This time: **domain specific languages** (DSL)
3. Building a language (i.e., designing your own language)

## Goal

1. Introduce programming with **domain specific languages** (DSLs).
2. Introduce **regular expressions** as an example of a DSL.

## Why DSLs?

1. Unlike a general purpose programming language, a DSL is designed to solve a class of problems in a specific domain. Consequently, a DSL is not necessarily Turing complete.
2. The downside is that there are some programs that you will not be able to write in a DSL.
3. The upside is that your programs can have special properties that may be useful for your specific setting.

## Examples of DSLs

1. Mathematica: solving mathematical equations
2. Matlab: scientific computing
3. Gradle: build system
4. YACC: parser generator
5. SQL: relational database query language

### Mathematica

Example code:
```
solve x^2 + 4x + 6 = 0
```

[https://www.wolframalpha.com/](https://www.wolframalpha.com/)

### Matlab


Example code:

```
Ts = 1/50;
t = 0:Ts:10-Ts;                     
x = sin(2*pi*15*t) + sin(2*pi*20*t);
plot(t,x)
xlabel('Time (seconds)')
ylabel('Amplitude')
```

[https://www.mathworks.com/products/matlab.html](https://www.mathworks.com/products/matlab.html)

### Gradle

Example code:
```
dependencies {                              
    api("junit:junit:4.13")
    implementation("junit:junit:4.13")
    testImplementation("junit:junit:4.13")
}
```

[https://docs.gradle.org/current/userguide/userguide.html](https://docs.gradle.org/current/userguide/userguide.html)

### SQL (Structured Query Language)

Example code:

```
SELECT column1, column2 FROM table1, table2 WHERE column2='value';
```

[https://www.w3schools.com/sql/](https://www.w3schools.com/sql/)

### YACC (Yet Another Compiler Compiler)

Example code:
```
input :
   | input line
;
line : '\n'
   | exp '\n'  { printf ("\t%.10g\n", $1); }
;
exp : NUM             { $$ = $1;         }
   | exp exp '+'     { $$ = $1 + $2;    }
   | exp exp '-'     { $$ = $1 - $2;    }
   | exp exp '*'     { $$ = $1 * $2;    }
   | exp exp '/'     { $$ = $1 / $2;    }
   /* Exponentiation */
   | exp exp '^'     { $$ = pow ($1, $2); }
   /* Unary minus    */
  | exp 'n'         { $$ = -$1;        }
```

[https://www.cs.ccu.edu.tw/~naiwei/cs5605/YaccBison.html](https://www.cs.ccu.edu.tw/~naiwei/cs5605/YaccBison.html)

Question: does YACC code remind you of something that you have seen in class?

## Common Problem: Pattern Matching on Strings

### Let's start with simple patterns

In [2]:
const files = ["hw1.ts", "hw1.js", "hw2.ts", "hw2.js"];

In [3]:
// Get all strings that end with ".ts"
files.filter((x) => x.endsWith(".ts"));

[ [32m'hw1.ts'[39m, [32m'hw2.ts'[39m ]


In [4]:
// Get all strings that end with ".js"
files.filter((x) => x.endsWith(".js"));

[ [32m'hw1.js'[39m, [32m'hw2.js'[39m ]


In [5]:
// Get all strings that begin with "hw1"
files.filter((x) => x.startsWith("hw1"));

[ [32m'hw1.ts'[39m, [32m'hw1.js'[39m ]


In [6]:
// Get all strings that begin with "hw1" and endsWith ".ts"
files.filter((x) => x.startsWith("hw1") && x.endsWith(".ts"));

[ [32m'hw1.ts'[39m ]


### More complex patterns

Suppose you want to check if phone numbers are valid.

In [7]:
const phoneNumbers = ["123-456-7890", "(123) 456-7890", "1234567890", "+1 1234567890"]; // phone numbers

In [8]:
function replaceAll(s: string, find: string, replace: string): string {
    let prev = s;
    let curr = s.replace(find, replace);
    while (prev !== curr) {
        prev = curr;
        curr = curr.replace(find, replace);
    }
    return curr;
}
replaceAll("123-456-7890", "-", "")

1234567890


In [9]:
const phoneNumbers2 = phoneNumbers.map((x) => replaceAll(x, "-", ""))
                                  .map((x) => replaceAll(x, " ", ""))
                                  .map((x) => replaceAll(x, "(", ""))
                                  .map((x) => replaceAll(x, ")", ""))
                                  .map((x) => replaceAll(x, "+", ""))
phoneNumbers2

[ [32m'1234567890'[39m, [32m'1234567890'[39m, [32m'1234567890'[39m, [32m'11234567890'[39m ]


In [10]:
function isNumber(s: string): boolean {
    for (const c of s) {
        if (! (c === "0" || c === "1" || c === "2" || c === "3" || c === "4" ||
               c === "5" || c === "6" || c === "7" || c === "8" || c === "9") ) {
            return false;
        }
    }
    return true
}

In [11]:
phoneNumbers2.filter(isNumber)

[ [32m'1234567890'[39m, [32m'1234567890'[39m, [32m'1234567890'[39m, [32m'11234567890'[39m ]


### Problems with Solution

1. Loses information:
    * +1 signifies country code
    * (123) signifies area code
2. This information may be useful for checking the validity of phone numbers, e.g., not all 10 digit numbers are valid phone numbers. The grouping of the numbers gives geographic information.
3. Does not enforce that phone numbers have a certain number of digits. For example, is "12389348762342134" an area code?

### Other examples of common patterns

1. URLs
    * https://www.google.com, www.google.com
2. ZIP codes
    * 12345 vs. 12345-678
3. Valid variable names in a programming language
    * Cannot start variables with a number in TypeScript
4. Extract emails and links from text

## What is the idea of a DSL?

Claim: Using string functions + general-purpose code is a no go for several reasons.
1. It requires a non-programmer to know how to program in a general-purpose language.
2. The non-programmer may find a more familar syntax easier to understand.
3. Therefore, we should design a language that is more familiar and easier to use.

## Regular Expressions

1. Addresses the string matching problem, thus useful.
2. Rich connections to formal language theory.
     * Take CSC 520.
     * [https://en.wikipedia.org/wiki/Chomsky_hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy)
     * Example of a DSL designed by computer scientists for computer scientists.

### Regular Expressions

### Regular Expressions for File Extensions

In [12]:
let regexpFileExt: RegExp = /.*\.ts$/;

In [13]:
const files = [".ts", "hw1.js", "hw2.ts", "hw2.js"];
files.filter((x) => regexpFileExt.test(x))

[ [32m'.ts'[39m, [32m'hw2.ts'[39m ]


Key

1. `/` and `/` signify the start and end of a regular expression similar to "" for strings.
2. `.` is a **wildcard**, i.e., it matches any character.
3. `\.` escapes `.` so that it matches a literal period similar to escaping characters in a string.
4. `t` and `s` stand for literal characters to match.
5. `$` means end of string.

### Regular Expressions for Phone Numbers

#### Phone numbers 1

In [14]:
const phoneNumbers = ["123-456-7890", "(123) 456-7890", "1234567890", "+1 1234567890"]; // phone numbers

In [15]:
let regexpPhone: RegExp = /^[0-9]{3}[0-9]{3}[0-9]{4}$/;
phoneNumbers.filter((x) => regexpPhone.test(x));

[ [32m'1234567890'[39m ]


Key

1. `^` means start of string.
3. `[0-9]` means every character between `0` and `9`.
4. `{x}` means exactly x matches of the preceeding expression.

#### Phone numbers 2

In [16]:
let regexpPhone2: RegExp = /^[0-9]{3}\s*-?\s*[0-9]{3}\s*-?\s*[0-9]{4}$/;
phoneNumbers.filter((x) => regexpPhone2.test(x));

[ [32m'123-456-7890'[39m, [32m'1234567890'[39m ]


Key

1. `\s` means any white space character.
2. `*` means 0 or more occurrences of previous character.
3. `?` means 0 or 1 occurrences of previous character.

#### Phone numbers 3

In [17]:
let regexpPhone3: RegExp = /^(\d{3})|(\(\d{3}\))\s*-?\s*\d{3}\s*-?\s*\d{4}$/;

In [18]:
phoneNumbers.filter((x) => regexpPhone3.test(x))

[ [32m'123-456-7890'[39m, [32m'(123) 456-7890'[39m, [32m'1234567890'[39m ]


Key

1. `|` means either the left or the right should match.
2. `\d` = `[0-9]`
3. `\(` means match the literal `(` because it's part of the regular expression language.

#### Phone numbers 4

In [19]:
let regexpPhone4: RegExp = /^(\+\d+)?\s*(\d{3})|(\(\d{3}\))\s*-?\s*\d{3}\s*-?\s*\d{4}$/;

In [20]:
phoneNumbers.filter((x) => regexpPhone4.test(x))

[ [32m'123-456-7890'[39m, [32m'(123) 456-7890'[39m, [32m'1234567890'[39m, [32m'+1 1234567890'[39m ]


Key

1. `+` means 1 or more occurrences

### Regular Expressions for Emails

In [21]:
let regexpEmail: RegExp = /^[\w.-]+@[\w.-]+$/;
console.log(regexpEmail.test("bob@sfsu.edu"));
console.log(regexpEmail.test("bobsfsu.edu"));
console.log(regexpEmail.test("bob@sfsu"));

[33mtrue[39m
[33mfalse[39m
[33mtrue[39m


### Regular Expression Summary

1. `/` and `/` signify the start and end of a regular expression similar to "" for strings.
2. `$` means end of string.
3. `^` means start of string.

4. `t` and `s` stand for literal characters to match.
5. `.` is a **wildcard**, i.e., it matches any character.
6. `\.` escapes `.` so that it matches a literal period similar to escaping characters in a string.

7. `\s` means any white space character.
8. `\d` = `[0-9]`
9. `[0-9]` means every character between `0` and `9`.

10. `|` means either the left or the right should match.
11. `{x}` means exactly x matches of the preceeding expression.
12. `*` means 0 or more occurrences of previous character.
13. `?` means 0 or 1 occurrences of previous character.
14. `+` means 1 or more occurrences

### Implementing Regular Expressions

In [22]:
type RegExp =
  | { tag: "VOID" }
  | { tag: "EMPTY" }  // ""
  | { tag: "CHAR", char: string } // match specific character, i.e., character case
  | { tag: "STAR", re: RegExp } // match any number, i.e., *
  | { tag: "CONCAT", re1: RegExp, re2: RegExp } // match re1 followed by re2, i.e., (re1)(re2)
  | { tag: "OR", re1: RegExp, re2: RegExp }  // match re1 or match re2, i.e., |

In [23]:
function newVoid(): RegExp { return { tag: "VOID" } }
function newEmpty(): RegExp { return { tag: "EMPTY" } }
function newChar(char: string): RegExp { return { tag: "CHAR", char: char } }
function newStar(re: RegExp): RegExp { return { tag: "STAR", re: re } }
function newConcat(re1: RegExp, re2: RegExp): RegExp { return { tag: "CONCAT", re1: re1, re2: re2 } }
function newOr(re1: RegExp, re2: RegExp): RegExp { return { tag: "OR", re1: re1, re2: re2 } }

In [24]:
// "asdfasdfasdf"
// [a, s, d, f, a, s, d, f, a, s, d, f]

function regexpTest(arr: string[], re: RegExp): boolean {
    switch (re.tag) {
        case "VOID": {
            return false;
        }
        case "EMPTY": {
            return arr.length === 0;
        }
        case "CHAR": {
            return arr.length === 1 ? arr[0] === re.char : false;
        }
        case "STAR": {
            if (arr.length === 0) {
                return true;
            }
            for (let i = 1; i <= arr.length; i++) {
                let arr2 = arr.slice(0, i);
                let count = 1;
                while (regexpTest(arr2, re.re) && arr2.length === i) {
                    arr2 = arr.slice(count*i, (count+1)*i);
                    count += 1;
                }
                if (arr2.length === 0) {
                    return true;
                }
            }
            return false;
        }
        case "CONCAT": {            
            for (let i = 0; i <= arr.length; i++) {
                const left = arr.slice(0, i);
                const right = arr.slice(i);
                if (regexpTest(left, re.re1) && regexpTest(right, re.re2)) {
                    return true;
                }
            }
            return false;
        }
        case "OR": {
            return regexpTest(arr, re.re1) || regexpTest(arr, re.re2);
        }
    }
}

In [25]:
const re1 = newConcat(newChar('a'), newConcat(newChar('b'), newChar('c')));
console.log(regexpTest(['a', 'b', 'c'], re1))
console.log(regexpTest(['a', 'b'], re1))
console.log(regexpTest(['a', 'a', 'b', 'c'], re1))

[33mtrue[39m
[33mfalse[39m
[33mfalse[39m


In [26]:
const re2_ = newConcat(newChar('a'), newChar('b'));
const re2 = newOr(re1, re2_);
console.log(regexpTest(['a', 'b', 'c'], re2))
console.log(regexpTest(['a', 'b'], re2))
console.log(regexpTest(['a', 'a', 'b', 'c'], re2))

[33mtrue[39m
[33mtrue[39m
[33mfalse[39m


In [27]:
const re3 = newStar(re1);
console.log(regexpTest([], re3))
console.log(regexpTest(['a', 'b', 'c'], re3))
console.log(regexpTest(['a', 'b', 'c', 'a', 'b', 'c'], re3))
console.log(regexpTest(['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'], re3))
console.log(regexpTest(['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a'], re3))

[33mtrue[39m
[33mtrue[39m
[33mtrue[39m
[33mtrue[39m
[33mfalse[39m


## Summary

1. We introduced the idea of a DSL and saw many examples of DSLs in different domains.
2. We focused on **regular expressions** as a DSL central to computer science.
3. Regular expressions can be used for matching patterns on strings.