# Javascript - Regular Expressions (regex)

### Note: the code in this notebook is Javascript and needs to run in the debugger/console, or through driver.execute_script

Wayt to write/test javascript:
    
    - debugger/console (F12 or control+shift+C)
    - www.jsconsole.com
    - through Selenium (execute_script)
    
There are also websites where you can write/test regular expressions:
    
    - www.regexr.com
    - www.regex101.com
    
A typical workflow is to copy the HTML into regexr or regex101, and write the regular expression. 
When it works, bring the regex either to javascript/jquery (and run through driver.execute_script), or use python regex.

So we have several options for scraping:
    
    - Selenium locator functions
    - jquery (through Selenium driver.execute_script)
    - requests (simple/non-dynamic pages) with only regex
    - select element through Selenium locator, and use regex on innerHTML of element
    - etc

## Main functions 

If you are coding the regex yourself (not regexr or regex101), these are the  main functions:
    
    - test: To test if there is a match (True/False)
    - match: As test, and returns the match (or matches) 
    - matchAll: As match, but will allow you to 'capture groups' 
    - replace: Match and replace
    - exec: 'older' function used to capture groups (matchAll is easier to use)

## Flags

Flags are used to give options to the regex. Most important ones are:
    
    - i: case insensitive
    - m: multiline, special characters ^ (begin) and $ (end) will work on a line-by-line basis (not full text)
    - g: global (meaning, allow multiple matches)
    - s (dotall): the special character '.' (that matches any character) will also match newlines

### Quantifiers

Quantifiers specify how many times a pattern needs to be repeated:
    
    - ?: zero or once
    - +: once or more
    - *: any (zero or more)
    - {} specified, for example: {2, 4} at least 2 times, but most 4 times
        
> Note: we will see that '?' can also mean to be 'non-greedy'

## Examples

### test

```
const str = 'hello world!';
const result = /world/.test(str);

console.log(result); // true
```

Symbol '^' means start of string (tests if the string starts with 'world')
```
const result = /^world/.test(str);

console.log(result); // false

```

### match

```
let strdate = '2016-01-02';
var re_date = /[0-9]+/g 
let result = strdate.match(re_date);
console.log(result);  // ["2016", "01", "02"]
```

### matchAll

matchAll doesn't return an array but an iterable object. If there are no matches this object will be empty.
    

```
const string = "I am learning JavaScript not Java.";
const re = /Java[a-z]*/gi;

let result = string.matchAll(re);

for (match of result) {
  console.log(match);
}
// JavaScript, Java

```

#### Capturing a group 

```
const string = "I am learning JavaScript not Java.";
const re = /(Java[a-z]*)/gi;

let result = string.matchAll(re);

for (m of result){
    console.log(`Found "${m[0]}" at index ${m.index}.`)
}

// Found "JavaScript" at index 14.
// Found "Java" at index 29.
```

#### Captured a group (named)
```
const string = "I am learning JavaScript not Java.";
const re = /(?<name>Java[a-z]*)/gi;

let result = string.matchAll(re);

for (m of result){
    console.log(`Found "${m[0]}" at index ${m.index}. Captured name = ${m.groups['name']}`)
}
```

See: https://www.programiz.com/javascript/library/string/matchall

### replace

```
var str = "Are you Ok? Yes I'm OK"
let result = str.replace(/OK/gi,'fine');
Output: Are you fine? Yes, I'm fine.
```

### greedy vs non-greedy

Compare:

```
'<h1>Hello, world!</h1>'.match(/<.*>/);
// "<h1>Hello, world!</h1>"
```
vs: (? makes it 'stop' as soon as it can)
```
'<h1>Hello, world!</h1>'.match(/<.*?>/);
// "<h1>"
```

Now adding 'g' flag
```
'<h1>Hello, world!</h1>'.match(/<.*?>/g);
// "<h1>", "</h1>"
```

## Regex overview/cheat sheet

See https://github.com/zeeshanu/learn-regex
