⁉️ A simple REGular EXpression tutorial in JavaScript
Switch branches/tags
Nothing to show
Clone or download
Latest commit bfe2c52 Sep 22, 2016
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Initial commit Nov 12, 2013
README.md Update README.md Sep 22, 2016

README.md

learn-regex [Work-in-Progress]

A simple REGular EXpression tutorial in JavaScript

Regular Expression XKCD http://xkcd.com/208/

If you have ever wondered how search works, its all about finding patterns.
While a search engine will have sophisticated search algorithms at their core are searches for patterns in text.

I am yet to find a Regular Expression tutorial for complete beginners. So I'm writing one!
Note: this tutorial is specific to JavaScript.

What is a Regular Expression?

A regular expression is a pattern we search for in text.

Imagine you want to check if an email address is well correct when someone is filling in a form. One way of doing this would be to try sending an email to the address and waiting for a reply or "bounce" (error)... (obviously this is a terrible idea! sending email is time consuming) A far more efficient way of checking email validity is to use a regular expression that will tell us in a milisecond if the email matches a valid pattern.

email regex

Searching for Character(s) in a Block of Text

There are 30 bits of knowledge you need to grasp in order to use regular expressions effectively. Each one will only take a minute to learn.

Basic Symbols

At the most basic level we need /pattern/flags two forward slashes to show the beginning and end of the pattern we are searching for and after the last forward slash we have (optional) "flags" indicate how to search (see below for the other available flags).

For example the regular expression /th/g will find all occurences of th within a block of text. (the g flag means "global" match - i.e. find all)

var text = "The cat with the hat sat on the mat."
var pattern = /th/gi;
var match, matches = [];
while ( (match = pattern.exec(text)) ) {
    matches.push(match.index);
}
console.log(matches); // >> [ 0, 10, 13, 28 ]

Try: https://repl.it/MYU/2

. (period) Matches any single character. For example the regular expression c.t would match the strings cat, cut, c t, but not cart or clot (because there is more than one character between the c and the t)

var text = "Its raining cats and dogs";
var pattern = /c.t/;
var match = text.match(pattern);
console.log(match.index);       // >> 12

Try: https://repl.it/MYS/1

^ (caret or "circumflex") Matches the beginning of a line. For example, the regular expression ^Once will match the beginning of "Once upon a time" but would not match "I only saw her cry once..."

var text = "wake up its a beautiful morning!";
var pattern = /^wake/;
var match = text.match(pattern);
console.log(match.index);       // >> 0 

var text = "I'm wide awake";
var pattern = /^wake/;
var match = text.match(pattern);
console.log(match);             // >> null

Try: https://repl.it/MYS/3

$ (dollar) Matches the end of a line. The pattern: away$ would match the end of the string "I need to get away" but not the string "Up Up and away!" (because the away pattern is not at the end of the text/string/line)

var text = "this is the end";
var pattern = /end$/;
var match = text.match(pattern);
console.log(match.index);       // >> 12

var text = "is this the end?";
var pattern = /end$/;
var match = text.match(pattern);
console.log(match);             // >> null

Try: https://repl.it/MYS/2

* (asterisk) Matches zero or more occurences of the character immediately preceding. ma* means match any occurance of the characters ma followed by any other character so it will find matt or marmite but ignore mom because it does not contain ma.

var text = "The matrix is all around us...";
var pattern = /ma*/;
var match = text.match(pattern);
console.log(match.index); // >> 4

Try: https://repl.it/MbE

__ (backslash) The quoting/escaping character, use it to treat the next character as an ordinary character. For example: $ is used to match the dollar sign ($) rather than the end of a line. Similarly, the expression . is used to match the period character rather than any single character.

var text = "My Raspberry Pi Cost $45!";
var pattern = /\$/;
var match = text.match(pattern);
console.log(match.index);       // >> 21

var text = "Is this the end...?"
var pattern = /\./g;
var match, matches = [];
while ( (match = pattern.exec(text)) ) {
    matches.push(match.index);
}
console.log(matches); // >> [ 15, 16, 17 ]

Try: https://repl.it/MYY

[ ] (square brackets) Matches any one of the characters between the brackets For example, the regular expression r[aou]t matches rat, rot, and rut, but not ret.

var text = "The caged rat felt stuck in a rut.";
var pattern = /r[aou]t/g;
var match, matches = [];
while ( (match = pattern.exec(text)) ) {
    matches.push(match.index);
}
console.log(matches); // >> [ 10, 30 ]

Try: https://repl.it/MY2

An example of a range of characters we want to look for is: [0-9] this means means find any digit (its much easier than writing [0123456789])

var text = "The 49ers are an American football team based in San Francisco, California";
var pattern = /[0-9]/;
var match = text.match(pattern);
console.log(match.index);       // >> 4

var text = "The 49ers are an American football team based in San Francisco, California";
var pattern = /[0123456789]/;
var match = text.match(pattern);
console.log(match.index);       // >> 4

Try: https://repl.it/MbG

We can find all the numbers recursively in a block of text:

var text = "2486 runners started the race but only 1865 finished!";
var pattern = /[0-9]/g;
var match, matches = [];
while ( (match = pattern.exec(text)) ) {
    matches.push(match.index);
}
console.log(matches); // >> [ 0, 1, 2, 3, 39, 40, 41, 42 ]

Try: https://repl.it/MY4

| (verticle bar or "pipe") Allows us to serach for two conditions together. For example (him|her) matches the line "it belongs to him" and matches the line "it belongs to her" but does not match the line "it belongs to them."

var text = "2486 runners started the race but only 1865 finished!";
var pattern = /[0-9]/g;
var match, matches = [];
while ( (match = pattern.exec(text)) ) {
    matches.push(match.index);
}
console.log(matches); // >> [ 0, 1, 2, 3, 39, 40, 41, 42 ]

Flags Tell us How to Search

Flags are placed at the end of the pattern (after the last forward slash) and tell the regular expression parser how we want to search.

  • g : "global search" or "find all occurences"
  • i : "case insensitive" so find both CAT and cat
  • m : "multiline" will match your pattern over multiple lines

Special Characters

\d Matches a digit character in the basic Latin alphabet. Equivalent to [0-9]

\D Matches any character that is not a digit in the basic Latin alphabet. Equivalent to [^0-9]

Moar: http://linuxreviews.org/beginner/tao_of_regular_expressions/

Background

Reading

Video

Books

Bad Examples