### Compilation Flags

* When compiling a pattern string into a pattern object, it's possible to modify the standard behavior of the patterns using Compilation Flags.

* Multiple compilation flags can be combined using the bitwise OR "|".

Here is a list of all the complation flags:



Syntax	                Meaning

re.IGNORECASE or re.I --	ignore case.

re.MULTILINE or re.M --	make begin/end boundary matchers (^, $) consider each line.

re.DOTALL or re.S	--    make . match newline too.

re.UNICODE or re.U	 --   make {\w, \W, \b, \B} follow Unicode rules.

re.LOCALE or re.L	 --   make {\w, \W, \b, \B} follow locale.

re.ASCII or re.A	 --   make {\w, \W, \b, \B} perform ASCII-only matching.

re.VERBOSE or re.X	 --   allow comment in regex.

re.DEBUG	         --   get information about the compilation pattern.

Let's go through each one of them one by one.

### 1. re.IGNORECASE or re.I
This flag makes a regex pattern case-insensitive.

Let's check out an example to find all occurances of the and The in the given text.

In [1]:
import re

from colorama import Back, Style


def highlight_regex_matches(pattern, text, print_output=True):
	output = text
	len_inc = 0
	for match in pattern.finditer(text):
		start, end = match.start() + len_inc, match.end() + len_inc
		output = output[:start] + Back.YELLOW + Style.BRIGHT + output[start:end] + Style.RESET_ALL + output[end:]
		len_inc = len(output) - len(text)  

	if print_output:
		print(output)
	else:
		return output

In [2]:
txt = """
The best thing about regex is that it makes the task of string manipulation so easy.
"""

In [4]:
pattern = re.compile("the",flags=re.I)

In [5]:
pattern

re.compile(r'the', re.IGNORECASE|re.UNICODE)

In [6]:
highlight_regex_matches(pattern,txt)


[43m[1mThe[0m best thing about regex is that it makes [43m[1mthe[0m task of string manipulation so easy.



### 2. re.MULTILINE or re.M
This flag is used to make begin/end boundary matchers (^, $) consider each line of the given text.

Let's check out an example to find all lines starting with A.

In [34]:
txt = """
man was car crossing the road.
Suddenly, a car passed before him in a very high speed.
He was terrified
And shocked.
"""

In [35]:
pattern = re.compile("^A.+",flags = re.M)

In [36]:
highlight_regex_matches(pattern,txt)


man was car crossing the road.
Suddenly, a car passed before him in a very high speed.
He was terrified
[43m[1mAnd shocked.[0m



### 3. re.DOTALL or re.S
The . metacharacter matches everything except newline character. If we want to make . match newline too, we have to set this flag.

Let's consider an examle to match all the text after (and including) car.

In [41]:
pattern = re.compile("car.+",flags=re.S)

In [42]:
highlight_regex_matches(pattern,txt)


man was [43m[1mcar crossing the road.
Suddenly, a car passed before him in a very high speed.
He was terrified
And shocked.
[0m


### 4. re.UNICODE or re.U
Using this flag, we can make the pattern characters {\w, \W, \b, \B} dependent on the Unicode character properties database.

re.UNICODE is the default flag in Python 3 regex patterns.

Let's consider an example where we try to work on hindi language.

In [43]:
txt = "मुझे किताबें पढ़ना बहुत पसंद है।"

In [44]:
pattern = re.compile("\w+")

In [45]:
pattern.findall(txt)

['म', 'झ', 'क', 'त', 'ब', 'पढ', 'न', 'बह', 'त', 'पस', 'द', 'ह']

In [48]:
!pip install regex
import regex




In [49]:
pattern = regex.compile("\w+")

pattern.findall(txt)


['मुझे', 'किताबें', 'पढ़ना', 'बहुत', 'पसंद', 'है']

### 5. re.LOCALE or re.L
A locale is a set of environmental variables that defines the language, country, and character encoding settings (or any other special variant preferences) for your applications.

This flag will make the word pattern {\w, \W} and boundary pattern {\b, \B}, dependent on the current locale.

The use of this flag is discouraged in Python 3 as the locale mechanism is very unreliable, it only handles one “culture” at a time, and it only works with 8-bit locales. Unicode matching is already enabled by default in Python 3 for Unicode (str) patterns, and it is able to handle different locales/languages.

### 6. re.ASCII or re.A
This flag will make the word pattern {\w, \W} and boundary pattern {\b, \B} perform ASCII-only matching, i.e. only A-Z, a-z, 0-9 will be considered alphanumeric characters.

Let us see an example below:

In [51]:
pattern = re.compile("\w")

In [53]:
chars = ''.join(chr(i) for i in range(256))

In [54]:
print(chars)

 	
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ


In [55]:
highlight_regex_matches(pattern,chars)

 	
 !"#$%&'()*+,-./[43m[1m0[0m[43m[1m1[0m[43m[1m2[0m[43m[1m3[0m[43m[1m4[0m[43m[1m5[0m[43m[1m6[0m[43m[1m7[0m[43m[1m8[0m[43m[1m9[0m:;<=>?@[43m[1mA[0m[43m[1mB[0m[43m[1mC[0m[43m[1mD[0m[43m[1mE[0m[43m[1mF[0m[43m[1mG[0m[43m[1mH[0m[43m[1mI[0m[43m[1mJ[0m[43m[1mK[0m[43m[1mL[0m[43m[1mM[0m[43m[1mN[0m[43m[1mO[0m[43m[1mP[0m[43m[1mQ[0m[43m[1mR[0m[43m[1mS[0m[43m[1mT[0m[43m[1mU[0m[43m[1mV[0m[43m[1mW[0m[43m[1mX[0m[43m[1mY[0m[43m[1mZ[0m[\]^[43m[1m_[0m`[43m[1ma[0m[43m[1mb[0m[43m[1mc[0m[43m[1md[0m[43m[1me[0m[43m[1mf[0m[43m[1mg[0m[43m[1mh[0m[43m[1mi[0m[43m[1mj[0m[43m[1mk[0m[43m[1ml[0m[43m[1mm[0m[43m[1mn[0m[43m[1mo[0m[43m[1mp[0m[43m[1mq[0m[43m[1mr[0m[43m[1ms[0m[43m[1mt[0m[43m[1mu[0m[43m[1mv[0m[43m[1mw[0m[43m[1mx[0m[43m[1my[0m[43m[1mz[0m{|}~ ¡¢£¤¥¦§¨©[43m[1mª[0m«¬

In [56]:
pattern = re.compile("\w",flags=re.A)

highlight_regex_matches(pattern,chars)

 	
 !"#$%&'()*+,-./[43m[1m0[0m[43m[1m1[0m[43m[1m2[0m[43m[1m3[0m[43m[1m4[0m[43m[1m5[0m[43m[1m6[0m[43m[1m7[0m[43m[1m8[0m[43m[1m9[0m:;<=>?@[43m[1mA[0m[43m[1mB[0m[43m[1mC[0m[43m[1mD[0m[43m[1mE[0m[43m[1mF[0m[43m[1mG[0m[43m[1mH[0m[43m[1mI[0m[43m[1mJ[0m[43m[1mK[0m[43m[1mL[0m[43m[1mM[0m[43m[1mN[0m[43m[1mO[0m[43m[1mP[0m[43m[1mQ[0m[43m[1mR[0m[43m[1mS[0m[43m[1mT[0m[43m[1mU[0m[43m[1mV[0m[43m[1mW[0m[43m[1mX[0m[43m[1mY[0m[43m[1mZ[0m[\]^[43m[1m_[0m`[43m[1ma[0m[43m[1mb[0m[43m[1mc[0m[43m[1md[0m[43m[1me[0m[43m[1mf[0m[43m[1mg[0m[43m[1mh[0m[43m[1mi[0m[43m[1mj[0m[43m[1mk[0m[43m[1ml[0m[43m[1mm[0m[43m[1mn[0m[43m[1mo[0m[43m[1mp[0m[43m[1mq[0m[43m[1mr[0m[43m[1ms[0m[43m[1mt[0m[43m[1mu[0m[43m[1mv[0m[43m[1mw[0m[43m[1mx[0m[43m[1my[0m[43m[1mz[0m{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹

### 7.re.VERBOSE OR re.X

This flag changes the regex syntax.to allow you to add annotations in regex

* whitespace within the pattern is ignored ,except when in a character class or proceded by an unescaped backslash.

* when a line contains a # neither in a character class or proceded by an unescaped backslash all characters from the letfmost such # through the end of the line are ignored

In [59]:
txt = """this is a sample text q3142"""

In [60]:
pattern = re.compile("\w + # this is a comment",flags=re.X)

In [61]:
pattern

re.compile(r'\w + # this is a comment', re.UNICODE|re.VERBOSE)

In [62]:
pattern.findall(txt)

['this', 'is', 'a', 'sample', 'text', 'q3142']

### 8.re.DEBUG

This flag when set ,gives some information about the compilation pattern