
# Regular Expressions in Python

### A regular expression (RE or regex) is a sequence of characters which describes textual patterns.
The re package provides set of methods to perform common operations using regular expressions.

Required module:

In [2]:
import re

The re module offers a set of functions that allows to handle different requirements:



In [3]:
pd.DataFrame(a)

NameError: name 'pd' is not defined

In [None]:
# output for all commands in python jupyter notebook
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Create a Regex object

Passing a string pattern representing your regular expression to re.compile() returns a Regex pattern object - a Regex object.

In [None]:
# RegEx object to match a 10-digit mobile number

mobileNum_RegEx = re.compile(r'\d\d\d\d\d\d\d\d\d\d')

In [None]:
type(mobileNum_RegEx)

In [None]:
mobileNum_RegEx

#### Note:

* Remember that escape characters in Python use the backslash (\).
* For example, value '\n' represents a single newline character, not a backslash followed by a lowercase n.
* You need to enter the escape character \\ to print a single backslash.
* However, by putting an r before the first quote of the string value,
you can mark the string as a raw string, which does not escape characters.

### Match RegEx Object

In [None]:
# Create a RegEx object for toll free number

mytext1 = 'Interested in Jio? Talk to us on	1860-893-3333'

# create a re pattern object
tollFreeNum_RegEx = re.compile(r'\d\d\d\d-\d\d\d-\d\d\d\d')

### search() method
A Regex object’s search() method searches the string it is passed for any matches to the regex.


* The search() method will return None if the regex pattern is not found in the string.
* If the pattern is found, the search() method returns a Match object.

In [None]:
tfn_no = tollFreeNum_RegEx.search(mytext1)

In [None]:
type(tfn_no)

In [None]:
print(tfn_no)

## Extract the text from this web page

https://www.jio.com/en-in/help-support/call-us#/

#### group() method
Match objects have a group() method that will return the actual matched text from the searched string.

In [None]:
##### group() method

Match objects have a `group()` method that will return the actual matched

In [None]:
# re.compile() -> Pattern object -> search() -> Match object -> group()

print('Identified number is: ' + tfn_no.group())
print('Identified number is: ' + tfn_no.group(0))
print('Identified number is: ' + tfn_no.group(1))
print('Identified number is: ' + tfn_no.group(2))
print('Identified number is: ' + tfn_no.group(3))

### Grouping with Parentheses
Adding parentheses () will create groups in the regex: (\d\d\d\d)-(\d\d\d)-(\d\d\d\d).


Then you can use the group() match object method to grab the matching text from just one group.

In [None]:
tollFreeNum_RegEx = re.compile(r'(\d\d\d\d)-((\d\d\d)-(\d\d\d\d))')

In [None]:
tfn_mo = tollFreeNum_RegEx.search(mytext1)
print(tfn_mo)

In [None]:
# Extract different parts of the matched text using integers -  1/2/3 . 

# Passing 0 or nothing to the group() method will return the entire matched t

print('Identified number is: ' + tfn_mo.group())
print('Identified number is: ' + tfn_mo.group(0))
print('Identified number is: ' + tfn_mo.group(1))
print('Identified number is: ' + tfn_mo.group(2))
print('Identified number is: ' + tfn_mo.group(3))
print('Identified number is: ' + tfn_mo.group(4))

### groups() on Match Object

In [None]:
# retrieve all the groups at once, use the groups()

# groups() returns a tuple of multiple values, you can use the multiple-assignme
print(tfn_mo.groups())
maincode, x, subcode, number = tfn_mo.groups()

print(maincode, x, subcode, number)

## Matching paranthesis ()

In [None]:
mytext2 = 'Iterested in Jio? Talk to us on	(1860)-893-3333'

In [None]:
# Create a RegEx object for toll free number

tollFreeNum_RegEx = re.compile(r'\d\d\d\d-\d\d\d-\d\d\d\d')

# create a Match Object using the search() method on the RegEx Object

tfn_mo = tollFreeNum_RegEx.search(mytext2)
print(tfn_mo)


In [None]:
# Create a RegEx object for toll free number

tollFreeNum_RegEx =  re.compile(r'(\d\d\d\d)-(\d\d\d)-(\d\d\d\d)')

# create a Match Object using the search() method on the RegEx Object

tfn_mo = tollFreeNum_RegEx.search(mytext2)

print(tfn_mo)

### Use Escape character
\ is an escape character.

In [None]:
# Create a RegEx object for toll free number

tollFreeNum_RegEx =  re.compile(r'\(\d\d\d\d\)-\d\d\d-\d\d\d\d')

# create a Match Object using the search() method on the RegEx Object

tfn_mo = tollFreeNum_RegEx.search(mytext2)

print(tfn_mo)

## Matching Multiple Groups with the Pipe
The | character is called a pipe. You can use it anywhere you want to match one of many expressions.



When both patterns occur in the searched string, the first occurrence of matching text will be returned as the Match object.

In [None]:
mytext3 = '''Our experts are available for your assistance 24x7 (Monday - Sunday)

Interested in Jio? Talk to us on 1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
Calling from Non Jio number	1800-889-9999'''

In [None]:
tollFreeNum_RegEx = re.compile(r'(\d\d\d\d)|(\d\d\d\d)-(\d\d\d)-(\d\d\d\d)')

In [None]:
tfn_mo = tollFreeNum_RegEx.search(mytext3)
print('Identified number is: ' + tfn_mo.group())
print(tfn_mo.groups())

In [None]:
moviename_RegEx = re.compile(r'(Iron|Bat|Ant|He) Man')

movie_mo = moviename_RegEx.search('Bat Man 2 is a 2010 American superhero film')

print(movie_mo.group(1)) # returns the matched text from the group inside parantheses
print(movie_mo.group(0))

**Note:** To find & match pipe (|) symbol use \|.

## Solve the below question

In [None]:
mytext4 = '''For support on International Roaming 
(accessible only when roaming abroad)	
+91-7018899999 (charges applicable)'''

In [None]:
mobno = re.compile(r'(\d\d)-(\d\d\d\d\d\d\d\d\d\d)')

In [None]:
mobilnum = mobno.search(mytext4)
print(mobilnum)
print(mobilnum.groups())

In [None]:
len(mytext4)
mytext4[78:91]

## Optional Matching with the Question Mark
Sometimes there is a pattern that you want to match only optionally.


The `?` character `flags the group that precedes it` as an optional part of the pattern.

In [None]:
mytext4 = '''For support on International Roaming 
(accessible only when roaming abroad)	
+91-7018899999 (charges applicable)'''

In [None]:
mobilenum_RegEx = re.compile(r'(\d\d)?(-)?(\d\d\d\d\d\d\d\d\d\d)')

In [None]:
mo = mobilenum_RegEx.search(mytext4)
print(mo)
print(mo.group())
print(mo.groups())

In [None]:
mytext4 = '''For support on International Roaming 
(accessible only when roaming abroad)	
7018899999 (charges applicable)'''# in this text we have removed 
                                   #91-

In [None]:
mobilenum_RegEx = re.compile(r'(\d\d)?(-)?(\d\d\d\d\d\d\d\d\d\d)')
mo3 = mobilenum_RegEx.search(mytext4)

mo3.group()
mo3.groups()

**Note:** To use actual question mark (?), use \?.

## Matching Zero or More with the Star
The * (star or asterisk) means match zero or more —the group that


precedes the star can occur any number of times in the text.


It can be completely absent or repeated over and over again.

In [5]:
mytext5 = '''Our experts are available for your assistance 24x7 (Monday - Sunday)

Interested in Jio? Talk to us on	1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
Calling from Non Jio number	1800-889-9999
Tele-verification to activate both HD voice & data services	1977
For support on International Roaming (accessible only 
when roaming abroad)	+917018899999 (charges applicable)'''

In [4]:
number_RegEx = re.compile(r'\d\d\d')

### findall() in regex
In addition to the search() method, Regex objects also have a findall() method.

While search() will return a Match object of the first matched text in the searched string, the findall() method will return the strings of every match in the searched string.

In [6]:
print(number_RegEx.findall(mytext5))

['186', '893', '333', '199', '199', '198', '180', '889', '999', '197', '917', '018', '899', '999']


In [7]:
# use * for zero or more matches
number_RegEx = re.compile(r'(\d\d\d(\d)*)')
print(number_RegEx.findall(mytext5))

[('1860', '0'), ('893', ''), ('3333', '3'), ('1991', '1'), ('199', ''), ('198', ''), ('1800', '0'), ('889', ''), ('9999', '9'), ('1977', '7'), ('917018899999', '9')]


Note: If you need to match an actual star character, prefix the star in the regular expression with a backslash, \*.

### Matching One or More with the Plus

In [8]:
# use + for one or more matches
number_RegEx = re.compile(r'(\d\d\d(\d)+)')

print(number_RegEx.findall(mytext5))

[('1860', '0'), ('3333', '3'), ('1991', '1'), ('1800', '0'), ('9999', '9'), ('1977', '7'), ('917018899999', '9')]


Note: If you need to match an actual plus sign character, prefix the plus sign with a backslash to escape it: \+.

## Matching Specific Repetitions with Curly Brackets

In [10]:
number_RegEx = re.compile(r'(\d\d\d(\d){1})')

# the above pattern is same as \d\d\d\d

print(number_RegEx.findall(mytext5))

[('1860', '0'), ('3333', '3'), ('1991', '1'), ('1800', '0'), ('9999', '9'), ('1977', '7'), ('9170', '0'), ('1889', '9'), ('9999', '9')]


In [14]:
number_RegEx = re.compile(r'(\d\d\d(\d){2})')

# the above pattern is same as \d\d\d\d\d\d\

print(number_RegEx.findall(mytext5))

[('91701', '1'), ('88999', '9')]


In [15]:
number_RegEx = re.compile(r'(\d\d\d(\d){3,})')

# the above pattern is same as min 6 and max infinity

print(number_RegEx.findall(mytext5))

[('917018899999', '9')]


In [26]:
number_RegEx = re.compile(r'(\d\d\d(\d){6,9})')

# the above pattern is same as min 6 and max 9

print(number_RegEx.findall(mytext5))

[('917018899999', '9')]


## Character Classes

Character class	              Description
\d	               Any numeric digit from 0 to 9.
\D	               Any character that is not a numeric digit from 0 to 9.
\w	               Any letter, numeric digit, or the underscore character.
\W              	Any character that is not a letter, numeric digit, or the underscore
  character.
\s	              Any space, tab, or newline character. (Think of this as matching “space”
  characters.)
\S	              Any character that is not a space, tab, or newline.
\b	              Word boundary. This is a zero-width assertion that matches only at the beginning or end of a word.


* Character classes are nice for shortening regular expressions.

\d is shorthand for the regular expression (0|1|2|3|4|5|6|7|8|9).

In [27]:
print(mytext5)

Our experts are available for your assistance 24x7 (Monday - Sunday)

Interested in Jio? Talk to us on	1860-893-3333
For recharge plans, data balance, validity, recharge confirmation & offers	1991
For Queries	199
For Complaints	198
Calling from Non Jio number	1800-889-9999
Tele-verification to activate both HD voice & data services	1977
For support on International Roaming (accessible only 
when roaming abroad)	+917018899999 (charges applicable)


In [28]:
myregex = re.compile(r'\d+\s\w+')
# this is same as
# (0|1|2|3|4|5|6|7|8|9)+(space/tab space/new line)(a-z/A-Z/0-9/_)+

In [29]:
print(myregex.findall(mytext5))

['3333\nFor', '1991\nFor', '199\nFor', '198\nCalling', '9999\nTele', '1977\nFor']


## Match a set of characters - using [] braces

In [34]:
# [] used for building custom patterns
myregex = re.compile(r'[a-zA-Z0-9]{10}')
print(myregex.findall(mytext5))

['assistance', 'Interested', 'confirmati', 'Complaints', 'verificati', 'Internatio', 'accessible', '9170188999', 'applicable']


## To-Do tasks
* Match and identify PAN numbers
* Build a pattern to match Email Id
* Build a pattern to match car number plate

In [35]:
myregex = re.compile(r'[a-zA-Z0-9\s]{10}')
print(myregex.findall(mytext5))

['Our expert', 's are avai', 'lable for ', 'your assis', 'tance 24x7', '\n\nInterest', ' Talk to u', '3333\nFor r', 'echarge pl', ' data bala', ' recharge ', 'confirmati', ' offers\t19', '91\nFor Que', 'ries\t199\nF', 'or Complai', 'nts\t198\nCa', 'lling from', ' Non Jio n', 'umber\t1800', 'verificati', 'on to acti', 'vate both ', ' data serv', 'ices\t1977\n', 'For suppor', 't on Inter', 'national R', 'accessible', ' only \nwhe', 'n roaming ', '9170188999', 'charges ap']


In [36]:
myregex = re.compile(r'[a-zA-Z0-9\s]+')
print(myregex.findall(mytext5))

['Our experts are available for your assistance 24x7 ', 'Monday ', ' Sunday', '\n\nInterested in Jio', ' Talk to us on\t1860', '893', '3333\nFor recharge plans', ' data balance', ' validity', ' recharge confirmation ', ' offers\t1991\nFor Queries\t199\nFor Complaints\t198\nCalling from Non Jio number\t1800', '889', '9999\nTele', 'verification to activate both HD voice ', ' data services\t1977\nFor support on International Roaming ', 'accessible only \nwhen roaming abroad', '\t', '917018899999 ', 'charges applicable']


In [37]:
myregex = re.compile(r'[a-zA-Z0-9\s()]+')
print(myregex.findall(mytext5))

['Our experts are available for your assistance 24x7 (Monday ', ' Sunday)\n\nInterested in Jio', ' Talk to us on\t1860', '893', '3333\nFor recharge plans', ' data balance', ' validity', ' recharge confirmation ', ' offers\t1991\nFor Queries\t199\nFor Complaints\t198\nCalling from Non Jio number\t1800', '889', '9999\nTele', 'verification to activate both HD voice ', ' data services\t1977\nFor support on International Roaming (accessible only \nwhen roaming abroad)\t', '917018899999 (charges applicable)']


In [38]:
myregex = re.compile(r'[a-zA-Z0-9\s()-?]+')
print(myregex.findall(mytext5))

['Our experts are available for your assistance 24x7 (Monday - Sunday)\n\nInterested in Jio? Talk to us on\t1860-893-3333\nFor recharge plans, data balance, validity, recharge confirmation ', ' offers\t1991\nFor Queries\t199\nFor Complaints\t198\nCalling from Non Jio number\t1800-889-9999\nTele-verification to activate both HD voice ', ' data services\t1977\nFor support on International Roaming (accessible only \nwhen roaming abroad)\t+917018899999 (charges applicable)']


In [40]:
myregex = re.compile(r'[a-zA-Z0-9\s()-?,]+')# it could be break from special chrt if we do not pass it
print(myregex.findall(mytext5))

['Our experts are available for your assistance 24x7 (Monday - Sunday)\n\nInterested in Jio? Talk to us on\t1860-893-3333\nFor recharge plans, data balance, validity, recharge confirmation ', ' offers\t1991\nFor Queries\t199\nFor Complaints\t198\nCalling from Non Jio number\t1800-889-9999\nTele-verification to activate both HD voice ', ' data services\t1977\nFor support on International Roaming (accessible only \nwhen roaming abroad)\t+917018899999 (charges applicable)']


In [41]:
myregex = re.compile(r'[a-zA-Z]{7,}')
print(myregex.findall(mytext5))

['experts', 'available', 'assistance', 'Interested', 'recharge', 'balance', 'validity', 'recharge', 'confirmation', 'Queries', 'Complaints', 'Calling', 'verification', 'activate', 'services', 'support', 'International', 'Roaming', 'accessible', 'roaming', 'charges', 'applicable']


In [42]:
# exactly 5 letter complete words
myregex = re.compile(r'\b[a-zA-Z]{5}\b')
print(myregex.findall(mytext5))

# exactly 6 letter complete words
myregex = re.compile(r'\b[a-zA-Z]{6}\b')
print(myregex.findall(mytext5))

# exactly 7 letter complete words
myregex = re.compile(r'\b[a-zA-Z]{7}\b')
print(myregex.findall(mytext5))

# exactly 10 letter complete words
myregex = re.compile(r'\b[a-zA-Z]{10}\b')
print(myregex.findall(mytext5))

['plans', 'voice']
['Monday', 'Sunday', 'offers', 'number', 'abroad']
['experts', 'balance', 'Queries', 'Calling', 'support', 'Roaming', 'roaming', 'charges']
['assistance', 'Interested', 'Complaints', 'accessible', 'applicable']


In [43]:
myregex = re.compile('[a-zA-Z0-9\s]')
print(myregex.findall(mytext5))

['O', 'u', 'r', ' ', 'e', 'x', 'p', 'e', 'r', 't', 's', ' ', 'a', 'r', 'e', ' ', 'a', 'v', 'a', 'i', 'l', 'a', 'b', 'l', 'e', ' ', 'f', 'o', 'r', ' ', 'y', 'o', 'u', 'r', ' ', 'a', 's', 's', 'i', 's', 't', 'a', 'n', 'c', 'e', ' ', '2', '4', 'x', '7', ' ', 'M', 'o', 'n', 'd', 'a', 'y', ' ', ' ', 'S', 'u', 'n', 'd', 'a', 'y', '\n', '\n', 'I', 'n', 't', 'e', 'r', 'e', 's', 't', 'e', 'd', ' ', 'i', 'n', ' ', 'J', 'i', 'o', ' ', 'T', 'a', 'l', 'k', ' ', 't', 'o', ' ', 'u', 's', ' ', 'o', 'n', '\t', '1', '8', '6', '0', '8', '9', '3', '3', '3', '3', '3', '\n', 'F', 'o', 'r', ' ', 'r', 'e', 'c', 'h', 'a', 'r', 'g', 'e', ' ', 'p', 'l', 'a', 'n', 's', ' ', 'd', 'a', 't', 'a', ' ', 'b', 'a', 'l', 'a', 'n', 'c', 'e', ' ', 'v', 'a', 'l', 'i', 'd', 'i', 't', 'y', ' ', 'r', 'e', 'c', 'h', 'a', 'r', 'g', 'e', ' ', 'c', 'o', 'n', 'f', 'i', 'r', 'm', 'a', 't', 'i', 'o', 'n', ' ', ' ', 'o', 'f', 'f', 'e', 'r', 's', '\t', '1', '9', '9', '1', '\n', 'F', 'o', 'r', ' ', 'Q', 'u', 'e', 'r', 'i', 'e', 's', '\t

In [44]:
myregex = re.compile('[\S]+')
print(myregex.findall(mytext5))


['Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', '24x7', '(Monday', '-', 'Sunday)', 'Interested', 'in', 'Jio?', 'Talk', 'to', 'us', 'on', '1860-893-3333', 'For', 'recharge', 'plans,', 'data', 'balance,', 'validity,', 'recharge', 'confirmation', '&', 'offers', '1991', 'For', 'Queries', '199', 'For', 'Complaints', '198', 'Calling', 'from', 'Non', 'Jio', 'number', '1800-889-9999', 'Tele-verification', 'to', 'activate', 'both', 'HD', 'voice', '&', 'data', 'services', '1977', 'For', 'support', 'on', 'International', 'Roaming', '(accessible', 'only', 'when', 'roaming', 'abroad)', '+917018899999', '(charges', 'applicable)']


In [45]:
myregex = re.compile('[\w]+')
print(myregex.findall(mytext5))

['Our', 'experts', 'are', 'available', 'for', 'your', 'assistance', '24x7', 'Monday', 'Sunday', 'Interested', 'in', 'Jio', 'Talk', 'to', 'us', 'on', '1860', '893', '3333', 'For', 'recharge', 'plans', 'data', 'balance', 'validity', 'recharge', 'confirmation', 'offers', '1991', 'For', 'Queries', '199', 'For', 'Complaints', '198', 'Calling', 'from', 'Non', 'Jio', 'number', '1800', '889', '9999', 'Tele', 'verification', 'to', 'activate', 'both', 'HD', 'voice', 'data', 'services', '1977', 'For', 'support', 'on', 'International', 'Roaming', 'accessible', 'only', 'when', 'roaming', 'abroad', '917018899999', 'charges', 'applicable']


In [46]:
myregex = re.compile('[a-z]+')
print(myregex.findall(mytext5))


['ur', 'experts', 'are', 'available', 'for', 'your', 'assistance', 'x', 'onday', 'unday', 'nterested', 'in', 'io', 'alk', 'to', 'us', 'on', 'or', 'recharge', 'plans', 'data', 'balance', 'validity', 'recharge', 'confirmation', 'offers', 'or', 'ueries', 'or', 'omplaints', 'alling', 'from', 'on', 'io', 'number', 'ele', 'verification', 'to', 'activate', 'both', 'voice', 'data', 'services', 'or', 'support', 'on', 'nternational', 'oaming', 'accessible', 'only', 'when', 'roaming', 'abroad', 'charges', 'applicable']


In [47]:
# [^] - Returns a match for any character EXCEPT mentioned in Set
myregex = re.compile('[^\w]+')
print(myregex.findall(mytext5))

print('')

myregex = re.compile('[^\w^\s]+')
print(myregex.findall(mytext5))

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' (', ' - ', ')\n\n', ' ', ' ', '? ', ' ', ' ', ' ', '\t', '-', '-', '\n', ' ', ' ', ', ', ' ', ', ', ', ', ' ', ' & ', '\t', '\n', ' ', '\t', '\n', ' ', '\t', '\n', ' ', ' ', ' ', ' ', '\t', '-', '-', '\n', '-', ' ', ' ', ' ', ' ', ' ', ' & ', ' ', '\t', '\n', ' ', ' ', ' ', ' ', ' (', ' ', ' \n', ' ', ' ', ')\t+', ' (', ' ', ')']

['(', '-', ')', '?', '-', '-', ',', ',', ',', '&', '-', '-', '-', '&', '(', ')', '+', '(', ')']


## ^ (Beginning) & $ (End) of a String

Caret symbol (^) at the start of a regex to indicate that a match must occur at the beginning of the searched text.

Dollar sign ($) at the end of the regex to indicate the string must end with this regex pattern.

In [48]:
myregex = re.compile(r'^\w+')
mo = myregex.search(mytext5)
mo.group()

'Our'

In [49]:
myregex = re.compile(r'\w+\W+$')
mo = myregex.search(mytext5)
mo.group()

'applicable)'

### Wildcard Character
The . (or dot) character in a regular expression is called a

wildcard and will match any character except for a newline.

In [50]:
dotRegex = re.compile(r'.at')
dotRegex.findall(mytext5)

['dat', 'mat', 'cat', 'vat', 'dat', 'nat']

In [51]:
dotRegex = re.compile(r'..at')
dotRegex.findall(mytext5)

dotRegex = re.compile(r'.at.')
dotRegex.findall(mytext5)

dotRegex = re.compile(r'.on')
dotRegex.findall(mytext5)

['Mon', ' on', 'con', 'ion', 'Non', 'ion', ' on', 'ion', ' on']

## sub() method

In [52]:
mytext5

'Our experts are available for your assistance 24x7 (Monday - Sunday)\n\nInterested in Jio? Talk to us on\t1860-893-3333\nFor recharge plans, data balance, validity, recharge confirmation & offers\t1991\nFor Queries\t199\nFor Complaints\t198\nCalling from Non Jio number\t1800-889-9999\nTele-verification to activate both HD voice & data services\t1977\nFor support on International Roaming (accessible only \nwhen roaming abroad)\t+917018899999 (charges applicable)'

In [53]:
subreg = re.compile(r'\t|\n|\?')
subreg.sub('----',mytext5)

'Our experts are available for your assistance 24x7 (Monday - Sunday)--------Interested in Jio---- Talk to us on----1860-893-3333----For recharge plans, data balance, validity, recharge confirmation & offers----1991----For Queries----199----For Complaints----198----Calling from Non Jio number----1800-889-9999----Tele-verification to activate both HD voice & data services----1977----For support on International Roaming (accessible only ----when roaming abroad)----+917018899999 (charges applicable)'

In [54]:
subreg = re.compile(r'\d')
subreg.sub('*',mytext5)

'Our experts are available for your assistance **x* (Monday - Sunday)\n\nInterested in Jio? Talk to us on\t****-***-****\nFor recharge plans, data balance, validity, recharge confirmation & offers\t****\nFor Queries\t***\nFor Complaints\t***\nCalling from Non Jio number\t****-***-****\nTele-verification to activate both HD voice & data services\t****\nFor support on International Roaming (accessible only \nwhen roaming abroad)\t+************ (charges applicable)'

In [55]:
subreg = re.compile(r'[a-zA-Z\s]')
subreg.sub('*',mytext5)

'**********************************************24*7*(*******-*******)*******************?***************1860-893-3333*******************,*************,*********,***********************&********1991*************199****************198*****************************1800-889-9999*****-***************************************&***************1977**************************************(************************************)*+917018899999*(******************)'

In [56]:
subreg = re.compile(r'[^\d]')
subreg.sub('*',mytext5)

'**********************************************24*7*****************************************************1860*893*3333****************************************************************************1991*************199****************198*****************************1800*889*9999*************************************************************1977******************************************************************************917018899999*********************'

## split() method

In [57]:
subreg = re.compile(r'\t|\n|\?')
re.split(subreg,mytext5)

['Our experts are available for your assistance 24x7 (Monday - Sunday)',
 '',
 'Interested in Jio',
 ' Talk to us on',
 '1860-893-3333',
 'For recharge plans, data balance, validity, recharge confirmation & offers',
 '1991',
 'For Queries',
 '199',
 'For Complaints',
 '198',
 'Calling from Non Jio number',
 '1800-889-9999',
 'Tele-verification to activate both HD voice & data services',
 '1977',
 'For support on International Roaming (accessible only ',
 'when roaming abroad)',
 '+917018899999 (charges applicable)']

In [58]:
subRegEx = re.compile(r' ')
list1 = re.split(subRegEx, mytext5)

subRegEx = re.compile(r'\s')
list2 = re.split(subRegEx, mytext5)

In [62]:
print(len(list1) == len(list2))
print(len(list1)); len(list2)

False
51


67

In [63]:
subRegEx = re.compile(r'\d+')
re.split(subRegEx, mytext5)

['Our experts are available for your assistance ',
 'x',
 ' (Monday - Sunday)\n\nInterested in Jio? Talk to us on\t',
 '-',
 '-',
 '\nFor recharge plans, data balance, validity, recharge confirmation & offers\t',
 '\nFor Queries\t',
 '\nFor Complaints\t',
 '\nCalling from Non Jio number\t',
 '-',
 '-',
 '\nTele-verification to activate both HD voice & data services\t',
 '\nFor support on International Roaming (accessible only \nwhen roaming abroad)\t+',
 ' (charges applicable)']

In [64]:
myregex = re.compile(r'[0-9]{10}')
print(myregex.findall(mytext5))

['9170188999']
