# Split using RegEx

> In almost every language, you can find the split operation in strings. The big difference is that the split in the `re` module is more powerful due to which you can use a regex. So, in this case, the string is split based on the matches of the pattern.

### `split(string[, maxsplit])`

- Every pattern object has a `split()` method which splits the input string at all positions where a match is found.

- `maxsplit` is an optional argument (default value 0) which specifies the max no. of splits that can take place. `0` value means there is no limit on the no. of splits.

- Pattern match is not included in any of the substrings obtained after splitting.

#### Example 1

Let us try to split a string to get individual lines in it.

In [3]:
import re

In [2]:
txt = """Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated."""

In [3]:
pattern = re.compile("\n")

In [4]:
pattern.split(txt)

['Beautiful is better than ugly.',
 'Explicit is better than implicit.',
 'Simple is better than complex.',
 'Complex is better than complicated.']

#### Example 2

Let us try one more example in which we want to get all the words in the given text.

In [5]:
pattern = re.compile("\W")

In [6]:
pattern.split(txt)

['Beautiful',
 'is',
 'better',
 'than',
 'ugly',
 '',
 'Explicit',
 'is',
 'better',
 'than',
 'implicit',
 '',
 'Simple',
 'is',
 'better',
 'than',
 'complex',
 '',
 'Complex',
 'is',
 'better',
 'than',
 'complicated',
 '']

In [7]:
list(filter(lambda x : x!= "" ,pattern.split(txt)))

['Beautiful',
 'is',
 'better',
 'than',
 'ugly',
 'Explicit',
 'is',
 'better',
 'than',
 'implicit',
 'Simple',
 'is',
 'better',
 'than',
 'complex',
 'Complex',
 'is',
 'better',
 'than',
 'complicated']

#### Example 3

What is we want only first 3 words? We need to split only 3 times in this case, which can be done by setting the value of `maxsplit` as 3.

In [8]:
pattern.split(txt, maxsplit=7)

['Beautiful',
 'is',
 'better',
 'than',
 'ugly',
 '',
 'Explicit',
 'is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.']

In [9]:
pattern.split(txt, maxsplit=3)

['Beautiful',
 'is',
 'better',
 'than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.']

In [1]:
# Just practice from made easy

In [4]:
#Example 1
re.split('\.','Today is sunny. I want go to the park. I want to eat ice cream.')

['Today is sunny', ' I want go to the park', ' I want to eat ice cream', '']

In [5]:
#includes split point
re.split('(\.)','Today is sunny. I want go to the park. I want to eat ice cream.')

['Today is sunny',
 '.',
 ' I want go to the park',
 '.',
 ' I want to eat ice cream',
 '.',
 '']

In [6]:
split = '.'
[i+split for i in re.split('\.','Today is sunny. I want go to the park. I want to eat ice cream.')]

['Today is sunny.',
 ' I want go to the park.',
 ' I want to eat ice cream.',
 '.']

In [7]:
#Example 2:

string = '<p>My mother has <span style="color:blue">blue</span> eyes.</p>'


In [8]:
re.split('<\w+>', string)   #doesn't work

['', 'My mother has <span style="color:blue">blue</span> eyes.</p>']

In [9]:
re.split('<.+>', string)  #captures entire string 
                                #because it's greedy

['', '']

In [10]:
re.split("<[^<>]+>", string) #empty string problem

['', 'My mother has ', 'blue', ' eyes.', '']

In [11]:
re.split(',', ',happy, birthday,') #It seems to split at empty strings

['', 'happy', ' birthday', '']

In [12]:
# list comprehensions
[i for i in re.split("<[^<>]+>", string) if i != '']

['My mother has ', 'blue', ' eyes.']

In [13]:
#Alternatives to split --
string = '<p>My mother has <span style="color:blue">blue</span> eyes.</p>'

In [14]:
re.findall('>([^<]+)<',string)  #findall

['My mother has ', 'blue', ' eyes.']

In [15]:
string = re.split(',', ',happy, birthday,')

re.split(',', ',happy, birthday,')

['', 'happy', ' birthday', '']

In [16]:
string = ',happy, birthday,'

In [17]:
list(filter(None, string.split(',')))

['happy', ' birthday']

![](images/memes/meme18.png)