# Text Processing
We can use Python to process text in various ways.
## Splitting a Phrase into Words

In [17]:
phrase = 'A phrase containing some words'
print(phrase)
words = phrase.split(' ')
print(words)
print(type(words))
print(words[0])
print(words[-1])  # -1 means the last element

A phrase containing some words
['A', 'phrase', 'containing', 'some', 'words']
<class 'list'>
A
words


## Finding Words with Length > 4

In [4]:
for word in words:
    if len(word) > 4:
        print(word)

phrase
containing
words


## Doing Something Similar with a List Comprehension

In [5]:
[word for word in words if len(word) > 4]

['phrase', 'containing', 'words']

## Locating Text Between Parentheses
Let’s say you have some text between parentheses what you want to extract. You can use index:

In [1]:
line = 'Many pets (cats, dogs, fish) enjoy water'
left = line.index('(')
right = line.index(')')
line[left + 1 : right]

'cats, dogs, fish'

## Regular Expressions
[Reference](https://docs.python.org/3/library/re.html)

Regular expressions help us find patterns in words. For example, words containing an “r”:

In [11]:
import re
r = re.compile('r')
[word for word in words if r.search(word)]

['phrase', 'words']

Or words containing two “n”s:

In [10]:
r = re.compile('n.*n')
[word for word in words if r.search(word)]

['containing']

Or words containing an “n” or an “r”:

In [12]:
r = re.compile('[nr]')
[word for word in words if r.search(word)]

['phrase', 'containing', 'words']

Or words containing a vowel:

In [14]:
r = re.compile('[aeiou]', re.IGNORECASE)
[word for word in words if r.search(word)]

['A', 'phrase', 'containing', 'some', 'words']

Or words containing two vowels in a row:

In [15]:
r = re.compile('[aeiou]{2}', re.IGNORECASE)
[word for word in words if r.search(word)]

['containing']

Or words containing exactly two vowels anywhere:

In [16]:
r = re.compile('([aeiou].*){2}', re.IGNORECASE)
[word for word in words if r.search(word)]

['phrase', 'containing', 'some']