# Regular Expression for Pattern Searching

---

Regular Expression or RegEx is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

https://en.wikipedia.org/wiki/Regular_expression

In [1]:
# Import Regex in python
import re

In [2]:
# Sample sentence
kalimat = "Halo, Saya sedang belajar bahasa pemrograman Python. Python adalah bahasa pemrograman yang sangat mudah dipelajari."
print(kalimat)

Halo, Saya sedang belajar bahasa pemrograman Python. Python adalah bahasa pemrograman yang sangat mudah dipelajari.


## Split Sentence by using build in string method

```python
# create separator, then apply .split() method
separator = ' ' 
kalimat.split(separator)
```

In [3]:
# Split sentence into words by space


In [4]:
# Split sentence into words by punctuation (.,)



## Using RegEx to split by space
```python
re.split('[ ]', kalimat)

# or simply
re.split(' ', kalimat)
```

In [5]:
# Split sentence into words by space


In [6]:
# Split sentence into words by punctuation (.,)



In [7]:
multiline_paragraph = """
Halo, Saya sedang belajar bahasa pemrograman Python. 
Python adalah bahasa pemrograman yang sangat mudah dipelajari.
Selain itu, Python juga memiliki banyak library yang dapat digunakan untuk keperluan data science.
"""
print(multiline_paragraph)


Halo, Saya sedang belajar bahasa pemrograman Python. 
Python adalah bahasa pemrograman yang sangat mudah dipelajari.
Selain itu, Python juga memiliki banyak library yang dapat digunakan untuk keperluan data science.



In [8]:
# Split multiline paragraph into sentences by new line



## Finding pattern matches using Findall and Search

In [9]:
matkul = """
101 COM   Computers
205 MAT   Mathematics
189 ENG   English
""" 
print(matkul)


101 COM   Computers
205 MAT   Mathematics
189 ENG   English



To find pattern from text using regex, we use `findall()`
```python
# Need to compile pattern first before searching
pattern = re.compile('\d')
pattern.findall(matkul)
```

In [10]:
# Find all matkul number using \d


In [11]:
# Find all matkul number using 0-9


In [12]:
# Find all matkul code (3 Characters Capital Letters)


In [13]:
# Find all matkul number limit only 2 digit



In [14]:
# Find all matkul name (English, Computers, Mathematics)



To find position of matching pattern, use `.search()`
```python
kalimat = 'umur saya 21 tahun`

pattern = re.compile('\d+')
search = pattern.search(kalimat)
print('Starting Position: ', search.start())
print('Ending Position: ', search.end())
```

### Exercise:
by using findall, extract unique email from this text
```
text_email = """
From: aryo@binar.com
To: TeamBinar@binar.co.id

Dear all,
Terlampir adalah materi pertemuan minggu ini.
bila ada kesulitan, 
silahkan hubungi Tim Facil di facil@binar.com atau ke saya di aryo@binar.com

Regards,
Aryo
"""
```

In [15]:
# Extract the email address from the text


## Replace Text by using Regex
to replace text in regex, we use re.sub()
```python
# marking phone number from text, by replace it with XXX
chat = 'Silahkan hubungi 081567234645 (Pak Rudi) untuk nego dan cek lokasi'
re.sub('\d+', 'XXX', text)
```

In [16]:
# Practice Here



In [17]:
# marking phone number from text, by replace it with XXX
chat = 'Jl. kenari no 413 harganya Rp.1.750.000 perbulan, Silahkan hubungi 081567234645 (Pak Rudi) untuk cek lokasi'



### Exercise: 
```python
# marking price, phone number & website from text, by replace it with XXX
chat = """
kondisi mulus gan, harga 2.500.000 nego,
chat aja 081567234645 ato klik https://www.tokopedia.com/produk/1234
"""
```