**Importing and practicing regular expression (Regex)**

In [1]:
import re

In [5]:
text = "Patient's phone is 7321119999. Bill amount is 120$"
pattern = '\d+'

match = re.findall(pattern, text)
match

## Let's make sure our regular expression is proper, so we pull patient number and bill from a string of text! ##

['7321119999', '120']

In [6]:
text = "Patient's phone is 7321119999. Bill amount is 120$"
pattern = '\d{10}'

match = re.findall(pattern, text)
match

## Regex if we just want a phone number. We use this pattern as a phone number is 10 digits. ##

['7321119999']

**What if phone number has a bracket and hypens?**

In [12]:
## With the help of regex101.com, we learn we need the regex of \(\d{3}\)-\d{3}-\d{d}|\d{10} ##

text = "Patient's phone is (732)-111-9999. Emergency contact is 7143093209 Bill amount is 120$"
pattern = '\(\d{3}\)-\d{3}-\d{4}|\d{10}'

match = re.findall(pattern, text)
match

['(732)-111-9999', '7143093209']

**Extracting phone number and bill separately**

In [18]:
## To group, we simply add brackets around each expression and make an exception for the text between number and bill. ##
## This can be changed if our data is formatted differently, but for the sake of this exercise, we know it will be (number)(text)(amount$) ##

text = "Patient's phone is 7321119999. Bill amount is 120$"
pattern = '(\d{10})\D+(\d+)\$'

match = re.search(pattern, text)
match

<re.Match object; span=(19, 50), match='7321119999. Bill amount is 120$'>

In [19]:
phone_number, bill_amount = match.groups()

## We set variables for our phone number and bill amounts. ##

In [21]:
text = '''
Name: Marta Sharapova Date: 5/11/2022

Address: 9 tennis court, new Russia, DC

Prednisone 20 mg
Lialda 2.4 gram

Directions:

Prednisone, Taper 5 mg every 3 days,
Finish in 2.5 weeks a
Lialda - take 2 pill everyday for 1 month

Refill: 2 times
'''

## We import the text from our prescription parser. We will strip out the various categories. ##

In [22]:
pattern = "Name:(.*)Date"

match = re.findall(pattern, text)
match[0].strip()

'Marta Sharapova'

In [23]:
pattern = "Address:(.*)\n"
match = re.findall(pattern, text)
match[0].strip()

'9 tennis court, new Russia, DC'

In [25]:
pattern = "Address[^\n]*(.*)Directions"
match = re.findall(pattern, text, flags = re.DOTALL)
print(match[0].strip())

## Because we needed to parse multiple lines, we needed to update our .findall() argument. ##

Prednisone 20 mg
Lialda 2.4 gram
