In [1]:
import pandas as pd

time_sentences = ["Monday: The doctor's appointment is at 2:45pm.", 
                  "Tuesday: The dentist's appointment is at 11:30 am.",
                  "Wednesday: At 7:00pm, there is a basketball game!",
                  "Thursday: Be back home by 11:15 pm at the latest.",
                  "Friday: Take the train at 08:10 am, arrive at 09:00am."]

df = pd.DataFrame(time_sentences, columns=['text'])
df

Unnamed: 0,text
0,Monday: The doctor's appointment is at 2:45pm.
1,Tuesday: The dentist's appointment is at 11:30...
2,"Wednesday: At 7:00pm, there is a basketball game!"
3,Thursday: Be back home by 11:15 pm at the latest.
4,"Friday: Take the train at 08:10 am, arrive at ..."


In [2]:
# find the number of characters for each string in df['text']
df.text.str.len()

0    46
1    50
2    49
3    49
4    54
Name: text, dtype: int64

In [3]:
# find the number of tokens for each string in df['text']
df.text.str.split().str.len()

0     7
1     8
2     8
3    10
4    10
Name: text, dtype: int64

In [8]:
# find which entries contain the word 'appointment'
df.text.str.contains('appointment')

0     True
1     True
2    False
3    False
4    False
Name: text, dtype: bool

In [10]:
# find how many times a digit occurs in each string

df.text.str.count(r'\d')

0    3
1    4
2    3
3    4
4    8
Name: text, dtype: int64

In [9]:
# find all occurances of the digits
df.text.str.findall(r'\d')

0                   [2, 4, 5]
1                [1, 1, 3, 0]
2                   [7, 0, 0]
3                [1, 1, 1, 5]
4    [0, 8, 1, 0, 0, 9, 0, 0]
Name: text, dtype: object

In [11]:
# group and find the hours and minutes
df.text.str.findall(r'(\d?\d):(\d\d)')

0               [(2, 45)]
1              [(11, 30)]
2               [(7, 00)]
3              [(11, 15)]
4    [(08, 10), (09, 00)]
Name: text, dtype: object

**1st Capturing Group `(\d?\d)`**

* `\d` matches a digit (equivalent to `[0-9]`)

* `?` matches the previous token between `zero` and `one` times, as many times as possible, giving back as needed (greedy)

* `\d` matches a digit (equivalent to `[0-9]`)

* `:` matches the character : with index 5810 (3A16 or 728) literally (case sensitive)

**2nd Capturing Group `(\d\d)`**

* `\d` matches a digit (equivalent to `[0-9]`)

* `\d` matches a digit (equivalent to `[0-9]`)

**Global pattern flags**

* `g modifier:` global. All matches (don't return after first match)

* `m modifier:` multi line. Causes `^` and `$` to match the begin/end of each line (not only begin/end of string)

In [12]:
# replace weekdays with '???'
df.text.str.replace(r'\w+day\b', '???')

  df.text.str.replace(r'\w+day\b', '???')


0          ???: The doctor's appointment is at 2:45pm.
1       ???: The dentist's appointment is at 11:30 am.
2          ???: At 7:00pm, there is a basketball game!
3         ???: Be back home by 11:15 pm at the latest.
4    ???: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object

In [13]:
# replace weekdays with 3 letter abbrevations
df.text.str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])

  df['text'].str.replace(r'(\w+day\b)', lambda x: x.groups()[0][:3])


0          Mon: The doctor's appointment is at 2:45pm.
1       Tue: The dentist's appointment is at 11:30 am.
2          Wed: At 7:00pm, there is a basketball game!
3         Thu: Be back home by 11:15 pm at the latest.
4    Fri: Take the train at 08:10 am, arrive at 09:...
Name: text, dtype: object

* `\w` matches any word character (equivalent to `[a-zA-Z0-9_]`)

* `+` matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)

* day matches the characters day literally (case sensitive)

* `\b` assert position at a word boundary: `(^\w|\w$|\W\w|\w\W)`