# Solutions 1: Regex

In [1]:
import re

### Finding numbers

Write a regex to find numbers in a text:

- Decimal separator can be `.` or `,`, examples: `1`, `2.3` of `4,5`.
- Thousands separator can be `.` or `,`, examples: `1.000` or `1,234.56`.

In [None]:
# Very simple form.
pattern = r"[0-9\.,]+"
re.search(pattern, "The costs were 2,345.67 euro.")

In [None]:
# Does grab the final dot too...
re.search(pattern, "The year is now 2023.")

In [None]:
# Bit more precise.
pattern = r"""
(
    ([0-9]*[\.,])*     # Capture 1 or more groups of digits and separators.
    [0-9]+             # Capture one final set of digits.
)
"""
re.search(pattern, "The year is now 2023.", re.X)

In [None]:
# Also captures numbers correctly.
re.search(pattern, "The costs were 2,345.67 euro.", re.X)

In [None]:
# But captures IP addresses too...
re.search(pattern, "Localhost has IP address 127.0.0.1", re.X)

### Finding dates

Write a regex to find dates in a text:

- Assume date format DD-MM-YYYY
- Optional `0` before day or month allowed.

In [None]:
# Somewhat naive approach...
pattern = r"\d{1,2}-\d{1,2}-\d{4}"
re.search(pattern, "31-1-2023")

In [None]:
# Woops...
re.search(pattern, "33-22-1111")

In [9]:
# More accurate pattern.
pattern = r"""(
    ( 0?[1-9] | [12][0-9] | 3[01] ) -    # Day:   01-09 | 10 - 29 | 30 - 31
    ( 0?[1-9] | 1[012] ) -               # Month: 01-09 | 10 - 12
    \d{4}                                # Year
)"""

In [None]:
# Works on correct dates.
re.search(pattern, "31-1-2023", re.X)

In [11]:
# No match on bad dates.
re.search(pattern, "33-22-1111", re.X)

In [None]:
# Can still match partially...
re.search(pattern, "33-11-1111", re.X)