
# Exercise #3.1: Regular Expressions

## Introduction
In this hands-on exercise, you are tasked with enhancing a Python program that currently uses regular expressions to identify integers and real numbers. Your objective is to expand its capabilities to recognize a broader range of string patterns, including prices, email addresses, and Python identifiers.

## Program Behavior
By default, the program reads lines from standard input, attempts to match each line against a list of regular expressions, and outputs the name of the pattern that matches or "unknown" if there is no match.

## Task Description
In the `main()` function of the program, there is the following list of tuples. Each tuple contains a regular expression and the name of the pattern it recognizes:

```python
patterns = [
    (r'^\d+$', 'integer'),
    (r'^\d+\.\d+$', 'real number'),
]
```

Your task is to add additional entries to this list to match the following types of strings:

### Price
Matches a price in SGD dollars. The number of cents is optional, but there must be two digits if the cents are shown. There may optionally be a comma separating thousands, millions, etc.

**Valid prices**:
- `$1`
- `$20`
- `$1.99`
- `$10.00`
- `$1500.50`
- `$2,000.99`
- `$1,234,567.89`

**Invalid prices**:
- `$1.9` (cents must have two digits if present)
- `$10,23.4` (improper comma placement)

### Email Address
Capturing all the rules for what makes a valid email address is complex, so we will use a simplified definition of a valid email address. This definition generally works just fine for extracting email addresses from documents.

The first part of the email address is the username portion, and it must not contain whitespace or the @ symbol. The username portion is followed by the @ symbol. After the @ symbol is the domain, which does not contain any whitespace or the @ symbol. The domain contains two or more non-empty components which are separated by periods. The final component must consist of only letters from the English alphabet.

**Valid email addresses**:
- `nsommer@smu.edu`
- `n.sommer@phdcs.smu.edu`
- `yippee_skippy@yee-haw.edu`
- `fun-times@Taylor.hall.smu.edu`

**Invalid email addresses**:
- `n@sommer@smu.edu` (multiple '@' symbols)
- `n sommer@smu.edu` (spaces not allowed)
- `nsommer@smu..edu` (consecutive periods not allowed)
- `nsommer@smu.edu-org` (hyphen in last domain extension)

### Python Identifiers
A python identifier is a name for a function, variable, etc. in a python program. A python identifier must contain only letters, digits, and underscores and the first character must be a letter or an underscore.

**Valid Python identifiers**:
- `x`
- `x1y2`
- `_hello`
- `funName`
- `FunName`

**Invalid Python identifiers**:
- `1x` (cannot start with a digit)
- `bad name` (spaces are not allowed)
- `!name` (special characters other than underscore are not allowed)

In [None]:
import re
import sys

def main():
    patterns = [
        (re.compile(r'^\d+$'), 'integer'),
        (re.compile(r'^\d+\.\d+$'), 'real number'),
        """you need to implement the following patterns:
            - a pattern verifying if the input is a valid price
            - a pattern verifying if the input is a email address
            - a pattern verifying if the input is a python identifier
        """
    ]


    print("Reading from standard input. Enter lines to match, or press 'quit' to exit.")

    while True:
        try:
            matched = False
            input_line = input("Enter a string: ")
            if input_line.lower() == "quit" or input_line.lower() == "":
                print("Exiting program.")
                break
            for pattern, name in patterns:
                if pattern.match(input_line):
                    print(f"{input_line}: {name}")
                    matched = True
                    break
            if not matched:
                print(f"{input_line}: unknown")

        except Exception as e:
            print(f"Exiting program.")
            break

if __name__ == "__main__":
    main()


# # Exercise #3.2: Language Modeling - Introduction to N-grams
