__TO ADD:__
* regex functions
* fuzzy string matching examples & best practices
* Levenshtein distance: calculation & examples

In [1]:
import re

#### Python Regex Flags
- Reference: https://pynative.com/python-regex-flags/
- Named regex group `(?P<group_name>regexp)`

In [2]:
regex_flag_sample_dict = {
    're.IGNORECASE' : {
        'target_str' : 'KELLy is a Python developer at a PYnative. kelly loves ML and AI',
        'pattern' : r'kelly',
        'flag' : re.IGNORECASE,
        'short_description' : 'case insensitive matching.',
        'comparison_flag' : True
    },
    
    're.DOTALL' : {
        'target_str' : 'ML\nand AI',
        'pattern' : r'.+',
        'flag' : re.DOTALL,
        'short_description' : 'enables the DOT(.) metacharacter to match any possible character, \
including the new line character.',
        'comparison_flag' : True
    },
    
    're.VERBOSE' : {
        'target_str' : 'Jessa is a Python developer, and her salary is 8000',
        'pattern' : r"""(?P<five_letter>^\w{2,}) # match 5-letter word at the start
               .+
               (?P<four_digit>\d{4}$) # match 4-digit number at the end """,
        'flag' : re.VERBOSE, # named regex group
        'short_description' : 'allows us \n\
# 1) Better spacing, indentation, and a clean format for more extended and intricate patterns. \n\
# 2) Allows us to add comments right inside the pattern for later reference using the hash sign (#).',
        'comparison_flag' : False
    },
    
    're.MULTILINE' : {
        'target_str' : 'Joy lucky number is 75\nTom lucky number is 25',
        'pattern' : r"^\w{3}(?:[\D]*)\d{2}$",
        'flag' : re.MULTILINE,
        'short_description' : 'perform a match inside a multiline block of text - match for each line.',
        'comparison_flag' : True
    },
    
    're.ASCII' : {
        'target_str' : '虎太郎 and Jessa are friends',
        'pattern' : r'\b\w{3}\b',
        'flag' : re.ASCII,
        'short_description' : 'perform ASCII-only matching instead of full Unicode matching.',
        'comparison_flag' : True
    }
    
}

In [3]:
# demonstrate regex flags by before vs. after comparison

def regex_flag_demonstration(target_str, pattern, flag, short_description, comparison_flag):
    print('>>>>> %s: >>>>>' % str(flag))
    print('Short description: %s\n' % short_description)
    print('Input string: %s' % target_str)
    print('Regex pattern: %s' % re.compile(pattern).pattern )
   
    if comparison_flag:
        result_without = re.findall(pattern, target_str)
        result_with = re.findall(pattern, target_str, flag)
        print( "Without %s: " % str(flag), result_without)
        print( "With %s: " % str(flag), result_with)
    else:
        result_with = re.findall(pattern, target_str, flag)
        print( "With %s: " % str(flag), result_with)
    print()

for key, value in regex_flag_sample_dict.items():
    regex_flag_demonstration(**value)

>>>>> re.IGNORECASE: >>>>>
Short description: case insensitive matching.

Input string: KELLy is a Python developer at a PYnative. kelly loves ML and AI
Regex pattern: kelly
Without re.IGNORECASE:  ['kelly']
With re.IGNORECASE:  ['KELLy', 'kelly']

>>>>> re.DOTALL: >>>>>
Short description: enables the DOT(.) metacharacter to match any possible character, including the new line character.

Input string: ML
and AI
Regex pattern: .+
Without re.DOTALL:  ['ML', 'and AI']
With re.DOTALL:  ['ML\nand AI']

>>>>> re.VERBOSE: >>>>>
Short description: allows us 
# 1) Better spacing, indentation, and a clean format for more extended and intricate patterns. 
# 2) Allows us to add comments right inside the pattern for later reference using the hash sign (#).

Input string: Jessa is a Python developer, and her salary is 8000
Regex pattern: (?P<five_letter>^\w{2,}) # match 5-letter word at the start
               .+
               (?P<four_digit>\d{4}$) # match 4-digit number at the end 
With re.VERB