<h1 align=center>RegEx in Python</h1>

 [python documentation](https://docs.python.org/2/library/re.html#regular-expression-syntax)

python provide module for regex called `re`

most used functions:
- sub()
- findall()
- match()
- search()
- compile()

### import `re` module

In [1]:
import re

<h3 align=center>Using Specific Words</h3>

#### `sub()` function

In [2]:
text = 'Hossam plays footbal.'

re.sub('Hossam', 'Wssam', text)

'Wssam plays footbal.'

In [3]:
text.replace('a', 'W')

'HossWm plWys footbWl.'

**text.replace() `faster` than re.sub() `unless using patters`.**

In [4]:
re.sub(r'\s', '_', text)

'Hossam_plays_footbal.'

#### `finall()` function

In [5]:
text = '''
            #Hossam plays footbal.
            Ahmed tells Hossam to #pass the ball.
'''

re.findall(r'#\S+', text)

['#Hossam', '#pass']

<h3 align=center>Using Patterns</h3>


[RegEx 101](https://regex101.com/)

**Basic patterns that match single chars**


| Character  | function |
| ------------- | ------------- |
| a-z, 0-9  | ordinary characters just match themselves exactly.|
| . (dot)  | matches any single character except newline '\n'  |
| \w | matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_] |
| \W | matches any non-word character |
| \b | boundary between word and non-word |
| \s | matches a single whitespace character -- space, newline, return, tab |
| \S | matches any non-whitespace character |
| \t, \n, \r | tab, newline, return |
| \d | decimal digit [0-9] |
| ^ | matches start of the string |
| $ | match the end of the string |

In [6]:
text = 'This is a string! But it has punctuation. How can we remove it? '

print(re.findall(r'\w+\W\s', text))

['string! ', 'punctuation. ', 'it? ']


In [17]:
print(re.findall(r'(\w+)\W\s', text))

['2022', 'year', '2021']


In [18]:
text = 'this is wssma hassan my mail is wssamhassan96@yahoo.com'

re.findall(r'\S+@(\S+)', text)

['yahoo.com']

In [19]:
text = '''regular expression is a special sequence of characters
that helps you match or find other strings or sets of strings,
using regular expression pattern. regular expressions are widely used in UNIX world.'''

re.findall(r'^expression', text)

[]

In [24]:
# replace the first regular only 

re.sub(r'^regular', 'Reg', text)

'Reg expression is a special sequence of characters\nthat helps you match or find other strings or sets of strings,\nusing regular expression pattern. regular expressions are widely used in UNIX world.'

### Exercise: Valid Usernames

In [26]:
user_input = input('Enter your username: ').strip()

invalid_input = re.findall(r'\W', user_input)

if invalid_input:
    print('Warning! invalid username')
else:
    print("Username {} is valid.".format(user_input))

Enter your username: fad fa


### Using Patterns Cont.

In [27]:
"""
[abc]edf
aedf
bedf
cedf


a b c x y z
A

"""

'\n[abc]edf\naedf\nbedf\ncedf\n\n\na b c x y z\nA\n\n'

**Intermediate patterns**

| Example  | description |
| --- | --- |
| [Pp]ython | Match "Python" or "python" |
| rub[ye] | Match "ruby" or "rube" |
| [aeiou] | Match any one lowercase vowel |
| [0-9] | Match any digit; same as [0123456789] |
| [a-z] | Match any lowercase ASCII letter |
| [A-Z] | Match any uppercase ASCII letter |
| [a-zA-Z0-9] | Match any of the defined |
| [^aeiou] | Match anything other than a lowercase vowel |
| [^0-9] | Match anything other than a digit |

> we can use **OR** to use multiple regex together.


In [13]:
texts = [
  "python is a great language",
  "i lov to write in py",
  "what a cool language Python is",
  "the pyramids of giza are so huge!"
]


for text in texts:
    python_detected = re.findall(r"[Pp]ython|\b[Pp]y\b", text)
    if python_detected:
        print("talking about python")
    else:
        print("something else")

talking about python
talking about python
talking about python
something else


**Repetition Cases**

| Example | description |
| --- | --- |
| ruby? | Match "rub" or "ruby": the y is optional |
| ruby* | Match "rub" plus 0 or more y(s) |
| ruby+ | Match "rub" plus 1 or more y(s) |
| \d{3} | Match exactly 3 digits |
| \d{3,} | Match 3 or more digits |
| \d{3,5} | Match 3, 4, or 5 digits |

In [14]:
"""

hello+
helloooooo
heoooooo

hel*o+


\d{11}

"""


SyntaxError: invalid syntax (2568249974.py, line 1)

In [28]:
text = "it's 2022, happy new year! and bye bye 2021. My age is 26"

re.findall(r"\d+", text)

['2022', '2021', '26']

### Exercise: Valid Phone Number

In [30]:
user_input = input('Enter your phone number: ').strip()

valid_number = re.findall(r'\d{11}', user_input)

if valid_number:
    print("Your phone number  is {}.".format(user_input))
else:
    print('Warning! invalid phone number')

Enter your phone number: 423423


#### `search()` function

to search for the first string matches the pattern

In [None]:
text = 'My mail is wssamhassan96@gmail.com and anything@gmail.com'

print(re.search(r'\S+@\S+', text))

<re.Match object; span=(11, 34), match='wssamhassan96@gmail.com'>


In [None]:
print(re.search(r'\S+@\S+', text).group(0))

wssamhassan96@gmail.com


In [None]:
\S+@gmail | 
@[gmail]
& 

#### `match()` function

to search for the string matches the pattern only in the text begining

In [None]:
^

In [31]:
text = 'wssamhassan96@gmail.com is my mail.'
print(re.match(r'\S+@\S+', text).group())

text = 'My mail is wssamhassan96@gmail.com and anything@gmail.com'
print(re.match(r'\S+@\S+', text))

wssamhassan96@gmail.com
None


#### `findall()` VS `search()` VS `match()`

In [32]:
text = 'My mail is wssamhassan96@gmail.com and anything@gmail.com'

find_all = re.findall(r'\S+@\S+',  text)
searches = re.search(r'\S+@\S+',  text)
matches = re.match(r'\S+@\S+',  text)

if searches!=None:
    searches = searches.group()
    
if matches!=None:
    matches = matches.group()

In [33]:
print(find_all)
print(searches)
print(matches)

['wssamhassan96@gmail.com', 'anything@gmail.com']
wssamhassan96@gmail.com
None


#### `compile()` function

In [None]:
text = 'Hossam plays footbal.'

extract_words = re.compile(r'\S+')
extract_words.findall(text)

['Hossam', 'plays', 'footbal.']

In [None]:
re.findall('\S+', text)

['Hossam', 'plays', 'footbal.']

> Both are equivalent, but using `re.compile()` and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.