___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

<img src=https://i.ibb.co/6gCsHd6/1200px-Pandas-logo-svg.png width="700" height="200">

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:200%; text-align:center; border-radius:10px 10px;">Data Analysis with Python</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#4d77cf; font-size:200%; text-align:center; border-radius:10px 10px;">RegEx in Python</p>

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>

* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#0)
* [RegEx in PYTHON](#1)
* [RAW STRING ("r/ R")](#2)
* [COMMON PYTHON RegEx FUNCTIONS](#3)    
* [PANDAS FUNCTIONS ACCEPTING RegEx](#4)    
* [THE END OF THE SESSION - 07](#5)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Importing Libraries Needed in This Notebook</p>

<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Once you've installed NumPy & Pandas you can import them as a library:

In [1]:
import numpy as np
import pandas as pd
import re

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">RegEx in Python</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

- A **Reg**ular **Ex**pression (RegEx) is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

- The **Python module** **``re``** provides full support for regular expressions in Python [Source 01](https://docs.python.org/3/library/re.html#re-objects), [Source 02](https://www.tutorialspoint.com/python/python_reg_expressions.htm) & [Source 03](https://www.w3schools.com/python/python_regex.asp).


### Common Expressions

**``\d``** Any numeric digit from ``0`` to ``9``.
                           
**``\D``** Matches any character which is not a decimal digit. This is the opposite of ``\d``.
                           
**``\w``** Any letter, numeric digit, or the underscore character. (Think of this as matching "word" characters.)
                           
**``\W``** Any character that is not a letter, numeric digit, or the underscore character.
                           
**``\s``** Any space, tab, or newline character. (Think of this as matching white-space characters.)
                           
**``\S``** Any character that is not a space, tab, or newline.


### Common Metacharacters

**``"[]"``**	  A set of characters	``"[a-m]"``

**``"\"``**	      Signals a special sequence (can also be used to escape special characters)

**``"."``**	      Any character (except newline character)

**``"^"``**	      Starts with	``"^hello"``

**``"$"``**	      Ends with	``"world$"``

**``"*"``**	      Match zero, one or more of the previous

**``"+"``**	      Match one or more of the previous

**``"?"``**	      Match zero or one of the previous

**``"{}"``**	  Match exactly the specified number of occurrences

**``"|"``**	      Either or	`"falls|stays"`

**``"()"``**	  Capture and group

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Raw String ("r / R")</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

- Python raw string is created by prefixing a string literal with **'r' or 'R'**.
- Python raw string treats **``backslash (\)``** as a literal character. This is useful when we want to have a string that contains backslash and don’t want it to be treated as an escape character [Source 01](https://blog.devgenius.io/beauty-of-raw-strings-in-python-fa627d674cbf) & [Source 02](https://stackoverflow.com/questions/26318287/what-does-r-mean-before-a-regex-pattern#:~:text=The%20r%20means%20that%20the,escape%20codes%20will%20be%20ignored.).

In [2]:
print("Backslash: \\")
print("New line char: \\n")

Backslash: \
New line char: \n


In [3]:
print(r"Backslash: \\")
print(r"New line char: \\n")

Backslash: \\
New line char: \\n


## Invalid Raw String

In [6]:
#print("\") # gives an error

In [7]:
#print(r"\") # gives an error

In [8]:
#print(r"abc\") # gives an error

In [9]:
#print(r"abc\\\)" # gives an error

In [10]:
print(r"abc\\")

abc\\


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Common Python RegEx Functions</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

- **re.search():** Scan through string looking for a match to the pattern.
- **re.match():** Try to apply the pattern at the start of the string.
- **re.fullmatch():** Try to apply the pattern to all of the string.
- **re.findall():** Return a list of all non-overlapping matches in the string.
- **re.sub():** Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.
- **re.split():** Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.

## ``re.search(pattern, string, flags=0)``

Scan through string looking for a match to the pattern, returning a Match object, or None if no match was found [Source](https://www.pythontutorial.net/python-regex/python-regex-flags/).

#### Find numeric digits with search function

In [13]:
text = "A78L41K"

#### with regular expressions

#### with compile() method

#### Find non decimal digits with search function

In [24]:
text = "8PM19MIN"

#### Find phone number pattern

In [28]:
text = 'My phone number is 1234567890'

#### Find phone number pattern by grouping

#### Escaping parentheses and create 2 group -> first group:(415) second group:555-1212 print

In [40]:
text = 'My phone number is (415) 555-1212'

## ``re.match(pattern, string, flags=0)``

Try to apply the pattern at the start of the string, returning a Match object, or None if no match was found.

If you want to locate a match anywhere in string, use search() instead of match()

In [42]:
text = "A78L41K"

## ``re.fullmatch(pattern, string, flags=0)``

Try to apply the pattern to all of the string

In [45]:
text = "A78L41K"

## ``re.findall(pattern, string, flags=0)``

Return a list of all non-overlapping matches in the string.

#### Extract numbers from text as a list

In [48]:
text = "O 1, t 10, o 100. 100000"

#### Extract words begining with "f"

In [55]:
text = 'which foot or hand fell fastest'

#### Extract equations made up of words and numbers

In [57]:
text = 'set width=20 and height=10'

#### Check if the string starts with 'hello'

In [60]:
text = "hello world"

#### Check if the string ends with 'world'

## ``re.sub(pattern, repl, string, count=0, flags=0)``

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl.  repl can be either a string or a callable; if a string, backslash escapes in it are processed.  If it is a callable, it's passed the Match object and must return a replacement string to be used.

#### Remove anything other than digits

In [65]:
text = "2004-959-559 # This is Phone Number"

#### Remove digits and replace with "."

## ``re.split(pattern, string, maxsplit=0, flags=0)``

Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.

In [70]:
text = "ab56cd78_de fg3hıi49"

In [73]:
re.findall("\d+", text)

['56', '78', '3', '49']

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Pandas Functions Accepting RegEx</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

- **count():** Count occurrences of pattern in each string of the Series/Index
- **replace():** Replace the search string or pattern with the given value
- **contains():** Test if pattern or regex is contained within a string of a Series or Index. Calls re.search() and returns a boolean
- **findall():** Find all occurrences of pattern or regular expression in the Series/Index. Equivalent to applying re.findall() on all elements
- **match():** Determine if each string matches a regular expression. Calls re.match() and returns a boolean
- **split():** Split strings around given separator/delimiter and accepts string or regular expression to split on
- **extract():** Extract capture groups in the regex pat as columns in a DataFrame and returns the captured groups

In [74]:
data = [['Evert van Dijk', 'Carmine-pink, salmon-pink streaks, stripes, flecks. #94569# Warm pink, clear carmine pink, rose pink shaded salmon.  Mild fragrance.  Large, very double, in small clusters, high-centered bloom form.  Blooms in flushes throughout the season.'],
        ['Every Good Gift', 'Red.  Flowers velvety red.  #079463895689# Moderate fragrance.  Average diameter 4".  Medium-large, full (26-40 petals), borne mostly solitary bloom form.  Blooms in flushes throughout the season.'], 
        ['Evghenya', 'Orange-pink.  75 petals.  Large, very double #68345_686# bloom form.  Blooms in flushes throughout the season.'], 
        ['Evita', 'White or white blend.  None to mild fragrance.  35 petals #9897#.  Large, full (26-40 petals), high-centered bloom form.  Blooms in flushes throughout the season.'],
        ['Evrathin', 'Light pink. [Deep pink.]  Outer petals white. Expand rarely #679754YH89#.  Mild fragrance.  35 to 40 petals.  Average diameter 2.5".  Medium, double (17-25 petals), full (26-40 petals), cluster-flowered, in small clusters bloom form.  Prolific, once-blooming spring or summer.  Glandular sepals, leafy sepals, long sepals buds.'],
        ['Evita 2', 'White, blush shading.  Mild, wild rose fragrance #AGHJS876IOP#.  20 to 25 petals.  Average diameter 1.25".  Small, very double, cluster-flowered bloom form.  Blooms in flushes throughout the season.']]
  
df = pd.DataFrame(data, columns = ['name', 'bloom']) 
df 

Unnamed: 0,name,bloom
0,Evert van Dijk,"Carmine-pink, salmon-pink streaks, stripes, fl..."
1,Every Good Gift,Red. Flowers velvety red. #079463895689# Mod...
2,Evghenya,"Orange-pink. 75 petals. Large, very double #..."
3,Evita,White or white blend. None to mild fragrance....
4,Evrathin,Light pink. [Deep pink.] Outer petals white. ...
5,Evita 2,"White, blush shading. Mild, wild rose fragran..."


## ``pandas.Series.str.count(pat, flags=0)``

Count occurrences of pattern in each string of the Series/Index.

This function is used to count the number of times a particular regex pattern is repeated in each of the string elements of the Series.

#### How many numerical values are there in each row of "bloom" feature?

#### How many characters are there in each row of "bloom" feature?

#### How many sentences are there in each row of "bloom" feature?

## ``pandas.Series.str.replace(pat, repl, n=- 1, case=None, flags=0, regex=None)``

Replace each occurrence of pattern/regex in the Series/Index.

Equivalent to str.replace() or re.sub(), depending on the regex value.

#### Replace the values finding between the two "#" characters (including "#" characters) with the "" in each row of "bloom" feature 

## ``pandas.Series.str.contains(pat, case=True, flags=0, na=None, regex=True)``

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

#### Which rows in "bloom" feature includes "diameter" value?

## ``pandas.Series.str.findall(pat, flags=0)``

Find all occurrences of pattern or regular expression in the Series/Index.

Equivalent to applying re.findall() to all the elements in the Series/Index.

#### Find all numeric values in each rows of the "bloom" feature 

#### Find diameter values in each rows of the "bloom" feature

## ``pandas.Series.str.match(pat, case=True, flags=0, na=None)``

Determine if each string starts with a match of a regular expression.

#### Find the rows of pink blooms (this information is available in the first words of the rows)

## ``pandas.Series.str.split(pat=None, n=- 1, expand=False, *, regex=None)``

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string.

#### Split each rows of "bloom" feature from the dot character as sentences 

In [92]:
info = ["id:345, age:25, salary:1200", "id:346, age:32, salary:1500", "id:347, age:28, salary:1400"]
s = pd.Series(info)
s

0    id:345, age:25, salary:1200
1    id:346, age:32, salary:1500
2    id:347, age:28, salary:1400
dtype: object

#### Split the serie to create a dataframe consisting of "id, age and salary" columns.

## ``pandas.Series.str.extract(pat, flags=0, expand=True)``

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

#### Extract just numbers

In [96]:
s = pd.Series(['a3aa', 'b4aa', 'c5aa'])
s

0    a3aa
1    b4aa
2    c5aa
dtype: object

#### Extract just letters

#### Extract "id, age and salary" values to create a dataframe consisting of "id, age and salary" columns.

In [101]:
info = ["id:345, age:25, salary:1200", "id:346, age:32, salary:1500", "id:347, age:28, salary:1400"]
s = pd.Series(info)
s

0    id:345, age:25, salary:1200
1    id:346, age:32, salary:1500
2    id:347, age:28, salary:1400
dtype: object

#### Extract first number

In [105]:
s= pd.Series(['40 l/100 km (comb)', 
        '38 l/100 km (comb)', '6.4 l/100 km (comb)',
       '8.3 kg/100 km (comb)', '5.1 kg/100 km (comb)',
       '5.4 l/100 km (comb)', '6.7 l/100 km (comb)',
       '6.2 l/100 km (comb)', '7.3 l/100 km (comb)',
       '6.3 l/100 km (comb)', '5.7 l/100 km (comb)',
       '6.1 l/100 km (comb)', '6.8 l/100 km (comb)',
       '7.5 l/100 km (comb)', '7.4 l/100 km (comb)',
       '3.6 kg/100 km (comb)', '0 l/100 km (comb)', 
       '7.8 l/100 km (comb)'])
s

0       40 l/100 km (comb)
1       38 l/100 km (comb)
2      6.4 l/100 km (comb)
3     8.3 kg/100 km (comb)
4     5.1 kg/100 km (comb)
5      5.4 l/100 km (comb)
6      6.7 l/100 km (comb)
7      6.2 l/100 km (comb)
8      7.3 l/100 km (comb)
9      6.3 l/100 km (comb)
10     5.7 l/100 km (comb)
11     6.1 l/100 km (comb)
12     6.8 l/100 km (comb)
13     7.5 l/100 km (comb)
14     7.4 l/100 km (comb)
15    3.6 kg/100 km (comb)
16       0 l/100 km (comb)
17     7.8 l/100 km (comb)
dtype: object

#### Extract first and second number

#### Extract date as month and year separately

In [108]:
s = pd.Series(['06/2020\n\n4.9 l/100 km (comb)',
'11/2020\n\n166 g CO2/km (comb)',                                 
'10/2019\n\n5.3 l/100 km (comb)',
'05/2022\n\n6.3 l/100 km (comb)',
'07/2019\n\n128 g CO2/km (comb)',
'06/2022\n\n112 g CO2/km (comb)',                                                 
'01/2022\n\n5.8 l/100 km (comb)',
'11/2020\n\n106 g CO2/km (comb)',
'04/2019\n\n105 g CO2/km (comb)',
'08/2020\n\n133 g CO2/km (comb)',
'04/2022\n\n133 g CO2/km (comb)'])
s

0     06/2020\n\n4.9 l/100 km (comb)
1     11/2020\n\n166 g CO2/km (comb)
2     10/2019\n\n5.3 l/100 km (comb)
3     05/2022\n\n6.3 l/100 km (comb)
4     07/2019\n\n128 g CO2/km (comb)
5     06/2022\n\n112 g CO2/km (comb)
6     01/2022\n\n5.8 l/100 km (comb)
7     11/2020\n\n106 g CO2/km (comb)
8     04/2019\n\n105 g CO2/km (comb)
9     08/2020\n\n133 g CO2/km (comb)
10    04/2022\n\n133 g CO2/km (comb)
dtype: object

#### Extract date and comsuption value -> 4.9

#### Extract date as month and year separately

In [111]:
s = pd.Series(['\n\n4.9 06/2020 l/100 km (comb)',
'\n\n166 11/2020 g CO2/km (comb)',                                 
'\n\n5.3 10/2019 l/100 km (comb)',
'\n\n6.3 05/2022 l/100 km (comb)',
'\n\n128 07/2019 g CO2/km (comb)',
'\n\n112 06/2022 g CO2/km (comb)',                                                 
'\n\n5.8 01/2022 l/100 km (comb)'])
s

0    \n\n4.9 06/2020 l/100 km (comb)
1    \n\n166 11/2020 g CO2/km (comb)
2    \n\n5.3 10/2019 l/100 km (comb)
3    \n\n6.3 05/2022 l/100 km (comb)
4    \n\n128 07/2019 g CO2/km (comb)
5    \n\n112 06/2022 g CO2/km (comb)
6    \n\n5.8 01/2022 l/100 km (comb)
dtype: object

## Example For Slides

In [114]:
text = "my email adress is example@gmail.com"

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:150%; text-align:center; border-radius:10px 10px;">The End of The Session - 11 (Part - 01)</p>

<a id="5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

____