# Regular Expressions in Python

## 1. Basic Concepts of String Manipulation

### 1.1 Introduction to string manipulation

#### 1.1.1 Strings
* **Quotes: single quotes `''` or double quotes `""`**

In [1]:
my_string = "This is a string"
my_string2 = 'This is also a string'

In [3]:
my_string = 'And this? It's the wrong string'

SyntaxError: unterminated string literal (detected at line 1) (1324863342.py, line 1)

In [4]:
my_string = "And this? It's the correct string"

* **Length: `len()`**

In [6]:
my_string = "Awesome day"
len(my_string)

11

* **Convert to string: `str()`**

In [7]:
str(123)

'123'

#### 1.1.2 Concatenation: `+` operator

In [9]:
my_string1 = "Awesome day"
my_string2 = "for biking"
print(my_string1 + " " + my_string2)

Awesome day for biking


#### 1.1.3 Indexing: Bracket notation `[]`
* **Index: start from 0, reverse from -1**

0  1  2  3  4  5  6  7  8
M  Y  _  S  T  R  I  N  G
-9 -8 -7 -6 -5 -4 -3 -2 -1 

In [27]:
my_string = "MY_STRING"
print(my_string[3])
print(my_string[-1])

S
G


#### 1.1.4 Slicing: Bracket notation and colon notation `[:]`
* **Include left and exclude right**

In [28]:
my_string = "MY_STRING"
print(my_string[0:3])

MY_


* **No number means all**

In [29]:
print(my_string[:5])
print(my_string[5:])

MY_ST
RING


#### 1.1.5 Stride: `[start:end:skip]`

In [30]:
my_string = "MY_STRING"
print(my_string[0:6:2])
print(my_string[::-1])

M_T
GNIRTS_YM


### 1.2 String operations

#### 1.2.1 Adjusting cases

* **Converting to lowercase: `.lower()`**

In [16]:
my_string = "tHis Is a niCe StriNg"
print(my_string.lower())

this is a nice string


* **Converting to uppercase: `.upper()`**

In [17]:
my_string = "tHis Is a niCe StriNg"
print(my_string.upper())

THIS IS A NICE STRING


* **Capitalizing the first character: `.capitalize()`**

my_string = "tHis Is a niCe StriNg"
print(my_string.capitalize())

#### 1.2.2 Splitting 
* **Splitting a string into a list of substrings: `.split()`, `.rsplit()`**

In [21]:
my_string = "This string will be split"
my_string.split(sep=" ", maxsplit=2)

['This', 'string', 'will be split']

In [22]:
my_string.rsplit(sep=" ", maxsplit=2)

['This string will', 'be', 'split']

* **Escape Sequence**

|Escape Sequence |Character       |
|:--------------:|:--------------:|
| `\n`           | Newline        |
| `\r`           | Carriage return|

In [23]:
my_string = "This string will be split\nin two"
print(my_string)

This string will be split
in two


In [32]:
my_string = "This string will be split\rin two"
print(my_string)

in twotring will be split


* **Breaking at line boundaries: `.splitlines()`**

In [34]:
my_string = "This string will be split\nin two"
my_string.splitlines()

['This string will be split', 'in two']

#### 1.2.3 Joining

* **Concatenate strings from list or another iterable: `sep.join(iterable)`**

In [35]:
my_list = ["this", "would", "be", "a", "string"]
print(" ".join(my_list))

this would be a string


#### 1.2.4 Stripping characters

* **Strips characters from left to right: `.strip()`**     
  Remove both the leading space and the trailing escape sequence

In [36]:
my_string = " This string will be stripped\n"
my_string.strip()

'This string will be stripped'

* **Remove charaters from the right end**

In [37]:
my_string = " This string will be stripped\n"
my_string.rstrip()

' This string will be stripped'

* **Remove characters from the left end**

In [38]:
my_string = " This string will be stripped\n"
my_string.lstrip()

'This string will be stripped\n'

### 1.3 Finding and replacing

#### 1.3.1 Finding substrings

* **Search target string for a specified substring: `string.find(substring, start, end)`**

In [39]:
my_string = "Where's Waldo?"
my_string.find("Waldo")

8

In [49]:
my_string.find("Wenda")
# the substring is not found and the method returns minus one

-1

In [41]:
my_string.find("Waldo", 0, 6)

-1

#### 1.3.2 Index function

* **Search target string for a specified substring: `string.index(substring, start, end)`**

In [42]:
my_string = "Where's Waldo?"
my_string.index("Waldo")

8

In [50]:
my_string.index("Wenda")
# the substring is not found and the method returns error

ValueError: substring not found

In [44]:
my_string = "Where's Waldo?"
try:    
    my_string.index("Wenda")
except ValueError:
    print("Not found")

Not found


#### 1.3.3 Counting occurrences

* **Return number of occurrences for a specified substring: `string.count(substring, start, end)`**

In [45]:
my_string = "How many fruits do you have in your fruit basket?"
my_string.count("fruit")

2

In [46]:
my_string.count("fruit", 0, 16)

1

#### 1.3.4 Replacing substring

* **Replace occurrences of substring with new substring: `string.replace(old, new, count)`**

In [47]:
my_string = "The red house is between the blue house and the old house"
print(my_string.replace("house", "car"))

The red car is between the blue car and the old car


In [51]:
print(my_string.replace("house", "car", 2))
# replace first two occurrences

The red car is between the blue car and the old house


## 2. Formatting Strings

### 2.1 Positional formatting

#### 2.1.1 What is string formatting?

* **String interpolation**
* **Insert a custom string or variable in predefined text**

In [52]:
custom_string = "String formatting"
print(f"{custom_string} is a powerful technique")

String formatting is a powerful technique


* **Usage:**
    * Title in a graph
    * Show message or error
    * Pass a statement to a function

#### 2.1.2 Methods for formatting

* Positional formatting
* Formatted string literals
* Template method

#### 2.1.3 Positional formatting: `str.format()`

* Placeholder replace by value: `'text{}'.format(value)`

In [53]:
print("Machine learning provides {} the ability to learn {}".format("systems", "automatically"))

Machine learning provides systems the ability to learn automatically


* Use variables for both the initial string and the values passed into the method

In [54]:
my_string = "{} rely on {} datasets"
method = "Supervised algorithms"
condition = "labeled"
print(my_string.format(method, condition))

Supervised algorithms rely on labeled datasets


#### 2.1.4 Reordering values

* Include an index number into the placeholders to reorder values

In [55]:
print("{} has a friend called {} and a sister called {}".format("Betty", "Linda", "Daisy"))

Betty has a friend called Linda and a sister called Daisy


In [56]:
print("{2} has a friend called {0} and a sister called {1}".format("Betty", "Linda", "Daisy"))

Daisy has a friend called Betty and a sister called Linda


#### 2.1.5 Named placeholders: specify a name for the placeholders

In [57]:
tool="Unsupervised algorithms"
goal="patterns"
print("{title} try to find {aim} in the dataset".format(title=tool, aim=goal))

Unsupervised algorithms try to find patterns in the dataset


In [58]:
my_methods = {"tool": "Unsupervised algorithms", "goal": "patterns"}
print('{data[tool]} try to find {data[goal]} in the dataset'.format(data=my_methods))

Unsupervised algorithms try to find patterns in the dataset


#### 2.1.6 Format specifier

* Specify data type to be used: `{index:specifier}`

In [59]:
print("Only {0:f}% of the {1} produced worldwide is {2}!".format(0.5155675, "data", "analyzed"))

Only 0.515567% of the data produced worldwide is analyzed!


In [61]:
print("Only {0:.2f}% of the {1} produced worldwide is {2}!".format(0.5155675, "data", "analyzed"))

Only 0.52% of the data produced worldwide is analyzed!


#### 2.1.7 Formatting datetime

In [62]:
from datetime import datetime
print(datetime.now())

2023-07-11 15:30:53.090008


In [63]:
print("Today's date is {:%Y-%m-%d %H:%M}".format(datetime.now()))

Today's date is 2023-07-11 15:31


### 2.2 Formatted string literal

#### 2.2.1 f-strings

* Minimal syntax
* Add prefix `f` to string: `f"literal string {expression}"`

In [64]:
way = "code"
method = "learning Python faster"
print(f"Practicing how to {way} is the best method for {method}")

Practicing how to code is the best method for learning Python faster


#### 2.2.2 Type conversion

* Allowed conversions:
    * `!s` (string version)
    * `!r` (string containing a printable representation, i.e. with quotes)
    * `!a` (same as `!r` but escape the non-ASCII characters)

In [66]:
name = "Python"
print(f"Python is called {name!r} due to a comedy series")

Python is called 'Python' due to a comedy series


#### 2.2.3 Format specifiers
* Standard format specifier:
    * `e` (scientific notation, e.g. 5 10^3)
    * `d` (digit, e.g. 4)
    * `f` (float, e.g. 4.5353)

In [67]:
number = 90.41890417471841
print(f"In the last 2 years, {number:.2f}% of the data was produced worldwide!")

In the last 2 years, 90.42% of the data was produced worldwide!


* `datetime`

In [70]:
from datetime import datetime
my_today = datetime.now()
print(f"Today's date is {my_today:%B %d, %Y}")

Today's date is July 11, 2023


#### 2.2.4 Index lookups

In [71]:
family = {"dad": "John", "siblings": "Peter"}
print("Is your dad called {family[dad]}?".format(family=family))

Is your dad called John?


* Use quotes for index lookups

In [92]:
family['dad']

'John'

In [90]:
print(f"Is your dad called {family['dad']}?")

Is your dad called John?


#### 2.2.5 Escape sequences

* Escape sequences: backslashes `\`

In [83]:
print("My dad is called "John"")

SyntaxError: invalid syntax. Perhaps you forgot a comma? (1677833228.py, line 1)

In [85]:
my_string = "My dad is called \"John\""
print(my_string)

My dad is called "John"


* Backslashes are not allowed in f-strings

In [87]:
family = {"dad": "John", "siblings": "Peter"}
print(f"Is your dad called {family[\"dad\"]}?")

SyntaxError: f-string expression part cannot include a backslash (2870725557.py, line 2)

In [88]:
print(f"Is your dad called {family['dad']}?")

Is your dad called John?


#### 2.2.6 Inline operations

* Advantage: evaluate expressions and call functions inline

* **Template method**