### What are Strings?

<p>The following example shows a string contained within 2 quotation marks:</p>

In [1]:
"This is a string."

'This is a string.'

<p>We can also use single quotation marks:</p>

In [2]:
'This is a string.'

'This is a string.'

<p>A string can be a combination of spaces and digits:</p>

In [3]:
"1 2 3 4 5 6 7 8 9"

'1 2 3 4 5 6 7 8 9'

<p>A string can also be a combination of special characters:</p>

In [4]:
"@#2_#]&*^%$"

'@#2_#]&*^%$'

<p>We can print our string using the print statement:</p>

In [5]:
print("Hello!")

Hello!


<p>We can bind or assign a string to another variable:</p>

In [6]:
name = "SuperHero"
name

'SuperHero'

### Indexing

<p>It is helpful to think of a string as an ordered sequence. Each element in the sequence can be accessed using an index represented by the array of numbers:</p>

| S | u | p | e | r | H | e | r | o |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

<div class="alert alert-success alertsuccess" style="margin-top: 20px">
[Tip]: Because indexing starts at 0, it means the first index is on the index 0.
</div>

In [7]:
print(name[0])

S


<p>We can access index 6:</p>

In [8]:
print(name[6])

e


<p>Moreover, we can access the 8th index:</p>

In [9]:
print(name[8])

o


### Negative Indexing

<p>We can also use negative indexing with strings:</p>

| S  | u  | p  | e  | r  | H  | e  | r  | o  |
|----|----|----|----|----|----|----|----|----|
| -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

<p>Negative index can help us to count the element from the end of the string.</p>

<p>The last element is given by the index -1:</p>

In [10]:
print(name[-1])

o


<p>The first element can be obtained by index -9:</p>

In [11]:
print(name[-9])

S


<p>We can find the number of characters in a string by using <code>len</code>, short for length:</p>

In [12]:
# Find the length of string
len(name)

9

### Slicing

<p>We can obtain multiple characters from a string using slicing, we can obtain the 0 to 5th and 6th to the 8th element:</p>

<table style="border: 1px solid">
<tr>
    <td style="border: 1px solid; text-align: center">S</td>
    <td style="border: 1px solid; text-align: center">u</td>
    <td style="border: 1px solid; text-align: center">p</td>
    <td style="border: 1px solid; text-align: center">e</td>
    <td style="border: 1px solid; text-align: center">r</td>
    <td style="border: 1px solid; text-align: center">H</td>
    <td style="border: 1px solid; text-align: center">e</td>
    <td style="border: 1px solid; text-align: center">r</td>
    <td style="border: 1px solid; text-align: center">o</td>
</tr>
<tr>
    <td style="border: 1px solid; text-align: center">0</td>
    <td style="border: 1px solid; text-align: center">1</td>
    <td style="border: 1px solid; text-align: center">2</td>
    <td style="border: 1px solid; text-align: center">3</td>
    <td style="border: 1px solid; text-align: center">4</td>
    <td style="border: 1px solid; text-align: center">5</td>
    <td style="border: 1px solid; text-align: center">6</td>
    <td style="border: 1px solid; text-align: center">7</td>
    <td style="border: 1px solid; text-align: center">8</td>
</tr>
<tr>
    <td style="border: 1px solid; text-align: center" colspan="5">name[0:5]</td>
    <td style="border: 1px solid; text-align: center" colspan="1"></td>
    <td style="border: 1px solid; text-align: center" colspan="3">name[6:9]</td>
</tr>
</table>

<div class="alert alert-success alertsuccess" style="margin-top: 20px">
[Tip]: When taking the slice, the first number means the index (start at 0), and the second number means the length from the index to the last element you want (start at 1)
</div>

In [13]:
name[0:5]

'Super'

In [14]:
name[6:9]

'ero'

### Stride

<p>We can also input a stride value as follows, with the '2' indicating that we are selecting every second variable:</p>

| S | u | p | e | r | H | e | r | o |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

name[::2] = "Spreo"

In [15]:
# Get every second element. The elements on index 1, 3, 5 ...
name[::2]

'Spreo'

<p>We can also incorporate slicing  with the stride. In this case, we select the first five elements and then use the stride:</p>

In [16]:
# Get every second element in the range from index 0 to index 4
name[0:5:2]

'Spr'

### Concatenate Strings

<p>We can concatenate or combine strings by using the addition symbols, and the result is a new string that is a combination of both:</p>

In [17]:
statement = "I want to be a " + name + "."
statement

'I want to be a SuperHero.'

<p>To replicate values of a string we simply multiply the string by the number of times we would like to replicate it. In this case, the number is three. The result is a new string, and this new string consists of three copies of the original string:</p>

In [18]:
3 * name

'SuperHeroSuperHeroSuperHero'

You can create a new string by setting it to the original variable. Concatenated with a new string, the result is a new string that changes from "SuperHero" to "SuperHero is a special nickname of mine".

In [19]:
name = "SuperHero"
name = name + " is a special nickname of mine."
name

'SuperHero is a special nickname of mine.'

### Escape Sequences

<p>Back slashes represent the beginning  of escape sequences. Escape sequences represent strings that may be difficult to input. For example, back slash "n" represents a new line. The output is given by a new line after the back slash "n" is encountered:</p>

In [20]:
print("SuperHero\n is one of the special symbols in my childhood.")

SuperHero
 is one of the special symbols in my childhood.


<p>Similarly, back slash "t" represents a tab:</p>

In [21]:
print("SuperHero \t is one of the special symbols in my childhood.")

SuperHero 	 is one of the special symbols in my childhood.


<p>If you want to place a back slash in your string, use a double back slash:</p>

In [22]:
print("SuperHero \\ is one of the special symbols in my childhood.")

SuperHero \ is one of the special symbols in my childhood.


<p>We can also place an "r" before the string to display the backslash:</p>

In [23]:
print(r"SuperHero \ is one of the special symbols in my childhood.")

SuperHero \ is one of the special symbols in my childhood.


### String Manipulation Operations

<p>There are many string operation methods in Python that can be used to manipulate the data. We are going to use some basic string operations on the data.</p>

<p>Let's try with the method <code>upper</code>; this method converts lower case characters to upper case characters:</p>

In [24]:
a = "I love Python very very much!"
print("Before upper: ", a)
b = a.upper()
print("After upper: ", b)

Before upper:  I love Python very very much!
After upper:  I LOVE PYTHON VERY VERY MUCH!


<p>The method <code>replace</code> replaces a segment of the string, i.e. a substring  with a new string. We input the part of the string we would like to change. The second argument is what we would like to exchange the segment with, and the result is a new string with the segment changed:</p>

In [25]:
a = "My favourite singer is Jay Chow."
b = a.replace("Jay Chow", "Eason Chan")
b

'My favourite singer is Eason Chan.'

<p>The method <code>find</code> finds a sub-string. The argument is the substring you would like to find, and the output is the first index of the sequence. We can find the sub-string <code>he</code> or <code>Guard<code>.</p>

<table style="border: 1px solid">
<tr>
    <td style="border: 1px solid; text-align: center">S</td>
    <td style="border: 1px solid; text-align: center">u</td>
    <td style="border: 1px solid; text-align: center">p</td>
    <td style="border: 1px solid; text-align: center">e</td>
    <td style="border: 1px solid; text-align: center">r</td>
    <td style="border: 1px solid; text-align: center">H</td>
    <td style="border: 1px solid; text-align: center">e</td>
    <td style="border: 1px solid; text-align: center">r</td>
    <td style="border: 1px solid; text-align: center">o</td>
</tr>
<tr>
    <td style="border: 1px solid; text-align: center">0</td>
    <td style="border: 1px solid; text-align: center">1</td>
    <td style="border: 1px solid; text-align: center">2</td>
    <td style="border: 1px solid; text-align: center">3</td>
    <td style="border: 1px solid; text-align: center">4</td>
    <td style="border: 1px solid; text-align: center">5</td>
    <td style="border: 1px solid; text-align: center">6</td>
    <td style="border: 1px solid; text-align: center">7</td>
    <td style="border: 1px solid; text-align: center">8</td>
</tr>
<tr>
    <td style="border: 1px solid; text-align: center" colspan="5">name.find("Super"): 0</td>
    <td style="border: 1px solid; text-align: center" colspan="4">name.find("Hero"): 5</td>
</tr>
</table>

In [26]:
name = "SuperHero"
name.find("Super")

0

In [27]:
name.find("Hero")

5

<p>If the  sub-string is not in the string then the output is a negative one. For example, the string 'Niko' is not a substring:</p>

In [28]:
name.find("Niko")

-1

The method <code>Split</code> splits the string at the specified separator, and returns a list.

**Syntax**

<code>string.split(separator, maxsplit)</code>

**Parameters**
- separator (optional): This is the delimiter at which the string will be split. If not provided, the default separator is any whitespace.
- maxsplit (optional): This specifies the maximum number of splits to perform. If not provided, there is no limit on the number of splits.

**Return Value**:

The method returns a list of substrings.

In [29]:
split_string = (name.split())
split_string

['SuperHero']

### RegEx

<p>In Python, RegEx (short for Regular Expression) is a tool for matching and handling strings.</p>

<p>This RegEx module provides several functions for working with regular expressions, including <code>search, split, findall,</code> and <code>sub</code>.</p>

<p>Python provides a built-in module called <code>re</code>, which allows you to work with regular expressions.
First, import the <code>re</code> module.</p>

In [30]:
import re

<p>The search() function searches for specified patterns within a string. Here is an example that explains how to use the search() function to search for the word "Body" in the string "The BodyGuard is the best".</p>

In [31]:
s1 = "The BodyGuard is the best album."

# Define the pattern to search for
pattern = r"Body"

# Use the search() function to search for the pattern in the string
result = re.search(pattern=pattern, string=s1)

# Check if a match was found
matched = "Match found!" if result else "No match!"
matched

'Match found!'

Regular expressions (RegEx) are patterns used to match and manipulate strings of text. There are several special sequences in RegEx that can be used to match specific characters or patterns.

| Special Sequence | Meaning                 | 	Example             |
| -----------  | ----------------------- | ----------------------|
| \d|Matches any digit character (0-9)|"123" matches "\d\d\d"|
|\D|Matches any non-digit character|"hello" matches "\D\D\D\D\D"|
|\w|Matches any word character (a-z, A-Z, 0-9, and _)|"hello_world" matches "\w\w\w\w\w\w\w\w\w\w\w"|
|\W|Matches any non-word character|	"@#$%" matches "\W\W\W\W"|
|\s|Matches any whitespace character (space, tab, newline, etc.)|"hello world" matches "\w\w\w\w\w\s\w\w\w\w\w"|
|\S|Matches any non-whitespace character|"hello_world" matches "\S\S\S\S\S\S\S\S\S\S\S"|
|\b|Matches the boundary between a word character and a non-word character|"cat" matches "\bcat\b" in "The cat sat on the mat"|
|\B|Matches any position that is not a word boundary|"cat" matches "\Bcat\B" in "category" but not in "The cat sat on the mat"|

<p>Special Sequence Examples:</p>

<p>A simple example of using the <code>\d</code> special sequence in a regular expression pattern with Python code:
</p>

In [32]:
pattern = r"\d\d\d\d\d\d\d\d"
text = "My phone number is 88888888."
result = re.search(pattern=pattern, string=text)
matched = f"Match found: {result.group()}" if result else "No match!"
matched

'Match found: 88888888'

The match.group() method is used in Python's re module to retrieve the part of the string where the regular expression pattern matched. Here's a detailed explanation:

**Purpose**
- Extract Matched Text: match.group() returns the exact substring that matched the pattern.

**Usage**
- When you use functions like re.search() or re.match(), they return a match object if the pattern is found. You can then use match.group() to get the matched text.

Here `match.group()` retrieves the substring 88888888 from the text, which is the part that matched the pattern.

<p>A simple example of using the <code>\W</code> special sequence in a regular expression pattern with Python code:</p>

In [33]:
pattern = r"\W"
text = "Hello, my friend!"
result = re.findall(pattern=pattern, string=text)
matched = f"Match found: '{result}'" if result else "No match!"
matched

"Match found: '[',', ' ', ' ', '!']'"

<p>The regular expression pattern is defined as r"\W", which uses the \W special sequence to match any character that is not a word character (a-z, A-Z, 0-9, or _). The string we're searching for matches in is "Hello, world!".</p>

<p>The <code>findall()</code> function finds all occurrences of a specified pattern within a string.</p>

In [34]:
s2 = "The BodyGuard is the best album of 'Whitney Houston'."


# Use the findall() function to find all occurrences of the "st" in the string
result = re.findall("st", s2)

# Print out the list of matched words
result

['st', 'st']

<p>A regular expression's <code>split()</code> function splits a string into an array of substrings based on a specified pattern.</p>

In [35]:
split_array = re.split(r"\s", s2)
split_array

['The',
 'BodyGuard',
 'is',
 'the',
 'best',
 'album',
 'of',
 "'Whitney",
 "Houston'."]

Here's a detailed explanation:

<code>re.split("\s", s2)</code>:

**re.split**: This function splits a string by the occurrences of a pattern.
- **r"\s"**: This is a regular expression pattern that matches any whitespace character (spaces, tabs, newlines, etc.).
- **s2**: This is the string that you want to split.

<p>The <code>sub</code> function of a regular expression in Python is used to replace all occurrences of a pattern in a string with a specified replacement.</p>

In [36]:
# Define the regular expression pattern to search for
pattern = r"Whitney Houston"

# Define the replacement string
replacement = "legend"

# Use the sub function to replace the pattern with the replacement string
new_string = re.sub(pattern=pattern, repl=replacement, string=s2, flags=re.IGNORECASE)

# The new_string contains the original string with the pattern replaced by the replacement string
new_string

"The BodyGuard is the best album of 'legend'."

### Quiz on Strings

<p>What is the value of the variable <code>a</code> after the following code is executed?</p>

In [37]:
a = "1"
a

'1'

<p>What is the value of the variable <code>b</code> after the following code is executed?</p>

In [38]:
b = "2"
b

'2'

<p>What is the value of the variable <code>c</code> after the following code is executed?</p>

In [39]:
c = a + b
c

'12'

<p>Consider the variable <code>d</code> use slicing to print out the first three elements:</p>

In [40]:
d = "ABCDEFG"
d[:3]

'ABC'

<p>Use a stride value of 2 to print out every second character of the string <code>e</code>:</p>

In [41]:
e = "clocrkr1e1c1t"
e[::2]

'correct'

<p>Print out a backslash:</p>

In [42]:
print("\\")

\


<p>Convert the variable <code>f</code> to uppercase:</p>

In [43]:
f = "You are wrong!"
f.upper()

'YOU ARE WRONG!'

<p>Convert the variable <code>f2</code> to lowercase:</p>

In [44]:
f2 = "OK, YOU ARE RIGHT."
f2.lower()

'ok, you are right.'

<p>Consider the variable <code>g</code>, and find the first index of the sub-string <code>snow</code>:</p>

In [45]:
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"

g.find("snow")

95

<p>In the variable <code>g</code>, replace the sub-string <code>Mary</code> with <code>Bob</code>:</p>

In [46]:
g.replace("Mary", "Bob")

'Bob had a little lamb Little lamb, little lamb Bob had a little lamb Its fleece was white as snow And everywhere that Bob went Bob went, Bob went Everywhere that Bob went The lamb was sure to go'

<p>In the variable <code>g</code>, replace the sub-string <code>,</code> with <code>.</code>:</p>

In [47]:
g.replace(",", ".")

'Mary had a little lamb Little lamb. little lamb Mary had a little lamb Its fleece was white as snow And everywhere that Mary went Mary went. Mary went Everywhere that Mary went The lamb was sure to go'

<p>In the variable <code>g</code>, split the substring to list:</p>

In [48]:
g.split()

['Mary',
 'had',
 'a',
 'little',
 'lamb',
 'Little',
 'lamb,',
 'little',
 'lamb',
 'Mary',
 'had',
 'a',
 'little',
 'lamb',
 'Its',
 'fleece',
 'was',
 'white',
 'as',
 'snow',
 'And',
 'everywhere',
 'that',
 'Mary',
 'went',
 'Mary',
 'went,',
 'Mary',
 'went',
 'Everywhere',
 'that',
 'Mary',
 'went',
 'The',
 'lamb',
 'was',
 'sure',
 'to',
 'go']

<p>In the string <code>s3</code>, find whether the digit is present or not using the <code>\d</code> and <code>search() </code>function:</p>

In [49]:
pattern = r"\d\d\d\d"
s3 = "House number - 3601"

result = re.search(pattern=pattern, string=s3)
result.group()

'3601'

<p>In the string <code>str1</code>, replace the sub-string <code>fox</code> with <code>bear</code> using <code>sub() </code>function:</p>

In [50]:
pattern = "fox"
replacement = "bear"
str1 = "The quick brown fox jumps over the lazy dog."

result = re.sub(pattern=pattern, repl=replacement, string=str1, flags=re.IGNORECASE)
result

'The quick brown bear jumps over the lazy dog.'

<p>In the string <code>str2</code> find all the occurrences of <code>woo</code> using <code>findall()</code> function:</p>

In [51]:
str2 = "How much wood would a woodchuck chuck, if a woodchuck could chuck wood?"
pattern = r"woo"

result = re.findall(pattern=pattern, string=str2)
result

['woo', 'woo', 'woo', 'woo']

****
### Congratulations!
This is the end of this file.
All assignments in this notebook are completed.
Thanks for reviewing my work!
****