# Learning Objectives

*   Work with Strings
*   Perform operations on String
*   Manipulate Strings using indexing and escape sequences

# What are Strings?

In Python, a string is a sequence of characters enclosed within either single quotes (' ') or double quotes (" "). Strings are one of the fundamental data types in Python and are used to represent text. Here are some key points about strings in Python:

1. **Immutable**: Once a string is created, its contents cannot be changed. However, operations on strings often create new strings rather than modifying the original.

2. **Sequence of Characters**: Each character in a string has a specific position or index. Python uses zero-based indexing, meaning the first character of a string has an index of 0, the second character has an index of 1, and so on.

3. **Escape Characters**: Strings can contain special characters that are difficult to type or are non-printable, such as newline (\n), tab (\t), or backslash itself (\\). These characters are preceded by a backslash (\) and are known as escape characters.

4. **Concatenation**: Strings can be concatenated, or joined together, using the concatenation operator (+) or by simply placing them adjacent to each other.

5. **Length**: The length of a string, i.e., the number of characters it contains, can be obtained using the len() function.

6. **Indexing and Slicing**: Individual characters in a string can be accessed using indexing. Slicing allows you to extract a portion of a string by specifying a range of indices.

Here's an example demonstrating strings in Python:

In [2]:
# Define a string
my_string = "Hello, World!"

# Print the string
print(my_string)  # Output: Hello, World!

# Get the length of the string
print(len(my_string))  # Output: 13

# Accessing individual characters
print(my_string[0])  # Output: H
print(my_string[-1])  # Output: !

# Slicing
print(my_string[0:5])  # Output: Hello
print(my_string[7:])   # Output: World!

Hello, World!
13
H
!
Hello
World!


In [3]:
# Digitals and spaces in string

'1 2 3 4 5 6 '

'1 2 3 4 5 6 '

In [4]:
# Special characters in string

'@#2_#]&*^%$'

'@#2_#]&*^%$'

## Indexing

It is helpful to think of a string as an ordered sequence. Each element in the sequence can be accessed using an index represented by the array of numbers:

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%201/images/StringsIndex.png" width="600" align="center">


In [7]:
# Assign string to variable

name = "Michael Jackson"
name

'Michael Jackson'

In [8]:
# Print the first element in the string

print(name[0])

M


In [9]:
# Print the element on index 6 in the string

print(name[6])

l


In [10]:
# Print the element on the 13th index in the string

print(name[13])

o


## Negative Indexing

We can also use negative indexing with strings. Negative index can help us to count the element from the end of the string.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%201/images/StringsNeg.png" width="600" align="center">


In [12]:
# Print the last element in the string

print(name[-1])

n


In [13]:
# Print the first element in the string

print(name[-15])

M


# Slicing

We can obtain multiple characters from a string using slicing, we can obtain the 0 to 4th and 8th to the 12th element. 
When taking the slice, the first number means the index (start at 0), and the second number means the length from the index to the last element you want 

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%201/images/StringsSlice.png" width="600" align="center">


In [14]:
# Take the slice on variable name with only index 0 to index 3

name[0:4]

'Mich'

In [15]:
# Take the slice on variable name with only index 8 to index 11

name[8:12]

'Jack'

# Stride

<p> Stride in Python refers to the step size used when traversing a sequence such as a string, list, or tuple. It determines how many elements or characters to skip between successive elements during iteration.</p>

In Python, stride can be specified as the third parameter in a slicing operation. The general syntax for slicing is:
    
    sequence[start:end:stride]
    
<p> Here's what each part of the slicing syntax represents:
<ol>
    <li>start: The starting index of the slice (inclusive).</li>
    <li>end: The ending index of the slice (exclusive).</li>
    <li>stride: The step size or the number of elements to skip between successive elements.</li>
</ol>
If the stride parameter is omitted, it defaults to 1, meaning that every element in the specified range is included in the slice.</p>

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%201/images/StringsStride.png" width="600" align="center">

In [16]:
# Get every second element. The elments on index 1, 3, 5 ...

name[::2]

'McalJcsn'

In [17]:
# Get every second element in the range from index 0 to index 4

name[0:5:2]

'Mca'

## Concatenate Strings

We can concatenate or combine strings by using the addition symbols, and the result is a new string that is a combination of both:

In [18]:
# Concatenate two strings

statement = name + "is the best"
statement

'Michael Jacksonis the best'

To replicate values of a string we simply multiply the string by the number of times we would like to replicate it. In this case, the number is three. The result is a new string, and this new string consists of three copies of the original string:

In [19]:
# Print the string for 3 times

3 * "Michael Jackson"

'Michael JacksonMichael JacksonMichael Jackson'

You can create a new string by setting it to the original variable. Concatenated with a new string, the result is a new string that changes from Michael Jackson to “Michael Jackson is the best".

In [20]:
# Concatenate strings

name = "Michael Jackson"
name = name + " is the best"
name

'Michael Jackson is the best'

# Escape Sequences

Back slashes represent the beginning of escape sequences. Escape sequences represent strings that may be difficult to input. For example, back slash "n" represents a new line. The output is given by a new line after the back slash "n" is encountered:

In [21]:
# New line escape sequence

print(" Michael Jackson \n is the best" )

 Michael Jackson 
 is the best


Similarly, back slash "t" represents a tab:

In [22]:
# Tab escape sequence

print(" Michael Jackson \t is the best" )

 Michael Jackson 	 is the best


If you want to place a back slash in your string, use a double back slash:

In [23]:
# Include back slash in string

print(" Michael Jackson \\ is the best" )

 Michael Jackson \ is the best


We can also place an "r" before the string to display the backslash:

In [24]:
# r will tell python that string will be display as raw string

print(r" Michael Jackson \ is the best" )

 Michael Jackson \ is the best


# String Manipulation Operations

There are many string operation methods in Python that can be used to manipulate the data. We are going to use some basic string operations on the data.
Let's try with the method <code>upper</code>; this method converts lower case characters to upper case characters:

In [25]:
# Convert all the characters in string to upper case

a = "Thriller is the sixth studio album"
print("before upper:", a)
b = a.upper()
print("After upper:", b)

before upper: Thriller is the sixth studio album
After upper: THRILLER IS THE SIXTH STUDIO ALBUM


The method replace replaces a segment of the string, i.e. a substring with a new string. We input the part of the string we would like to change. The second argument is what we would like to exchange the segment with, and the result is a new string with the segment changed:

In [26]:
# Replace the old substring with the new target substring is the segment has been found in the string

a = "Michael Jackson is the best"
b = a.replace('Michael', 'Janet')
b

'Janet Jackson is the best'

The method find finds a sub-string. The argument is the substring you would like to find, and the output is the first index of the sequence. We can find the sub-string jack or el.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%201/images/StringsFind.png" width="600" align="center">


In [27]:
# Find the substring in the string. Only the index of the first elment of substring in string will be the output

name = "Michael Jackson"
name.find('el')

5

In [28]:
# Find the substring in the string.

name.find('Jack')

8

If the sub-string is not in the string then the output is a negative one. For example, the string 'Jasdfasdasdf' is not a substring:

In [29]:
# If cannot find the substring in the string

name.find('Jasdfasdasdf')

-1

The method Split splits the string at the specified separator, and returns a list:

In [30]:
#Split the substring into list
name = "Michael Jackson"
split_string = (name.split())
split_string

['Michael', 'Jackson']

# RegEx

In Python, RegEx (short for Regular Expression) is a tool for matching and handling strings. 

This RegEx module provides several functions for working with regular expressions, including <code>search, split, findall,</code> and <code>sub</code>.

Python provides a built-in module called <code>re</code>, which allows you to work with regular expressions. 
First, import the <code>re</code> module

In [31]:
import re

The search() function searches for specified patterns within a string. Here is an example that explains how to use the search() function to search for the word "Jackson" in the string "Michael Jackson is the best".

In [32]:
s1 = "Michael Jackson is the best"

# Define the pattern to search for
pattern = r"Jackson"

# Use the search() function to search for the pattern in the string
result = re.search(pattern, s1)

# Check if a match was found
if result:
    print("Match found!")
else:
    print("Match not found.")


Match found!


Regular expressions (RegEx) are patterns used to match and manipulate strings of text. There are several special sequences in RegEx that can be used to match specific characters or patterns.

| Special Sequence | Meaning                 | 	Example             |
| -----------  | ----------------------- | ----------------------|
| \d|Matches any digit character (0-9)|"123" matches "\d\d\d"|
|\D|Matches any non-digit character|"hello" matches "\D\D\D\D\D"|
|\w|Matches any word character (a-z, A-Z, 0-9, and _)|"hello_world" matches "\w\w\w\w\w\w\w\w\w"|
|\W|Matches any non-word character|	"@#$%" matches "\W\W\W\W"|
|\s|Matches any whitespace character (space, tab, newline, etc.)|"hello world" matches "\w\s\w\w\w\w\w"|
|\S|Matches any non-whitespace character|"hello_world" matches "\S\S\S\S\S\S\S\S\S"|
|\b|Matches the boundary between a word character and a non-word character|"cat" matches "\bcat\b" in "The cat sat on the mat"|
|\B|Matches any position that is not a word boundary|"cat" matches "\Bcat\B" in "category" but not in "The cat sat on the mat"|


Special Sequence Examples:

A simple example of using the <code>\d</code> special sequence in a regular expression pattern with Python code:


In [33]:
pattern = r"\d\d\d\d\d\d\d\d\d\d"  # Matches any ten consecutive digits
text = "My Phone number is 1234567890"
match = re.search(pattern, text)

if match:
    print("Phone number found:", match.group())
else:
    print("No match")

Phone number found: 1234567890


The regular expression pattern is defined as r"\d\d\d\d\d\d\d\d\d\d", which uses the \d special sequence to match any digit character (0-9), and the \d sequence is repeated ten times to match ten consecutive digits

A simple example of using the <code>\W</code> special sequence in a regular expression pattern with Python code:


In [34]:
pattern = r"\W"  # Matches any non-word character
text = "Hello, world!"
matches = re.findall(pattern, text)

print("Matches:", matches)

Matches: [',', ' ', '!']


The regular expression pattern is defined as r"\W", which uses the \W special sequence to match any character that is not a word character (a-z, A-Z, 0-9, or _). The string we're searching for matches in is "Hello, world!".

The findall() function finds all occurrences of a specified pattern within a string.

In [35]:
s2 = "Michael Jackson was a singer and known as the 'King of Pop'"


# Use the findall() function to find all occurrences of the "as" in the string
result = re.findall("as", s2)

# Print out the list of matched words
print(result)

['as', 'as']


A regular expression's split() function splits a string into an array of substrings based on a specified pattern.

In [36]:
# Use the split function to split the string by the "\s"
split_array = re.split("\s", s2)

# The split_array contains all the substrings, split by whitespace characters
print(split_array) 

['Michael', 'Jackson', 'was', 'a', 'singer', 'and', 'known', 'as', 'the', "'King", 'of', "Pop'"]


The <code>sub</code> function of a regular expression in Python is used to replace all occurrences of a pattern in a string with a specified replacement.


In [37]:
# Define the regular expression pattern to search for
pattern = r"King of Pop"

# Define the replacement string
replacement = "legend"

# Use the sub function to replace the pattern with the replacement string
new_string = re.sub(pattern, replacement, s2, flags=re.IGNORECASE)

# The new_string contains the original string with the pattern replaced by the replacement string
print(new_string) 

Michael Jackson was a singer and known as the 'legend'


What is the value of the variable a after the following code is executed?

In [39]:
a = "1"
a

'1'

What is the value of the variable b after the following code is executed?

In [40]:
b = "2"
b

'2'

What is the value of the variable c after the following code is executed?

In [41]:
c = a + b
c

'12'

Consider the variable d use slicing to print out the first three elements:

In [44]:
d = 'ABCDEFG'
d[0:3]

'ABC'

Use a stride value of 2 to print out every second character of the string e:

In [47]:
e = 'clocrkr1e1c1t'
e[::2]

'correct'

Print out a backslash:

In [50]:
print("\\\\")

# or

print(r"\ ")

\\
\ 


Convert the variable f to uppercase:

In [51]:
f = "You are wrong"
F = f.upper()
print(F)

YOU ARE WRONG


Convert the variable f2 to lowercase:

In [53]:
f2 = "YOU ARE RIGHT"
f2 = f2.lower()
print(f2)

you are right


Consider the variable g, and find the first index of the sub-string snow:

In [54]:
g = "Mary had a little lamb Little lamb, little lamb Mary had a little lamb \
Its fleece was white as snow And everywhere that Mary went Mary went, Mary went \
Everywhere that Mary went The lamb was sure to go"
g.find('snow')

95

In the variable g, replace the sub-string Mary with Bob:

In [55]:
g.replace('Mary','Bob')

'Bob had a little lamb Little lamb, little lamb Bob had a little lamb Its fleece was white as snow And everywhere that Bob went Bob went, Bob went Everywhere that Bob went The lamb was sure to go'

In the variable g, replace the sub-string , with .:

In [56]:
g.replace(',','.')

'Mary had a little lamb Little lamb. little lamb Mary had a little lamb Its fleece was white as snow And everywhere that Mary went Mary went. Mary went Everywhere that Mary went The lamb was sure to go'

In the variable g, split the substring to list:

In [57]:
g.split()

['Mary',
 'had',
 'a',
 'little',
 'lamb',
 'Little',
 'lamb,',
 'little',
 'lamb',
 'Mary',
 'had',
 'a',
 'little',
 'lamb',
 'Its',
 'fleece',
 'was',
 'white',
 'as',
 'snow',
 'And',
 'everywhere',
 'that',
 'Mary',
 'went',
 'Mary',
 'went,',
 'Mary',
 'went',
 'Everywhere',
 'that',
 'Mary',
 'went',
 'The',
 'lamb',
 'was',
 'sure',
 'to',
 'go']

In the string s3, find whether the digit is present or not using the \d and search() function:

In [58]:
s3 = "House number- 1105"
# Use the search() function to search for the "\d" in the string
result = re.search("\d", s3)

# Check if a match was found
if result:
    print("Digit found")
else:
    print("Digit not found.")

Digit found


In the string str1, replace the sub-string fox with bear using sub() function:

In [59]:
str1= "The quick brown fox jumps over the lazy dog."
# Use re.sub() to replace "fox" with "bear"
new_str1 = re.sub(r"fox", "bear", str1)

print(new_str1)

The quick brown bear jumps over the lazy dog.


In the string str2 find all the occurrences of woo using findall() function:

In [60]:
str2= "How much wood would a woodchuck chuck, if a woodchuck could chuck wood?"
# Use re.findall() to find all occurrences of "woo"
matches = re.findall(r"woo", str2)

print(matches)

['woo', 'woo', 'woo', 'woo']
