<table class="table table-bordered">
    <tr>
        <th style="text-align:center; width:35%"><img src='https://drive.google.com/uc?export=view&id=1zIB3Nw_z8N2SJSSdd2yWQIsDS0MGPYKm' style="width: 300px; height: 90px; "></th>
        <th style="text-align:center;"><h3>IS111 - Notebook 6</h3><h2>Strings</h2></th>
    </tr>
</table>

## Learning Outcomes

At the end of this lesson, you should be able to:
<ul>
    <li> Access individual characters in strings using <code>[ ]</code>.</li>
    <li> Understand what an <code>IndexError</code> is.</li>
    <li> Understand that strings are immutable.</li>
    <li> Extract substrings using slicing.</li>
    <li> Use <code>in</code> to check if a character or a string is inside another string.</li>
    <li> Use a <b>for-loop</b> to iterate through the characters inside a string.</li>
    <li> Use the following string methods: <code>upper()</code>, <code>lower()</code>, <code>find()</code> and <code>split()</code>.</li>
</ul>

In this notebook we look at the `str` (string) data type in more detail.

## I. Review of String Basics

Recall that a string is a sequence of characters. We usually create strings simply by enclosing characters in single quotes or double quotes. We can concatenate two strings using the `+` operator.

Below are some sample code to remind you of the basic usage of strings.

In [2]:
s1 = 'Hello World!'
s2 = "Fun with Python!"
s3 = "Today is Ben's birthday."
s4 = 'The title of the book is "Peter Pan".'
s5 = s1 + ' ' + s2

print(s1)
print(s2)
print(s3)
print(s4)
print(s5)

Hello World!
Fun with Python!
Today is Ben's birthday.
The title of the book is "Peter Pan".
Hello World! Fun with Python!


Also recall that we can use `==` and `!=` to compare whether two strings are the same or not. The sample code below shows some examples:

In [3]:
answer = input("Do you want to play a game? (Please answer Yes or No.) ")
if answer == "Yes":
    print("Let's start the game.")
else:
    print("Good-bye!")

Do you want to play a game? (Please answer Yes or No.) Yes
Let's start the game.


## II. Accessing Individual Characters of a String

Previously we have seen that for tuples, which are also sequences, we can use square brackets `[ ]` to access individual elements inside.

Strings are also sequences. Similarly, we can also use square brackets to access the individual characters inside a string.

Upon creation, each character of a string is placed at an <b>index</b> starting from 0 and ending at the length of the string minus 1, as shown in the picture below for the string "MY NAME IS":

<img align="left" src='https://drive.google.com/uc?export=view&id=0B3N0YMArBdSBdUF2b2dkS1dMS0U' style="width: 1000px">

Based on the picture below, what do you think the following code will print?

In [4]:
s = 'MY NAME IS'
print(s[3])
print(s[6])

N
E


### Length of a String

To find out the length of a string, we can use the `len()` built-in function from Python. For example, we can use the following code to get the length of the string `s` defined above:

In [5]:
l = len(s)
print(l)

10


### IndexError

What if we use an index that is outside of the range of valid indices for a string?

In [6]:
print(s[10])

IndexError: string index out of range

We can see that the code above generates an `IndexError`. This is because we try to access the character with index 10 within s, but the largest index for s is 9.

### Strings Are Immutable

Similar to tuples, strings are also immutable, which means we cannot change any character inside a string.

Run the following code and see what happens:

In [7]:
s[0] = 'm'

TypeError: 'str' object does not support item assignment

We can see the same error message that says `'str' object does not support item assignment`. This means elements inside a string (i.e., characters) cannot be modified.

## III. Slicing to Extract Substrings

Besides accessing a single character from a string, we can also use a special operation called <b>slicing</b> to access a consecutive sequence of characters within a given string, or in other words, a substring of a given string. To do this, we simply need to specify the beginning and the end indices of the substring to be extracted.

Specifically, we can use `[m:n]` on a string to return the part of the string from the <b>`m`'th character</b> to the <b>`n`'th character</b>, <u>including</u> the `m`'th character but <u>excluding</u> the `n`'th character.

The following code shows some examples of slicing:

In [8]:
fruit = "mango"

substring1 = fruit[0:1]
substring2 = fruit[0:2]
substring3 = fruit[2:5]

print(substring1)
print(substring2)  
print(substring3) 

m
ma
ngo


### Step

Besides the start and the end indicies (`m` and `n` in the description above), optionally, we can also provide a third number called a <b>step</b> to specify how many "steps" to skip ahead when extracting the characters.

For example, by `s[0:10:2]`, we are extracting the characters inside the string `s` at positions `0`, `2`, `4`, `6` and `8`.
By `s[0:10:3]`, we are extracting the characters inside the string `s` at positions `0`, `3`, `6` and `9`.

Run the code below to understand how step works:

In [9]:
s = 'abcdefghijklmn'

print(s[0:10])
print(s[0:10:2])
print(s[0:10:3])

abcdefghij
acegi
adgj


### Negative step

Step could be a negative integer. However, when you use a negative step value, your `m` and `n` (i.e., start and end indices) need to be reversed, that is, `m` should be greater than `n`, because you are traversing the string backwords. Again, character at position `m` is <u>included</u> but character at position `n` is <u>excluded</u>.

See the example below:

In [9]:
s = 'abcdefg'
print(s[6:0:-1])
print(s[6:0:-2])
print(s[::-1])

gfedcb
gec
gfedcba


Note that if you want to <b>include</b> the character at position 0, you can leave out the value for `n` when using a negative step, as shown below:

In [19]:
s = 'abcdefg'

print(s[6::-1])
print(s[6::-2])

gfedcba
geca


## IV. Reverse a String

The most common use of a negative step is to reverse a string. If we'd like to reverse a string, we can apply <b>[::-1]</b> to the string, as shown in the following code. By not providing the values of the start and the end indices, we use the default values of start and end, and in this case, slicing will take the entire string and skip backwords to extract the characters one by one.

In [14]:
word = "stressed"
reversed_word = word[::-1]

print(word)
print(reversed_word)

stressed
desserts


## V. Use `in` to Check Existence of a Character or a Substring inside a String

If we'd like to check if a character or a string appears in another string, we can do the following:

In [15]:
s = 'abcdefg'

if ('a' in s):
    print("'a' is in the string.")

if ('z' in s):
    print("'z' is in the string.")

if ('ab' in s):
    print("'ab' is in the string.")

if ('ac' in s):
    print("'ac' is in the string.")

'a' is in the string.
'ab' is in the string.


Here `in` is an operator that checks if the string on the left can be found in the string on the right.

## VI. Iterate through a String

We are now going to learn a very useful kind of statements in Python: <b>for-loops</b>. We will start by using a for-loop to iterate through a string.

Take a look at the following code and observe what it does:

In [16]:
my_str = "SMU SIS"
for ch in my_str:
    print("current value of ch: " + ch)

current value of ch: S
current value of ch: M
current value of ch: U
current value of ch:  
current value of ch: S
current value of ch: I
current value of ch: S


Basically, Line 2 above sets up a <b>for-loop</b> that goes through all the characters inside the string called `my_str` one by one, using the variable `ch` to store these characters. At any point of time, `ch` stores a particular character inside `my_str`. Line 3 is the <b>body</b> of the for-loop. It defines what is to be done with the current value of `ch`.

You can see that although we have only a single line of `print()` statement in Line 3, the code has multiple lines of output. In other words, Line 3 is being executed <b>repeatedly</b>. This is the effect of a for-loop.

<img align="left" src='https://drive.google.com/uc?export=view&id=0B08uY8vosNfobDBuOXVXQWVxMFE' style="width: 60px; height: 60px;"><br />Let's do an exercise !

Modify the for-loop below such that the characters inside `my_str` is printed out one by one, separated by spaces, but whenever the letter `'S'` is encountered, an asterisk (`*`) is used to replace it. So for the code below, when you run it, you should see

`* M U   * I * `  

as the output.

In [17]:
my_str = "SMU SIS"
for ch in my_str:
    # Modify the code below to produce the output you see above.
    print(ch, end=' ')

S M U   S I S 

## VII. String Methods

There are many other pre-defined actions we can perform on strings. These are often in the form of string <b>methods</b>.

Method is a concept very similar to function. We have introduced functions before. We know that a function defines a sequence of actions that collectively perform a task. Methods are similar to functions except that methods must be applied to a piece of data of a particular data type. For example, methods defined for the `str` data type have to be applied to a `str` value.

The example code below shows how a method called `lower()` can be used:

In [18]:
name = "Singapore Management University"
name_in_lowercase = name.lower()

print(name)
print(name_in_lowercase)

Singapore Management University
singapore management university


We can see that to call a method we need to use a dot (`.`) to connect the string that the method is applied to and the name of the method followed by a pair of parentheses. 

Methods can also take arguments, in the same way as functions taking arguments. Similarly, methods can return values just like functions.

In the example above, the method `lower()` returns a new string that is the original string changed into lowercases.

The table below shows a few commonly used useful methods for strings:

<table class="table table-bordered">
<tr>
<th style="text-align:center; width:10%">Method</th>
<th style="text-align:center; width:17%">Usage</th>
<th style="text-align:center; width:45%">Description</th>
<th style="text-align:center;">Example</th>
</tr>
<tr>
<td style="text-align:center;"><b>upper</b></td>
<td style="text-align:center;">upper()</td>
<td style="text-align:left;">Returns a <b>copy</b> of the string with all its characters in uppercase.</td>
<td>'Hello World'.upper()<br/>returns 'HELLO WORLD'</td>
</tr>
<tr>
<td style="text-align:center;"><b>lower</b></td>
<td style="text-align:center;">lower()</td>
<td style="text-align:left;">Returns a <b>copy</b> of the string with all its characters in lowercase.</td>
<td>'Hello World'.lower()<br/>returns 'hello world'</td>
</tr>
<tr>
<td style="text-align:center;"><b>find</b></td>
<td style="text-align:center;">find(substr)</td>
<td style="text-align:left;">Returns the <b>lowest index</b> in the string where substring <b>substr</b> is found. Returns <b>-1</b> if <b>substr</b> is not found.</td>
<td>'banana'.find('an')<br />
returns 1<br /><br />
'banana'.find('e')<br />
returns -1
</td>
</tr>
<tr>
<td style="text-align:center;"><b>split</b></td>
<td style="text-align:center;">split([sep])</td>
<td style="text-align:left;">Returns a list of the words in the string, using <b>sep</b> as the delimiter string. If the delimiter <b>sep</b> is not provided, then the delimiter by default is a space.</td>
<td>'Rain is coming'.split()<br />
returns ['Rain', 'is', 'coming']<br /><br />
'one&#8727;two&#8727;three&#8727;four'.split('&#8727;')<br />
returns ['one', 'two', 'three', 'four']</td>
</tr>
</table>


Run the code below to observe how `find()` and `split()` work:

In [19]:
s = 'SMU SIS'
index = s.find('M')
print(index)

1


We see that the code prints `1` because `M` appears at position `1` inside the string `s`.

In [20]:
s = 'SMU School of Information Systems'
print(s.split())

['SMU', 'School', 'of', 'Information', 'Systems']


We see that `s.split()` above returns a <b>list</b> of strings. We will learn about lists in a later week.

### String Methods Do Not Modify Strings

You can see that the methods `lower()` and `upper()` create copies of the original strings with the necessary changes. They cannot modify the original strings, because strings are immutable.