### APS106 Lecture Notes - Week 3, Lecture 1
# Strings, Strings, Everywhere

### Lecture Structure
1. [Type Conversions](#section1)
2. [String Indexing and Slicing](#section2)
3. [Modifying Strings](#section3)
4. [String Methods](#section4)

<a id='section1'></a>
## Type Conversions
**Convert to str**

The builtin function `str` takes any value and returns a string representation of that value.

In [None]:
str(2)
my_str = str(2) + str(2)
print(my_str)

In [None]:
x = str(42.8)
print(x)     #even though this looks like a float, it's actually a string
#print(type(x))

**Convert to int**

In [None]:
y = int('12345')
print(y)
print(type(y))

In [None]:
print(int('-99'))

In [None]:
print(int('99.9'))

In [None]:
print(int(99.9))  #notice the difference between this and above

If function `int` is called with a string that contains anything other than digits, a `ValueError` happens.

**Convert to float**

In [None]:
print(float('99.9'))

In [None]:
print(int('99.9')) #remember we couldn't do this above, but now that we know the float conversion function works...

In [None]:
#How about this?
print(int(float('99.9')))

In [None]:
print(float('-43.2'))

In [None]:
print(float('453'))

If function `float` is called with a string that can't be converted, a `ValueError` happens.

In [None]:
print(float('-9.9.9'))

<a id='section2'></a>
## 2. str Indexing and Slicing

**Indexing**

An index is a position within the string. Positive indices count from the left-hand side with the first character at index 0, the second at index 1, and so on. Negative indices count from the right-hand side with the last character at index -1, the second last at index -2, and so on. For the string "I Love Cats", the indices are:

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| --- | --- | ---| --- | --- | ---| --- | --- | ---| --- | --- | 
| I |  | L | o | v | e | | C | a | t | s |
| -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 

The first character of the string is at index 0 and can be accessed using square brackets.

In [8]:
s = "I Love Cats"
print(s[0])

I


In [9]:
print(s[1])

 


Negative indices are used to count from the end (from the right-hand side):

In [10]:
print(s[-1])

s


In [11]:
print(s[-2])

t


**Slicing Strings**

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| --- | --- | ---| --- | --- | ---| --- | --- | ---| --- | --- | 
| I |  | L | o | v | e | | C | a | t | s |
| -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 

We can get at more than one character using slicing. A slice is a substring from the start index up to **but not including** the end index. For example:

In [12]:
print(s[0:3])
sub_str = s[0:3]
print(sub_str)

I L
I L


In [13]:
print('this contains something: ', s[6:2:-1])
print('this does not: ', (s[6:2:1])) #reading in wrong direction

this contains something:   evo
this does not:  


What if you want to display all characters from an index to the end of the string, but you don’t want to manually count each character?

There are multiple ways to do this.

1. Find the length of the string. Then you can select characters from the index you want to the end.

In [14]:
print(len(s))

11


In [18]:
print(s[7:len(s)])

Cats


2. Use the default index value which *is* len()! If the index is left empty, the default is the index of the last character.

In [16]:
print(s[7:])

Cats


Similarly, if the start index is omitted, the slice starts from index 0:

In [17]:
print('sliced:', s[:])
print('original:', s)

sliced: I Love Cats
original: I Love Cats


You can also slice using negative indices.

In [19]:
print(s[1:4])

 Lo


In [20]:
print(s[1:-4]) #notice the space before and after in the string below

 Love 


In [22]:
print(s[-11:-8])
print(s[-11:-8:-1])  #reading in wrong direction

I L



In [23]:
print(s[2:])
#print(s)

Love Cats


**Another Example**

In [24]:
x = "Today is: 24/01/2022"

In [25]:
print(x[13:15])

01


In [26]:
date = input("Enter a date (YYYYMMDD): ")

Enter a date (YYYYMMDD): 20240521


How would you extract the month?

In [27]:
month = date[4:6]
print(month)

05


The day?

In [28]:
day = date[6:8]
print(day)

21


The year

In [29]:
year = date[0:4]
print(year)

2024


In [30]:
date = input("Enter a date (YYYYMMDD): ")
year = date[0:4]
month = date[4:6]
day = date[6:8]

print("The date in day/month/year format: " + day + "/" + month + "/" + year)


Enter a date (YYYYMMDD): 20240521
The date in day/month/year format: 21/05/2024


We can slice (select) every nth character by providing three arguments
Uses the syntax [start : finish : step], where:

- start is the index where we start the slice

- finish is the index of one after where we end the slice

- step is how much we count by between each character 

When step is not provided, it defaults to 1

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| --- | --- | ---| --- | --- | ---| --- | --- | ---| --- | --- | 
| I |  | L | o | v | e | | C | a | t | s |
| -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 

In [31]:
print(s)

I Love Cats


In [32]:
print(s[::2])
print(s[2:9:3])

ILv as
Lea


<a id='section3'></a>
## 3. Modifying Strings

The slicing and indexing operations **do not modify** the string that they act on, so the string that the variable refers to is unchanged by the operations above. 

**In fact, we cannot change a string at all**. String variables are *immutable*: that means that they cannot be changed.

Operations like the following result in errors:

In [33]:
x = "This sounds like a bad idea"
x[0:4] = "That"

TypeError: 'str' object does not support item assignment

Given that we start with `s = "I Love Cats"` and we would like change string s to refer to "I Loved Cats". How might we do that?


In [35]:
s = "I Love Cats"
s_new = s[:6] + 'd' + s[6:]
print(s)
print(s_new)

I Loved Cats


Variable `s_new` is assigned to the new string: `s_new = s[:5] + 'ed' + s[5:]`. 

Remember we cannot modify strings. We can only create a new string and change where the original variable points to.

Alternatively, we could have gone directly and written:

In [None]:
s = "I Love Cats"
s = s[:6] +'d' + s[6:]
print(s)

In [36]:
s = 'I Love Cats'
s = s[:2] + 'Like' + s[6:]
print(s)

I Like Cats


Q: In terms of expression evaluation and strings, what is going on in the second line?

We can also use augmented operators if we want to add onto the end.

In [37]:
s = "I Love Cats"
s += " (or not...)"
#s = s + ' (or not...)'
print(s)

I Love Cats (or not...)


<a id='section4'></a>
## 4. str Methods

**Methods**

Remember methods? A method is a function that is applied to a particular object. The general form of a method call is:

`object.method(arguments)`

Similar to the turtle objects we’ve seen, strings are objects. Just like with turtles, there are associated methods that are valid only for those objects, i.e. tina.forward(20), tina.color(“red”)
 
**String Methods**

Consider the code:

In [38]:
white_rabbit = "I'm late! I'm late! For a very important date!"

To find out which methods are inside strings, use the function `dir`:

In [39]:
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


To get information about a method, such as the lower method, use the `help` function:

In [40]:
help(str.lower)

Help on method_descriptor:

lower(self, /)
    Return a copy of the string converted to lowercase.



For many of the string methods, a new string is returned. Since strings are immutable, the original string is unchanged. For example, a lowercase version of the str that white_rabbit refers to is returned when the method lower is called:

In [41]:
print(white_rabbit.lower())

i'm late! i'm late! for a very important date!


In [42]:
print(white_rabbit)  #We just said that strings are immutable, and cannot be changed!

I'm late! I'm late! For a very important date!


In [43]:
small_rabbit = white_rabbit.lower()
print(small_rabbit)
print(white_rabbit)

i'm late! i'm late! for a very important date!
I'm late! I'm late! For a very important date!


`white_rabbit` hasn't changed!

But, if you want it to change, you can reassign the variable

In [44]:
white_rabbit = "I'm late! I'm late! For a very important date!"
print("Before: ", white_rabbit, 'id:', id(white_rabbit))
white_rabbit = white_rabbit.lower()
print("After: ", white_rabbit, 'id:', id(white_rabbit))

Before:  I'm late! I'm late! For a very important date! id: 4414740912
After:  i'm late! i'm late! for a very important date! id: 4436647952


**More `str` methods**

`capitalize`: returns a string with the first letter capitalized.

In [45]:
name = 'joseph'
print("Why, hello there " + name.capitalize() + "!")

Why, hello there Joseph!


In [46]:
name = 'joseph sebastian'
print("Why, hello there " + name.capitalize() + "!")

Why, hello there Joseph sebastian!


In [47]:
scream = 'Why are you screaming?'
print(scream)
print(scream.upper())

Why are you screaming?
WHY ARE YOU SCREAMING?


`rfind(s)` returns the **last** index where the substring `s` is found, or -1 if no such index exists.

In [48]:
str1 = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
str2 = "wood"

In [49]:
print(str1.find(str2))

9


In [50]:
print('Length of string:', len(str1))
print(str1.rfind(str2))

Length of string: 70
65


In the above case we have two strings. The method `rfind()` is applied to find `str2` in `str1`.

If we were to reverse `str1` and `str2`, then there would be no match and hence we would have a result of -1 for the search as shown below.

In [51]:
print(str2.rfind(str1))

-1


In [52]:
print(str1.rfind("Ben"))

-1


`replace`: We can also replace the word, "wood" with something else using the method `replace()`. 

The method `replace(old,new,count)` returns a copy of the string in which the occurrences of `old` have been replaced with `new`, optionally restricting the number of replacements to `count`. (If `count` is not specified, then all of them are replaced.)

In [53]:
help(str.replace)

Help on method_descriptor:

replace(self, old, new, count=-1, /)
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.



In [54]:
str1 = "How old is Seb?"
str2 = str1.replace("Seb","Ben")

In [55]:
print(str1)
print(str2)

How old is Seb?
How old is Ben?


In [56]:
str1 = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
str2 = str1.replace("wood", "steel")
str3 = str1.replace("wood", "steel",3)

In [58]:
print(str1)
print(str2)
print(str3)

How much wood would a woodchuck chuck if a woodchuck could chuck wood?
How much steel would a steelchuck chuck if a steelchuck could chuck steel?
How much steel would a steelchuck chuck if a steelchuck could chuck wood?


<div class="alert alert-block alert-info">
<big><b>This Lecture</b></big>
<ul>  
 <li>str conversions</li>  
 <li>getting inside a string: indexing and slicing </li>  
 <li>you can't modify a string: immutability</li>
    <li>but you can reassign string variables</li>
 <li>string methods</li>
</ul>  
</div>