# Lecture 8 - Strings 

* Strings - lots of cool things you can do with strings
  * String operators
  * Length function
  * Slicing
  * Immutability
  * String comparison
  * For loops
  * In operator
  * Convenience functions
    * Find
    * Split
    * Join
    * etc.
  * Format method and other ways to do string substitution

# String operators

Python provides some surprising ways to manipulate strings

In [1]:
# You can concatenate strings together
s = "Lets" + "add" + "together" + "strings"

print(s) # Note it just puts them one after the other 
# (i.e. it doesn't do any whitesppace addition)

Letsaddtogetherstrings


In [2]:
s = "Hello" * 10 # The multiplication operator allows you to make a 
# a sequence of strings

print(s) 

HelloHelloHelloHelloHelloHelloHelloHelloHelloHello


In [3]:
# Note this doesn't work

s = "You can't" - "subtract strings" # What would this even do?

TypeError: unsupported operand type(s) for -: 'str' and 'str'

In [4]:
# Nor does this

s = "You can't" / "divide strings either"

TypeError: unsupported operand type(s) for /: 'str' and 'str'

# Length function

The length of a string is given by the len() function

In [1]:
s = "A long string"

len(s)

13

In [6]:
s = "" # The empty string case

len(s)

0

# Selecting Characters from a String



In [1]:
s = "A long string"

s[0] # Let's select the first character

'A'

In [8]:
s[1] # The second character

' '

In [9]:
s[100] # Trying to address a character beyond the length of the string creates 
# an error

IndexError: string index out of range

**Quick but extremely vital aside: Python indexes sequences from 0 (zero-based coordinates)**

So for a string s of length n, s[0] is the first character, s[1] is the second, etc., and s[n-1] is the last

This is true for strings and all sequences, including lists (as we saw). 

This is consistent with functions like range() and the previous for loop iterations we saw.

In [8]:
# It is also important to realise that a character is just another string in Python

# In some languages, like C/C++, individual characters are not strings but have a different
# type, but Python being a high-level language just treats them as a single character string

print(s)
print(s[0])
print(s[0][0][0][0][0][0][0][0][0][0][0][0][0][0])

A long string
A
A


In [11]:
len(s[0]) # It is a string with length 1

1

# Slices

You will often find you want to work with substrings: sub-portions of a string. Python is really nice for this.

In [9]:
# Beyond indexing single characters, you can slice strings to create substrings

s = "A long string"

s[0:6] # The 'prefix' substring of the first 6 characters

'A long'

Slices use zero-based half open interval coordinates, so for a slice s[x:y], x is the first indexed character

In [13]:
# Zero length case

s = "A long string"

s[6:6] # The interval [6, 6) is empty

s[6:6]


''

In [14]:
# Negative length strings?

s = "A long string"

s[6:0] # If the second index occurs before the first index it won't
# throw an error, just make a zero length (empty) string

''

In [15]:
# Python also gives you useful shorthand where you omit the range 

s[:6] # This is the same as s[0:6]

# s[:n] is called a prefix of s

'A long'

In [16]:
s[6:] # This is the same as s[6:13] or s[6:len(s)]

# s[n:] is called a suffix of s

' string'

In [7]:
s[:] 
# This is just s[0:len(s)], ie the whole string

'A long string'

In [18]:
s[::2] # This is every second character! (step of 2)

'Aln tig'

# Challenge 1

In [19]:
s = "A long string"

# Write down an expression that concatenates two slices of s to get "long ring"

s[2:7]+s[9:]


'long ring'

# Negative slicing coordinates

In [20]:
# Negative coordinates let you slice from the other end of the string
# (it's surprising how often this proves to be useful)

s = "A long string"

s[-1] # This is the last character of s

'g'

In [21]:
s[-2] # The second to last

'n'

In [22]:
s[-100] # This throws an error, because it implies a character before 
# the start of the string

IndexError: string index out of range

In [23]:
# You can also slice using negative coordinates:

s[:-1] # Get the n-1 prefix 

'A long strin'

In [24]:
s[-2:-1] # Get the penultimate character

'n'

# Challenge 2

In [22]:
s = "A long string"

# Give a slice of s that reverses s ! (hint: try a negative step)
s[::-1]




'gnirts gnol A'

# Immutability

Strings are immutable - that is you can't edit a string, you can only make new strings by copying them.

In [26]:
x = "Strings can't be changed"

# This doesn't work

x[0] = 's'


TypeError: 'str' object does not support item assignment

In [27]:
x = "Strings can't be changed"

# This doesn't work

#x[0] = 's'

# To make s lower case you could instead do:

x = 's' + x[1:]

print(x)

strings can't be changed


Making things immutable has some nice, simplifying properties.

Notably, immutable data is easy to share across different parts of a program, because we are guaranteed that one bit of the code can't change the data and cause unexpected behaviour in another part of the program that was not expecting these changes. 

Ints, floats, booleans and strings are all immutable in Python.

# String comparison


In [28]:
# We saw this already, but Python compares strings lexicographically

x = "Aardvarks"
y = "Apples"

x < y # This is true, because Aardvarks is before (less than) Apples in the dictionary

True

In [29]:
x == "aardvarks" # This is false because string comparison is case sensitive

False

In [30]:
x.lower() == "aardvarks" # The call to .lower() changes the string to lower case

True

# In operator

We can easily search within a string to find if it contains a given substring:

In [31]:
s = "once upon a time there lived a wicked teacher"

s2 = "once upon a time"

s2 in s

True

In [32]:
# You can also use 'not in'

s2 not in s


False

# For loops on strings

In [33]:
# You can easily iterate through the characters in a string
# using a for loop:

alphabet = "abcdefghijklmnopqrstuvwxyz"
for i in range(0, len(alphabet)):
  print("The next letter is:", alphabet[i])

The next letter is: a
The next letter is: b
The next letter is: c
The next letter is: d
The next letter is: e
The next letter is: f
The next letter is: g
The next letter is: h
The next letter is: i
The next letter is: j
The next letter is: k
The next letter is: l
The next letter is: m
The next letter is: n
The next letter is: o
The next letter is: p
The next letter is: q
The next letter is: r
The next letter is: s
The next letter is: t
The next letter is: u
The next letter is: v
The next letter is: w
The next letter is: x
The next letter is: y
The next letter is: z


In [34]:
# Or better:

for i in alphabet:
  print("The next letter is:", i) # At iteration of the loop we get the next 
  # character in the string

The next letter is: a
The next letter is: b
The next letter is: c
The next letter is: d
The next letter is: e
The next letter is: f
The next letter is: g
The next letter is: h
The next letter is: i
The next letter is: j
The next letter is: k
The next letter is: l
The next letter is: m
The next letter is: n
The next letter is: o
The next letter is: p
The next letter is: q
The next letter is: r
The next letter is: s
The next letter is: t
The next letter is: u
The next letter is: v
The next letter is: w
The next letter is: x
The next letter is: y
The next letter is: z


# Challenge 3

In [4]:
s = "this is a test"

# Use two nested loops and string slices to print out all possible non-zero length substrings of s 
# (a substring is any sequence of contiguous characters in the string)

count = 0
for start in range(0,len(s)):
    for end in range(start, len(s)):
        print(s[start:end+1])
        count += 1
        
count


t
th
thi
this
this 
this i
this is
this is 
this is a
this is a 
this is a t
this is a te
this is a tes
this is a test
h
hi
his
his 
his i
his is
his is 
his is a
his is a 
his is a t
his is a te
his is a tes
his is a test
i
is
is 
is i
is is
is is 
is is a
is is a 
is is a t
is is a te
is is a tes
is is a test
s
s 
s i
s is
s is 
s is a
s is a 
s is a t
s is a te
s is a tes
s is a test
 
 i
 is
 is 
 is a
 is a 
 is a t
 is a te
 is a tes
 is a test
i
is
is 
is a
is a 
is a t
is a te
is a tes
is a test
s
s 
s a
s a 
s a t
s a te
s a tes
s a test
 
 a
 a 
 a t
 a te
 a tes
 a test
a
a 
a t
a te
a tes
a test
 
 t
 te
 tes
 test
t
te
tes
test
e
es
est
s
st
t


105

# Examples of functions processing strings

In [2]:
# We can use loops to do neat processing to strings

def remove_vowels(s): 
  """Remove vowels from a string
  """
  vowels = "aeiouAEIOU"
  r = ""
  for x in s: # For each character in s
    if x not in vowels: # If not a vowel
      r += x # This makes a new string
  return r

remove_vowels("compsci")


'cmpsc'

In [37]:
# Search for first instance of a character string

def find_character(s, ch):
  """
  Find the first occurrence of a given character
  in a string s and return the position, 
  otherwise if not present, return -1
  """
  for i in range(len(s)):
    if s[i] == ch:
      return i
  return -1

find_character("once upon a time", 'u')

5

# Convenience functions

There are several useful functions on strings that Python provides, here's a non-exhaustive look:



**find**

In [38]:
# Find generalizes the find_character method above to search for substrings

s = "once upon a time there lived, a time"

s2 = "a time"

s.find(s2) # Find first instance of s2 in s
  
  

10

**split**

In [39]:
# Split is useful for splitting strings into "tokens"

s = "once      upon\ta time\nthere lived"

s.split()

['once', 'upon', 'a', 'time', 'there', 'lived,', 'a', 'time']

In [40]:
# You can use it to split on specific characters, 
# consider comma separated data (csv data):

s = "0.5,0.9,17,20"

s.split(",") # ',' is used as the split character

['0.5', '0.9', '17', '20']

In [41]:
# You can also do this with tabs (e.g. tsv data)

s = "0.5\t0.9\t17\t20"

s.split("\t") # a tab is used as the split character

['0.5', '0.9', '17', '20']

**join**

In [4]:
# Join lets you concatenate a sequence of strings

l = ['once', 'upon', 'a', 'time', 'there', 'lived']

" ".join(l) # the string " " is used as the joining sequence

'once upon a time there lived'

In [7]:
# This therefore works too

",".join(l)

'once,upon,a,time,there,lived'

**case changing functions**

In [44]:
s = "once upon a time there lived"

s.upper() # When you feel like shouting

'ONCE UPON A TIME THERE LIVED'

In [45]:
s = "SHOUTING" 

s.lower() # The opposite

'shouting'

**lots more...**

Python strings provide lots of good functions, see:

https://docs.python.org/3/library/string.html

In [3]:
dir('')

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [13]:
"foo".swapcase

<function str.swapcase>

In [12]:
help("asd".swapcase)

Help on built-in function swapcase:

swapcase(...) method of builtins.str instance
    S.swapcase() -> str
    
    Return a copy of S with uppercase characters converted to lowercase
    and vice versa.



# Format method

As an alternative to using f-strings, Python provides the format method. If you are using an older version of Python you may have to use this method instead, so it is worth knowing.

In [47]:
# Format

s = """Dear {},\nI'd like to invite you to a {}, 
which will be held at {} on {}.\n"""

print(s)

print(s.format("John", "party", "my house", "Friday"))

print(s.format("Kishwar", "seance", "the haunted castle", "Monday at midnight"))

Dear {},
I'd like to invite you to a {}, which will be held at {} on {}.

Dear John,
I'd like to invite you to a party, which will be held at my house on Friday.

Dear Kishwar,
I'd like to invite you to a seance, which will be held at the haunted castle on Monday at midnight.



In [3]:
# You can use named arguments

s = """
Dear {name},
I'd like to invite you to a {event}, 
which will be held at {place} on {time}.\n
"""

print(s.format(name="Xian", place="the library", time="Monday morning", 
               event="fun study section"))


Dear Xian,
I'd like to invite you to a fun study section, 
which will be held at the library on Monday morning.




In [49]:
# You can also use numbered arguments

s = "{1} {0}"

s.format("world", "hello") # This uses positional ordering to match the numbered arguments

'hello world'

In [50]:
# Which allows you to reuse the arguments

"{0}{1}{0}".format("abra", "cad")

'abracadabra'

There is lots more to learn about f-strings and the format method, see: 

https://docs.python.org/3/library/string.html

In [51]:
# One final note, Python also supports an older style string formatting
# which uses C like syntax for formatting arguments

s = "Dear %s,\nI'd like to invite you to a %s, which will be held at %s on %s.\n"


#print(s) 

print(s % ("John", "party", "my house", "Friday"))

Dear John,
I'd like to invite you to a party, which will be held at my house on Friday.



You can use f-strings, the format method or the old style, but f-strings seems more flexible and powerful, and are less likely to get deprecated in future Python versions, IMO.

# Challenge 4

In [18]:
s = """Hi {name}, 
I regret to inform you that your {noun} has been {verb}.
Apologies for the inconvenience,\nbest regards\n{you}"""

# Use string formatting to customize the above string however you like. Print the result to the screen.
# Try using order and named arguments


Hi u, I regret to inform you that your cat has been rescued.
Apologies for the inconvenience,
best regards
fire dept


# Challenge 5

In [20]:
# Complete the format command to print '0005.69'
print("{}".format(5.6872))

0005.69


# Challenge 6

In [10]:
# Complete the function:

def how_many_occurrences(s, s2):
  """
  Returns the number of times s2 occurs as a substring of s
  """
  # CODE TO WRITE

how_many_occurrences("mississsippi", "ss") # Correct answer should be 3 

# PS: yes, I know mississsippi (sic) is spelled wrong, 
# just to make you think about overlapping occurrences

# Reading

* Read chapter 8: http://openbookproject.net/thinkcs/python/english3e/strings.html


# Homework

* ZyBook Reading 8
* Go to Canvas and complete the lecture quiz, which involves completing each challenge problem
