#### GISC 420 T1 2022
# Strings and string manipulations
This notebook supplements material in Chapter 8 of [Think Python](https://greenteapress.com/wp/think-python-2e/).

Strings are a commonly encountered data type in programming, that is very different from numerical data types. As such we need a different set of operations to manipulate them.

## Strings are sequences
As far as a Python is concerned a `string` is a *sequence* of *characters*. We can access subsequences of a string using an index, in square brackets.

In [None]:
s = 'Hello world'
s[1]

The number inside the `[]` is an *index value* which picks out a substring of `'Hello world'`. The single most important thing about indexes in Python is that **they start at 0, _not_ at 1**! The first character in a string is at index position 0:

In [None]:
s[0]

The value inside the `[]` can be an expression

In [None]:
s[2 ** 2]

The index must be an `integer` or it is an error

In [None]:
s[1.5]

## Loops and stepping through a string
We can get the length of a string using the `len()` function and can use this to step through the characters in a string:

In [None]:
i = 0
while i < len(s):
    print(s[i])
    i = i + 1

That works fine, but it is not considered _pythonic_. It's essentially borrowed from other languages. Python provides a much more expressive way to iterate over the characters in a string. So, this is the _right way_ to iterate over the letters in a string is using the `for ... in ...` construct:

In [None]:
for character in s:
    print(character)

The format of the `for` loop is `for <item> in sequence`, and as we shall see `strings` are just one example of a sequence in Python.

In [None]:
# Another example
alphabet = 'abcdefghjklmnopqrstuvwxyz'
for letter in alphabet:
    print(letter)

Sometimes you want _both_ the index position and the value of each element in a sequence. It's tempting to use `for index in range(len(sequence))` for this, but the pythonic approach is much nicer and uses `enumerate`. This produces a series of pairs inside the loop, the first being the index position, the second the value:

In [None]:
for i, letter in enumerate(alphabet):
    print(f"Letter {i + 1} in the alphabet is {letter}")

## Slices from sequences
We can index more than one element in a sequence, using *slice* operations, to extract a segment from a string like this

In [None]:
s[0:5]

In [None]:
s[6:11]

The index before the colon is the first character in the slice, while the second index is the one *after* the last one included. This is a bit confusing, but you do get used to it. It is really a side-effect of starting index values from 0, and also means that **the length of the slice you get is equal to the difference between the two index values** (check this in the examples above). It also means that

In [None]:
s[0:len(s)]

returns the full string.

If the index before the `:` is missing then `0` is assumed and if the index after the `:` is missing then `len(s)` is assumed.

In [None]:
s[:5]

In [None]:
s[6:]

In [None]:
s[:]

Here is an example of the kind of thing you can do with this

In [None]:
i = 0
s = 'ACROSTIC'
while i < len(s):
    print(s[i:] + s[:i])
    i = i + 1

### More slice-y goodness
You can also count back from the end of a string with negative indices

In [None]:
s[-1]

In [None]:
s[-2]

In [None]:
s[:-4]

In [None]:
s[-3:]

## String functions
Strings can be manipulated with a large number of built in functions. Some of these are shown below.

In [None]:
s = 'The quick brown fox jumps over the lazy dog'

s.upper()

In [None]:
s.lower()

In [None]:
s.replace('dog', 'cat')

In [None]:
s.find('a')

In [None]:
s.count('o')

In [None]:
s.split()

The last of these splits the string into a `list` of strings. By default it does this at whitespace. We can also specify a split string:

In [None]:
s.split('o')

A list is another kind of sequence, which will we look at in another notebook.

## Strings are _immutable_
Finally it is important to note that a string once created cannot have the value of characters changed. Strings are said to be _immutable_. This means that if you want to change a string, you have to assign the result of any of the above functions back to the string itself. None of the above functions have changed `s`:

In [None]:
s

We can't 'reach in' to the string and change a character:

In [None]:
s[0] = "T"

But if we reassign the result of a function to the same name, then we can 'update' it. This can lead to some performance issues if you are doing a lot of it, so be careful!

In [None]:
s = s.capitalize()
s