# Strings

String manipulation is widely regarded as one of Python's strong points. These segments of text contain many rich features, and it is straightforward in Python to pull strings apart and re-combine them in interesting ways.

Working with strings is an important skill in Python. For one, reading data into a program will often take the form of a file containing text, and so we must be proficient in extracting the bits we're interested in. Moreover, we are humans. We do not want the computer to merely spit out a list of numbers after performing a calculation; we want our outputs presented in a way that we can actually read.

Strings are an immutable data type. The operations we perform involve forging new strings from old, rather than modifying the original string.

## Slicing and dicing

To get a "slice" of a string is similar to getting an item from a list (in fact we can also get slices of lists by the same syntax). However, we specify a start and end index, rather than just an index:

In [5]:
a_string = "This is a piece of string"
print(a_string[8:16])

a piece 


We can also leave one end "open"

In [8]:
print(a_string[:4])
print(a_string[5:])

This
is a piece of string


To get the <i>last</i> x letters, we can count backwards with negative numbers:

In [9]:
print(a_string[-6:]) # will get the last 6 letters!

string


Strings can be concatenated (joined together) using the addition operator:

In [11]:
print("Hallo " + "Welt")

Hallo Welt


If we wish to use this technique to include, for instance, numbers, then we must first convert the number into a string, as + has a different meaning for strings and numbers.

In [5]:
year = 2017
print("The year is " + str(year) + "!")

The year is 2017!


These tools already provide a flexible system for string manipulation. Many of the functions that work on sequence-like data will also work on strings. For instance:

In [2]:
print(len("How long is a piece of string?"))

30


In [3]:
print(list("A list of letters"))

['A', ' ', 'l', 'i', 's', 't', ' ', 'o', 'f', ' ', 'l', 'e', 't', 't', 'e', 'r', 's']


## Special characters and escape sequences

Certain special characters are represented with a backslash followed by a letter, called an escape sequence. Python considers this pairing to be a single character, even though it looks like two characters on the screen. For example, a new line is represented with <code>\n</code>.

In [4]:
print("Split a string\nonto two lines")

Split a string
onto two lines


Escape sequences also allow you to use quotation marks inside a string without Python thinking you are closing the string. Finally, if you actually do want to insert a backslash, then <code>\\</code> is the escape sequence to insert a backslash.

A full list of escape sequences can be found here http://www.techpaste.com/2014/06/escape-sequences-python/. Probably you don't know what all of these do. Neither do I. Why not try some out anyway?

## Formatting strings

Many languages, such as C and its derivatives, allow a segment of text to receive inputs using a funny looking syntax in which % signs appear everywhere. Python supports this syntax, and in Python 2 this was the preferred way to modify strings. In Python 3, however, we have the more powerful <code>.format()</code> ability.

Have a good look at this section, as it provides the ideal tools for giving useful, readable outputs. However, string formatting is virtually its own mini-langauge, and is a lot to take in at once. The important thing is to know that Python <i>can</i> do all these things. You can work out the details as and when you need them.

The most basic use of string formatting is to insert data from your program into a string. This can be any data that has a suitable string representation, such as numbers. There are many options for doing this. The first is simply by position:

In [7]:
destination = "St. Ives"
wivescount = 7
poem = "As I was going to {} I met a man with {} wives".format(destination, wivescount)
print(poem)

As I was going to St. Ives I met a man with 7 wives


Notice that we have two <code>{}</code>s and provide two arguments to the format function. The order that we provide the arguments is the order that they appear in the text. We can reference the arguments more explicitly by including the position (this is useful if the same argument will appear several times:

In [12]:
poem = """As I was going to {0} I met a man with {1} wives,
those {1} wives had {1} sacks.""".format(destination, wivescount) # triple quotes allow multi-line paragraphs
print(poem)

As I was going to St. Ives I met a man with 7 wives,
those 7 wives had 7 sacks.


So here we can see the zero'th argument is referenced once; the one'th argument appears 3 times. If we don't want to worry about position, we can use keyword arguments:

In [14]:
poem = """Those {count} sacks had {count} {animals}""".format(animals="cats", count=wivescount)
print(poem)

Those 7 sacks had 7 cats


We can supply <code>.format()</code> with any data structure, and access its items in the usual way:

In [21]:
travellers = ["wives", "sacks", "cats", "kits"]
# access list items
poem = "{0[3]}, {0[2]}, {0[1]} and {0[0]}, how many going to {1}?".format(travellers, destination)
print(poem)

kits, cats, sacks and wives, how many going to St. Ives?


<code>.format()</code> also provides ways to represent floats and large integers.

In [24]:
large_int = 23455453424
print("There were {0:,} travellers to {1}".format(large_int, destination))
# :, adds comma as thousands separator.

There were 23,455,453,424 travellers to St. Ives


In [39]:
correct_answer = 2802
rebuke = "That answer is {0:f} times too big!".format(large_int/correct_answer)
# default (6 decimal places)
print(rebuke)
rebuke = "That answer is {0:.2f} times too big!".format(large_int/correct_answer)
print(rebuke)
# two decimal places
rebuke = "That answer is {0:,.2f} times too big!".format(large_int/correct_answer)
print(rebuke)
# two decimal places and comma separators

That answer is 8370968.388294 times too big!
That answer is 8370968.39 times too big!
That answer is 8,370,968.39 times too big!


Yet another use for string formatting is in aligning text. Text can be left-aligned, right-aligned, or centered. Let's place 3 words on separate lines, with line-width of 30 characters, each with a different alignment:

In [36]:
print( "{:<30}\n{:>30}\n{:^30}".format("right", "left", "center") )

right                         
                          left
            center            


So, this is no more mysterious than <code>"{}\n{}\n{}"</code>, but we use the <, >, ^ characters to show alignment followed by the linewidth.

## More ways to carve up strings

In the Data Structures example video, we met a function called <code>partition()</code> that splits up a string if you provide it with a separator character. There are many many functions on strings that perform similar tasks.

For example, we have splitting and joining, which allow easy conversion from lists to strings and strings to list. A string can be split into indvidual words using the <code>split()</code> function:

In [40]:
listofwords = "These words will form a list".split()
print(listofwords)

['These', 'words', 'will', 'form', 'a', 'list']


Split also will take an argument to specify a different delimiter.

The counter to this is <code>join()</code>, which acts on the character you wish to use a separator, and takes a list as its arguments. This is a faster algorithm for gluing together a bunch of words than repeated use of the + operator, and it can easily be combined with a list comprehension, too.

In [42]:
# just straight up join, good for making a word
# we here use join on an empty string to glue the letters directly
word = "".join(['H', 'e', 'l', 'l', 'o'])
print(word)

Hello


In [43]:
# here we use join with spaces to separate, making a sentence
reunited_words = " ".join(listofwords)
print(reunited_words)

These words will form a list


In [46]:
# more complicated example. the joining string here is a comma followed by new line
# we also use a list comprehension to capitalize each item in the list

shopping_list = ['bread', 'bananas', 'beans', 'beer']
readable_shopping = ",\n".join([item.capitalize() for item in shopping_list])
print(readable_shopping)

Bread,
Bananas,
Beans,
Beer


## Cutting off the beginning or end of a line; the string module

It is somewhat common when working with strings to wish to remove a chunk of text from the beginning or end of a line. As an example, suppose I have copied and pasted a numbered list from the internet, and I wish to remove the numbers. The trouble is, the numbers have different numbers of digits, so I can't just do a straight up slice on each line.

In [52]:
best_python_books = """1. Dive Into Python 3
2. Automate The Boring Stuff With Python
3. Python For Everyone
4. Python Cookbook, 3rd Ed
5. Python For Data Analysis
6. Fluent Python
7. Violent Python
8. Think Python
9. Learn Python The Hard Way
10. Problem Solving with Algorithms and Data Structures Using Python
11. Python Crash Course
"""

The tool that comes to our rescue is <code>.strip()</code>, which takes as its argument a collection of characters as a string. Python will remove those characters from the beginning of a string, until it reaches a character not contained in the argument. If we give it no argument, it just removes whitespace (spaces and tabs). To break the list into separate lines, we'll use <code>.splitlines()</code>, which is like split, but splits at linebreaks.

In [53]:
lines = best_python_books.splitlines()
print(lines)

['1. Dive Into Python 3', '2. Automate The Boring Stuff With Python', '3. Python For Everyone', '4. Python Cookbook, 3rd Ed', '5. Python For Data Analysis', '6. Fluent Python', '7. Violent Python', '8. Think Python', '9. Learn Python The Hard Way', '10. Problem Solving with Algorithms and Data Structures Using Python', '11. Python Crash Course']


Now we need to remove the leading characters. Python provides a useful module called <code>string</code> in its standard library, that contains lots of useful strings, as well as additional functions for working with strings. We want to remove the numbers at the start, so we can say:

In [55]:
import string # gives us a string containing all the numbers
print(string.digits)

0123456789


While this actually requires more keystrokes than simply writing the numbers <code>"0123456789"</code>, the string module contains many other collections of characters like this such as punctuation

In [56]:
print(string.punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [58]:
print(string.ascii_letters)

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ


These strings are useful for checking facts about other strings. For example, this snippet will check if there is any punctuation in a string:

In [59]:
def has_punc(text):
    import string
    for c in string.punctuation:
        if c in text:
            return True
    return False

print( has_punc("Has no punctuation") )
print( has_punc("Has punctuation."))


False
True


Anyway, back to our problem. We want to remove the numbers, full stops and whitespace from the strings in the list. Here we go:

In [64]:
chars_to_remove = ". " + string.digits # make a string containing all bad chars

nice_list = [book.strip(chars_to_remove) for book in lines]

readable_list = "\n".join(nice_list)

print(readable_list)

Dive Into Python
Automate The Boring Stuff With Python
Python For Everyone
Python Cookbook, 3rd Ed
Python For Data Analysis
Fluent Python
Violent Python
Think Python
Learn Python The Hard Way
Problem Solving with Algorithms and Data Structures Using Python
Python Crash Course


Clearly, to use <code>strip()</code>, we have to be pretty confident about the format of our data. If I had a book called "20 Cool Python Programs" in the list, then the "20" part would have been stripped out as well.

## Some quick transformations of strings

The developers of Python kindly include many single word ways to make quick adjustments to strings. We just demonstrate a bunch of them here; their functioning should be self explanatory:



In [68]:
my_string = "Some advanced string theory"

print(my_string.capitalize())
print(my_string.lower())
print(my_string.upper())
print(my_string.swapcase())
print(my_string.title())
print(my_string.replace("advanced", "basic"))

Some advanced string theory
some advanced string theory
SOME ADVANCED STRING THEORY
sOME ADVANCED STRING THEORY
Some Advanced String Theory
Some basic string theory


You can read more about how to use strings here
https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str
and about the string module here
https://docs.python.org/3/library/string.html