# Python Strings

## Working with Strings

A *string* variable in Python can be created with the assignment operator *=* and by enclosing the text in either single or double quotes. Either is fine, but the start and end quotes must be the same!

In [1]:
cityname = "Dublin"
address = 'University College Dublin, Belfield'

Strings that span multiple lines can be defined in several different ways. One way is to include a backslash character \ as the last character on the line:

In [2]:
s1 = "This is all \
part of the same string"

Alternatively, we can use triple quotes (""" or ''') to enclose a multi-line string:

In [3]:
s2 = """
This also has multiple lines
defined in a different way
"""

Strings can be viewed as lists of individual characters. We can apply many standard list operations and functions to strings, such as accessing individual characters or slicing.

In [4]:
cityname[0]

'D'

In [5]:
address[0:3]

'Uni'

We can use the built-in *len()* function to check the length of a string (i.e. how many characters it contains):

In [6]:
len(cityname)

6

Strings can be concatenated together using + operator:

In [7]:
address + ", " + cityname

'University College Dublin, Belfield, Dublin'

**Special characters**: Backslashes are used to introduce special characters. This use of backslashes is referred to as an *escape sequence*. For instance '\t' represents a tab character and '\n' represents a newline character.

In [8]:
s = "\tIreland\n\tGermany"

In [9]:
print(s)

	Ireland
	Germany


## String Functions

Strings have a range of associated functions to perform basic operations on them. Note: These functions make a copy of the original string, they don not change the original!

For example, you can change the case of the characters in a string:

In [10]:
country = "Ireland"
country.upper()  # make a copy of country, but in upper case characters

'IRELAND'

In [11]:
country.lower() # make a copy of country, but in lower case characters

'ireland'

We can remove trailing whitespaces characters (spaces, tabs and line breaks) from strings with the *strip()* function:

In [12]:
s = "Hello World   "
s.strip()

'Hello World'

Strings have simple functions for finding and replacing characters substrings (i.e. strings contained within other strings). We can search for the index (position) of a substring within a longer string using the *find()* function:

In [13]:
s.find("World")  # what index does that substring 'World' start at?

6

If we try to find a substring that does not exist within another string, we get an index of -1.

In [14]:
s.find("bye")  

-1

We can count number of times a character or substring occurs in a longer string using the *count()* function:

In [15]:
s.count("o")

2

We can also replace characters or substrings in another string. Note: The *replace()* function makes a copy of the string, it does not change the original string.

In [16]:
rep = s.replace("l","z")
print( s )    # original string
print( rep )  # the new copy, where replacements were made

Hello World   
Hezzo Worzd   


We can separate a single string into a list of one or more strings based on a *delimiter* (a separator character or string) using the *split()* function:

In [17]:
names = "lisa;john;alex;alice"
names.split(";")  # split this string based on the ; character

['lisa', 'john', 'alex', 'alice']

In [18]:
words = "the python programming language"
words.split(" ")  # split the string based on the space character

['the', 'python', 'programming', 'language']

In reverse, we can merge a list containing multiple strings into a single string using the *join()* function, where the values from the list are separated by a specified character or string.

In [19]:
l = ["dublin","cork","galway"]
" & ".join( l )   # note, the function is called on the separator string!

'dublin & cork & galway'

In [20]:
initials = ["J", "R", "R"]
".".join( initials )

'J.R.R'

## Converting Between Types

Recall mixing incompatible types is not permitted in Python, so trying to concatenate a string and a number will give an error message.

Instead, we use conversion functions to change a value between basic types in Python. Use the built-in *str()* function to convert any variable to a string:

In [21]:
# convert an integer to a string
str(30)

'30'

We can convert string values to other types using various built-in functions - most commonly *int()* to convert to an integer and *float()* to convert to a floating point (real) value:

In [22]:
int("3500")

3500

In [23]:
float("5.04")

5.04

Obviously not all strings will be suitable for conversion to a number, and will result in an error message:

In [24]:
int("UCD")

ValueError: invalid literal for int() with base 10: 'UCD'

## Formatting Strings

Python contains a number of different approaches for formatting strings. In this notebook we discuss the traditional approach from Python 2, and two newer approaches introduced in more recent versions of Python.

### Traditional String Formatting

traditional type of string formatting makes use of the **%-operator**. This approach was common in Python 2 and is still supported in Python 3.

With this approach, special placeholder codes are used when building a format string. Each placeholder should correspond to the type of the value that will replace it. For numbers, we can use *format specifiers* to change the precision:
- %d: integer
- %f: floating point, with default precision
- %.Nf: floating point to *N* decimal places)
- %s: a string (or any value)

In [25]:
"%s and %s and %s" % ("one", "two", "three")

'one and two and three'

In [26]:
"%d and %d and %d" % (1, 2, 3)

'1 and 2 and 3'

In [27]:
"%.2f and %.2f and %.2f" % (1, 2, 3)

'1.00 and 2.00 and 3.00'

In [28]:
title = "News of the World"
lead_actor = "Tom Hanks"
year = 2020
rating = 6.879
"The %d movie %s starring %s has an IMDB rating of %.1f" % (year, title, lead_actor, rating)

'The 2020 movie News of the World starring Tom Hanks has an IMDB rating of 6.9'

In [29]:
scores = [0.8914, 0.9142, 0.0343, 0.2313]
print("%.1f, %.1f, %.1f, %.1f" % (scores[0], scores[1], scores[2], scores[3]))
print("%.2f, %.2f, %.2f, %.2f" % (scores[0], scores[1], scores[2], scores[3]))
# add zeros in front
print("%08.3f, %08.3f, %08.3f, %08.3f" % (scores[0], scores[1], scores[2], scores[3]))
# add spaces in front
print("%8.3f, %8.3f, %8.3f, %8.3f" % (scores[0], scores[1], scores[2], scores[3]))

0.9, 0.9, 0.0, 0.2
0.89, 0.91, 0.03, 0.23
0000.891, 0000.914, 0000.034, 0000.231
   0.891,    0.914,    0.034,    0.231


### Formatting with str.format()

However, once you start using several parameters and longer strings, the previous approach becomes difficult to read.

Python 3 introduced a new approach for string which removes the requirement for the %-operator. Formatting is now handled by calling the **.format()** function on a string value.

For a comprehensive set of examples for this formatting syntax, see https://pyformat.info/

With *str.format()*, the placeholder fields are marked by curly brackets:

In [30]:
name = "Alice"
"Hello, {}".format(name)

'Hello, Alice'

In [31]:
s = "{} and {} and {}"
s.format("one", "two", "three")

'one and two and three'

In [32]:
s.format(1, 2, 3)

'1 and 2 and 3'

We can reference variables in any order by referencing their index:

In [33]:
name, age = "Bob", 25
when = "today"
"{2} is {1} years old {0}.".format(when, age, name)

'Bob is 25 years old today.'

We can also added named placeholders:

In [34]:
'Capital of {country}: {city}, {country}'.format(city='Dublin', country='Ireland')

'Capital of Ireland: Dublin, Ireland'

We can format or pad numbers as with the previous approach, using placeholders and *format specifiers*:

In [35]:
'{:d}'.format(42)

'42'

In [36]:
# floating point, 4 decimal places
'{:.4f}'.format(42)

'42.0000'

In [37]:
x = 13.543646
# format to 2 decimal places
print("{:.2f}".format(x))
# output should have at least 6 characters, with 2 decimal places
print("{:06.2f}".format(x))

13.54
013.54


We can left, right or centre align strings:

In [38]:
# align left
"{:12}".format("Dublin")

'Dublin      '

In [39]:
# align right
"{:>12}".format("Dublin")

'      Dublin'

In [40]:
# centre the string
"{:^12}".format("Dublin")

'   Dublin   '

We can also truncate long strings to make the shorter:

In [41]:
"Word: {:.5}".format("xylophone")

'Word: xylop'

### Formatting with F-Strings

A third approach to string formatting was introduced in Python 3.6, which is called formatted string literals or *f-strings*. 

Here we prefix the formatting string with 'f':

In [42]:
name = "Alice"
f"Hello, {name}!"

'Hello, Alice!'

In [43]:
# be careful with quote characters
f"{'one'} and {'two'} and {'three'}"

'one and two and three'

We can format numbers like in the two previous formatting methods, using format specifiers:

In [44]:
x = 0.234535
# 2 decimal places
print( f"{x:.2f}" )
# at least 8 characters, with 3 decimal places
print( f"{x:08.3f}" )
# percentage with 1 decimal palce
print( f"{x:.1%}" )

0.23
0000.235
23.5%


We can reference existing variables in an f-string, and perform operations

In [45]:
a = 6
b = 10
f"Six plus ten is {a + b} and not {2 * (a + b)}"

'Six plus ten is 16 and not 32'

To make curly brackets appear in a string, we need to use double braces:

In [46]:
f"{{{50 * 5}}}"

'{250}'

F-strings are useful when working with dictionaries, where we can specify key names:

In [47]:
person = {'name': 'Lionel Messi', 'year': 1987, 'country' : 'Argentina'}
f"The football player {person['name']} was born in {person['country']} in {person['year']}."

'The football player Lionel Messi was born in Argentina in 1987.'