# Strings

Python strings and formatting. Python supports single, double and triple quotations.

In [3]:
my_string_single = 'Hello world'
my_string_double = "Hello again world"
my_multi_line_string = '''This is a string
over two lines'''

We can see that string variables are of type `str`...

In [4]:
print(str(type(my_multi_line_string))[1:-1])

class 'str'


Quotes within quotes

In [20]:
my_nested_string = "my nested 'string'"
print(my_nested_string)
my_nested_string = 'my nested "string"'
print(my_nested_string)


my nested 'string'
my nested "string"


We can concatenate strings using the `+` operator...

In [5]:
print("You can " + "concatenate" + " strings like this")
print('You can ' + "mix" + ' quotes but not recommended')

You can concatenate strings like this
You can mix quotes but not recommended


String repetition can be achived using the `*` operator

In [6]:
bob = "again.." * 5
print(bob)

again..again..again..again..again..


Strings can include escape characters...

In [8]:
print("This is a\n string with the '\\n' escape character")
print('This works for single\n quoted strings as well')

This is a
 string with the '\n' escape character
This works for single
 quoted strings as well


Strings have methods and you can apply method direct to literals...

In [9]:
print("capitalise this sentence with a method.".capitalize())
print("Concatenate using join: ", " and ".join(('Bob', 'Sue', 'Sam')))

Capitalise this sentence with a method.
Concatenate using join:  Bob and Sue and Sam


Formatting using variable string parameters to `print`, no space required

In [10]:
my_integer = 10
print("Integer value is", my_integer)

Integer value is 10


String formatting using the `%` operator or `format()` method

In [11]:
print("Integer value = %d" % my_integer)
print("Integer value = {0}".format(my_integer))        # using format method
print("Integer value in hexidecimal = 0x{bob:X}".format(bob=my_integer))
big_number = 100000000
print("Big number in thousands = {0:,}".format(big_number))

Integer value = 10
Integer value = 10
Integer value in hexidecimal = 0xA
Big number in thousands = 100,000,000


Remember that formatting is a property of strings and not `print`

In [10]:
formatted_string = "with formatted string the integer value is %d" % my_integer
print(formatted_string)

with formatted string the integer value is 10


Raw strings use the `r` prefix. Below we compare a raw string with the new line versus a normal string with ASCII line feed.

In [12]:
def print_hex(my_string):
    """Print a string in hexidecimal."""
    print(":".join("{:02x}".format(ord(c)) for c in my_string))

raw_string = r"\n"
normal_string = "\n"
print_hex(raw_string)
print_hex(normal_string)

5c:6e
0a


In Python 3 all strings are Unicode by default, so no need for `u` prefix as required in Python 2

In [25]:
my_string = 'A unicode # string'
my_string_with_unicode_chars = 'A unicode \u018e string'
print(my_string)
print(my_string_with_unicode_chars)

A unicode # string
A unicode Ǝ string


Strings are actually of the same length as Unicode code points only counts as one character

In [24]:
print("my_string length = %d" % len(my_string))
print("my_string_with_unicode_chars length = %d" % len(my_string_with_unicode_chars))

my_string length = 18
my_string_with_unicode_chars length = 18


`isascii()` new in Python 3.7

In [18]:
print("Is my_string ASCII: %r" % my_string.isascii())
print("Is my_string_with_unicode_chars ASCII: %r" % my_string_with_unicode_chars.isascii())

Is my_string ASCII: True
Is my_string_with_unicode_chars ASCII: False


Converting to UTF converts a string into a series of bytes, hence the `b` prefix when printed to the console. In this case the input string is ASCII so the UTF-8 mapping is one to one.

In [20]:
utf8_encoded = my_string.encode('utf-8')
print(utf8_encoded)
print(str(type(utf8_encoded))[1:-1])

b'A unicode # string'
class 'bytes'


See https://en.wikipedia.org/wiki/UTF-8 for how UTF-8 encoding works non-ASCII characters, such as Unicode code points > 128. Here we see conversion of Unicode code points to UTF8.

In [26]:
utf8_encoded = my_string_with_unicode_chars.encode('utf-8')
# here you see the large Unicode value 0x018e being encoded as 2 bytes \xc6\x8e
print(utf8_encoded)

b'A unicode \xc6\x8e string'


Decoding the UTF-8 bytes back to a Unicode string

In [27]:
recovered_ustring = utf8_encoded.decode('utf-8')
print(recovered_ustring)
print(str(type(recovered_ustring))[1:-1])

A unicode Ǝ string
class 'str'


Character conversions

In [28]:
ord_value = ord('a')
print(ord_value)
print(str(type(ord_value))[1:-1])
ord_value = ord('\u018e')
print(ord_value)
print(str(type(ord_value))[1:-1])

# and back again using chr
print(chr(ord_value))

97
class 'int'
398
class 'int'
Ǝ


f-strings are available in Python 3. You can use the same formatting as with `format` and you can even call functions within the f-string braces.

In [31]:
my_integer = 10
my_list = ['bob', 'dave']
my_fstring = f"integer = {my_integer} and my_list is {my_list}"     # uses the f-prefix
print("my_fstring:", my_fstring)
my_fstring = f"my_integer**2 = {my_integer**2} and max(my_list) is {max(my_list)}"     # uses the f-prefix
print("my_fstring:", my_fstring)
print(f"my_integer = 0x{my_integer:02X}")

my_fstring: integer = 10 and my_list is ['bob', 'dave']
my_fstring: my_integer**2 = 100 and max(my_list) is dave
my_integer = 0x0A
