# Strings

Python strings and formatting. Python supports single, double and triple quotations.

In [1]:
my_string_single = 'Hello world'
my_string_double = "Hello again world"
my_multi_line_string = '''This is a string
over two lines'''

We can see that string variables are of type 'str'...

In [2]:
print(type(my_string_single))

<class 'str'>


Quotes within quotes

In [20]:
my_nested_string = "my nested 'string'"
print(my_nested_string)
my_nested_string = 'my nested "string"'
print(my_nested_string)


my nested 'string'
my nested "string"


We can concatenate strings using the '+' operator...

In [4]:
print("you can " + "concatenate" + " strings like this")
print('you can ' + "mix" + ' quotes but not recommended')

you can concatenate strings like this
you can mix quotes but not recommended


Repetition with the '*' operator...

In [5]:
bob = "again.." * 5
print(bob)

again..again..again..again..again..


Escape characters...

In [6]:
print("this is a \n string with the '\\n' escape character")
print('this works for single\n quoted strings as well')

this is a 
 string with the '\n' escape character
this works for single
 quoted strings as well


Strings have methods and you can apply method direct to literals...

In [7]:
print("capitalise this sentence with a method.".capitalize())
print("concatenate using ".join("the join method"))

Capitalise this sentence with a method.
tconcatenate using hconcatenate using econcatenate using  concatenate using jconcatenate using oconcatenate using iconcatenate using nconcatenate using  concatenate using mconcatenate using econcatenate using tconcatenate using hconcatenate using oconcatenate using d


Formatting using variable string parameters to print, no space required

In [8]:
my_integer = 10
print("integer value is", my_integer)

integer value is 10


Using string formatting using the `%` operator or `format()` method

In [9]:
print("integer value is %d" % my_integer)
print("integer value is {0}".format(my_integer))        # using format method
print("integer value in hex 0x{bob:X}".format(bob=my_integer))
big_number = 100000000
print("big number in thousands {0:,}".format(big_number))


integer value is 10
integer value is 10
integer value in hex 0xA
big number in thousands 100,000,000


Remember that formatting is a property of strings and not print

In [10]:
formatted_string = "with formatted string the integer value is %d" % my_integer
print(formatted_string)

with formatted string the integer value is 10


Raw strings using the 'r' prefix. Below we compare ASCII hex of a raw string with the new line string versus normal string with ASCII line feed.

In [11]:
def print_hex(my_string):
    """Print a string in hexidecimal."""
    print(":".join("{:02x}".format(ord(c)) for c in my_string))

raw_string = r"\n"
normal_string = "\n"
print_hex(raw_string)
print_hex(normal_string)


5c:6e
0a


In Python 3 all strings are unicode by default, so no need for u prefix

In [12]:
my_string = 'A unicode # string'
my_ustring = 'A unicode \u018e string'

Strings are actually of same length as Unicode code point only counts as one character

In [13]:
print("my_string length = %d" % len(my_string))
print("my_ustring length = %d" % len(my_ustring))

my_string length = 18
my_ustring length = 18


`isascii()` new in Python 3.7

In [14]:
print("is my_string ascii: %r" % my_string.isascii())
print("is my_ustring ascii: %r" % my_ustring.isascii())

is my_string ascii: True
is my_ustring ascii: False


Converting to UTF converts a string into a series of bytes, hence the b prefix when printing. In this case the string is ASCII so UTF-8 mapping is one to one.

In [15]:
utf8_encoded = my_string.encode('utf-8')
print(type(utf8_encoded))

<class 'bytes'>


See https://en.wikipedia.org/wiki/UTF-8 for how utf-8 encoding works non-ASCII characters such as Unicode code points > 128. Here we see conversion of Unicode code points to UTF8

In [16]:
utf8_encoded = my_ustring.encode('utf-8')
# here you see the large Unicode value 0x018e being encoded as 2 bytes \xc6\x8e
print(utf8_encoded)

b'A unicode \xc6\x8e string'


Decoding the UTF-8 bytes back to a Unicode string

In [17]:
recovered_ustring = utf8_encoded.decode('utf-8')
print(recovered_ustring)
print(type(recovered_ustring))

A unicode Ǝ string
<class 'str'>


Character conversions...

In [18]:
ord_value = ord('a')
print(ord_value)
print(type(ord_value))
ord_value = ord('\u018e')
print(ord_value)
print(type(ord_value))

# and back again using chr
print(chr(ord_value))

97
<class 'int'>
398
<class 'int'>
Ǝ


f-strings are available in Python 3

In [19]:
my_integer = 10
my_list = ['bob', 'dave']
my_fstring = f"integer = {my_integer} and my_list is {my_list}"     # uses the f-prefix
print("my_fstring:", my_fstring)
print("you can call functions in f-strings")
my_fstring = f"integer**2 = {my_integer**2} and max(my_list) is {max(my_list)}"     # uses the f-prefix
print("my_fstring:", my_fstring)
print("use same formatting as with format")
print(f"integer = 0x{my_integer:02X}")

my_fstring: integer = 10 and my_list is ['bob', 'dave']
you can call functions in f-strings
my_fstring: integer**2 = 100 and max(my_list) is dave
use same formatting as with format
integer = 0x0A
