# Working with strings

When we looked at the basic data types, we noticed that Python does not have a dedicated data type for characters. <br/>
However, it has out-of-the-box support for strings.

As we mentioned, strings are typically defined using either single or double quotes.

In [1]:
m = "Hello World"
m = 'Hello World'

Alternatively, we can define multi-line strings using triple quotes.

In [2]:
m = """Hello 
World"""
m = '''Hello 
World'''

Note that the Python string class provides many nice features that come in handy in practice.

### String basic operations

In [3]:
# Convert all letters to upper case
m = "Hello World".upper()
print(m)

HELLO WORLD


In [4]:
# Convert all letters to lower case
m = "Hello World".lower()
print(m)

hello world


In [5]:
# Capitalize the first character only
m = "hello world".capitalize()
print(m)

Hello world


In [6]:
# Check whether all letters are lower or upper case
print('hello world'.islower())
print('hello World'.islower())
print('hello world'.isupper())
print('HELLO WORLD'.isupper())

True
False
False
True


In [7]:
# Sometimes, we are given a number as a string and want to convert the number to a float or int.
# However, before doing so, we might first want to check whether the number only contains valid digits.
print('123'.isdigit())
print('A123'.isdigit())

True
False


In [8]:
# Likewise, we can also check whether the string contains only alphabetical characters
print('ABC'.isalpha())
print('ABC234'.isalpha())

True
False


In [9]:
# Another functionality we often need in practice is a way to check whether a string starts with a certain substring.

# Check if a string starts with "ABC"
print('ABCHello World'.startswith('ABC'))
print('ABCHello World'.endswith('ABC'))

True
False


In [10]:
# We can also check whether a certain substring appears somewhere in the string.
# Python has a dedicated operator (a.k.a. membership operator) for this
print('orld' in 'Hello World')
print('orld' in 'Hello World')

# We can also do the opposite and check whether a substring is not contained in a string
print('orld' not in 'Hello World')
print('orld' not in 'Hello World')

True
True
False
False


In [11]:
# But what if we want to know where exactly a substring appears in a string?
print('Hello World'.find('ello')) # Starts at pos 1
print('Hello World'.find('allo')) # Does not contain the substring

1
-1


In [12]:
# How do we compare two strings character-wise? ==> With the == operator
print('Hello' == 'World')
print('Hello' == 'Hell')
print('Hello' == 'Hello')

False
False
True


In [13]:
# We can also check whether two strings are not equal
print('Hello' != 'World')
print('Hello' != 'Hello')

True
False


In [14]:
# If necessary, we can also access certain elements/characters in a string using s[<pos>]
m = "Hello World"
print('The fourth element in the string is: ', m[4])

# Similarly, we can also get an entire substring simply by specifying a range s[<start>:<end>]
print(m[1:4])

The fourth element in the string is:  o
ell


In [15]:
m = "Hello World"
# We can also easily get the first n-characters ...
print(m[:4]) # Gets the first 4 characters

# Or all characters that after (incl.) the 4-th character
print(m[4:])

Hell
o World


### Format conversion

Format conversion from str to another type (and vice versa) is very easy to achieve. For example, ...

#### String to boolean and vice versa

In [16]:
# Convert booleans to string
print(str(True), str(False))

True False


In [17]:
# And from string to boolean
print(bool('True'))

True


In [18]:
print(bool('true'))

True


In [19]:
# Ups, apparently this does not work as intended. Why?
print(bool('False'))

True


In [20]:
# Remember what we said when we looked at the basic operators.
# Any non-zero value evaluates to True. As string false "False" is a non-zero number in memory. Hence, it evaluates to true.
# However, if we check ...
print(bool(''))

# '' => NULL character = 0 in the ASCII code

False


In [21]:
# To do this conversion correctl what we can do is ...
s_true = 'True'
s_false = 'False'

print(s_true == 'True')
print(s_false == 'True')

True
False


#### String to Int/float and vice versa

In [22]:
# Converting strings to integer/float is pretty straightforward
print(int("123"))
print(float("123.5"))

123
123.5


In [23]:
# Results in an error
print(int("123.5434"))

ValueError: invalid literal for int() with base 10: '123.5434'

In [None]:
# To cut off the digits after the decimal point we could do ...
print(int(float("123.5434")))

In [None]:
# What's nice is that int() can also handle, e.g., hexadecimal numbers. We only have to specify the right base
print(int('123ab', 16))

# Let's verify if it works correctly
print(int('a', 16)) # => 10
print(int('1a', 16)) # => 16+10 = 26

In [None]:
# Convert float/int to string
print(str(123))
print(str(123.123))

### Strings are immutable

As previously mentioned, we can access characters in a string using `[pos]`. <br/>
But what if we want to modify certain characters in a string? For example, by writing ...

In [None]:
m = "Hello"
m[1] = 'a'

This is because strings are IMMUTABLE. In other words, an existing string cannot be modified. However, we can always build a new, modified string.

In [24]:
m = 'Hello'
m_new = m[:1] + 'a' + m[2:]
print(m_new)

Hallo


In [25]:
# Both strings will have different locations in memory. But let's check ...
print(id(m) == id(m_new))

False


In [26]:
# Note that even functions such as upper() do not modify the existing string and instead return a new one.
print(id('Hello') == id('Hello'.upper()))

False


**Important:** <br/>
Never use the `is` operator to compare the quality of strings. Always use the `==` operator to check whether two strings are equal.

In fact, the result that you might get when comparing strings with `is` is kind of unpredictable due to features such as **string interning**. String interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient.

This can be illustrated with the following example:

In [37]:
m1 = 'Hello'
m2 = 'Hello'

# m1 and m2 share the same memory location due to string interning
print(m1 is m2)

m1 = 'Hello$'
m2 = 'Hello$'

# In C-Python, strings that contain special characters not "interned" by default
print(m1 is m2)

m1 = 'Hello'
m2 = 'H' + m[1:]

# This also creates a new string object
print(m1 is m2)

True
False
False


## Formatted strings

No matter whether a program simply has to interact with a user or simply write results to a text file, almost any program will have to compose a string out of multiple variables at some point.

An obvious way to do this is simply by converting each variable to a string and then concatenating these strings. 

In [None]:
m = "Let's print a string with two variables " + str(3) + ' and ' + str(3.4)
print(m)

Unfortunately, this is not very convenient if there are many variables. In particular, since we can easily miss whitespaces, etc.

#### Formatting strings using c-style formatting

Another option to create nicely formatted strings is by using **c-style string formatting**. <br/>
This formatting is also similar to the formatting used in Java.

In [None]:
m = 'A string formatted using c-style formatting --- Values %i and %f' % (3, 3.4)
print(m)

In [None]:
# Like in C, we can control the number of digits after the decimal point, etc.
# See: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
m = 'A string formatted using c-style formatting. Values %3i and %.3f' % (3, 3.4)
print(m)

However, this way of formatting strings is rather inconvenient. <br />
Let's assume that we want the text or the order or variables in the text. In this case, we also need to change the order of the elements that appear in the tuple.

Luckily, Python provides a way to specify the variable name.

In [None]:
m = 'A string formatted using c-style formatting. Values %(x)i and %(y).3f' % {'x': 3, 'y': 3.4}
print(m)

#### Formatting strings using `format`

Nevertheless, c-style formatting isn't pythonic ... <br/>
That's why the `format` method was added to the string class in Python 3 (and later ported back to Python 2.7)

In [None]:
m = 'This is a string formatted using the format method. Values {x} and {y}'.format(x=3, y=3.4)
print(m)

In [None]:
# Modifying strings now becomes straightforward. The format() function simply takes values as named arguments.
m = 'This is a string formatted using the format method. Values {y} and {x}'.format(x=3, y=3.4)
print(m)

In [None]:
# We can also format the string if required
# See: https://docs.python.org/3/library/string.html#formatstrings
m = 'This is a string formatted using the format method. Values {x:03d} and {y:.4f}'.format(x=3, y=3.4)
print(m)

#### Formatting strings using f-strings

However, since Python 3.6 there exists an even simpler to format strings using so-called **f-strings**.

In [None]:
x = 3
y = 3.4
m = f'This is an f-string which is easy to use.\nWe simply name our variable and tadaa ... Values {x} and {y}'
print(m)

In [None]:
m = f'With formatting ... Values {x:.3f} and {y}'
print(m)

In [None]:
m = f'We can even to calculations ... Values {x+y}'
print(m)