<!-- dom:TITLE: Introduction to Python (MOD510): Strings -->
# Introduction to Python (MOD510): Strings
<!-- dom:AUTHOR: Oddbjørn Nødland -->
<!-- Author: -->  
**Oddbjørn Nødland**

Date: **Aug 20, 2019**

**Summary.**
The aim of this workbook is to provide a rapid introduction to Python
string objects, and to show some examples of how you can work with them.




Python strings (*str* objects) are immutable objects. They can be created
using either single, double, or triple quotes, though the former is often
recommended practice. Only some very simple examples of usage are provided
here; for more information see, e.g., the
[official documentation](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str).

In [1]:
# String creation and basic manipulation:
text = 'We are all individuals!'
text2 = "We are all individuals!"
text3 = """We are all individuals!"""

if text == text2:
    # example: use double quotes for string if you want to use single quotes in the string itself.
    print("The two strings 'text' and 'text2' are identical.")
if text == text3:
    # We can of course do it the other way around too:
    print('The two strings "text" and "text3" are identical.')

print('Length of text =', len(text))  # includes whitespace

# A number also has a string representation:
y = 1234
y_as_string = str(y)
print(y_as_string)
print('y_as_string is an object of type:', type(y_as_string))

In [2]:
# We can use triple quotes for string spanning multiple lines:
multiple_lines_of_text = """This
is a
string
spanning
multiple
lines."""
print(multiple_lines_of_text)
print('Number of characters: {:d}.'.format(len(multiple_lines_of_text)))

# See the “official” string representation (includes hidden newline characters, space, etc.):
print(repr(multiple_lines_of_text))

# Another way:
long_text = ('This '
          'is '
          'a '
          'string '
          'spanning '
          'multiple '
          'lines.')
print(long_text)
print('Number of characters: {:d}.'.format(len(long_text)))
print(repr(long_text))

# The two strings are _NOT_ the same if you compare them for equality:
print(multiple_lines_of_text==long_text)
# Why not?

In [3]:
# Replace each occurrence of a string with another:
first_string = 'All we hear is radio ga ga.'
second_string = first_string.replace('ga', 'goo')
print(first_string)
print(second_string)

In [4]:
# Can use 'join' to rapidly collect a sequence of strings into a new one:
list_of_words = ['This', 'is', 'a', 'complete', 'sentence.']
# Separate the words by a single whitespace:
sentence = ' '.join(list_of_words)
print(sentence)

list_of_things = ['rock', 'paper', 'scissors']
# Separate words by comma followed by a whitespace:
print(', '.join(list_of_things))

In [5]:
# Convert to all upper / lower case:
s = 'Caps lock'.upper()
print(s)

s = 'lOWeR caSe'.lower()
print(s)

In [6]:
# Check if string starts with specific substring:
text = 'The Knights Who Say Ni demand a sacrifice!'
if text.startswith('The'):
    print("The sentence starts with 'the'")

# Search for first occurrence (index) of substring of a string:
string = 'That is the question, whether "tis nobler in the mind to suffer slings and arrows..."'
idx = string.find('slings and arrows')
if idx > 0:  # find returns -1 if no occurrences are present.
    print(string[idx:])

In [7]:
# Access first substring (character):
text = 'We are all individuals!'
a = text[0]
print('The first letter in the sentence is', a)

# Slice string:
subtext = text[7:11]
print(subtext)

# Split string into list (separated by whitespace):
text_list = text.split()
print(text_list)

# Concatenate strings using a for loop (list comprehension):
text_built_from_words = ''
for word in text_list:
    # NB! Creates a new string object each time:
    text_built_from_words += word
print(text_built_from_words)  # notice no whitespace

# Again, a more 'Pythonic' way to achieve the same effect is to use 'join':
text_built_from_words = ''.join(text_list)
print(text_built_from_words)

comma_separated_string = 'These,words,are,separated,by,commas,only'
word_list = comma_separated_string.split(',')
print(word_list)

For historical reasons, [there are a multitude of ways to format strings in Python](https://realpython.com/python-string-formatting/). Some examples of how to
use the old [*C-style* kind of
formatting](https://docs.python.org/2.4/lib/typesseq-strings.html) is shown
below:

In [8]:
# Examples of C-style formatting of strings:
g = 9.81

# Display as float with two significant digits:
msg1 = 'The constant of gravitational acceleration is %.2f m/s^2.' % g
print(msg1)

# Display as float with one significant digit:
msg2 = 'The constant of gravitational acceleration is %.1f m/s^2.' % g
print(msg2)

person_1 = 'John'
person_2 = 'Terry'
msg3 = '%s is better than %s at being silly.' % (person_1, person_2)
print(msg3)

An alternative method to print the same strings to screen could be to use the
*str.format* functionality introduced in Python 2.6:

In [9]:
# Examples of str.format:
g = 9.81

msg4 = 'The constant of gravitational acceleration is {:.2f} m/s^2.'.format(g)
print(msg4)

msg5 = 'The constant of gravitational acceleration is {:.1f} m/s^2.'.format(g)

person_1 = 'John'
person_2 = 'Terry'
msg6 = '{} is better than {} at being silly.'.format(person_1, person_2)
print(msg6)

# Check that the two ways to define the strings are the same:
print(msg1==msg4)
print(msg2==msg5)
print(msg3==msg6)

In [10]:
# Padding and aligning strings example:
text1 = '%10s' % ('text',)  # use field of 10 characters
# defaults to right-aligment in the 'old style'
print(text1)

# defaults to left-alignment in new...
text2 = '{:10}'.format('text')
print(text2)
# Therefore, the two strings are not equal:
print(text1==text2)

text3 = '{:>10}'.format('text')  # <-- now both are right-aligned
print(text3)
print(text1==text3)

For yet more advanced treatment of strings, one could delve into topics such
as [regular expressions](https://docs.python.org/3/library/re.html). Also,
sooner or later most programmers will hit upon problems related to using
[different character sets](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/).
These topics are, however, beyond the scope of this course.