# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## Session 05: Strings in Python

[&larr; Back to course webpage](https://datakolektiv.com/)

Feedback should be send to [goran.milovanovic@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com). 

These notebooks accompany the DATA SCIENCE SESSIONS VOL. 3 :: A Foundational Python Data Science Course.

![](../img/IntroRDataScience_NonTech-1.jpg)

### Lecturers

[Goran S. Milovanović, PhD, DataKolektiv, Chief Scientist & Owner](https://www.linkedin.com/in/gmilovanovic/)

[Aleksandar Cvetković, PhD, DataKolektiv, Consultant](https://www.linkedin.com/in/alegzndr/)

[Ilija Lazarević, MA, DataKolektiv, Consultant](https://www.linkedin.com/in/ilijalazarevic/)

![](../img/DK_Logo_100.png)

***

### 0. What do we want to do today?

Our goal in **Session05** is to learn

- about Strings in Python,
- operations with Strings,
- useful String methods defined in Python.

#### 1. Where am I?

Your are (or you should be...) in the `session05` directory, where we find 
- this notebook, 
- it's HTML version, 
- another directory `_data` that contains textual file named `python_zen.txt`.

In [1]:
import os
work_dir = os.getcwd()
print(work_dir)
print(os.listdir(work_dir))
data_dir = os.path.join(work_dir, "_data")
print(os.listdir(data_dir))

/home/ikacikac/workspace/dss03python2023/session05
['dss03_py_session05.ipynb', 'dss03_py_session05.html', '_data']
['python_zen.txt']


### Things we have learned so far

Defining the string variable:

In [2]:
foo = 'This is a one sentence text.'

# pangram
bar = 'The quick brown fox jumps over the lazy dog.'

Can we 'sum' the strings?

In [3]:
foo + bar

'This is a one sentence text.The quick brown fox jumps over the lazy dog.'

Yes, Python knows that 'summing' means concatenating. Yes, that's right. This operation is usually called string concatenation. But, you have to know that Python just 'tapes' second on the first one. If you want to make sure there is a space between the end of the first and second sentence, you have to do it yourself.

In [4]:
foo + ' ' + bar

'This is a one sentence text. The quick brown fox jumps over the lazy dog.'

In one of the previous sessions, we mentioned `mutability`, `sequences` and `iterables`. How is that related to strings?

Well, we said that:
- mutable objects allow changing values of their attributes or their representation,
- sequences preserve the order of inserted elements, and you can refer to each of these elements through its index,
- iterables are objects that we can iterate through.

How does this reflect on strings?

#### Strings are immutable

In [5]:
foo

'This is a one sentence text.'

In [6]:
foo[0] = 'A'

TypeError: 'str' object does not support item assignment

#### Strings as sequences

We have already mentioned some of these operations, but it's not bad to refresh our memory. These apply to all sequence types, but let's see how they work on strings.

In [7]:
foo + bar

'This is a one sentence text.The quick brown fox jumps over the lazy dog.'

In [8]:
foo * 2

'This is a one sentence text.This is a one sentence text.'

In [9]:
foo[0]

'T'

In [10]:
foo[:3]

'Thi'

In [11]:
len(foo)

28

In [12]:
list(range(len(foo)))

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27]

Let's define one helper function for giving us an overview of the string and indexing positions. Disregard the "complexity" of the code; you will be able to write this yourself by the end of this session.

In [173]:
def scheme_string(s):
    print('')
    print('String:', s)
    print('')
    print('Scheme:')
    print('|'.join(f'{x: >3}' for x in range(len(s))))
    print('|'.join([f'{x: >3}' for x in s]))
    print('|'.join(f'{-x: >3}' for x in range(len(s), 0, -1)))
    print()
    print('Length:', len(s))

In [174]:
scheme_string(foo)


String: This is a one sentence text.

Scheme:
  0|  1|  2|  3|  4|  5|  6|  7|  8|  9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27
  T|  h|  i|  s|   |  i|  s|   |  a|   |  o|  n|  e|   |  s|  e|  n|  t|  e|  n|  c|  e|   |  t|  e|  x|  t|  .
-28|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10| -9| -8| -7| -6| -5| -4| -3| -2| -1

Length: 28


In [154]:
foo[0:9]

'This is a'

In [16]:
foo[14: 22]

'sentence'

In [17]:
print(list(range(14, 22)))

[14, 15, 16, 17, 18, 19, 20, 21]


In [18]:
foo[-22:-13]

's a one s'

In [19]:
foo[-16:-24:-1]

'eno a si'

In [20]:
foo[21:13:-1]

'ecnetnes'

In [21]:
scheme_string(foo)


String: This is a one sentence text.

Scheme:
  0|  1|  2|  3|  4|  5|  6|  7|  8|  9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27
  T|  h|  i|  s|   |  i|  s|   |  a|   |  o|  n|  e|   |  s|  e|  n|  t|  e|  n|  c|  e|   |  t|  e|  x|  t|  .
-28|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10| -9| -8| -7| -6| -5| -4| -3| -2| -1


In [22]:
foo[14:22:-1]

''

In [23]:
list(range(21, 13, -1))

[21, 20, 19, 18, 17, 16, 15, 14]

In [24]:
foo.index('te')

17

In [25]:
foo.rindex('te')

23

In [26]:
foo.index('te', 10, 20)

17

In [27]:
foo.index('te', 10, 15)

ValueError: substring not found

In [28]:
[1,2,3,5,3,2,5,6].index(3, 4)

4

In [29]:
foo.count('te')

2

In [30]:
foo.count('what')

0

In [31]:
'te' in foo

True

Official Python documentation say: 
>> In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. 

But, how does that work on strings?

In [32]:
'This' == 'This'

True

In [33]:
'This' == 'This '

False

Interesting!  This seems to be correct for strings too.

Wait, what does `lexicographically' even mean here? Well, it means you compare sequences as you would compare words. Letter by corresponding letter.

Of course, here you also have to think about spaces on both sides of the string.

In [34]:
' This' == 'This'

False

How do we deal with this?

Well, strings have their specific methods, and we will go through some of these in the next section.

But before that, let's try something that may not make any sense at all.

Let's try to find the minimum and maximum of the string sequence!

In [35]:
min(foo)

' '

In [36]:
max(foo)

'x'

Hmmm, this is strange. How does a string have a minimum, and why is the "x" character the string's maximum? 

We know how to compare the values of numbers, so letters must have their numerical representation, right? 

We will describe what is happening behind the scenes later on.

#### Interesting string methods

Now we will go over some most useful string methods.

In [37]:
'This is a sentence '.strip()

'This is a sentence'

In [38]:
' This is a sentence '.strip()

'This is a sentence'

In [39]:
' This is a sentence '.rstrip()

' This is a sentence'

In [40]:
' This is a sentence '.lstrip()

'This is a sentence '

In [41]:
str.strip?

[0;31mSignature:[0m [0mstr[0m[0;34m.[0m[0mstrip[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mchars[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.
[0;31mType:[0m      method_descriptor


In [42]:
foo.strip('.')

'This is a one sentence text'

In [43]:
foo.strip('this.')

'This is a one sentence tex'

In [44]:
foo.lstrip('This ')

'a one sentence text.'

In [45]:
foo.lstrip('This ').capitalize()

'A one sentence text.'

In [46]:
foo.lstrip('This ').upper()

'A ONE SENTENCE TEXT.'

In [47]:
foo.lstrip('This ').upper().lower()

'a one sentence text.'

In [48]:
bar

'The quick brown fox jumps over the lazy dog.'

In [49]:
bar.casefold?

[0;31mSignature:[0m [0mbar[0m[0;34m.[0m[0mcasefold[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return a version of the string suitable for caseless comparisons.
[0;31mType:[0m      builtin_function_or_method


In [50]:
bar.casefold()

'the quick brown fox jumps over the lazy dog.'

Wait, this looks like what `.lower()` should return, right?

In [51]:
bar.lower()

'the quick brown fox jumps over the lazy dog.'

Well yes, but... no!

In [52]:
"der Fluß".lower()

'der fluß'

In [53]:
"der Fluß".casefold()

'der fluss'

How is this possible? Strings are more complicated than they appear. Hint: Unicode. We will be back on this later.

In [54]:
foo.lstrip('This ').capitalize()

'A one sentence text.'

In [55]:
foo.lstrip('This ').title()

'A One Sentence Text.'

In [56]:
foo.lstrip('This ').title().swapcase()

'a oNE sENTENCE tEXT.'

In [57]:
foo.startswith('This')

True

In [58]:
foo.startswith('his')

False

In [59]:
foo.endswith('.')

True

In [60]:
foo.index('te')

17

In [61]:
foo.index('What')

ValueError: substring not found

In [62]:
scheme_string(foo)


String: This is a one sentence text.

Scheme:
  0|  1|  2|  3|  4|  5|  6|  7|  8|  9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27
  T|  h|  i|  s|   |  i|  s|   |  a|   |  o|  n|  e|   |  s|  e|  n|  t|  e|  n|  c|  e|   |  t|  e|  x|  t|  .
-28|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10| -9| -8| -7| -6| -5| -4| -3| -2| -1


In [63]:
foo.find('What')

-1

In [64]:
foo.find('This')

0

In [65]:
foo.find('te')

17

In [66]:
foo.rfind('te')

23

In [67]:
foo.ljust(35, '<')

'This is a one sentence text.<<<<<<<'

In [68]:
scheme_string(foo.ljust(35, '<'))


String: This is a one sentence text.<<<<<<<

Scheme:
  0|  1|  2|  3|  4|  5|  6|  7|  8|  9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27| 28| 29| 30| 31| 32| 33| 34
  T|  h|  i|  s|   |  i|  s|   |  a|   |  o|  n|  e|   |  s|  e|  n|  t|  e|  n|  c|  e|   |  t|  e|  x|  t|  .|  <|  <|  <|  <|  <|  <|  <
-35|-34|-33|-32|-31|-30|-29|-28|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10| -9| -8| -7| -6| -5| -4| -3| -2| -1


In [69]:
scheme_string(foo.rjust(35, '<'))


String: <<<<<<<This is a one sentence text.

Scheme:
  0|  1|  2|  3|  4|  5|  6|  7|  8|  9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27| 28| 29| 30| 31| 32| 33| 34
  <|  <|  <|  <|  <|  <|  <|  T|  h|  i|  s|   |  i|  s|   |  a|   |  o|  n|  e|   |  s|  e|  n|  t|  e|  n|  c|  e|   |  t|  e|  x|  t|  .
-35|-34|-33|-32|-31|-30|-29|-28|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10| -9| -8| -7| -6| -5| -4| -3| -2| -1


In [70]:
foo.removeprefix('This')

' is a one sentence text.'

In [71]:
foo.removesuffix('text.')

'This is a one sentence '

In [72]:
foo.removesuffix('tex.')

'This is a one sentence text.'

In [73]:
foo

'This is a one sentence text.'

In [74]:
foo.split()

['This', 'is', 'a', 'one', 'sentence', 'text.']

In [75]:
foo.rsplit()

['This', 'is', 'a', 'one', 'sentence', 'text.']

Hmm, no difference. At least looks like it.

In [76]:
foo.split(maxsplit=3)

['This', 'is', 'a', 'one sentence text.']

In [77]:
foo.rsplit(maxsplit=3)

['This is a', 'one', 'sentence', 'text.']

In [78]:
foo.partition('is')

('Th', 'is', ' is a one sentence text.')

In [79]:
foo.rpartition('a')

('This is ', 'a', ' one sentence text.')

In [80]:
foo.replace('e', 'b')

'This is a onb sbntbncb tbxt.'

In [81]:
foo.replace('e', 'b', 2)

'This is a onb sbntence text.'

There are also methods that check if strings are of a certain format. For example:

In [82]:
foo.istitle()

False

In [83]:
foo.isupper()

False

In [84]:
foo.isspace()

False

In [85]:
'   '.isspace()

True

So let's give overview of used string methods:
- `index` and `rindex`,
- `find` and `rfind`,
- `strip`, `lstrip` and `strip`,
- `count`
- `capitalize`, `title`, `upper`, `lower`, `swapcase` and `casefold`,
- `startswith` and `endswith`,
- `ljust` and `rjust`,
- `removeprefix` and `removesuffix`,
- `split` and `rsplit`,
- `replace`,
- `isupper`, `islower`, `isspace`, `istitle`. 

But there is also a way of checking if string characters are part of a certain character subset. What does this mean?

Well, as humans, we can certainly differentiate between '1' and 'a'.

'1' is a string representation of a numeric, and 'a' is a character of a small alphabet letter.

How do we test a string based on its characters?

Let's take some of the different types of characters that can end up in a string. Then we will test each of them with some of the string methods and print out a table.

In [86]:
# examples different characters in strings
str_list = [
    '123',
    '123a',
    'ab12',
    'abc',
    '¼',
    '一',
    '10.2',
    '10\u00B2',
    '٢',
    '\N{ROMAN NUMERAL ONE}' + '\N{ROMAN NUMERAL TEN}',
    '\N{ROMAN NUMERAL TEN}',
    '\N{BLACK CHESS QUEEN}'
]

In [87]:
# some of the string methods for testing string content
method_list = [str.isalnum, str.isalpha, str.isascii, str.isdecimal, str.isnumeric, str.isdigit]

In [88]:
# let's print out scheme of method string representation
scheme_string(str(method_list[0]))


String: <method 'isalnum' of 'str' objects>

Scheme:
  0|  1|  2|  3|  4|  5|  6|  7|  8|  9| 10| 11| 12| 13| 14| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 27| 28| 29| 30| 31| 32| 33| 34
  <|  m|  e|  t|  h|  o|  d|   |  '|  i|  s|  a|  l|  n|  u|  m|  '|   |  o|  f|   |  '|  s|  t|  r|  '|   |  o|  b|  j|  e|  c|  t|  s|  >
-35|-34|-33|-32|-31|-30|-29|-28|-27|-26|-25|-24|-23|-22|-21|-20|-19|-18|-17|-16|-15|-14|-13|-12|-11|-10| -9| -8| -7| -6| -5| -4| -3| -2| -1


Do not pay attention to the code and its complexity. Just look at the output.

In [89]:
print(' ' * 10 + ''.join([i.rjust(6) for i in str_list]))
for method in method_list:
    m = str(method)
    method_name = m[m.index('is'):m.index('of')-2]
    print(method_name.rjust(10) + ''.join([f'{method(i): >6}' for i in str_list]))

             123  123a  ab12   abc     ¼     一  10.2   10²     ٢    ⅠⅩ     Ⅹ     ♛
   isalnum     1     1     1     1     1     1     0     1     1     1     1     0
   isalpha     0     0     0     1     0     1     0     0     0     0     0     0
   isascii     1     1     1     1     0     0     1     0     0     0     0     0
 isdecimal     1     0     0     0     0     0     0     0     1     0     0     0
 isnumeric     1     0     0     0     1     1     0     1     1     1     1     0
   isdigit     1     0     0     0     0     0     0     1     1     0     0     0


Essentially, Python can recognize different types of characters, including letters from various foreign languages. Example is ٢ which is 2 in Arabic. It is clearly an alpha-numeric, numeric, decimal number and a digit.

Now is a good time to talk about these special characters.

#### Encodings: ASCII, Unicode and UTF-8

The whole idea of encoding goes back to the early days of human history. One example would be smoke signals used for wireless transmission of messages. Another widely known way of encoding is the Morse code, which has special sequences of sounds for each letter and more. With the invention of computers, there was a need to store data in memory and send it over the wire. Since computers only worked with 0s and 1s, we needed a way to encode alphabet letters into sequences of 0s and 1s. Even though there were encodings before it, the first and most widely known one, at least in the computer era, is ASCII encoding.

ASCII encoding was used to encode all English alphabet letters (lower and upper case), punctuation marks, and some specific characters for special use. In the first, there were 127 of these symbols. Later versions included additional modifications of the latin alphabet, but due to the way alphabet symbols were encoded in memory, there were just 127 coding points left to use. This was not enough for supporting different languages across the planet and also adding hundreds of special symbols, e.g., emojis.

This is the reason Unicode was invented. Unicode is a newer and more powerful encoding standard, although it succeeded ASCII and is backward compatible. It consists of code points assigned to the symbol (glyph). With these code points, it can support more than 1 million different symbols all across the planet that are or were in use, e.g., Egyptian hieroglyphs. This comes at the cost of complexity, so we will not get into much detail at this point. Even though Unicode encodes various glyphs with code points, this doesn't solve the challenge of storing this in memory, i.e., translating it into 0s and 1s. So we need additional standards for this. And there are many, but one widely known is UTF-8. Even ASCII can be used at this point, albeit in a subset of cases.

What is important to know is that:
- the 3.x version Python treats all strings as Unicode ones. This is the reason why you can have, for example, '¼', '٢' or '♛' in your string.
- when you store your text in files or send it over the internet, it gets encoded.
- the way you **encode it on the sending side** and the way it gets **decoded on the receiving side must be the same**.

Most of the time this is done automatically by the operating system and Python, but every once in a while you might end up with errors while trying to read the textual file in Python, and it is good to know what could be the cause of them.

What about 'der Fluß' and `casefold` method? Explain.

So, let's see how strings are encoded.

In [90]:
unique_string = '¼ ٢ ♛'

In [91]:
unique_string.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode character '\xbc' in position 0: ordinal not in range(128)

Wow, ASCII really doens't know how to encode these glyphs. Let's try with UTF-8.

In [92]:
unique_string.encode('utf8')

b'\xc2\xbc \xd9\xa2 \xe2\x99\x9b'

Okay, this apparently works, but what is this? Let's say this is how our `unique_string` gets stored in the memory. Kind of, but it's more than enough for you to know at this point.

One more thing. Do you remember when we did get the `min` and `max` values of the string? Well, this has to do with encodings too. Each string character has its own code point. In ASCII that code point can be interpreted as a number between 1 and 255. This is the reason why you can compare them!

In [93]:
'a' > 'b'

False

In [94]:
'b' > 'a'

True

Oh, but what about capital letter's value?

In [95]:
'A' > 'a'

False

In [96]:
'B' > 'A'

True

In [97]:
foo

'This is a one sentence text.'

Python has builtin functions for getting the numerical representation of a character.

In [98]:
ord('a')

97

In [99]:
ord('b')

98

Or, you can go the inverse.

In [100]:
chr(98)

'b'

In [101]:
chr(99)

'c'

Interesting, isn't it?

This way of interpreting characters gives us the ability to sort strings lexicographically.

In [102]:
'aaa' > 'aaa'

False

In [103]:
'aaa' > 'aab'

False

In [104]:
'aab' > 'aaa'

True

In [105]:
'aaaa' > 'aaa'

True

Let's sort the string and see the result.

In [106]:
bar

'The quick brown fox jumps over the lazy dog.'

In [107]:
sorted(bar)

[' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 ' ',
 '.',
 'T',
 'a',
 'b',
 'c',
 'd',
 'e',
 'e',
 'e',
 'f',
 'g',
 'h',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'o',
 'o',
 'o',
 'p',
 'q',
 'r',
 'r',
 's',
 't',
 'u',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [108]:
''.join(sorted(bar))

'        .Tabcdeeefghhijklmnoooopqrrstuuvwxyz'




Now let's get to some practical parts. Let's load the Python Zen textual file from the file in `_data` directory, and print the result.

In [109]:
with open('_data/python_zen.txt', 'r', encoding='utf-8') as f:
    text_lines = f.readlines()

In [110]:
text_lines

['Beautiful is better than ugly.\n',
 'Explicit is better than implicit.\n',
 'Simple is better than complex.\n',
 'Complex is better than complicated.\n',
 'Flat is better than nested.\n',
 'Sparse is better than dense.\n',
 'Readability counts.\n',
 "Special cases aren't special enough to break the rules.\n",
 'Although practicality beats purity.\n',
 'Errors should never pass silently   .\n',
 'Unless explicitly silenced.\n',
 'In the face of ambiguity, refuse the temptation to guess.\n',
 'There should be one-- and preferably only one --obvious way to do it.\n',
 "Although that way may not be obvious at first unless you're Dutch.\n",
 'Now is better than never.\n',
 'Although never is often better than *right* now.\n',
 "If the implementation is hard to explain, it's a bad idea.\n",
 'If the implementation is easy to explain, it may be a good idea.\n',
 "Namespaces are one honking great idea -- let's do more of those!\n"]

Obviously, this is the list of strings, where each string is a text row in the file. But what is `\n'? Remember special symbols in ASCII? Well, '\n' is one of these. It is a **new line** symbol. It instructs the computer to know where it should break the text into a new line.

In [111]:
print('This is one line\nThis is another one.')

This is one line
This is another one.


There is dozen of these but the important ones are:
- **'\n'** - new line or line feed (LF)
- **'\r'** - carriage return (CR)
- **'\t'** - tab


**Important!!** Windows users: CRLF, Unix users: LF

In [112]:
print('This is one text row\nThis is another one\nBut\twhat\tis\tthis?\nAnd what\bwhot is this?')

This is one text row
This is another one
But	what	is	this?
And whawhot is this?


So what do we do with these new lines at the end of each string? We strip them!

In [113]:
[l.strip() for l in text_lines]

['Beautiful is better than ugly.',
 'Explicit is better than implicit.',
 'Simple is better than complex.',
 'Complex is better than complicated.',
 'Flat is better than nested.',
 'Sparse is better than dense.',
 'Readability counts.',
 "Special cases aren't special enough to break the rules.",
 'Although practicality beats purity.',
 'Errors should never pass silently   .',
 'Unless explicitly silenced.',
 'In the face of ambiguity, refuse the temptation to guess.',
 'There should be one-- and preferably only one --obvious way to do it.',
 "Although that way may not be obvious at first unless you're Dutch.",
 'Now is better than never.',
 'Although never is often better than *right* now.',
 "If the implementation is hard to explain, it's a bad idea.",
 'If the implementation is easy to explain, it may be a good idea.',
 "Namespaces are one honking great idea -- let's do more of those!"]

Easy as that. Now you can work with each line, or you can join them togheather.

In [114]:
print(' '.join([l.strip() for l in text_lines]))

Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently   . Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!


Or just join them for better preview.

In [115]:
print(''.join(text_lines))

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently   .
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!



For the end of this part, let's introduce one additional way of defining strings. We will refresh on two approaches we have learned so far.

In [116]:
print('This is the test string')

This is the test string


In [117]:
print("This isn't the test string.")

This isn't the test string.


In [118]:
test_string = """
Well, this is the new kind of defining a string!
Isn't this awesome?

I can even quote with double quotes: "The quick brown fox jumps over the lazy dog."
"""

In [119]:
print(test_string)


Well, this is the new kind of defining a string!
Isn't this awesome?

I can even quote with double quotes: "The quick brown fox jumps over the lazy dog."



In [120]:
test_string.split('\n')

['',
 'Well, this is the new kind of defining a string!',
 "Isn't this awesome?",
 '',
 'I can even quote with double quotes: "The quick brown fox jumps over the lazy dog."',
 '']

In [121]:
test_string.strip().split('\n')

['Well, this is the new kind of defining a string!',
 "Isn't this awesome?",
 '',
 'I can even quote with double quotes: "The quick brown fox jumps over the lazy dog."']

In [148]:
test_string = "This is a test string." \
              "But this is also part of it"

In [149]:
print(test_string)

This is a test string.But this is also part of it


In [150]:
test_string = "This is a test string." \
              "\nBut this is also part of it"

In [151]:
print(test_string)

This is a test string.
But this is also part of it


#### Strings formatting

So far, we have defined string variables, played with string methods, and talked about reading textual files from the file system.

What about the scenario where you have some intermediate results of your code that you want to print out or even save to a file? How do you include variable values in strings for displaying and saving?

Well, we used one simple way:

In [122]:
a = 5
b = 6

In [123]:
print('This is the result of summing a and b:', a+b)

This is the result of summing a and b: 11


What if `a` variable holds a floating point number?

In [124]:
a = 12.3954382934823

In [125]:
print('This is the result of summing a and b:',a+b)

This is the result of summing a and b: 18.3954382934823


Do we really need this many decimal places? What if I want to format my output in a more readable format? For example:

In [126]:
print('Report:')
print('a =', a)
print('b =', b)
print('-'*20)
print('a + b =', a+b)
print('a * b =', a*b)
print('(a + b)/a =', (a+b)/a)
print('-'*20)

Report:
a = 12.3954382934823
b = 6
--------------------
a + b = 18.3954382934823
a * b = 74.3726297608938
(a + b)/a = 1.4840490394885744
--------------------


Of course, you can play with spaces and achieve something that may suit your needs. But Python has a way of achieving this easily.

This is achieved using f-strings.

F-strings are defined as regular ones with the letter 'f' prefix.

In [127]:
print('Test string')

Test string


In [128]:
print(f'Test string')

Test string


Same thing, yes? Well, f-string gives you more. Look:

In [129]:
print(f'This is the sum of a and b: {a+b}')

This is the sum of a and b: 18.3954382934823


Too much decimal places? Not a problem!

In [130]:
print(f'This is the sum of a and b: {a+b:.2f}')

This is the sum of a and b: 18.40


Informally speaking, the format for inserting a variable value into an f-string is:

`{variable_name:formatting}`

**Variable name** part can be a previously defined variable or the result of calling functions or methods.

**Formatting** part depends on the variable data type. We said that there are four primitive data types: `int`, `float`, `str`, and `boolean`.

In [131]:
type(a)

float

In [132]:
print(f'This is a method name: {str.isalnum}')

This is a method name: <method 'isalnum' of 'str' objects>


Here Python tries to get the string representation of method `isalnum`. But we can also call method inside an f-string.

In [133]:
print(f'This is uppercase foo: {foo.upper()}')

This is uppercase foo: THIS IS A ONE SENTENCE TEXT.


The formatting part is smart enough to adapt to the type of the variable. The previous example used a string representation of the method, but we can be explicit by instructing f-string how to perceive variables.

In [134]:
print(f'This is uppercase foo: {foo.upper():s}')

This is uppercase foo: THIS IS A ONE SENTENCE TEXT.


In [135]:
print(f'This is a:{a}')

This is a:12.3954382934823


In [136]:
print(f'This is a:{a+b}')

This is a:18.3954382934823


In [137]:
# Number. This is the same as d except that it uses the current locale setting to
# insert the appropriate number separator characters.
print(f'This is a:{a:n}.') 

This is a:12.3954.


In [138]:
print(f'This is a:{a:.2f}.')

This is a:12.40.


In [139]:
print(f'This is a:{a:.2}.') # scientific notation by default

This is a:1.2e+01.


In [140]:
# Exponent notation. Prints the number in scientific notation using the letter ‘e’
# to indicate the exponent. The default precision is 6.
print(f'This is a:{a:e}.') # explicitly set scientific notation 

This is a:1.239544e+01.


This is all great. We can format and display numbers in a more appealing way. For prices, we surely do not need 10 decimal places. For scientific notation, we can use explicit formatting.

F-strings give more options. One of these is positioning a variable value in the string.

In [141]:
print(f"{'Report':+>30s}")

++++++++++++++++++++++++Report


In [142]:
print(f"{'Report':+<30s}")

Report++++++++++++++++++++++++


In [143]:
print(f"{'Report':-^50s}")
print(f"We have two variables:")
print(f"a = {a}")
print(f"b = {b}")
print('-'*50)
print(f"Here are some of the calculations:")
print(f"a + b = {a+b}")
print(f"a * b = {a*b}")
print(f"(a + b) / 2 = {(a+b)/2}")
print('-'*50)

----------------------Report----------------------
We have two variables:
a = 12.3954382934823
b = 6
--------------------------------------------------
Here are some of the calculations:
a + b = 18.3954382934823
a * b = 74.3726297608938
(a + b) / 2 = 9.19771914674115
--------------------------------------------------


So we can position our strings inside the result of the f-string. Cool! We saw that we could format floats and reduce the unnecessary number of decimal points. Let's do that!

In [144]:
print(f"{'Report':-^50s}")
print(f"We have two variables:")
print(f"a = {a:.2f}")
print(f"b = {b:.2f}")
print('-'*50)
print(f"Here are some of the calculations:")
print(f"a + b = {a+b:.2f}")
print(f"a * b = {a*b:.2f}")
print(f"(a + b) / 2 = {(a+b)/2:.2f}")
print('-'*50)

----------------------Report----------------------
We have two variables:
a = 12.40
b = 6.00
--------------------------------------------------
Here are some of the calculations:
a + b = 18.40
a * b = 74.37
(a + b) / 2 = 9.20
--------------------------------------------------


Cool. Now lets use the formatting options for positioning values in f-strings.

In [145]:
print(f"{'Report':-^50s}")
print(f"We have two variables:")
print(f"a = {a:>46.2f}")
print(f"b = {b:>46.2f}")
print('-'*50)
print(f"Here are some of the calculations:")
print(f"a + b = {a+b:>42.2f}")
print(f"a * b = {a*b:>42.2f}")
print(f"(a + b) / 2 = {(a+b)/2:>36.2f}")
print('-'*50)

----------------------Report----------------------
We have two variables:
a =                                          12.40
b =                                           6.00
--------------------------------------------------
Here are some of the calculations:
a + b =                                      18.40
a * b =                                      74.37
(a + b) / 2 =                                 9.20
--------------------------------------------------


But there is an even better way to achieve all of this. Remember `"""`? F-strings can be defined that way too!

In [146]:
report_string = f"""
{'Report':-^50s}
a = {a:>46.2f}
b = {b:>46.2f}
{'-'*50}
Here are some of the calculations:
a + b = {a+b:>42.2f}
a * b = {a*b:>42.2f}
(a + b) / 2 = {(a+b)/2:>36.2f}
"""

In [147]:
print(report_string)


----------------------Report----------------------
a =                                          12.40
b =                                           6.00
--------------------------------------------------
Here are some of the calculations:
a + b =                                      18.40
a * b =                                      74.37
(a + b) / 2 =                                 9.20



Now, let's look at the helper function from the beginning of the session. We said you would be able to understand it by the end.

```python
def scheme_string(s):
    print('')
    print('String:', s)
    print('')
    print('Scheme:')
    print('|'.join(f'{x: >3}' for x in range(len(s))))
    print('|'.join([f'{x: >3}' for x in s]))
    print('|'.join(f'{-x: >3}' for x in range(len(s), 0, -1)))
```

### Readings and Videos
- [Bill Lubanovic, Introducing Python, 1st Edition](https://www.oreilly.com/library/view/introducing-python-2nd/9781492051374/), Chapter 5.
- [freeCodeCamp.org Intermediate Python Programming Course](https://www.youtube.com/watch?v=HGOBQPFzWKo), Section 5 (Strings)

### A highly recommended To Do
- Watch brief intro videos about ASCII and Unicode [Understanding ASCII and Unicode (GCSE)
](https://youtu.be/5aJKKgSEUnY), [What Is Unicode? And Why Do I Need To Use Unicode?](https://youtu.be/EGtcgMlyBhU), and [Unicode and character encoding
](https://youtu.be/JwWoVQXQ24k)
- Read about f-strings [Python 3's f-Strings: An Improved String Formatting Syntax (Guide)](https://realpython.com/python-f-strings/)

<hr>

DataKolektiv, 2022/23.

[hello@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com)

![](../img/DK_Logo_100.png)

<font size=1>License: [GPLv3](https://www.gnu.org/licenses/gpl-3.0.txt) This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.</font>