# Lesson 36: String Operations II - Formatting

**WARNING:** The reference notebook is meant **ONLY** for a teacher. Please **DO NOT** share it with any student. The contents of the reference notebook are meant only to prepare a teacher for a class. To conduct the class, use the class copy of the reference notebook.

|Particulars|Description|
|-|-|
|**Topic**|String Operations II - Formatting|
|||
|**Class Description**|In this class, a student will learn different ways to format a string in detail|
|||
|**Class**|C36|
|||
|**Class Time**|45 minutes|
|||
|**Goals**|Format a string in different ways using the `format()` function and the `f` literal|
||Compare the formatting speeds of the `format()` function and the `f` literal|
|||
|**Teacher Resources**|Google Account|
||Link to Lesson 36 Colab reference notebook|
||Laptop with internet connectivity|
||Earphones with mic|
|||
|**Student Resources**|Google Account|
||Laptop with internet connectivity|
||Earphones with mic|

---

### Teacher-Student Activities

In this class, we learn how to format a string using the `format()` function and the `f-string` technique in detail.

Let's continue the class from the `format()` function section.

---

#### Recap

In the previous class, we learnt how to format a string using the `format()` function. Here are the examples:

```
Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that DERARA HURISA                    
BIB No. 19
officially completed the marathon
(distance 42.2 km)
in the men's category
with a finish time of 2:08:09
and rank 1.                   
--------------------------------------------------
```

Here's the link to the CSV file containing data for each runner.

https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/tmm-2020-race-results/TMM2020_RaceResults.csv


In [None]:
# Placeholders without labels: Print the certificates for each TMM 2020 runner in the prescribed format.
import pandas as pd

df = pd.read_csv('https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/tmm-2020-race-results/TMM2020_RaceResults.csv')

certificate = """
  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that {}
BIB No. {}
officially completed the marathon
(distance 42.2 km)
in the men's category
with a finish time of {}
and rank {}.
{}
"""
for i in df.index: # The index keyword returns an array containing indices of all the rows in a DataFrame.
  print(certificate.format(df.loc[i, 'NAME'], df.loc[i, 'BIB'], df.loc[i, 'Net Finish Time'], df.loc[i, 'Finish Position'], '-' * 50))


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that DERARA HURISA                    
BIB No. 19
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:09 
and rank 1.                   
--------------------------------------------------


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that AYELE ABSHERO                    
BIB No. 2
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:20 
and rank 2.                   
--------------------------------------------------


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that BIRHANU TESHOME                    
BIB No. 8
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:26 
and rank 3.                   
--------------------------------------------------


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This 


**Labelling Placeholders**

Label the placeholders in such a way that they describe the values to put to the placeholders.

In [None]:
# Placeholders with labels: Print the certificates for each TMM 2020 runner in the prescribed format.
certificate = """
  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that {name}
BIB No. {bib}
officially completed the marathon
(distance 42.2 km)
in the men's category
with a finish time of {finish_time}
and rank {rank}.
{hyphens}
"""
for i in df.index: # The index keyword returns an array containing indices of all the rows in a DataFrame.
  print(certificate.format(name=df.loc[i, 'NAME'],
                           bib=df.loc[i, 'BIB'],
                           finish_time=df.loc[i, 'Net Finish Time'],
                           rank=df.loc[i, 'Finish Position'],
                           hyphens='-' * 50))


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that DERARA HURISA                    
BIB No. 19
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:09 
and rank 1.                   
--------------------------------------------------


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that AYELE ABSHERO                    
BIB No. 2
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:20 
and rank 2.                   
--------------------------------------------------


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that BIRHANU TESHOME                    
BIB No. 8
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:26 
and rank 3.                   
--------------------------------------------------


  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This 

**Syntax:** `string.format(placeholder1 = value1, placeholder2 = value2, ..., placeholderN = valueN)` where `N` is the number of values to assign or the number of placeholders to fill.


----

#### Activity 1: The `format()` Function

We can do a lot more with the `format()` function. E.g., we can retrieve values from a dictionary more crisply and plug them into a string. Let's learn this part by creating the certificate for the winner of TMM 2020 only.




In [None]:
# Teacher Action: Print all the columns of the 'df' DataFrame.
df.columns

Index(['Finish Position', 'BIB', 'NAME', 'NATIONALITY', 'Net Finish Time'], dtype='object')

In [None]:
# Teacher Action: Create a dictionary for the TMM 2020 winner and then generate a certificate for him.
winner_dict = dict(df.loc[0, ['NAME', 'BIB', 'Net Finish Time', 'Finish Position']])
print(winner_dict)
winner_certi = """
  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that {0[NAME]}
BIB No. {1[BIB]}
officially completed the marathon
(distance 42.2 km)
in the men's category
with a finish time of {2[Net Finish Time]}
and rank {3[Finish Position]}.
{4}
"""

print(winner_certi.format(winner_dict, winner_dict, winner_dict, winner_dict, '-' * 50))

{'NAME': 'DERARA HURISA', 'BIB': 19, 'Net Finish Time': '2:08:09', 'Finish Position': 1}

  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that DERARA HURISA                    
BIB No. 19
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:09 
and rank 1.                   
--------------------------------------------------



The `format()` function actually behaves like a list or a tuple. It means the values passed to the `format()` function occupy unique indices inside the function. So, the first value will occupy `index = 0`, the second value will occupy `index = 1` and so on.

Here, we have passed the `winner_dict` variable to the `format()` function four times. It acts as an item inside the `format()` function. So, `winner_dict` has four indices. They are `0, 1, 2` and `3`. Additionally, we have passed each of these indices inside the placeholders along with the keys of the dictionary stored in the `winner_dict` variable. Therefore:

- In the first placeholder, `0[NAME]` becomes `winner_dict['NAME']` because `winner_dict` is present at `index = 0` inside the `format()` function.

- In the second placeholder, `1[BIB]` becomes `winner_dict['BIB']` because `winner_dict` is present at `index = 1` as well inside the `format()` function.

- In the third placeholder, `2[Net Finish Time]` becomes `winner_dict['Net Finish Time']` because `winner_dict` is present at `index = 2` as well inside the `format()` function.

- In the fourth placeholder, `3[Finish Position]` becomes `winner_dict['Finish Position']` because `winner_dict` is present at `index = 3` as well inside the `format()` function.

- In the fifth placeholder, `{4}` becomes `'-' * 5` because it is present at `index = 4` inside the `format()` function.

There is an even more neat way of formatting the string stored in the `winner_certi` variable. Pass the `winner_dict` dictionary only once to the `format()` function and insert `0` inside the first four placeholders along with the keys of the `winner_dict` dictionary.

In [None]:
# Student Action: Pass the 'winner_dict' dictionary only once to the 'format()' function.
# Also, insert 0 inside the first four placeholders along with the keys of the 'winner_dict' dictionary.
winner_dict = dict(df.loc[0, ['NAME', 'BIB', 'Net Finish Time', 'Finish Position']])
print(winner_dict)
winner_certi = """
  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that {0[NAME]}
BIB No. {0[BIB]}
officially completed the marathon
(distance 42.2 km)
in the men's category
with a finish time of {0[Net Finish Time]}
and rank {0[Finish Position]}.
{1}
"""

print(winner_certi.format(winner_dict, '-' * 50))

{'NAME': 'DERARA HURISA', 'BIB': 19, 'Net Finish Time': '2:08:09', 'Finish Position': 1}

  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that DERARA HURISA                    
BIB No. 19
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:09 
and rank 1.                   
--------------------------------------------------



In the above code:

- In the first placeholder, `0[NAME]` becomes `winner_dict['NAME']` because `winner_dict` is present at `index = 0` inside the `format()` function.

- In the second placeholder, `0[BIB]` becomes `winner_dict['BIB']` because `winner_dict` is present at `index = 0` inside the `format()` function.

- In the third placeholder, `0[Net Finish Time]` becomes `winner_dict['Net Finish Time']` because `winner_dict` is present at `index = 0` inside the `format()` function.

- In the fourth placeholder, `0[Finish Position]` becomes `winner_dict['Finish Position']` because `winner_dict` is present at `index = 0` inside the `format()` function.

- In the fifth placeholder, `{1}` becomes `'-' * 5` because it is present at `index = 1` inside the `format()` function.

Similarly, you can retrieve values from a list or tuple and substitute them into a string by replacing the keys of a dictionary inside the placeholders with the indices of the items in a tuple or a list. However, you cannot use negative indices while working with placeholders.

Repeat the above exercise by creating a tuple for the winner of TMM 2020 and then generate the certificate for him.

In [None]:
# Student Action: Repeat the above exercise by creating a tuple for the TMM 2020 winner and then generate the certificate for him.
winner_tuple = tuple(df.loc[0, :])
print(winner_tuple)
winner_certi = """
  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that {0[2]}
BIB No. {0[1]}
officially completed the marathon
(distance 42.2 km)
in the men's category
with a finish time of {0[4]}
and rank {0[0]}.
{1}
"""

print(winner_certi.format(winner_tuple, '-' * 50))

(1, 19, 'DERARA HURISA', 'ETH', '2:08:09')

  Tata Mumbai Marathon 2020

      CONGRATULATIONS

This is to certify that DERARA HURISA                    
BIB No. 19
officially completed the marathon 
(distance 42.2 km) 
in the men's category 
with a finish time of 2:08:09 
and rank 1.  
--------------------------------------------------                 



In the above code:

- In the first placeholder, `0[2]` becomes `winner_tuple[2]` because `winner_tuple` is present at `index = 0` inside the `format()` function. The name of the runner exists at `index = 2` in the `winner_tuple`.

- In the second placeholder, `0[1]` becomes `winner_dict[1]` because `winner_tuple` is present at `index = 0` inside the `format()` function. The BIB number of the runner exists at `index = 1` in the `winner_tuple`.

- In the third placeholder, `0[4]` becomes `winner_tuple[4]` because `winner_tuple` is present at `index = 0` inside the `format()` function. The finish time of the runner exists at `index = 4` in the `winner_tuple`.

- In the fourth place holder, `0[0]` becomes `winner_dict[0]` because `winner_tuple` is present at `index = 0` inside the `format()` function. The finish position of the runner exists at `index = 0` in the `winner_tuple`.

- In the fifth placeholder, `{1}` becomes `'-' * 5` because it is present at `index = 1` inside the `format()` function.

---

#### Activity 2: Zero Padding

You can print a single-digit number as an $n$-digit number, such that the first $(n - 1)$ digits are zeros, using the `format()` function. E.g., you can print 7 as 007. To do this, you have to put the `:0n` flag inside a placeholder where `n` is the number of digits to be displayed in the final number.

Let's learn this concept with the help of an example.

In [None]:
# Student Action: Print the natural numbers between 5 and 15 as three-digit numbers.
for i in range(5, 16):
  print("{:03}".format(i))

005
006
007
008
009
010
011
012
013
014
015


---

#### Activity 3: Digits After Decimal

Similarly, you can also define the number of digits to be printed after the decimal point using the `format()` function. E.g., the value of $\pi$, i.e., 3.14159, correct upto five decimal places can be reported as 3.14, i.e., correct upto two decimal places.

To do this, you have to put the `:.nf` flag inside a placeholder where `n` is the number of digits to be displayed after the decimal point.

Consider the tuple below.

`india_gdp_2011_2018 = (5.2413,	5.4564,	6.3861,	7.4102,	7.9963,	8.1695,	7.1679,	6.8114)`

Let's print the numbers contained in the `india_gdp_2011_2018` tuple. Also, display only two digits after the decimal point.

In [None]:
# Student Action: Print the numbers contained in the 'india_gdp_2011_2018' tuple. Also, display only two digits after the decimal point.
india_gdp_2011_2018 = (5.2413, 5.4564, 6.3861, 7.4102, 7.9963, 8.1695, 7.1679, 6.8114)
for i in range(len(india_gdp_2011_2018)):
  print("{:.2f}".format(india_gdp_2011_2018[i]))

5.24
5.46
6.39
7.41
8.00
8.17
7.17
6.81


---

#### Activity 4: Commas After Digits

You can put commas in a large number using the `format()` function. E.g., the number 2384759 can be printed as 2,384,759 using the `format()` function. This is a very helpful feature to read an astronomically large number.

To do this, you need to put the `:,` flag inside a placeholder.

In [None]:
# Student Action: Create a tuple containing five random integers; each integer having 11 digits. Print the integers with commas included in them.
import random
rand_ints = tuple((random.randint(2.71e10, 3.14e10) for i in range(5)))
print(rand_ints, "\n")
for i in range(len(rand_ints)):
  print("Number at index = {} is {:,}".format(i, rand_ints[i]))

(28852230363, 29457041501, 28165615643, 28492906021, 28738573546) 

Number at index = 0 is 28,852,230,363
Number at index = 1 is 29,457,041,501
Number at index = 2 is 28,165,615,643
Number at index = 3 is 28,492,906,021
Number at index = 4 is 28,738,573,546


In the above case, you can also print digits after the decimal point along with the commas.

In [None]:
# Student Action: Create a tuple containing five random integers; each integer having 11 digits. Print the integers with commas included in them.
# Also, print the digits upto three correct decimal places.
import random
rand_ints = tuple((random.randint(2.71e10, 3.14e10) for i in range(5)))
print(rand_ints, "\n")
for i in range(len(rand_ints)):
  print("Number at index = {} is {:,.3f}".format(i, rand_ints[i]))

(27270572965, 29252474033, 31074523595, 29992136886, 30047428819) 

Number at index = 0 is 27,270,572,965.000
Number at index = 1 is 29,252,474,033.000
Number at index = 2 is 31,074,523,595.000
Number at index = 3 is 29,992,136,886.000
Number at index = 4 is 30,047,428,819.000


As you can see, the numbers contain both the comma and digits after the decimal point.

---

#### Activity 5: The `f-strings`^^

The `f-strings` is a cleaner and a faster way of formatting a string as compared to the `format()` function. The concepts of zero paddings, digits after the decimal point, commas after digits etc. are also applicable in case of the formatting through `f` literal in the exact same way.  However, the `f-strings` are applicable only to the newer versions of Python, specifically, version 3.6 and above.

Introduction of `f-string` does not necessarily mean that the `format()` function has become irrelevant. Many Python programmers still use the `format()` function to make their code compatible with the older versions of Python. It is also more convenient while working with multiline strings.

Before we learn formatting through the literal `f`, let's check the Python version installed on Colab notebooks by Google. You can do this by writing the `!python --version` bash command.

**Note:**

1. Every notebook has the same version of Python installed.

2. If you are working on a terminal window of a Linux/Unix based machine, then you don't have to use the exclamation mark in the bash command, i.e., the command becomes `python --version`.

In [None]:
# Student Action: Check the version of Python installed on Colab notebooks.
!python --version

Python 3.6.9


The literal `f`, works exactly like the `format()` function. You simply have to write a final version of a string containing placeholders. The placeholders need to be labelled with the variable names that contain the values to be placed in the respective placeholders.

Let's write a code to get the following output.

```
The annual GDP of India in 2011 was 5.2413%.
The annual GDP of India in 2012 was 5.4564%.
The annual GDP of India in 2013 was 6.3861%.
The annual GDP of India in 2014 was 7.4102%.
The annual GDP of India in 2015 was 7.9963%.
The annual GDP of India in 2016 was 8.1695%.
The annual GDP of India in 2017 was 7.1679%.
The annual GDP of India in 2018 was 6.8114%.
```

Use the literal `f` to format a string.

In [None]:
# Student Action: Write a code to get the above output. Strictly use the literal 'f' for string formatting.
import numpy as np

np_years = np.arange(2011, 2019) # You can also create a tuple or a list.
india_gdp_2011_2018 = (5.2413, 5.4564, 6.3861, 7.4102, 7.9963, 8.1695, 7.1679, 6.8114)
for year, gdp in tuple(zip(np_years, india_gdp_2011_2018)):
  print(f'The annual GDP of India in {year} was {gdp}%.')

The annual GDP of India in 2011 was 5.2413%.
The annual GDP of India in 2012 was 5.4564%.
The annual GDP of India in 2013 was 6.3861%.
The annual GDP of India in 2014 was 7.4102%.
The annual GDP of India in 2015 was 7.9963%.
The annual GDP of India in 2016 was 8.1695%.
The annual GDP of India in 2017 was 7.1679%.
The annual GDP of India in 2018 was 6.8114%.


In the above code:

- The `zip()` function joins the NumPy array containing the year values with the corresponding GDP values.

- The `tuple()` function creates a tuple containing a collection of tuples of year and GDP values.

- The `for` loop iterates through each item of the tuple.

- The literal `f` formats the string in the desired format. The skeleton of the string is

  `'The annual GDP of India in {year} was {gdp}%.'`

  The placeholders in the above string are labelled as `year` and `gdp` which one-by-one store the year and GDP values for each item contained in the tuple.

**The`f-string` syntax:** The syntax to format a string using the `f` literal is

`"Some string having place holders {variable1} {variable2} ... {variableN}"`

where `variable1, variable2 ... variableN` are the variables storing the values that need to go to their corresponding labelled placeholders.



---

#### Activity 6: Mathematical Operations Inside Placeholders^^^

You can also perform mathematical operations inside the placeholders.

Let's understand this concept with the help of an example. Let's calculate the year-on-year change in the GDP figures of India between 2011 and 2018. Here's the expected output:

```
The increase in the GDP from 2011 to 2012 is 0.22 pp.
The increase in the GDP from 2012 to 2013 is 0.93 pp.
The increase in the GDP from 2013 to 2014 is 1.02 pp.
The increase in the GDP from 2014 to 2015 is 0.59 pp.
The increase in the GDP from 2015 to 2016 is 0.17 pp.
The decrease in the GDP from 2016 to 2017 is 1.00 pp.
The decrease in the GDP from 2017 to 2018 is 0.36 pp.
```

The abbreviation `pp` stands for percent points, i.e., the difference between two percentages. Use the `f-string` formatting technique to get the above output.

In [None]:
# Student Action: Write a code to solve the above problem statement. Strictly use the 'f-string' formatting technique.
for i in range(len(india_gdp_2011_2018) - 1):
  for j in range(i + 1, i + 2):
    if india_gdp_2011_2018[j] - india_gdp_2011_2018[i] > 0:
      print(f"The increase in the GDP from {np_years[i]} to {np_years[j]} is {(india_gdp_2011_2018[j] - india_gdp_2011_2018[i]):.2f} pp.")
    elif india_gdp_2011_2018[j] - india_gdp_2011_2018[i] < 0:
      print(f"The decrease in the GDP from {np_years[i]} to {np_years[j]} is {-(india_gdp_2011_2018[j] - india_gdp_2011_2018[i]):.2f} pp.")
    else:
      print(f"There change in the GDP from {np_years[i]} to {np_years[j]} is {(india_gdp_2011_2018[j] - india_gdp_2011_2018[i]):.2f} pp.")

The increase in the GDP from 2011 to 2012 is 0.22 pp.
The increase in the GDP from 2012 to 2013 is 0.93 pp.
The increase in the GDP from 2013 to 2014 is 1.02 pp.
The increase in the GDP from 2014 to 2015 is 0.59 pp.
The increase in the GDP from 2015 to 2016 is 0.17 pp.
The decrease in the GDP from 2016 to 2017 is 1.00 pp.
The decrease in the GDP from 2017 to 2018 is 0.36 pp.


In the above code:

- The nested `for` loop iterates through each value stored in the `india_gdp_2011_2018` tuple and `np_years` array such that only two consecutive values are taken at a time.

- In the last placeholder, the difference between two consecutive GDP values is calculated. At the same time, the difference is formatted to display two digits after the decimal point.

---

#### Activity 7: Performance^

The formatting through `f` literal is faster as compared to the formatting through the `format()` function.

Let's calculate the time taken to format the same string 1 million times using the `f` literal as well as the `format()` function individually.

In [None]:
# Student Action: Run the code below to compare the formatting speeds of the same string using the 'f' literal & the 'format()' function.
import timeit

s1 = """
name='Bruce Wayne'
age=28
f'My name is {name}. I am {age} years old.'
"""

s2 = """
name='Bruce Wayne'
age=28
'My name is {}. I am {} years old.'.format(name, age)
"""

time1 = timeit.timeit(stmt=s1, number=100000)
time2 = timeit.timeit(stmt=s2, number=100000)

print(f"Time taken to format a string using the 'f' literal is {time1 :.4f} seconds.")
print(f"Time taken to format a string using the 'format()' function is {time2 :.4f} seconds.")
print(f"\nThe f-strings are {(time2 / time1):.2f} times faster than the 'format()' function.") # This ratio will always be greater than 1.

Time taken to format a string using the 'f' literal is 0.0292 seconds.
Time taken to format a string using the 'format()' function is 0.0452 seconds.

The f-strings are 1.55 times faster than the 'format()' function.


Let's pause here. Up next, we have the Capstone class. That means it is time to put your coding skills to test! In the upcoming class, we will dive deeper into string operations that are critical to analysing textual data.

Please ask your parents to join the class.


---

### Activities

**Teacher Activities**

1. String Operations II - Formatting (Class Copy)

  Link on Panel
    
2. String Operations II - Formatting (Reference)

   Link on Panel

---