### What is string formatting?

String formatting is also called string interpolation. It is the process of inserting a custom string in a predefined text. As data scientist, you will use it to insert a title in a graph, show an error message or pass a statement into a function.

### Methods for formatting

The modern versions of Python have three main approaches to string formatting: 

- positional formatting, 


- formatted strings literals, and 


- template methods


### Positional formatting

Positional formatting works in the following way. We put placeholders defined by a pair of curly braces in a text. We call the string dot format method. Then, we pass the desired value into the method. The method replaces the placeholders using the values in order of appearance. 

<img src="pf.jpg" style="max-width:600px">

Let's examine the example. We define a string and insert two placeholders. We pass two strings to the method which will be passed to get the following output. We can use variables for both the string and the values passed to the method. 

In [1]:
print("Machine learning provides {} the ability to learn {}".format("systems", "automatically"))

Machine learning provides systems the ability to learn automatically


In the example code, we defined a string with placeholders along with two other variables. We apply the format method to the string using the two defined variables. 

The method reads the string and replaces the placeholders with the given values. And we get the output.

### Reordering values

We can add index numbers into the curly braces. This affects the order in which the method replaces placeholders. In the example, we left the placeholders empty. The method replaces them with the values in the given order. And we get the output shown here.  

In [3]:
my_string = "{} rely on {} datasets"
method = "Supervised algorithms"
condition = "labeled"

print(my_string.format(method, condition))

Supervised algorithms rely on labeled datasets


If we add the index numbers, the replacement order changes accordingly. Now, the output changes as you can observe.

In [4]:
print("{} has a friend called {} and a sister called {}".format("Betty", "Linda", "Daisy"))

Betty has a friend called Linda and a sister called Daisy


In [5]:
print("{2} has a friend called {0} and a sister called {1}".format("Betty", "Linda", "Daisy"))

Daisy has a friend called Betty and a sister called Linda


### Named placeholders

We can also introduce keyword arguments that are called by their keyword name. 

In the example code, we inserted keywords in the placeholders. Then, we call these keywords in the format method. We then assign which variable will be passed for each of them resulting in the following output.

In [6]:
tool="Unsupervised algorithms"
goal="patterns"

print("{title} try to find {aim} in the dataset".format(title=tool, aim=goal))

Unsupervised algorithms try to find patterns in the dataset


In [7]:
my_methods = {"tool": "Unsupervised algorithms", 
              
              "goal": "patterns"}

print('{data[tool]} try to find {data[goal]} in the dataset'.format(data=my_methods))

Unsupervised algorithms try to find patterns in the dataset


Let's examine this code. We have defined a dictionary with keys: tool and goal. We want to insert their values in a string. Inside the placeholders, we can specify the value associated with the key tool of the variable data using bracket notation. 

Pay attention to the code. Data is the dictionary specified in the method and tool is the key present in that dictionary. So, we get the desired output shown in the slide. 

Be careful! You need to specify the index without using quotes.

### Format specifier

We can also use format specifiers inside the curly braces. This defines how individual values are presented. We’ll use the syntax index colon specifier. One of the most common format specifiers is float represented by the letter f. 

In [8]:
print("Only {0:f}% of the {1} produced worldwide is {2}!".format(0.5155675, "data", "analyzed"))

Only 0.515567% of the data produced worldwide is analyzed!


In the code, we specified that the value passed with index 0 will be a float, getting the displayed output. 

We could also add dot two f indicating that we want the float to have two decimals as seen in the resulting output.

In [9]:
print("Only {0:.2f}% of the {1} produced worldwide is {2}!".format(0.5155675, "data", "analyzed"))

Only 0.52% of the data produced worldwide is analyzed!


### Formatting datetime

Python has a module called datetime that allows us to, for example, get the date and time for today. 

In [10]:
from datetime import datetime
print(datetime.now())

2022-02-03 20:30:14.195017


You can see that the format returned is very particular. 

We can use format specifiers such as percentage y, m, d, h and capital m to adjust the format to something more familiar to us 

In [11]:
print("Today's date is {:%Y-%m-%d %H:%M}".format(datetime.now()))

Today's date is 2022-02-03 20:31


### Exercise 1: Put it in order!

Your company is analyzing the best way to provide users with different online courses. Your job is to scrape Wikipedia pages searching for tools used in Data Science subfields. You'll store the tool and field name in a database. 

After a text analysis, you realize that the information is provided in a specific position of the text but sometimes the field name is given first and the tool after that, while in other cases it's the other way around.

You decide to use positional formatting to handle these situations because it provides a way to reorder placeholders.

The text of one article has already been saved in the variable 'wikipedia_article'. Also, the empty list 'my_list' is already defined.

In [14]:
wikipedia_article ='In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals.'

wikipedia_article

'In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals.'

In [17]:
## Empty list

my_list = []

- Assign the substrings going from the 4th to the 19th character inclusive, and from the 22nd to the 44th character inclusive of wikipedia_article to the variables first_pos and second_pos, respectively. Adjust the strings to be lowercase.


- Define a string with the text "The tool is used in" adding placeholders after the word 'tool' and the word 'in' for future positional formatting. Append it to the list my_list.


- Define a string with the text "The tool is used in" adding placeholders after the word 'tool' and 'in' but reorder them so the second argument passed to the method will replace the first placeholder. Append to the list my_list.


- Craete the for-loop so that it uses the .format() method and the variables first_pos and second_pos to print out every string in my_list.

In [16]:
wikipedia_article

'In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals.'

In [15]:
## 1st question

first_pos = wikipedia_article[3:19]
print(first_pos)

second_pos = wikipedia_article[21:44]
print(second_pos)

computer science
artificial intelligence


In [18]:
## 2nd question
my_list.append("The tool {} is used in {}")

In [19]:
## 3rd question
my_list.append("The tool {1} is used in {0}")

In [20]:
my_list

['The tool {} is used in {}', 'The tool {1} is used in {0}']

In [21]:
## 4th q
for string in my_list:
    print(string.format(first_pos,second_pos))

The tool computer science is used in artificial intelligence
The tool artificial intelligence is used in computer science


### Exercise 2: Calling by its name

You have created your database with the tools and the different Data Science subfields they are used in. The company is considering creating courses using these tools and sending personalized emails to the users recommending the different topics. They have asked you to make this process more time-efficient. 

To do this, you want to create a template email with a standard message changing the different tools and corresponding field name.

First, you want to try doing this with just one example as a proof of concept. You use positional formatting and named placeholders to call the variables in a dictionary.

The variable 'courses' containing one tool and one field name has been saved. 

In [22]:
courses = ['artificial intelligence', 'neural networks']
courses

['artificial intelligence', 'neural networks']

In [34]:
my_message = "If you are interested in ____, you can take the course related to ____"

- Create a dictionary assigning the first and second element appearing in the list courses to the keys "field" and "tool" respectively.


- Complete the placeholders accessing inside to the elements linked with the keys field and tool in the dictionary data.


- Print out the resulting message using the .format() method, passing the plan dictionary to replace the data placeholders.

In [23]:
## 1st
plan = {"field" : courses[0],
        "tool" : courses[1]}
plan

{'field': 'artificial intelligence', 'tool': 'neural networks'}

In [27]:
## 2nd and 3rd
my_message = "If you are interested in {data[field]}, you can take the course related to {data[tool]}"
my_message.format(data=plan)

'If you are interested in artificial intelligence, you can take the course related to neural networks'

In [None]:
# Create a dictionary
plan = {
  		"field": courses[0],
        "tool": courses[1]
        }

# Complete the placeholders accessing elements of field and tool keys in the data dictionary
my_message = "If you are interested in {data[field]}, you can take the course related to {data[tool]}"

# Use the plan dictionary to replace placeholders
print(my_message.format(data=plan))

### Exercise 3: What day is today?

It's lunch time and you are talking with some of your colleagues. They comment that they feel that every morning someone should send them a reminder of what day it is so they can check in the calendar what their assignments are for that day.

You want to help out and decide to write a small script that takes the date and time of the day so that every morning, a message is sent to your colleagues. You can use the module datetime along with named placeholders to achieve your goal.

The date should be expressed as Month day, year, e.g. `April 16, 2019` and the time as hh:mm, e.g. `16:30`.

You write down some specifiers to help you:` %d(day), %B (monthname), %m (monthnumber), %Y(year), %H (hour) and %M(minutes)`

In [28]:
message = "Good morning. Today is {____:____ ____, ____}. It's {today:___:____} ... time to work!"

- Import the function datetime from the module datetime .


- Obtain the date of today and assign it to the variable get_date.


- Complete the string message by adding to the placeholders named today and the format specifiers to format the date as 'month_name day, year' and time as 'hour:minutes'.


- Print the message using the .format() method and the variable get_date to replace the named placeholder.

In [31]:
## 1st and 2nd
from datetime import datetime

get_date = datetime.now()
print(get_date)

2022-02-03 21:00:59.130076


In [32]:
## 3rd
message = "Good morning. Today is {today:%B %d, %Y}. It's {today:%H:%M} ... time to work!"

In [33]:
## 4th
print(message.format(today = get_date))

Good morning. Today is February 03, 2022. It's 21:00 ... time to work!


### Formatted string literal

Modern versions of Python introduced a new powerful string formatting: formatted string literal method.

### f-strings

The strings defined by this method are called f-strings. They have a minimal syntax as you can see in the slide. To defined them, you need to add the prefix f before the string. Inside quotes, you put your text along with curly braces, which identify placeholders where you insert the expressions. 

<img src="fs.jpg" style="max-width:600px">

In the example code, we have two variables, way and method. Then, we define an f-string. Inside curly braces, we pass the two variables. The method replaces the placeholders with the variables. And we obtain the following output.

In [35]:
way ="code"
method ="learning Python faster"

print(f"Practicing how to {way} is the best method for {method}")

Practicing how to code is the best method for learning Python faster


### Type conversion

f-strings allow us to convert expressions into different types. 

- We can use exclamation mark s for strings, 


- exclamation mark r for printable representation of strings, or


- exclamation mark a to escape non-ascii characters. 


<img src="tc.jpg" style="max-width:600px">

Let's imagine we define the variable name as you can see in the slide. The variable should be surrounded by quotes in the resulting string. We can add after the variable the exclamation r conversion. This will return a printable representation of the string. 

In [36]:
name = "Python"
print(f"Python is called {name!r} due to a comedy series")

Python is called 'Python' due to a comedy series


In the output, we can see that quotes are surrounding the variable.

### Format specifiers

We can also use format specifiers such as e for scientific notation, d for digit and f for float. 

<img src="fsp.jpg" style="max-width:600px">

In the example code, we define a variable containing a number. Then, we insert it in the f-string. We specify that we want it to have only two decimals. And we get the following string.

In [37]:
number = 90.41890417471841
print(f"In the last 2 years, {number:.2f}% of the data was produced worldwide!")

In the last 2 years, 90.42% of the data was produced worldwide!


### Format specifiers(Datetime)

We can also format datetime. We only need to insert the variable containing the datetime object. After that, we placed a colon and the specifiers month name, day and year. And we get the string containing the date as we see on the code.

In [38]:
my_today = datetime.now()
print(f"Today's date is {my_today:%B %d, %Y}")

Today's date is February 03, 2022


###  Index lookups

Do you remember when we accessed dictionaries from the string format method? To insert the value associated with a specific key, we specify the index without quotes. 

Let's try the same to access dictionaries in f-strings. As we see in the code, Python raises an error telling us that it cannot find the variable. 

This is due to the fact we need to surround the index with quotes.

In [39]:
## string format method
family = {"dad": "John", "siblings": "Peter"}
print("Is your dad called {family[dad]}?".format(family=family))

Is your dad called John?


In [40]:
print(f"Is your dad called {family[dad]}?")

NameError: name 'dad' is not defined

Our only solution to this problem is to use single quotes to get the desire output.

In [41]:
print(f"Is your dad called {family['dad']}?")

Is your dad called John?


### Inline operations

One of the biggest advantages of f-strings is that they allow us to perform inline operations. 

In the example, we define two numeric variables. We then insert them into the f-string. We can also multiply them inside the expression. And we get the result of that operation in the output.

In [42]:
my_number = 4
my_multiplier = 7

print(f'{my_number} multiplied by {my_multiplier} is {my_number * my_multiplier}')

4 multiplied by 7 is 28


### Calling functions

We can also call functions inside the expression of f-strings. 

In the example, we define a function. Then, we call this function and pass two numbers inside the expression in the f-string. This will return the value in the final stringe.

In [43]:
def my_function(a, b):
    return a + b

print(f"If you sum up 10 and 20 the result is {my_function(10, 20)}")

If you sum up 10 and 20 the result is 30


### Exercise 4: Literally formatting

You remember that you've created a website that displayed data science facts but it was too slow. You think that it could be due to the string formatting you used. Because f-strings are very fast and easy to use, you decide to rewrite that project.

The variables 'field1', 'field2' and 'field3' containing character strings as well as the numeric variables 'fact1', 'fact2', 'fact3' and 'fact4' have been saved.

In [44]:
field1 = 'sexiest job'
field2 = 'data is produced daily'
field3 = 'Individuals'

In [45]:
fact1 = 21
fact2 = 2500000000000000000
fact3 = 72.41415415151
fact4 = 1.09

- Complete the f-string to include the variable field1 with quotes and the variable fact1 as a digit.

In [47]:
# Complete the f-string
print(f"Data science is considered ____ in the ____st century")

Data science is considered ____ in the ____st century


- Complete the f-string to include the variable fact2 using exponential notation, and the variable field2

In [48]:
# Complete the f-string
print(f"About ____ of ____ in the world")

About ____ of ____ in the world


- Complete the f-string to include field3 and fact3 rounded to 2 decimals, and then fact4 rounded to one decimal.

In [49]:
# Complete the f-string
print(f"____ create around ____% of the data but only ____% is analyzed")

____ create around ____% of the data but only ____% is analyzed


In [54]:
## answers

In [50]:
## q1
print(f"Data science is considered {field1!r} in the {fact1:d}st century")

Data science is considered 'sexiest job' in the 21st century


In [51]:
## q2
print(f"About {fact2:e} of {field2} in the world")

About 2.500000e+18 of data is produced daily in the world


In [53]:
## q3
print(f"{field3} create around {fact3:0.2f}% of the data but only {fact4:0.1f}% is analyzed")

Individuals create around 72.41% of the data but only 1.1% is analyzed


### Exercise 5: Make this function

So you plan to rewrite some more of your old code. Now you know that f-strings allow you to evaluate expressions where they appear and include function and method calls. You decide to use them in a project where you analyze 120 tweets to check if they include links to different news. In that way, you expect the code to be cleaner and more readable.

The variables 'number1', 'number2','string1', and 'list_links' have already been pre-loaded.

In [55]:
number1 = 120
number2 = 7

string1 = 'httpswww.datacamp.com'

list_links = ['www.news.com','www.google.com','www.yahoo.com','www.bbc.com',
              'www.msn.com','www.facebook.com','www.news.google.com']

In [56]:
# Include both variables and the result of dividing them 
print(f"___ tweets were downloaded in ____ minutes indicating a speed of ____ tweets per min")

___ tweets were downloaded in ____ minutes indicating a speed of ____ tweets per min


- Inside the f-string, include number1,number2 and the result of dividing number1 by number2 rounded to one decimal.

In [59]:
# Replace the substring https by an empty string
print(f"{____.____(____, ____)}")

- Inside the f-string, use .replace() to replace the substring https with an empty substring in string1

In [60]:
# Divide the length of list by 120 rounded to two decimals
print(f"Only ____% of the posts contain links")

Only ____% of the posts contain links


- Inside the f-string, get list_links length, multiply it by 100 and divide it by 120. Round the result to two decimals.

In [61]:
## Answers

In [64]:
number1,number2

(120, 7)

In [65]:
## q1
print(f"{number1} tweets were downloaded in {number2} minutes indicating a speed of {number1/number2:0.1f} tweets per min")

120 tweets were downloaded in 7 minutes indicating a speed of 17.1 tweets per min


In [63]:
string1

'httpswww.datacamp.com'

In [66]:
## q2
print(f"{string1.replace('https', '')}")

www.datacamp.com


In [67]:
list_links

['www.news.com',
 'www.google.com',
 'www.yahoo.com',
 'www.bbc.com',
 'www.msn.com',
 'www.facebook.com',
 'www.news.google.com']

In [69]:
## q3
print(f"Only {len(list_links)*100/120:0.2f}% of the posts contain links")

Only 5.83% of the posts contain links


### Exercise 6: On time

Lastly, you want to rewrite an old real estate prediction project. At the time, you obtained historical information about house prices and used it to make a prediction on future values.

The date was in the datetime format: datetime.datetime(1990, 3, 17) but to print it out, you format it as 3-17-1990. 

You also remember that you defined a dictionary for each neighborhood. Now, you believe that you can handle both type of data better with f-strings.

Two dictionaries, east and west, both with the keys date and price, have already been loaded.

In [71]:
east = {'date': datetime(2007, 4, 20, 0, 0), 'price': 1232443}
print(east)

west = {'date': datetime(2006, 5, 26, 0, 0), 'price': 1432673}
print(west)

{'date': datetime.datetime(2007, 4, 20, 0, 0), 'price': 1232443}
{'date': datetime.datetime(2006, 5, 26, 0, 0), 'price': 1432673}


In [74]:
# Access values of date and price in east dictionary
print(f"The price for a house in the east neighborhood was ${____[____]} in {____[____]:____-____-____}")

- Inside the f-string, access the values of the keys price and date in east dictionary. Format the date to month-day-year.

In [75]:
# Access values of date and price in west dictionary
print(f"The price for a house in the west neighborhood was $____ in ____.")

- Inside the f-string, access the values of the keys price and date in west dictionary. Format the date to month-day-year.

In [76]:
## Answers

In [77]:
east

{'date': datetime.datetime(2007, 4, 20, 0, 0), 'price': 1232443}

In [78]:
## q1
print(f"The price for a house in the east neighborhood was ${east['price']} in {east['date']:%m-%d-%Y}")

The price for a house in the east neighborhood was $1232443 in 04-20-2007


In [80]:
west

{'date': datetime.datetime(2006, 5, 26, 0, 0), 'price': 1432673}

In [79]:
## q2
print(f"The price for a house in the west neighborhood was ${west['price']} in {west['date']:%m-%d-%Y}")

The price for a house in the west neighborhood was $1432673 in 05-26-2006


### Template strings

Template strings have a simpler syntax. They are slow. And they don't allow the use of format specifiers. Yet, they are very suitable in some specific situations. Specially, when working with externally formatted strings that you don't have control over.

<img src="ts.jpg" style="max-width:600px">

### Basic syntax

Template strings do not belong to the Python core features. You need to import the Template class from the string module. 

- First, you need to create the template string. For that, you use the Template constructor that takes only the string, as you can observe in the slide. Template strings use dollar signs to identify placeholders or identifiers. 


- Then, you need to call the method that substitutes the identifier by the string values. For that, you use the identifier name equal the replacement string. And we'll get the output shown in the slide.

In [81]:
from string import Template

In [82]:
my_string = Template('Data science has been called $identifier')
my_string.substitute(identifier="sexiest job of the 21st century")

'Data science has been called sexiest job of the 21st century'

###  Substitution

We can place many identifiers as well as variables when using Template strings. 

In the example code, we define two variables containing strings. We can create a template having two identifiers with a designated name. Afterward, we call the method substitute to assign the identifiers to the different variables. And we get the following output.

In [83]:
job = "Data science"
name = "sexiest job of the 21st century"
my_string = Template('$title has been called $description')
my_string.substitute(title=job, description=name)

'Data science has been called sexiest job of the 21st century'

### Substitution(enclosing the identifier)

Sometimes we need to add extra curly braces after the dollar sign to enclose the identifier. This is required when valid characters follow the identifier but are not part of it. 

In [84]:
my_string = Template('I find Python very ${noun}ing but my sister has lost $noun')
my_string.substitute(noun="interest")

'I find Python very interesting but my sister has lost interest'

Let's clarify this. In the example, we need to add the 'ing' immediately after the first identifier. We need to include curly braces. If we don't do it, Python believes that 'ing' belong to the identifier name. We replace it by the variable noun obtaining the shown output.

### Substitution

- Use $$ to escape the dollar sign

Let's imagine now that you are working with numbers and you want to include the dollar sign as part of a string. Because they are use for identifiers, you will need to escape this character by adding an extra dollar sign. And get the correct output as seen in the code.

In [86]:
my_string = Template('I paid for the Python course only $$ $price, amazing!')
my_string.substitute(price="12.50")

'I paid for the Python course only $ 12.50, amazing!'

### Substitution(Dictionary)

In the example code, we have defined a dictionary with only one key. However, when we define our template string, we include two identifiers. 

What would happen if we pass this dictionary to the method substitute? 

In [87]:
favorite = dict(flavor="chocolate")
favorite

{'flavor': 'chocolate'}

In [88]:
my_string = Template('I love $flavor $cake very much')
my_string.substitute(favorite)

KeyError: 'cake'

Python will raise an error. It tries to replace every placeholder and some of them are missing. 

We could try using the try except block again. 

In [89]:
favorite = dict(flavor="chocolate")
my_string = Template('I love $flavor $cake very much')

In [90]:
try:
    my_string.substitute(favorite)
except KeyError:
    print("missing information")

missing information


You can observe the syntax. The try part will test the given code. If any error appears the except part will be executed obtaining the following output as a result.

### Safe substitution

A better way to handle this situation is using the safe substitute method. This method will always try to return a usable string. How? 

It will place missing placeholders in the resulting string. 

Let's say we have the same situation as before. Now, if we pass the dictionary to the safe substitute, we will not get an error. Instead, we'll get the identifier dollar sign cake in our resulting string, as you can observe in the output.

In [91]:
favorite = dict(flavor="chocolate")
my_string = Template('I love $flavor $cake very much')
my_string.safe_substitute(favorite)

'I love chocolate $cake very much'

Template strings are a simpler string substitution mechanism and even though it's less powerful, it's the right choice when you are not sure about the source of strings.

### Exercise 7: Preparing a report

Once again, you scraped Wikipedia pages. This time, you searched for the description of useful tools used for text mining. Your first task is to prepare a report about different tools you found. You want to format the information contained in the dataset to be printed out as: 

(tool) is a (description).

In this case, template strings are the best solution to interpolate data generated by external sources into an already created template.

For this example, the variables 'tool1', 'tool2' and 'tool3' contain three article titles. Each variable 'description1', 'description2' and 'description3' contains the corresponding article description.

In [94]:
tool1 = 'Natural Language Toolkit'
tool2 = 'TextBlob'
tool3 = 'Gensim'

In [93]:
description1 = 'suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania.'
print(description1,"\n")

description2 = 'Python library for processing textual data. It provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.'
print(description2,"\n")

description3 = 'robust open-source vector space modeling and topic modeling toolkit implemented in Python. It uses NumPy, SciPy and optionally Cython for performance. Gensim is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing.'
print(description3)

suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. 

Python library for processing textual data. It provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. 

robust open-source vector space modeling and topic modeling toolkit implemented in Python. It uses NumPy, SciPy and optionally Cython for performance. Gensim is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing.


- First of all, import Template from string module.

In [None]:
# Create a template
wikipedia = Template("____ is a ____")

- Complete the template using '$tool'  and '$description' identifiers.

In [None]:
# Substitute variables in template
print(wikipedia.____(____=____, ____=____))
print(wikipedia.____(____=____, ____=____))
print(wikipedia.____(____=____, ____=____))

- Substitute identifiers with the correct tool and description variables in the template and print out the results.

In [95]:
from string import Template

In [101]:
## q2
wikipedia = Template("$tool is a $description")

In [102]:
## q3
# Substitute variables in template
print(wikipedia.substitute(tool=tool1, description=description1), "\n")
print(wikipedia.substitute(tool=tool2, description=description2), "\n")
print(wikipedia.substitute(tool=tool3, description=description3))

Natural Language Toolkit is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. 

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. 

Gensim is a robust open-source vector space modeling and topic modeling toolkit implemented in Python. It uses NumPy, SciPy and optionally Cython for performance. Gensim is specifically designed to handle large text collections, using data streaming and efficient incremental algorithms, which differentiates it from most other scientific software packages that only target batch and in-memory processing.


### Exercise 8: Identifying prices

After you showed your report to your boss, he came up with the idea of offering courses to the company's users on some of the tools you studied. In order to make a pilot test, you will send an email offering a course about one of the tools, randomly chosen from your dataset. You also mention that the estimated fee needs to be paid on a monthly basis.

For writing the email, you will use Template strings. You remember that you need to be careful when you use the dollar sign since it is used for identifiers in this case.

For this example, the list tools contains the corresponding 'tool name', 'fee' and 'payment type' for the product offer. 

In [103]:
tools = ['Natural Language Toolkit', '20', 'month']

In [None]:
# Import template
from string import Template

# Select variables
our_tool = ____
our_fee = ____
our_pay = ____

# Create template
course = ____("We are offering a 3-month beginner course on ____ just for ____ ____ ____")

# Substitute identifiers with three variables
print(____.____(____=____, ____=____, ____=____))

- Assign the first, second, and third element of tools to the variables our_tool, our_fee and our_pay respectively.


- Complete the template string using dollar sign tool, dollar sign fee, and dollar sign pay  as  identifiers. Add the dollar sign before the dollar sign fee identifier and add the characters ly directly after the dollar sign pay identifier.


- Substitute identifiers with the three variables you created and print out the results.

In [104]:
## q1
# Select variables
our_tool = tools[0]
our_fee = tools[1]
our_pay = tools[2]

In [105]:
## q2
# Create template
course = Template("We are offering a 3-month beginner course on $tool just for $$ $fee ${pay}ly")

## q3
course.substitute(tool = our_tool, fee = our_fee, pay = our_pay)

'We are offering a 3-month beginner course on Natural Language Toolkit just for $ 20 monthly'

### Exercise 9: Playing safe

You are in charge of a new project! Your job is to start collecting information from the company's main application users. You will make an online quiz and ask your users to voluntarily answer two questions. However, it is not mandatory for the user to answer both. 

You will be handling user-provided strings so you decide to use the Template method to print the input information. This allows users to double-check their answers before submitting them.

The answer of one user has been stored in the dictionary 'answers'. 

In [106]:
answers = {'answer1': 'I really like the app. But there are some features that can be improved'}

In [None]:
# Complete template string using identifiers
the_answers = ____("Check your answer 1: ____, and your answer 2: ____")

# Use substitute to replace identifiers
try:
    print(____.____(____))
except KeyError:
    print("Missing information")
    
# Use safe_substitute to replace identifiers
try:
    print(____.____(____))
except KeyError:
    print("Missing information")

- Complete the template string using dollar sign answer1 and dollar sign answer2 as identifiers.


- Use the method .substitute() to replace the identifiers with the values in answers in the predefined template.


- Use the method .safe_substitute() to replace the identifiers with the values in answers in the predefined template.

In [107]:
# q1
# Complete template string using identifiers
the_answers = Template("Check your answer 1: $answer1, and your answer 2: $answer2")

In [108]:
## q2
# Use substitute to replace identifiers
try:
    print(the_answers.substitute(answers))
except KeyError:
    print("Missing information")

Missing information


In [109]:
## q3

# Use safe_substitute to replace identifiers
try:
    print(the_answers.safe_substitute(answers))
except KeyError:
    print("Missing information")

Check your answer 1: I really like the app. But there are some features that can be improved, and your answer 2: $answer2
