<a href="https://colab.research.google.com/github/julianajlk/tutorials/blob/master/PSG_formatstring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **FORMATTING STRINGS IN PYTHON**


In [0]:
# What's your quarantine nickname?
# How you feel + last thing you ate

name = "Juliana"
team = "Revenue"
feeling = "Forgetful"
food = "Everything Bagel"

print("Hi I'm %s, I'm an engineer in %s. You can call me %s %s" % (name, team, feeling, food))




What is a string?  

*Pre-Python 3.6* 

### **1) %-Formatting**

- Uses the modulo operator.
- % format specifier is replaced with elements of values.

In [0]:
day = "Tuesday"
'Today is %s' % day

If more than 1 argument, values must be a tuple with the number of items specified by the format string, or a single mapping object.


In [0]:
company = "edX"
language = "Python"
version = 3.8

"At %s, we will soon use %s %f" % (company, language, version)

P.S. → %s replaces a string, %f and %g replaces a float (%f with trailing zeroes), %d an integer, %x for hex format.

👎Why it can be BAD?
- Readability.

- Also not recommended by Python docs
https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting

In [0]:
order = "sandwhich"
bread = "Ben Holt's homemade sourdough"
spread = "roasted red pepper cashew"
meat = "sliced turkey"
vegetable = "arugula"
cheese = "fresh mozzarela"
extras = "pickles"

"Hello, I would like a %s, with %s bread, and %s spread, some %s, don't forget the %s and the %s, and extra %s." % (order, bread, spread, meat, vegetable, cheese, extras)

Does the order of the variables matter? 

A workaround is to refer the variable replacements by name, passing a mapping to the % operator. The order doesn't matter in this case. 

In [0]:
order = "sandwhich"
bread = "Ben Holt's homemade sourdough"
spread = "roasted red pepper cashew"
meat = "sliced turkey"
vegetable = "arugula"
cheese = "fresh mozzarela"
extras = "pickles"

"Hello, I would like a %(order)s, with %(bread)s bread, and %(spread)s spread, some %(meat)s, don't forget the %(vegetable)s and the %(cheese)s, and extra %(extras)s." % {"bread": bread, "order": order, "spread": spread, "meat": meat, "vegetable": vegetable, "cheese": cheese, "extras": extras, "order": order}

👍Why it can be GOOD? 
- Best practice to use % formatting with logging. 

In [0]:
log.error("Found an error: %s", error)

> The primary benefit of this is not performance (doing the string interpolation will be quick compared to whatever you're doing with the output from logging, e.g displaying in a terminal, saving to disk) It is that if you have a logging aggregator, it can tell you "you got 12 instances of this error message", even if they all had different 'some_info' values. If the string formatting is done before passing the string to log.debug, then this is impossible. The aggregator can only say "you had 12 different log messages" 

Source: https://stackoverflow.com/questions/5082452/string-formatting-vs-format



🕵️‍♀️Example from edX's code here: 

https://github.com/edx/edx-platform/blob/61e1eda20df2825a409db3e2d36c69d7c36d3e2d/openedx/core/djangoapps/verified_track_content/tasks.py#L41

## **2) str.format()**

The replacement fields are marked by { }, calls .format() on a string object.

In [0]:
day = "Tuesday"
activity = "Python Study Group"
"Today is {}. And that means it's a good day to go to {}!".format(day, activity)

Can access arguments by position:

In [0]:
'{2}, {1}, {0}'.format('one', 'two', 'three')

Can also refer to the variable replacements by name, order does not matter, allows easy rearrangement and accepts keyword arguments.

In [0]:
day = "Tuesday"
activity = "Python Study Group"
"Today is {day}. And that means it's a good day to go to {activity}!".format(activity=activity, day=day)

📌Neat tricks:

Use ** with dictionaries.

In [0]:
measurements = {'length': 48, 'width': 30}
"This table is {length} inches long, {width} inches wide.".format(**measurements) 

You can also unpack an argument sequence, use an argument more than once, pass in a dictionary that has many other things unused.

In [0]:
'{2}, {1}, {0}'.format(*'abc')

In [0]:
'{0}{1}{0}'.format('abra', 'cad') 

Dealing with numbers:

In [0]:
'{:,}'.format(1000000)

In [0]:
pto = 15
workdays = 262
'Off from work: {:.2%}'.format(pto/workdays)

In [0]:
'Average min/max temperature in Boston: {:+d} to {:+d} °F'.format(-1, 85) 

In [0]:
'Real feel min/max temprature in Boston: {:-d} to {:-d} °F'.format(-20, 110) 

Text aligning:

In [0]:
'{:<50}'.format('left')

In [0]:
'{:>50}'.format('right')

In [0]:
'{:^50}'.format('centered')

Call builtin functions!

In [0]:
locals()

In [0]:
'My name is {name}'.format(**locals())

In [0]:
'My name is {name}'.format_map(locals())

👎Why it can be BAD?
- Readability is improved but can still be cumbersome with multiple parameters and super long strings. 

In [0]:
order = "sandwhich" 
bread = "Ben Holt's homemade sourdough"
spread = "roasted red pepper cashew"
meat = "sliced turkey"
vegetable = "arugula"
cheese = "fresh mozzarela"
extras = "pickles"

"Hello, I would like a {order}, with {bread} bread, and {spread} spread, some {meat}, don't forget the {cheese} and the {vegetable}, and extra {extras}.".format(order=order, bread=bread, spread=spread, meat=meat, cheese=cheese, vegetable=vegetable, extras=extras)


🕵️‍♀️Example from edX's code here:

https://github.com/edx/edx-notifications/blob/ef448f5d3488dd1f8b87d2ff86f7afc4a7f24d55/edx_notifications/server/web/utils.py#L52

THERE'S GOT TO BE A BETTER WAY TO DO THIS! 🧐

*Post Python 3.6*

##**3) f-strings** 
(*aka formatted string literals*)
- Use an f at the beginning and curly braces containing expressions that will be replaced with their values.



👍Why it can be GOOD?

- More readable.
- Less prone to error. 

In [0]:
day = "Tuesday"
date = "June 2"

f"Today is {day}, {date}."

- Evaluated at runtime (can put Python expressions and call functions!)

In [0]:
def say_something_loudly(input):
  return input.upper()

input = "Don't give up"
f"Loud and clear: {say_something_loudly(input)}"

- Multiline strings

In [0]:
name = "Juliana"
squad = "Revenue"
intro = (
    f"Hi my name is {name}. "
    f"I'm an engineer in {squad}."
)

intro

- Faster 🏃‍♀

In [0]:
import timeit

variables = """
day = "Tuesday"
date = "June 2"
"""
modulo_operator = "'Today is %s, %s.' % (day, date)"
dot_format = "'Today is {}, {}.'.format(day, date)"
f_string = "f'Today is {day}, {date}.'"


def get_time(formatting):
  return f'{timeit.timeit(formatting, variables, number=10000)}'

print(f'Time in %-formatting: {get_time(modulo_operator)}')
print(f'Time in .format(): {get_time(dot_format)}')
print(f'Time in f-string: {get_time(f_string)}')

👎Why it can be BAD?

Once objects are passed in, if getting string from an user, could have malicious code inserted in the expression with Python functions. 

In [0]:
super_duper_secret_key = "supercalifragilisticexpialidocious"

class Event(object):
  def __init__(self):
    pass

# read data from global namespace
malicious_input = '{event.__init__.__globals__[super_duper_secret_key]}'

e = Event()
print(malicious_input.format(event=e))

supercalifragilisticexpialidocious


📌Gotchas:
- Quotes matter. 

In [0]:
f"{'Hello!'}"

In [0]:
f"{"Try again"}"

In [0]:
engineer = {"name": "Juliana", "team": "Revenue"}
f"Try again {engineer["name"]}"

In [0]:
f"""Goodbye!"""

In [0]:
f'That\'s crazy!'

Backslashes and quotes:

In [0]:
name = "Juliana"
f"Hello, {\"name\"}!"

In [0]:
name = "Juliana"
f"Hello, \'{name}\'!"

In [0]:
name = "Juliana"
f"Hello, {name!r}!"

- Braces/Curlies. 

In [0]:
f"{1+1}"

In [0]:
f"{{1+1}}"

In [0]:
f"{{{1+1}}}"

*Extras: f-string and special characters*



In [0]:
c = "彁"
f"U+{ord(c):06X}" #6 wide + uppercase hexidecimal

##**4) Template Strings**

- Similar to the way JavaScript does template literals. 
- Must import Template class from Python's built-in string module. 

👍Why it CAN be good?

Use case to prevent security vulnerabilities: formatted strings supplied from user. 

In [0]:
from string import Template

super_duper_secret_key = "supercalifragilisticexpialidocious"

class Event(object):
  def __init__(self):
    pass

# read data from global namespace
malicious_input = '${event.__init__.__globals__[super_duper_secret_key]}'

e = Event()
Template(malicious_input).substitute(error=err)

# **How to know when to use what?**

**Pre-Python 3.6?**

Logs: %-formatting

Otherwise: %-formatting or .format()

**Post-Python 3.6?**

If string is not supplied by user: f-string 

String is from user input: template strings

Great flowchart: https://files.realpython.com/media/python-string-formatting-flowchart.4ecf0148fd87.png

🤓**Links and resources:**

Python docs on string built-in methods 
https://docs.python.org/2/library/stdtypes.html#string-methods

Python docs on format strings 
https://docs.python.org/3/library/string.html#formatstrings

Python docs on f-strings
https://docs.python.org/3/reference/lexical_analysis.html#f-strings

Timeit
https://docs.python.org/2/library/timeit.html

Stackoverflow about logging
https://stackoverflow.com/questions/5082452/string-formatting-vs-format

More info on old vs. new
https://pyformat.info/

https://realpython.com/python-string-formatting/



🤩**Thank you's:**

Becca and Chris for kindly "asking" me to do today's study group session

Ben and Ned for being available to ask questions!

