<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Python's-str-type" data-toc-modified-id="Python's-str-type-1">Python's str type</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-2">Learning Outcomes</a></span></li><li><span><a href="#Create-a-string" data-toc-modified-id="Create-a-string-3">Create a string</a></span></li><li><span><a href="#One-way-to-update-immutable-strings" data-toc-modified-id="One-way-to-update-immutable-strings-4">One way to update immutable strings</a></span></li><li><span><a href="#String-methods" data-toc-modified-id="String-methods-5">String methods</a></span></li><li><span><a href="#Fluent-Interface-" data-toc-modified-id="Fluent-Interface--6">Fluent Interface </a></span></li><li><span><a href="#Split-hack" data-toc-modified-id="Split-hack-7">Split hack</a></span></li><li><span><a href="#String-formatting" data-toc-modified-id="String-formatting-8">String formatting</a></span></li><li><span><a href="#Takeaways" data-toc-modified-id="Takeaways-9">Takeaways</a></span></li><li><span><a href="#Bonus-Material" data-toc-modified-id="Bonus-Material-10">Bonus Material</a></span></li><li><span><a href="#Convert-`&quot;&quot;.join`-to-a-function" data-toc-modified-id="Convert-`&quot;&quot;.join`-to-a-function-11">Convert `"".join` to a function</a></span></li><li><span><a href="#More-on-strings" data-toc-modified-id="More-on-strings-12">More on strings</a></span></li></ul></div>

<center><h2>Python's str type</h2></center>

<center><h2>Learning Outcomes</h2></center>

__By the end of this session, you should be able to__:

- Use common methods of `str` objects.
- Explain why double quotes are better than single quotes for Data Science programming.
- Chain commands together with the fluent interface.
- Use string formatting to better display data.


Create a string
-----

In [43]:
reset -fs

In [44]:
# Note - I prefer "" to ''
# "" handle the complexity of human language better
string = "Hello, World!" 

In [45]:
type(string)

str

String methods
----

In [48]:
# str.<tab> 
# str.

__How to learn string methods__

[Chunking](https://en.wikipedia.org/wiki/Chunking_(psychology)) means breaking items into semantically similar groups. Learn at group level so there is less to learn at one time.

Way to chunk to `str` methods:

- Sequence methods - `str.index` and similar. Similar to methods on list and tuples

- Boolean methods - `str.isalpha` and similar. They check a string for specific conditional

- String manipulation - `str.lower` and similar. Returns a new, updated string 

- Python string specific - `str.encode` and `bytes.decode`. See Unicode notebook

In [80]:
# Sequence-type methods
string.count('o')

2

In [81]:
# String specific methods
string.lower() 

'hello, world!'

In [82]:
# Use str.replace to delete specific characters
string.replace("!", "")

'Hello, World'

In [83]:
# Separate words into a list
# The default separator is white space
string.split() #

['Hello,', 'World!']

In [53]:
# Can take an argument to be more specific about splitting
string.split(", ") 

['Hello', 'world?']

Fluent Interface 
-----

Remember functional sandwich, chaining function by nesting them:

```python


```

Fluent interface chains methods.

Each method call returns a modified object so that you can call another function on it.

In [77]:
string = 'Hello, World!'

In [78]:
string.replace("!", "").lower().split(", ")

['hello', 'world']

In [79]:
# Wrap the chained methods in parentheses to split across lines
# Easier to read and comment
(string             # Starting string
 .replace("!", "")  # Remove punctuation
 .lower()           # Normalize to lower case
 .split(", ")           # Convert to list (not able to call string methods any more)
)

['hello', 'world']

Pandas also supports fluent interface / method chaining

In [63]:
import pandas as pd

In [64]:
df = pd.DataFrame(data={'col_1': [10, 20], 'col_2': [30, 40]})

In [65]:
df.col_1.tolist()

[10, 20]

<center><h2>Split hack</h2></center>

Use `str.split()` to save keystrokes when creating a list of strings

In [54]:
rainbow_colors = "red orange yellow green blue violet".split()
rainbow_colors

['red', 'orange', 'yellow', 'green', 'blue', 'violet']

In [55]:
# Add another color by just typing the word (not adding "" and ,)
rainbow_colors = "red orange yellow green blue indigo violet".split()
rainbow_colors

['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

In [56]:
# How do we put it back together?
"".join(rainbow_colors)

'redorangeyellowgreenblueindigoviolet'

In [57]:
# Accepts joiner string
", ".join(rainbow_colors)

'red, orange, yellow, green, blue, indigo, violet'

<center><h2>String formatting</h2></center> 

__Machine-readable vs human-readable__

Machine-readable goal is as much precision as possible.

Human-readable goal is what is important to understand and make a decision.



In [66]:
# Let's explore tau
from math import tau

tau

6.283185307179586

Let's format for humans

In [67]:
str(tau)

'6.283185307179586'

In [68]:
# f means formated string
f"{tau}"

'6.283185307179586'

In [69]:
# Save as above only much more typing
"{}".format(tau)

'6.283185307179586'

In [76]:
# 4 digits of precision, including 1 digit to the left of decimal
f"{tau:.4}"

'6.283'

In [None]:
# 4 digits of precision of floating point
f"{tau:.4f}"

In [None]:
# Given a ratio number, return human-centric view as percentage.
number_raw = 0.1234
f"{number_raw:.2%}"

In [None]:
# Automatically rounds
number_raw = 0.999999
f"{number_raw:.2%}"

String formatting is important when reporting numbers in Data Science.

Your manager or the CEO does __not__ want to see all the precision. It hides the important numbers. The first numbers are more important so just present them.

Think of string formatting as creating a 'view' of the underlying data for a specific purpose.

Learn more [here](https://stackabuse.com/formatting-strings-with-python/)

In [None]:
# %load string_formatting_examples.py
"String formatting examples"

from math import tau

# Signficant digits
print(f"τ: {tau:.4%}")

# Print only 2 deciminals
print(f"{tau:.2f}")

# Print as integer
print(f"{tau:.0f}")

# Currency
print(f"${3.66666:.2f}")

# Padding
day = 1
print(f"{day:04}") # Padd to 4 places with zeros

# Align
print("Options: \t {:>6} {:>6} {:>6} {:>6}".format(*"A B C D".split()))

# Center text
print(f"{'foo':^10}") #=> ' foo '

# Unpack list, then format
data = [1, 2, 3, 4]
print("The numbers are {}, {}, {}, and {}".format(*data))

# Pretty print function name
def my_function():
    pass

print(f" ".join(my_function.__name__.title().split("_")))

# Unicode names
print(f' \N{Hatching Chick} ') # 🐣
print(f' \N{long rightwards arrow} ')  # ⟶
print(f' \N{Vulgar Fraction One Quarter} ') # ¼ 

# Replace datetime.stftime()
print(f"{datetime.now():%m/%d/%y}") #=> '05/15/19'

# Show numbers in another base / notation
print(f"{878:b}") #=> '1101101110'

print(f"{878:x}") #=> '36e'

print(f"{878:e}") #=> '8.780000e+02' 


<center><h2>Takeaways</h2></center>

- Default to double quotes to handle human language.
- `str` objects have nice methods, use them.
- All objects have a fluent interface. Use them to write code more quickly.
- Slice `str` like all other sequence objects.
- `str` are immutable. Make a new copy (or even better have Python make a new copy for you)
- String formatting allows you to write more human-readable strings, especially for numbers.


Bonus Material
----

One way to update immutable strings
-----

In [46]:
# string = "Hello, world!" 

# string[-1] = "?"# string = "Hello, world!" 

# string[-1] = "?"

In [47]:
# Update and re-assign pattern
string = "Hello, world!" 
string = string[:-1] + "?"
string

'Hello, world?'

<center><h2>Convert `"".join` to a function</h2></center>

In [58]:
# Convert that odd syntax to a function
cat = "".join # Short for conCATenate

In [59]:
# What is this going to return?
cat(rainbow_colors)

'redorangeyellowgreenblueindigoviolet'

HT: Peter Norvig

<center><img src="https://imgs.xkcd.com/comics/coordinate_precision.png" width="75%"/></center>

In [72]:
# Generate random strings

from random import sample

s = 'hello world'
''.join(sample(s, k=len(s)))

'rleoldwl ho'

In [73]:
# Reduce with a string

from functools import reduce

letters = ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!']

reduce(lambda phrase, letter: phrase+letter, letters)

'Hello, world!'

More on strings
------

- https://developers.google.com/edu/python/strings)
- https://www.listendata.com/2019/06/python-string-functions.html