# Python Objects I: Strings <em style="color: Gray;">(Hill 2.3)<em>
### PHYS 240
### Dr. Wolf

# Strings are Python objects that encode text
- No numerical value whatsoever (though number-like text can be converted to integers, floats, etc.)
- Examples:
    - a password
    - your full name
    - the entire text of Moby Dick
    

# Basics: Literals for Strings

We can use either single quotes: `'Hello, World'` or double quotes: `"Hello World"`. **There is no difference.** Pick one and stick to it unless you have a good reason to switch.

Example: what if a piece of text has the string delimiter in it?

In [None]:
'Dr. Wolf isn't very good at this'

# Basics: Concatenating Strings
**Concatentation** is the act of bringing two or more strings together into a new, combined string.

To concatenate, we can simply use the `+` operator between string literals/variables.

In [None]:
first = 'William'
last = 'Wolf'
full = first + ' ' + last
full

# Escape Sequences
Many characters do not appear on your keyboard (like µ or Ω) or may be difficult to decipher (four spaces or a tab?). **Escape Sequences** are special combinations of characters that Python will interpret as a special character.

Escape sequences always start with a backslash (`\`; same symbol used for most $\rm\LaTeX$ commands).

Most common and basic escape sequences:
- single quote: `\'` (works even in a single-quote-delimited string)
- double quote: `\"` (works even in a double-quote-delimited string)
- linefeed (or newline): `\n` (creates a new line in the text)
- tab: `\t`
- backspace; `\b`
- the backslash character: `\\`

# Escape Sequence Challenge
Let's create a literal that replicates this grocery list. Items are set off by tabs.
```
Bill's Grocery List
    - Diet Coke
    - Newfangled "Lady Gaga" Oreos
```

Note, to render to the screen properly, we need to use the `print` function to make the string appear properly. You'll explore this more in lab.

# Unicode Strings
**Unicode** is a standard that allows displaying more than 100,000 different characters (asian languages, special accents, and even our beloved emoji) from all over the world. Essentially it is a mapping from binary numbers to what a computer screen should display.

### Using `\u` and `\N`
If you know the 16- or 32- bit hex (hexadecimal) value, simply use `\u` *immediately* follwed by the code. Or, if you know the official unicode name, you can use `\N{}`, and put the name in all caps inside the curly brackets.

**Note:** `\u` is for 16-bit hexadecimal codes, which have **four characters** of 0–9 and A–F. `\U` is for 32-bit codes, which have **eight characters**. If using `\U`, you may need to put zeros at the beginning of a code to get it to 8.

The internet is your friend in finding these special names, or you can copy and paste characters in manually.

In [None]:
print('\u26C4')
print('\N{FREEZING FACE}')

# Escaping from Escape Sequences: Raw Strings
If you want to render a string EXACTLY as you typed it, without interpreting any escpae sequences, you can use a **raw string**. To indicate a literal as a raw string, append the letter "r" directly in front of the opening quotation mark.

In [None]:
print("The code for a freezing face (\N{FREEZING FACE}) is " + r'\N{FREEZING FACE}')

# Challenge: Unicorn Emoji
Using the almighty google, find a way to encode the unicorn emoji (🦄) into a string using the `\u`, `\U`, or `\N` escape sequences and print it out. Do not simply copy and paste the emoji itself from somewhere.

# Getting the Length of a String
We can use python's built-in `len` function, which computes the lengths of many things (we'll use it more later on other objects).

In [None]:
course = 'PHYS 240'
len(course)

In [None]:
# we can cast integers to strings using the `str` constructor method
# but we'll have more robust ways to convert numbers to strings later
print('"' + course + '" has ' + str(len(course)) + ' characters.')

 It is **not** a ***method*** of strings (we'll deal with those later).

In [None]:
'PHYS 240'.len()

# Indexing and Slicing
To get at individual characters or sequences of characters in a string, we can **slice** the string using **indexes**. First let's focus on getting at individual characters.

## Getting a single character
Imagine that each character in a string occupies a slot that is labeled by an index. The very first slot is labeled "0", the next by "1", and so on. This means that the *last* character is at a position equal to the length of the string **minus 1**. So the string `PHYS 240` would be organized thusly:

```
characters: P|H|Y|S| |2|4|0
indexes:    0|1|2|3|4|5|6|7
```
To retrieve the character located at some index, we add square brackets containing an integer literal or variable to the end of a string literal or variable.

In [None]:
course = 'PHYS 240'
course[3]

# Indexing and Slicing
## Getting the last character
We can get at characters from the end of a string, too. The last character can be referenced as –1. The one before it is at position –2, etc. So in reality, each character has two valid indexes.

In [None]:
print(course)
print(course[len(course) - 1])
print(course[-1])

# Indexing and Slicing
## Getting a substring
We can **slice** a string by providing two indices, producing a **substring**. The first one indicates where we want to start taking characters, and the second indicates the first character that we **don't** want. These two characters are placed in square brackets and delimited by a colon.

In [None]:
print(course)
print(course[0:4])

If you leave off the first index (and start with a colon), the substring is assumed to start from the beginning, and if you leave off the second one, it is assumed to go on until it runs out of characters.

In [None]:
print(course[:4])
print(course[-3:])

# Indexing and Slicing
## Setting the stride
A third, optional, number in the list of indices for a slice sets the **stride**. The stride says, "take every n<sup>th</sup> character", where n is the stride. By default, it is 1. Notably, passing a stride of –1 will walk the string backwards.

In [None]:
# every second character, starting at the one at index 1
print(course[1::2])
# all characters, but in reverse order
print(course[::-1])

# Indexing and Slicing
### Determining if a substring is in a string
You can check for the inclusion of a substring within a string using the python keyword `in`.

In [None]:
course = 'PHYS 240'
'PHYS' in course

It returns a boolean, indicating if any contiguous substring of the string is equal to the one being checked for.

# Challenge: Find the Hidden Message!
There is a hidden message in the string below. Try different combinations of starting characters, ending characters, and strides to see if you can find it!

In [None]:
original = '?P!%so agll zeo;2t:p nny ja#kW'
# play around with the slicing to see if you can find the hidden message.
print(original[0::1])

# String Methods
Strings, unlike numbers, have a large number of useful **methods**. Remember, methods are little functions that are attached to string objects that perform various useful features. We'll go over a few important ones, but you should look at Table 2.11 in Hill for more examples.

## `center(width)`
Create a new string that contains the original string, but padding it with whitespace on either side so it is centered and of a fixed width.

In [None]:
course.center(20)

# String Methods
## `strip([chars])`
With no arguments, will remove all whitespace on either side of a string. If characters are given as a single string as an argument, those character will be removed from the returned string if they are at the beginning or end.

**Note:** `rstrip` and `lstrip` variants work only on the right and left ends of the string, respectively.

In [None]:
# new concept: method chaining! calling center returns a new string, which we can then immediately call methods like strip on
course.center(20).strip()

In [None]:
course.strip('P0')

# String Methods
## `index(substring)`
Return index where a substring (possibly a single character) **first begins**. Raises an error if the substring isn't found (be careful!).

In [None]:
course.index('240')

In [None]:
# try searching for a substring that isn't in `course`

# String Methods
## `replace(old, new)`
Replaces every instance of a particular substring with a new one.

In [None]:
# let's fix this. Replace "terrible" with "terrific"
'Professor Wolf is a terrible professor doing a terrible job.'

# Advanced Usage of  `print`
What does `print` really do? In short, it converts its argument to a string (or fails in trying) and then displays it to the screen, interpreting any escape codes as needed, ultimately returning `None`.

But it also has some other tricks up its sleeve.

## Printing multiple objects
By providing multiple arguments that are comma-separated, you can print multiple strings in one command.

In [None]:
print('The', 'Power', 'of', '[AND]')

# Using `print`: Changing the Separation String
When printing multiple strings, default behavior is to separate the strings by spaces. We can customize this by using the optional **keyword argument** `sep`.

A keyword argument is an argument given near the end of a function call that has the format `KEYWORD=VALUE` where `KEYWORD` is the special name of the argument, and `VALUE` is the actual value.

In [None]:
print('The', 'Power', 'of', '[AND]', sep="\n")

Note that last argument is the keyword argument. This order is important! Keyword arguments must always go at the end of a function call.

# Using `print`: Changing the Termination String
You might not notice it, but `print` automatically ends a newline (`"\n"`) to the end of every sequence of strings passed to it. That is, subsequent calls to `print` will result in text appearing on a new line.

We can customize what character, if any, is added to the end of a call to `print` by using the `end` keyword argument.

In [None]:
# add nothing to end of string
print('The', end='')
# add a single space after printing the string
print('Power', end=' ')
# add a space and an opening square bracket
print('of', end=' [')
# add closing square bracket
print('AND', end=']')

# Formatting Strings
The `format` method of python strings allows for fine control over how a data should be **interpolated** (or turned into a string). Most important application for us: formatting numbers (How to do scientific notation? How many digits of precision?).

Here's how it works in its simplest form.
- Create a string with one or more pairs of curly braces in it. Within the curly braces, optionally embed **format codes**, which python uses to format strings.
- Call that string's `format` method, and provide a series of arguments, one for each pair of curly braces.
- Python then converts each argument passed to `format` into its corresponding pair of curly braces according to the format string present in those curly braces.

# Formatting Strings Example: Simple Interpolation
Suppose we have several string variables that we simply want to insert into a template string.

In [None]:
student = 'Ben Bitdiddle'
course = 'PHYS 240'
print('Hello, {}, and welcome to {}'.format(student, course))

The curly braces represent where the arguments to `format` should be inserted. You can also get fancy with specifying *which* arguments you want to use by inserting the index (starting at zero) of the argument to insert within the curly braces.

In [None]:
print('Welcome to {0}\'s class, where the one and only {0} will teach you to code just as professionally as {0} does!'.format('Bill Wolf'))

# Formatting Strings Example: Centering a String

In [None]:
'{:^40}'.format('The Power of [AND]')

Let's break this down.
- Main string: `'{:^40}'`. This string is JUST a format code. Format codes start with a colon, and the interesting part is `^40`. This means, center the string (`^`) and make it 40 characters wide.
- We call the `format` method, which seeks out curly brace pairs, and then applies the format code in each to the corresponding argument
- Argument list: `'The Power of [AND]'`. This is the object (in this case, another string) that is to be formatted. It has the format code applied to it, and appears as the centered version of itself in the output.

# Formatting numbers

In [None]:
"The square root of 2 is {:.3f} and Avogadro's number is {:.2e}.".format(2**(1/2), 6.0221415e23)

Here we have a more interesting base string that has two format codes embedded in it.

1. standard (i.e. not scientific notation) floating point number with three places after the decimal point
2. scientific notation with a lower case "e" separating the mantissa from the power of 10 and two numbers after the decimal point

# Format code details (paraphrased from [realpython](https://realpython.com/python-formatted-output/#the-format_spec-component))

General structure of a format code:

```
:[[<fill>]<align>][<sign>][#][0][<width>][<group>][.<prec>][<type>]
```
| Component | Description |
|:---------:|:------------|
|`fill` |What to pad extra characters with|
|`align`|left, right, or center align|
|`sign` |Whether or not to always provide an explicit sign|
|`#`|Whether to use alternate format (special)|
|`0`|Whether to pad extra spaces with zeros|
|`width`|Minimum width of formatted string|
|`group`|Character to use as a thousands separator|
|`.prec`|How many digits after the decimal place to display|
|`type`|Broad type of data to be distributed|

**Most** are optional, but too many to go over! We'll look at the most important: type, width, precision.

# Format Codes: types
| Value | Presentation Type | Example |
|:------|:------------------|:-------:|
|`b`    |Binary integer     |`'{:b}'.format(15) => '1111'` |
|`c`    |Single character   |`'{:c}'.format(65) => 'A'`|
|`d`    |Decimal integer    |`'{:d}'.format(1_000) => '1000'`|
|`e` or `E`|Exponential     |`'{:E}'.format(6.02e23) => '6.020000E+23'`|
|`f` or `F`|Floating Point  |`'{:f}'.format(1.34e-5) => '0.0000134'`|
|`g` or `G`| Floating point/Exponential|`'{:g}'.format(1.34e-5) => '1.34e-5'`|
|`o`    |Octal integer      |`'{:o}'.format(17) => '21'`|
|`s`    |String             |`'{:s}'.format('Hello!') => 'Hello!'`|
|`x` or `X`| Hexadecimal integer|`'{:X}'.format(28) => '1C'`|
|`%`    |Percentage|`'{:%}'.format(0.0249) => '2.490000%'`|

# Format Codes: width
What is the **minimum** number of characters wide for the output string?

In [None]:
# decimal integer is only 2 characters wide, but we force the string to be four wide
'{:4d}'.format(10)

Combine with an alignment for nice effect!

In [None]:
'{:^20d}'.format(10)

# Format Codes: precision
For floating point numbers, precision goes **after the width**, (if you provide one) and a period, but **before the type** (one of `f/F`, `g/G`, `e/E`, or `%`.

In [None]:
'{:20.4e}'.format(6.02214076e23)

In [None]:
'{:.2%}'.format(0.7733529)

# New(ish) Feature! `f`-strings
Like raw strings, but with `f` instead of `r`. In this case, we can embed format codes and values directly into a string.

In [None]:
prof = 'Dr. Wolf'
course = 'PHYS 240'
print(f'{prof} is teaching {course} this semester.')

You can use the colon+format code syntax to format numbers, too!

In [None]:
from math import pi
print(f"\N{GREEK SMALL LETTER PI} = {pi:.5f}")