<img src='img/logo.png' />

<img src='img/title.png'>

<img src='img/py3k.png'>

# Table of Contents
* [Data types & data structures](#Data-types-&-data-structures)
* [Learning Objectives:](#Learning-Objectives:)
* [`None` object](#None-object)
* [Numeric types](#Numeric-types)
	* [`bool` (boolean) type](#bool-%28boolean%29-type)
	* [`int` (integer) type](#int-%28integer%29-type)
	* [`float` (floating-point number) type](#float-%28floating-point-number%29-type)
	* [`complex` (complex number) type](#complex-%28complex-number%29-type)
	* [Exercise (Python as a calculator)](#Exercise-%28Python-as-a-calculator%29)
* [`str` (string) type](#str-%28string%29-type)
	* [String methods](#String-methods)
	* [String indexing](#String-indexing)
		* [Immutability of Python `str` type](#Immutability-of-Python-str-type)
	* [String slicing](#String-slicing)
	* [String conversions and `format`](#String-conversions-and-format)
		* [Format strings (old-style)](#Format-strings-%28old-style%29)
		* [The `str.format()` mini-language](#The-str.format%28%29-mini-language)
		* [Details on the `str.format()` mini-language](#Details-on-the-str.format%28%29-mini-language)

# Data types & data structures

# Learning Objectives:

After completion of this module, learners should be able to:

* use & distinguish builtin Python numeric types: `bool`, `int`, `float`, `complex`
* use & explain Python rules for type conversion & casting (e.g., combining operators & types)
* apply common methods associated with builtin Python data types
* use `help` (and other documentation) to learn about methods associated with builtin types
* apply string indexing rules
* apply the `str.format` mini-language to generate formatted output

# `None` object

Not really a data structure, but the special value `None` in Python is often used where `NULL` or `nil` or `Nothing` are used in other languages.  It is a frequent placeholder to say, "We don't yet know how to handle this item, but we know to check whether it *is* `None`."  Hence code like this is common:

```python
for item in collection:
    if item is not None:
        process(item)
    else:
        pass
```

`None` is also special in that it is a *singleton*.  That is to say, there can only be one None object in a Python program, and hence ever `None` is not merely equal to, but *identical to* every other `None`.

# Numeric types

Python has are three distinct built-in numeric classes (or types): integers, floating-point numbers, and complex numbers. Numbers in Python are instantiated from numeric literal expressions in code or are the results returned by functions, methods, and operators.

## `bool` (boolean) type

Booleans are a subclass of integers. There are two values: `True` and `False`.

In [None]:
# Booleans: Is the Boolean value "True" simply the integer value "1"?
True == 1

In [None]:
False < 1 # Arithmetic comparison equivalent to "0<1"

In [None]:
(True + 1) 

In [None]:
isinstance(True, bool)

In [None]:
True is 1 # This is not quite what some expect

In [None]:
type(True), type(1)

In [None]:
issubclass(bool, int)

## `int` (integer) type

Numeric literals that contain no decimal point and no `e` (which express floating-point numbers) and no trailing `j` (which express complex numbers) are (decimal) integers.

In [None]:
a = 12345678     # Replace with any sequence integer characters without spaces
print(a,type(a))

Representations of integers in certain numeral bases other than 10 can be entered as numeric literals: `0x` or `0X` prefix literal hexadecimal integers, `0o` or `0O` prefix literal octal integers, and `0b` or `0B` prefix literal binary integers.

In [None]:
# Hexadecimal integers
print(0xFF03) # 15*16**3 + 15*16**2 +0*16**1 +3*16**0
print(0XFF03) # 15*16**3 + 15*16**2 +0*16**1 +3*16**0
print(15*16**3 + 15*16**2 +0*16**1 +3*16**0) # Verify result above

In [None]:
# Octal integers
print(0o76543) # 7*8**4 + 6*8**3 + 5*8**2 + 4*8**1 + 3*8**0
print(0O76543)
print(7*8**4 + 6*8**3 + 5*8**2 + 4*8**1 + 3*8**0) # Verify result above

In [None]:
# Binary integers
print(0b10010011) # 1*2**7 + 0*2**6 + 0*2**5 + 1*2**4 + 0*2**3 + 0*2**2 + 1*2**1 + 1*2**0
print(0B10010011) # 1*2**7 + 0*2**6 + 0*2**5 + 1*2**4 + 0*2**3 + 0*2**2 + 1*2**1 + 1*2**0
# Verify result above
print(1*2**7 + 0*2**6 + 0*2**5 + 1*2**4 + 0*2**3 + 0*2**2 + 1*2**1 + 1*2**0) 

In [None]:
# A number in base 6
int("3421", 6)

In Python 3, integers can have unlimited length in principle; as arithmetic operations produce results that overflow, representations of integers with longer bit patterns are adapted as required.

In [None]:
# Some integers
print(2**8)
print(2**32)
print(2**64)
print(2**65)
print(2**129)

In [None]:
# Even very large integers can be represented in Python
2**(2**16)

In addition to standard arithmetic operations, certain bitwise operations can be applied to integers.

In [None]:
print(3 << 4)  # shift left (int only)
print(33 >> 4) # shift right (int only)
print(3 & 4)   # bitwise and (int only)
print(3 | 4)   # bitwise or (int only)
print(3 ^ 4)   # bitwise xor (int only)
print(~ 3)     # bitwise not (int only)

## `float` (floating-point number) type

Real number literals (expressed in base 10 scientific notation) are distinguished from integer literals by either a decimal point or an explicit mantissa and exponent separated by `e` or `E` (optionally with a decimal point as well).

In [None]:
a = 4.732
print(a, type(a))

In [None]:
print(123e2)     # 12300; "123e2" means "123 times 10**2"
print(456E-4)    # 0.0456; "456e-4" means "456 times 10**(-4)"
print(-.7436e3)  # -743.6
print(1476.3e20) # 1.4763*10**23

In [None]:
-12345.6e78

In [None]:
# "int(23.45678e4)" means "23.45678 times 10 to the power 4 truncated to an integer"
int(23.45678e4)

By default, Python floating-point numbers are stored internally using 8 bytes (i.e., double precision or `double` in C). Greater precision is attainable using the `decimal` and `numpy` modules. Specific details about internal representation of floating-point numbers can be determined using `sys.float_info`.

Some fundamental facts to know about floating-point numbers:
* The largest positive floating-point value is about $10^{308}$ in double precision; larger values *overflow* to $+\infty$.
* The smallest positive floating-point value is about $10^{-324}$ in double precision; smaller values *underflow* to 0.
* Double precision floating-point values are represented with a 52-bit mantissa (plus one implicit bit). In practice, this translates to roughly 15&ndash;16 decimal digits of precision at best.
* Certain computations&mdash;e.g., $-\infty/\infty$, etc.&mdash;result in *Not-a-Number* (also denoted *NaN* or *nan*).
* More details on floating-point numbers & arithmetic:
    * [Floating-point numbers](https://en.wikipedia.org/wiki/Floating_point)
    * [The Floating-Point Guide](http://floating-point-gui.de/formats/fp/)
    * [IEEE 754 standard](http://en.wikipedia.org/wiki/IEEE_754-2008)

The behavior of IEEE-754 approximates the behavior of Real numbers in mathematics using a fixed and moderate amount of memory for each number, but because it is an imprecise format, in many cases it does not precisely obey mathematically expected properties such [associativity](https://en.wikipedia.org/wiki/Associative_property) , [commutativity](https://en.wikipedia.org/wiki/Commutative_property), and [multiplicative inverse](https://en.wikipedia.org/wiki/Multiplicative_inverse).  

In an old discussion on the comp.lang.python Usenet group, a commentor noted that "anyone who claims to understand IEEE-754 floating point math fully is either a liar or Tim Peters!"  Tim Peters—author of the Zen of Python, inventor of the widely used [Timsort](https://en.wikipedia.org/wiki/Timsort), and contributor #2 to Python itself—replied "It could be both."

In [None]:
import sys
sys.float_info

In [None]:
-1.23456e310 # overflows to -infinity

In [None]:
float('Inf') / float('-inf') # Evaluates to +inf / -inf == nan

In [None]:
for exponent in range(308, 400):
    float_string = "1e-{:d}".format(exponent)
    print("Attempting to represent {} as a float...".format(float_string))
    float_val = float(float_string)
    if float_val == 0:
        print("Underflow to 0 at {}".format(float_string))
        break

"nan" means "Not a Number", e.g., inf/inf, inf-inf, or any operation involving nan

In [None]:
inf = float('inf')
inf-inf, inf/inf

Every infinity is equal to, but not identical to, every other infinity of the sign.  However, every NaN is unequal to every other Nan

In [None]:
-1.23456e310 == -inf

In [None]:
inf == inf+2 == float('inf')

In [None]:
inf is inf+2

In [None]:
inf-inf == inf/inf

In [None]:
float('nan') == float('nan')

Comparing floating-point values for equality is generally inadvisable. Minor rounding errors in the least significant bits prevent simple calculation from resulting in expected results. It is usually better to specify a small tolerance (c.f., the square-root iteration from Module 1) to test for approximate equality of floating-point values.

In [None]:
b = sum([1/7]*7) # equivalent to "1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7"
print("1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 != 1.0")
print(b, "!=", 1.0)

In [None]:
# Associativity can produce rounding errors
a = (0.1 + 0.2) + 0.3
b = 0.1 + (0.2 + 0.3)
print(a, b)

In [None]:
# Or overflows depending on associativity
a = (1e307*100) / 100
b = 1e307 * (100/100)
print(a, b)

In [None]:
delta = 0.0001   # Set our tolerance delta

abs(3.14159265 - 3.1415) < delta

In [None]:
type(delta)

It is also important to be wary of the distinction between integer ("floor") division with `//` as opposed to regular floating-point division with `/`. In addition, dividing by zero raises an exception.

Notice Python permits mixed arithmetic; when values of `int` and `float` type are combined in arithmetic expressions, the `int` is promoted ("cast") to a `float`. The type of a value can be explicitly cast using the constructors `int()`, or `float()` (with appropriate rounding/truncation).

In [None]:
1.0/0

In [None]:
print(1/5)

In [None]:
# Types of division (different from Python 2.x)
print(2/3)
print(2//3)
print(2.0//3.0)

In [None]:
denominators = [3, 4, 6, 0, 3]
for d in denominators:
    print("d = %d" % d)
    print(7/d)

In [None]:
denominators = [3, 4, 6, 0, 3]
for d in denominators:
    print("d = %d" % d)
    try:
        print(7/d)
    except ZeroDivisionError:
        print("Attempt to divide by zero")

## `complex` (complex number) type

In mathematics, it is common to refer to the square root of $-1$ as $i$ or $j$; in Python, we'll use the symbol $\mathtt{j}$ to denote $\sqrt{-1}$. Then *complex numbers* are expressible as a combination of the form $x+yj$ where $x$ and $y$ are real numbers.

* In the expression $x+yj$, $x$ is said to be the *real* part and $y$ is said to be the *imaginary* part.
* In Python, a complex numeric literal is (a) a real numeric literal with the symbol `j` as a suffix or (b) a real numeric literal added or subtracted to a real numeric literal with the symbol `j`. Notice this is the only case in Python where a token mixing numerals and alphabetic characters can begin with a numeral.

In [None]:
# Complex numbers
print(4.56e-3+7.5e1j)
complex_one = 1 + 0j 
print(complex_one == 1.0)
print(complex_one is 1.0)
type(complex_one)   # complex_one has type "complex" even though the imaginary part is zero

When an `int` or a `float` value is combined in an arithmetic expression with a `complex` value, the result is cast to a `complex`. The type of a value can be explicitly cast using the constructors `int()`, `float()`, or `complex()` (with rounding or zeros introduced appropriately).

In [None]:
a = 3  # Try replacing with various integer of floating-point values
print(type(a))
a += complex_one # casts resulting value to a complex value
print(a, type(a))

In [None]:
x, y = 3, 4.0
print("x is of type {} and y is of type {}".format(type(x), type(y)))
# cast to "higher type" as needed
print("x + y == {} is of type {}".format(x+y, type(x+y))) 

In [None]:
x, y = complex(3,4), 4.0
print("x is of type {} and y is of type {}".format(type(x), type(y)))
# cast to "higher type" as needed
print("x * y == {} is of type {}".format(x*y, type(x*y))) 

* Python `complex` values are essentially represented as a pair of Python `float` values.
* Complex numbers are not ordered; as such, comparisons with "less than" and "greater than:" operators fail when applied to complex values.
* If `z==x+y*1j` is Python complex value, the Python object `z` has attributes *`z.real==x`* and *`z.imag==y`* corresponding to the real and imaginary parts respectively.
* The function `abs` returns the *modulus* of a complex value (i.e., $\mathtt{abs}(x+yj)=\sqrt{x^2+y^2}$).
* If `z==x+y*1j==complex(x,y)` (where `x` & `y` are `float` or `int` values), the method `z.conjugate()` returns the value `x-y*1j` (the *complex conjugate* or `z`).

In [None]:
# This does not work in Python 2.7
1+1j < -1-.5j

In [None]:
3+4j < 4+3j

In [None]:
1+0j == 1, 2+0j == 2, 1 < 2

In [None]:
# Again does not work in Python 2.7
1+0j < 2+0j

In [None]:
z = -1.43e-1+0.5e2j
print("real(z) = {:.3e}\nimag(z) = {:.3e}".format(z.real, z.imag))
print("conjugate(z) =", z.conjugate())

In [None]:
abs(3+4j), abs(4+3j), abs(3+4j) < abs(4+3j)

In summary, when working with numeric data types in Python, the [Python documentation on numeric types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex) is of great value. When trying to understand how certain operators are computing results, it is useful to keep in mind that the results can differ when the operands are of different numeric type.

In [None]:
pow = __builtins__.pow
print(3 + 4)          # addition
print(3 - 4)          # subtraction
print(3 * 4)          # multiplication
print(3 / 4)          # "true division"
print(3 // 4)         # floor division
print(13 % 4)         # modulo
print(3 ** 4)         # power
print(abs(3-4))       # absolute value
print(pow(3.0j,4))    # expect 81+0j
print(pow(3, 4, 5))   # power with optional modulo: (3**4) % 5
print(divmod(3, 4))   # division with remainder
print(int(3.14))      # convert to an int
print(float(3))       # convert to a float

## Exercise (Python as a calculator)

Play around with evaluating numeric expressions you'd like to calculate.  Perhaps you want to use capabilities in the `math` module we have seen briefly.  

Does anything seem surprising in the syntax or available functions? What did you learn about Python syntax and semantics?

# `str` (string) type

Strings (or `str` objects) are textual data with *delimeters* to denote where the string starts and ends. String literals are constructed with single quote characters(i.e., `'`), double quote characters (i.e., `"`) or a trio of single or double quote characters (i.e., `'''` or `"""`) as delimiters. Triple quoted strings can span multiple lines&mdash;all associated whitespace will be included in the string.

In [None]:
a = 'a string'
print(a, type(a))

In [None]:
string_1 = 'Single quotes as delimiters permit "double" quotes inside.'
string_2 = "Double quotes as delimiters don't have problems with 'single' quotes inside."
string_3 = '''
Triple (single) quotes don't have problems with 'single' or "double" quotes
inside. They don't even have problems with line breaks!
'''
print(string_1)
print(string_2)
print(string_3)

To embed a single quote character (one or more) within a string delimited by single quotes, a backslash character is needed as an *escape chracter*. The same applies for double quote characters embedded within strings delimited by double quote characters. Other escaped string literals can be found in the [Python documentation](https://docs.python.org/3/reference/lexical_analysis.html#strings). Notice as of Python 3, strings characters are [Unicode code points](https://en.wikipedia.org/wiki/Code_point).

In [None]:
empty_str = ''
string_1 = 'Single quotes as delimiters permit \'escaped single\' quotes inside.'
string_2 = "Double quotes as delimiters \"escaped double\" quotes inside."
string_3 = '''Other escaped characters include the literal backslash \\,
Unicode characters with hex values like \\u00CC == \u00CC,'''
string_4 = 'the\ttab character \\t &\nthe line feed \\n.'
print('empty_str = %r' % empty_str)
print(string_1)
print(string_2)
print(string_3,string_4)

In [None]:
print("Unicode charcters may be entered by name: \N{GREEK SMALL LETTER DELTA}. ",
      "And also by codepoint: \u03B4")

In [None]:
print("Strings can contain either literal non-ASCII characters",
      "Say in Русский.  Or they can contain escapes to codepoints",
      "such as \u0420\u0443\u0441\u0441\u043a\u0438\u0439")

In [None]:
import unicodedata
unicodedata.lookup("GREEK SMALL LETTER DELTA")

In [None]:
hex(ord("δ"))

In [None]:
unicodedata.name("δ")

In [None]:
old_s = 'Mary had a little lamb\nIts fleece was white as snow\nAnd everywhere that Mary went\nThe lamb was sure to go.'
print(old_s)

In [None]:
new_s = """Mary had a little lamb
Its fleece was white as snow
And everywhere that Mary went
The lamb was sure to go."""
print(new_s)

In [None]:
# Check whether the strings new_s and old_s are identical in every way.
print(new_s == old_s) 
new_s is old_s

In [None]:
s = "Ain't it a shame?!"  # Single quote in double quotes
s

In [None]:
s = 'Ain\'t it a shame?!' # Another example of escaping characters within strings
print(s)

In [None]:
s = """He said "Ain't that a shame"!""" # Triple quotes to include both single/double
print(s)

In [None]:
print('He said "Hi" to me')

In [None]:
print("He said \"Hi\" to me") # Another example of escaping characters within strings

## String methods

As objects, strings have a variety of *methods* (functions) that can be invoked and operate on data contained in the calling `str` object.

| | | |
 :-: | :-: | :-: | :-: 
`capitalize`|`casefold`|`center`|`count`
`encode`|`endswith`|`expandtabs`|`find`
`format`|`format_map`|`index`|`isalnum`
`isalpha`|`isdecimal`|`isdigit`|`isidentifier`
`islower`|`isnumeric`|`isprintable`|`isspace`
`istitle`|`isupper`|`join`|`ljust`
`lower`|`lstrip`|`maketrans`|`partition`
`replace`|`rfind`|`rindex` | `rjust`
`rpartition`|`rsplit`|`rstrip` | `split`
`splitlines`|`strip`|`swapcase`| `title`
`translate`|`upper`|`zfill` |

Many of these methods have purposes indicated clearly by their names. We can use `help` or the [Python documentation](https://docs.python.org/3/library/stdtypes.html#string-methods) to determine their function. Let's examine a few here.

Given a `str` object with identifier, say, `a_string`, any string *`method`* is invoked using `a_string.`*`method()`* (that is, the string as an argument to the method is positioned as a prefix of the method in the call). Of course, other arguments may be required within the parentheses, depending on which method is used.

With strings, as with all objects, the object instance itself is the first thing passed to the method, defined in the class.  So, for example, writing `a_string.method(other, args)` does the same thing as calling `str.method(a_string, other, args)`.

In [None]:
haiku = """
    an aging willow
    its image unsteady
    in the flowing stream
"""
print(haiku) # We construct a multi-line string to experiment with first.

In [None]:
# haiku.count('i') returns the number of times the character 'i' occurs
haiku.count('i')

In [None]:
# haiku.count('in') returns the number of times the substring 'in' occurs
haiku.count('in')

In [None]:
haiku.split()

In [None]:
sum('in' == word for word in haiku.split())

In [None]:
sum('in' in word for word in haiku.split())

In [None]:
haiku.count('x') # returns 0 because 'x' is not a substring of haiku

In [None]:
print(haiku.strip()) # Removes leading/trailing whitespace (but not internal whitespace)

In [None]:
# Splits string haiku on line feed characters; returns a *list*
lines = haiku.split('\n')
lines

In [None]:
# We're going to jump ahead slightly in this example, i.e., using list conprehension
# The following removes empty lines as well as trailing whitespace
[line.strip() for line in haiku.split('\n') if line]

In [None]:
# joining pieces back together
print("\n".join([line.strip() for line in haiku.split('\n') if line]))

In [None]:
print(haiku.upper()) # Convert to upper case, return a new string
print(haiku)

In [None]:
# replaces a source substring with target substring, return as new string
print(haiku.replace('unsteady','wavering'))

In [None]:
# nothing happens with the source substring not found
print(haiku.replace('uneasy','wavering')) 

In [None]:
'uneasy' in haiku # Should evaluate to False

In [None]:
'unsteady' in haiku

In [None]:
haiku.endswith('stream') # Whoops, need to strip the whitespace...

In [None]:
haiku.rstrip().endswith('stream') # This is what we expected...

In [None]:
# Another jump ahead to map().  This is another way of iterating implicitly
lines = map(str.strip, haiku.split('\n'))
[x for x in lines if x.endswith(('willow','stream'))]

In [None]:
haiku.isalpha() # Only True when all characters are alphanumeric

In [None]:
"David".isalpha() # Should be True

In [None]:
haiku.isdigit()

In [None]:
"12345".isdigit()

In [None]:
# This asks "are all the *letters* lowercase?", 
# not "are all the characters lowercase letters?
haiku.islower()

In [None]:
"abc123#$%^&".islower()

In [None]:
# However, there must also *be* some letters for this to be true
"12345".islower()

In [None]:
help(str.islower)

In [None]:
haiku.isupper()

In [None]:
# Converts string haiku into a list of words
# ... more specifically, divide the string around any sequence of whitespace
words = haiku.split() 
words

In [None]:
print(words)
"__".join(words)

In [None]:
"_".join(haiku) # Treats string haiku as list of letters; joins all with '_'

In [None]:
prefixes = ('Ti', 'Da', 'Le')
"David".startswith(prefixes)

In [None]:
# Replaces line-feed with empty strings; puts all on one line
print(haiku.replace('\n',''))
# split into substrings on 'w'; returns list
haiku.replace('\n','').split('w')

In [None]:
# Returns leading index of first occurrence of 'aging' inside 'haiku'
haiku.find('aging')

In [None]:
# haiku.find(substring) returns -1 if the substring is not found
print(haiku.find('old'))

In [None]:
help(str.find)

In [None]:
# also str.rindex() exists, and behaves as expected
haiku.rfind('st'), haiku.find('st')

In [None]:
# haiku.index is like haiku.find() with different error-handling bahaviour
haiku.index('aging') 

In [None]:
haiku.index('old') # ValueError because 'old' not substring of haiku

In [None]:
haiku[0], haiku[10]

In [None]:
haiku[:8]

In [None]:
haiku[8:]

In [None]:
haiku[8:20]

In [None]:
haiku[8:20] + haiku[20:30] == haiku[8:30]

In [None]:
haiku[-1]

In [None]:
haiku[-20:-10]

The distinct behaviors of `str.find` and `str.index` suggest two distinct methods for safeguarding output from a program. The first method `str.find` returns `-1` when the substring input argument does not produce a match. By contrast, the second method `str.index` returns an *exception*&mdash;in particular, the exception `ValueError` to inform us that the substring was not found in the string.

Using `str.find`, we can construct an `if-else` block to flag the error. Notice that if we don't try to catch the erroneous return value `-1`, the statement `print(haiku[position:])` prints the last character of `haiku` (which happens to be `\n`, a line feed character.

Note on good programming practice: It is more dangerous to let your program *succeed* in returning a wrong answer than it is to raise an uncaught exception that you *have to* fix before working with the program.  The philosophy behind this is often expressed with the slogan *"Fail early, fail hard!"*

In [None]:
# pos = haiku.find('old')
# end = haiku[pos:]
haiku[-1:]

In [None]:
position = haiku.find('old')
if position != -1:
    print(haiku[position:])
else:
    print("Not found")

A more Pythonic idiom to catch an error is to use a *`try-except`* block instead. With the `try-except` block, the Python interpreter attempts to execute the statement `position = haiku.index('old')`. In this case, rather than returning an innocuous value `-1` (as `haiku.find` would do), `haiku.index` *raises an exception* (in this case, the exception `ValueError`). When an exception is raised within a `try` block, the code within the `except` block executes instead. It is generally considered better practice to raise exceptions in functions/modules that can be caught in higher-level namespaces.

Programmers who have worked with languages such as C++ and Java may think of exceptions as terrible events that indicate a program is badly broken.  In contrast, Pythonic code follows the philosophy that *"Exceptions are not that exceptional!"*  Allowing exceptions to occur, and catching them in the appropriate place is good and expected coding style.

In [None]:
# More Pythonic not to allow a bad answer to pass silently
try:
    position = haiku.index('old')
    print(haiku[position:])
except Exception as e: # Exception is the broad class of all exceptions
    # This except block catches *any* exception whatsoever
    print(repr(e))

In [None]:
# Even more Pythonic to catch only the exception we know how to deal with
try:
    pos = haiku.index('old')
    print(haiku[pos:])
except ValueError as e: # In this version, we flag the particular exception
    # This except block executes only with a ValueError.
    print("Not found")

## String indexing

An important feature of string manipulation in Python is *string indexing* and *string slicing*. *Indexing* refers to extracting individual elements (characters) from Python strings. The syntax for indexing uses square brackets around an integer index to refer to a character inside the string.
* Indexing starts at 0 at the beginning (left) of the string, e.g., the reference `s[3]` refers to the *fourth* character of the string `s` counting from the left.
  * Sometimes using neologisms of constructing cardinal numbers from ordinal names makes clear the difference between "which character" and "which index position."  E.g. "zeroeth", "oneth", "two-eth", "three-eth" to name indices.  If the use of fake words pains you, don't use these.
* Negative indices start from the end (right) of the string, e.g, `s[-2]` is the second to last character in the string.
* Trying to index with an index too large for the string throws an exception (an `IndexError`)

In [None]:
s = "My name is David"
print(s[11])  # Indexed from zero, so s[11] is the 12th character,; expect 'D'
print(s[-3])  # expect 'v'
print(len(s)) # Prints the length of the string

In [None]:
# Should raies an IndexError (last character in the string has index 15)
print(s[16])

In [None]:
s[-17]

In [None]:
# We could have checked the length using:
len(s)   # Indices 0 .. 15

### Immutability of Python `str` type

A confusing feature for newcomers coming from C-family languages to Python is that strings are *immutable*; that is, individual characters/substrings within a string object cannot be overwritten once the string has been created. Thus, expressions involving string indices (or slices, see below) can occur on the right-hand side of an assignment operator, but never on the left-hand side. There are a handful of other immutable data structures in Python whose items cannot be reassigned after the object has been created. Having certain data structures being immutable enables optimizations in using dictionaries (see below).

In [None]:
print(s)
# This assignment works; s[4] on the right-hand sice of the assignment operator
c3 = s[3]
c3

In [None]:
s[3] = 'g' # This raises an exception (TypeError)

## String slicing

Beyond indexing individual elements of strings, we can also extract *slices* (substrings) from strings by specifying a (half-open) range of indices within brackets.
* The slice `my_string[a:b]` extracts a substring with characters `my_string[a]`, `my_string[a+1]`, `my_string[a+2]`, … `my_string[b-2]`, `my_string[b-1]` from the string `my_string` (under the assumption `b>a≥0`).
* If `a≥b`, the slice `my_string[a:b]` is the empty string.
* If `step>0` and `b>a>0`, then the slice `my_string[a:b:step]` extracts a substring from `my_string` starting from position `a` up to (but not including) position `b` in steps of length `step`. Of course, the endpoints can be given as negative integers as well, in which case, positions are measured from the right of the string.
* If `step<0`, the slice `my_string[a:b:step]` extracts a substring traversing the string `my_string` from right to left starting at `a`, terminating at (but not including `b`)
* The slice `my_string[:b]` slices from the beginning of the string (position `0`) up to (but not including) postition `b`.
* The slice `my_string[a:]` slices from position `a` up to *and including* the end of the string.

These rules are more easily understood looking at examples.

In [None]:
print(s)
# Slicing: pull out char 4 up to (but *not* including) char 9 from string s
print(s[4:9]) 

In [None]:
s[11:] # From s[11] to the very end

In [None]:
s[:11] # From the very start up to (but not including) character 11

In [None]:
s[:11] + s[11:]

In [None]:
s[-5] # s[-5] means the character 5 preceding the end

In [None]:
s[-5:] # Equivalent to s[11:] for this string

In [None]:
s[-5:-3] # Again, remember slicing is half-open (non-inclusive)

In [None]:
s[1:10:3] # Specify stride of length 3 (i.e., count in steps of 3)

In [None]:
s[15:8:-2]

The next couple cells show why using half-open intervals makes reasoning about slices easier.  The end of one slice adds seamlessly to the start of one with the same index.  This helps avoid what are called "[fence-post errors](https://en.wikipedia.org/wiki/Off-by-one_error)."

There's an old computer science joke that illustrates this:

> There are two hard things in computer science: cache invalidation; naming things; and off-by-one errors.

In [None]:
s[3:7] + s[7:10] # Remember, + operator concatenates strings...

In [None]:
s[:5] + s[5:]

## String conversions and `format`

It is possible to cast numeric values from strings if the strings represent appropriate numeric literals.

In [None]:
print(float("3.14"), int("-8"))

### Format strings (old-style)

Prior to Python 2.6, the principle way of conversting numeric data and other Python data into strings was using *string interpolation*.  The syntax of this style resembles conventions from the C programming language's `printf` statement. The basic trick was to use a `%` character preceding one of the characters in the table below to specify what would be substituted into the format string.

String interpolation remains widely used, but the newer `str.format()` method is more powerful, albeit also often more complicated.

| Conversion | Meaning
| :-:   | :-:
|`d`     |      Signed integer decimal.
|`i`     |      Signed integer decimal.
|`o`     |      Unsigned octal.
|`u`     |      Unsigned decimal.
|`x`     |      Unsigned hexadecimal (lowercase).
|`X`     |      Unsigned hexadecimal (uppercase).
|`e`     |      Floating point exponential format (lowercase).
|`E`     |      Floating point exponential format (uppercase).
|`f`     |      Floating point decimal format.
|`F`     |      Floating point decimal format.
|`g`     |      Same as "`e`" if exponent is greater than `-4` or less than precision, "`f`" otherwise.
|`G`     |      Same as "`E`" if exponent is greater than `-4` or less than precision, "`F`" otherwise.
|`c`     |      Single character (accepts integer or single character string).
|`r`     |      String (converts any Python object using `repr()`).
|`s`     |      String (converts any Python object using `str()`).
|`%`     |      No argument is converted, results in a "`%`" character in the result.

The format string followed by the `%` character and a Python tuple of values to convert described the "string interpolation". The generic syntax for variable substitution in a format string is

        %[flags][width][.precision]type

where `flags`, `width`, and `precision` are optional parameters.

In [None]:
a = str(42.5)
print(a)
# String interpolation (C-style, more or less)
# i.e. %[flags][width][.precision]type
from math import pi
print("Pi is about %d, in %s" % (pi, "Indiana"))
"For rough use, we often just use %0.4f" % pi

In [None]:
# Notice that if just one value is being interpolated into the string, 
# we can give the bare value.  However, if multiple, must use a tuple.
"For rough use, we often just use %0.7f" % pi

In [None]:
"Better precision is %.17f" % pi

In [None]:
"Past 17 digits, floating point precision is meaningless: %.30f" % pi

In [None]:
"Octal %o; Decimal %i; HEX %X; hex %x; Octal w/ marker %#o; Hex w/ marker %#X" % (
       13,         13,     13,     13,                 13,                13)

In [None]:
"Explicit signs %+d, %+d" % (-13, 13)

In [None]:
"Zero padded ints %+06d, %+06d" % (-13, 13)

In [None]:
"Space padded ints %6d, %6d" % (-13, 13)

In [None]:
"A scientific notation format %.3e using 'e'" % 1234567890

### The `str.format()` mini-language

The `format()` function and `str.format()` method of strings are enormously powerful, and occassionally enormously confusing. An excellent summary of the differences (with examples) can be found at [Pyformat](https://pyformat.info/).

Let's try a few examples both with old-style string interpolation and with `str.format`.

In [None]:
# Define a tuple of numeric values, say dollar amounts.
expenses = (1234.5678, 9900000.1, 83, .02)
for n, item in enumerate(expenses):
    print("Purchase %d:\t$%.2f" % (n+1, item))

We can do better than the last cell using a `format` specifier. In particular, two things we want in formatted currencies is comma separators in large numbers and right alignment.

In [None]:
format_string = "Purchase {}:\t${:>13,.2f}" # Format string using new format mini-language
for n, item in enumerate(expenses):
    print(format_string.format(n+1, item))

We compactly described the currency format above. However, we may rather have the dollar sign close to its amount. This needs to be done in two stages.

In [None]:
format_string = "Purchase {}:\t{:>14}"
for n, item in enumerate(expenses):
    amount = "${:,.2f}".format(item)
    print(format_string.format(n+1, amount))

### Details on the `str.format()` mini-language

Take a look at this for a complete description of the `str.format()` mini language: https://docs.python.org/3.4/library/string.html#formatstrings

| Option | Meaning
|:------:|:--------------------------------------
| `<`      | The field will be left-aligned within the available space. The default for strings.
| `>`      | The field will be right-aligned within the available space. The default for numbers.
| `=`      | Forces the padding to be placed after the sign (if any) but before the digits. This is used for printing fields in the form "`+000000120`". This alignment option is only valid for numeric types.
| `^`      | Forces the field to be centered within the available space.
| `+`      | A sign should be used for both positive as well as negative numbers.
| `-`      | A sign should be used only for negative numbers; the default behavior.
| `space`  | A leading space should be used on positive numbers, a minus sign on negative numbers.

Notice that `str.format()` permits several data structures for specifiying values.

In [None]:
# parameters can be out of order...
print("The capital of {1:s} is {2:s}, a {0:s} city".format(
                      "Northern", "California", "Sacramento", "USA"))

In [None]:
# Using keyword arguments to specify values to format
print("The capital of {state} is {capital}".format(
                      capital="Sacramento", state="California", country="USA"))

See the above linked documents for more full details.

<img src='img/copyright.png'>