# Data types & data structures

<img src="https://www.continuum.io/sites/default/files/Continuum-Wordmark.png">
This material is all available at: http://j.mp/academy-intro-saleswk

Table of Contents
----------------
* [Data types & data structures](#Data-types-&-data-structures)
* [Learning Objectives:](#Learning-Objectives:)
	* [Some comments on pedagogical style](#Some-comments-on-pedagogical-style)
* [`None` object](#None-object)
* [Numeric types](#Numeric-types)
	* [`bool` (boolean) type](#bool-%28boolean%29-type)
	* [`int` (integer) type](#int-%28integer%29-type)
	* [`float` (floating-point number) type](#float-%28floating-point-number%29-type)
	* [`complex` (complex number) type](#complex-%28complex-number%29-type)
* [`str` (string) type](#str-%28string%29-type)
	* [String methods](#String-methods)
	* [String indexing](#String-indexing)
		* [Immutability of Python `str` type](#Immutability-of-Python-str-type)
	* [String slicing](#String-slicing)
	* [String conversions and `format`](#String-conversions-and-format)
		* [Format strings (old-style)](#Format-strings-%28old-style%29)
		* [The `str.format()` mini-language](#The-str.format%28%29-mini-language)
		* [Details on the `str.format()` mini-language](#Details-on-the-str.format%28%29-mini-language)
* [Data structures](#Data-structures)
	* [`tuple` type](#tuple-type)
	* [`list` type](#list-type)
		* [Iteration over a collection](#Iteration-over-a-collection)
	* [`dict` (dictionary) type](#dict-%28dictionary%29-type)
	* [`set` type](#set-type)
* [Data types from the Python Standard Library](#Data-types-from-the-Python-Standard-Library)
	* [`datetime` module](#datetime-module)
	* [`collections` module](#collections-module)
		* [`collections.namedtuple`](#collections.namedtuple)
		* [`collections.OrderedDict`](#collections.OrderedDict)
	* [`decimal` module](#decimal-module)
	* [`fractions` module](#fractions-module)


**Developed by**

* David Mertz &lt;[dmertz@continuum.io](mailto:dmertz@continuum.io)&gt;
* Dhavide Aruliah &lt;[daruliah@continuum.io](mailto:daruliah@continuum.io)&gt;

**Taught by**

* David Mertz

# Learning Objectives:

After completion of this module, learners should be able to:
    
* use & distinguish builtin Python numeric types: `bool`, `int`, `float`, `complex`
* use & distinguish builtin Python data collections: `list`, `tuple`, `dict`, `set`
* apply the `str.format` mini-language to generate formatted output
* use & explain Python rules for type conversion & casting (e.g., combining operators & types)
* apply common methods associated with builtin Python data types
* use `help` (and other documentation) to learn about methods associated with builtin types
* apply Python rules for indexing & slicing strings, lists, & tuples
* apply Python idioms for looping over builtin data collections
* explain the distinction between *mutable* and *immutable* objects
* import & apply common extensions to Python builtin data types (e.g., `datetime`, `collections`, etc.)

## Some comments on pedagogical style

For the most part, we will introduce new concepts incrementally. Sometimes examples will deliberately introduce something you haven't seen yet with suggestions of how to proceed. You can probably guess what these new constructs are doing along the way, but if they are unclear, **please** stop and ask for more information.  

Even if you think you understand&mdash;either something covered explicitly, or something thrown in early&mdash;and would like to play with that construct some more, *let's do it!* The great thing about Notebooks (and about the Python interactive shell itself) is that we can play around, try things out, interactively discover options, and so on. We'll learn this material *together,* and experimentation and exploration is a much better way to learn than passive observation. 

* Discussion mixed with exercises
* Please ask questions
* Don't hesitate to ask for more detail

<hr/>
This tutorial, and Python in general, run more smoothly under Python 3.x.

Whether you're running on Python 2 or Python 3, please install [Python-Future](http://python-future.org/futurize.html):
```bash
conda install future
```

In [None]:
from __future__ import (absolute_import, division,
                        print_function, unicode_literals)
from future import standard_library
standard_library.install_aliases()
from future.builtins import (
         bytes, dict, int, list, object, range, str,
         ascii, chr, hex, input, next, oct, open,
         pow, round, super, filter, map, zip)

# `None` object

Not really a data structure, but the special value `None` in Python is often used where `NULL` or `nil` or `Nothing` are used in other languages.  It is a frequent placeholder to say, "We don't yet know how to handle this item, but we know to check whether it *is* `None`."  Hence code like this is common:

```python
for item in collection:
    if item is not None:
        process(item)
    else:
        pass
```

`None` is also special in that it is a *singleton*.  That is to say, there can only be one None object in a Python program, and hence ever `None` is not merely equal to, but *identical to* every other `None`.

#Numeric types

Python has are three distinct built-in numeric classes (or types): integers, floating-point numbers, and complex numbers. Numbers in Python are instantiated from numeric literal expressions in code or are the results returned by functions, methods, and operators.

## `bool` (boolean) type

Booleans are a subclass of integers. There are two values: `True` and `False`.

In [None]:
# Booleans: Is the Boolean value "True" simply the integer value "1"?
True == 1

In [None]:
False < 1 # Arithmetic comparison equivalent to "0<1"

In [None]:
(True + 1) 

In [None]:
isinstance(True, bool)

In [None]:
True is 1 # This is not quite what some expect

In [None]:
type(True), type(1)

In [None]:
issubclass(bool, int)

## `int` (integer) type

Numeric literals that contain no decimal point and no `e` (which express floating-point numbers) and no trailing `j` (which express complex numbers) are (decimal) integers.

In [1]:
a = 12345678     # Replace with any sequence integer characters without spaces
print(a,type(a))

12345678 <class 'int'>


Representations of integers in certain numeral bases other than 10 can be entered as numeric literals: `0x` or `0X` prefix literal hexadecimal integers, `0o` or `0O` prefix literal octal integers, and `0b` or `0B` prefix literal binary integers.

In [2]:
# Hexadecimal integers
print(0xFF03) # 15*16**3 + 15*16**2 +0*16**1 +3*16**0
print(0XFF03) # 15*16**3 + 15*16**2 +0*16**1 +3*16**0
print(15*16**3 + 15*16**2 +0*16**1 +3*16**0) # Verify result above

65283
65283
65283


In [3]:
# Octal integers
print(0o76543) # 7*8**4 + 6*8**3 + 5*8**2 + 4*8**1 + 3*8**0
print(0O76543)
print(7*8**4 + 6*8**3 + 5*8**2 + 4*8**1 + 3*8**0) # Verify result above

32099
32099
32099


In [None]:
# Binary integers
print(0b10010011) # 1*2**7 + 0*2**6 + 0*2**5 + 1*2**4 + 0*2**3 + 0*2**2 + 1*2**1 + 1*2**0
print(0B10010011) # 1*2**7 + 0*2**6 + 0*2**5 + 1*2**4 + 0*2**3 + 0*2**2 + 1*2**1 + 1*2**0
# Verify result above
print(1*2**7 + 0*2**6 + 0*2**5 + 1*2**4 + 0*2**3 + 0*2**2 + 1*2**1 + 1*2**0) 

In [4]:
# A number in base 6
int("3421", 6)

805

In Python 3, integers can have unlimited length in principle; as arithmetic operations produce results that overflow, representations of integers with longer bit patterns are adapted as required.

In [None]:
# Some integers
print(2**8)
print(2**32)
print(2**64)
print(2**65)
print(2**129)

In [None]:
# Even very large integers can be represented in Python
2**(2**16)

In addition to standard arithmetic operations, certain bitwise operations can be applied to integers.

In [None]:
print(3 << 4)  # shift left (int only)
print(33 >> 4) # shift right (int only)
print(3 & 4)   # bitwise and (int only)
print(3 | 4)   # bitwise or (int only)
print(3 ^ 4)   # bitwise xor (int only)
print(~ 3)     # bitwise not (int only)

## `float` (floating-point number) type

Real number literals (expressed in base 10 scientific notation) are distinguished from integer literals by either a decimal point or an explicit mantissa and exponent separated by `e` or `E` (optionally with a decimal point as well).

In [None]:
a = 4.732
print(a, type(a))

In [None]:
print(123e2)     # 12300; "123e2" means "123 times 10**2"
print(456E-4)    # 0.0456; "456e-4" means "456 times 10**(-4)"
print(-.7436e3)  # -743.6
print(1476.3e20) # 1.4763*10**23

In [None]:
-12345.6e78

In [None]:
# "int(23.45678e4)" means "23.45678 times 10 to the power 4 truncated to an integer"
int(23.45678e4)

By default, Python floating-point numbers are stored internally using 8 bytes (i.e., double precision or `double` in C). Greater precision is attainable using the `decimal` and `numpy` modules. Specific details about internal representation of floating-point numbers can be determined using `sys.float_info`.

Some fundamental facts to know about floating-point numbers:
* The largest positive floating-point value is about $10^{308}$ in double precision; larger values *overflow* to $+\infty$.
* The smallest positive floating-point value is about $10^{-324}$ in double precision; smaller values *underflow* to 0.
* Double precision floating-point values are represented with a 52-bit mantissa (plus one implicit bit). In practice, this translates to roughly 15&ndash;16 decimal digits of precision at best.
* Certain computations&mdash;e.g., $-\infty/\infty$, etc.&mdash;result in *Not-a-Number* (also denoted *NaN* or *nan*).
* More details on floating-point numbers & arithmetic:
    * [Floating-point numbers](https://en.wikipedia.org/wiki/Floating_point)
    * [The Floating-Point Guide](http://floating-point-gui.de/formats/fp/)
    * [IEEE 754 standard](http://en.wikipedia.org/wiki/IEEE_754-2008)

The behavior of IEEE-754 approximates the behavior of Real numbers in mathematics using a fixed and moderate amount of memory for each number, but because it is an imprecise format, in many cases it does not precisely obey mathematically expected properties such [associativity](https://en.wikipedia.org/wiki/Associative_property) , [commutativity](https://en.wikipedia.org/wiki/Commutative_property), and [multiplicative inverse](https://en.wikipedia.org/wiki/Multiplicative_inverse).  

In an old discussion on the comp.lang.python Usenet group, a commentor noted that "anyone who claims to understand IEEE-754 floating point math fully is either a liar or Tim Peters!"  Tim Peters—author of the Zen of Python, inventor of the widely used [Timsort](https://en.wikipedia.org/wiki/Timsort), and contributor #2 to Python itself—replied "It could be both."

In [5]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

In [6]:
-1.23456e310 # overflows to -infinity

-inf

In [7]:
float('Inf') / float('-inf') # Evaluates to +inf / -inf == nan

nan

In [None]:
for exponent in range(308, 400):
    float_string = "1e-{:d}".format(exponent)
    print("Attempting to represent {} as a float...".format(float_string))
    float_val = float(float_string)
    if float_val == 0:
        print("Underflow to 0 at {}".format(float_string))
        break

"nan" means "Not a Number", e.g., inf/inf, inf-inf, or any operation involving nan

In [27]:
inf = float('inf')
inf-inf, inf/inf

(nan, nan)

Every infinity is equal to, but not identical to, every other infinity of the sign.  However, every NaN is unequal to every other Nan

In [21]:
-1.23456e310 == -inf

True

In [29]:
inf == inf+2 == float('inf')

True

In [23]:
inf is inf+2

False

In [25]:
inf-inf == inf/inf

False

In [14]:
float('nan') == float('nan')

False

Comparing floating-point values for equality is generally inadvisable. Minor rounding errors in the least significant bits prevent simple calculation from resulting in expected results. It is usually better to specify a small tolerance (c.f., the square-root iteration from Module 1) to test for approximate equality of floating-point values.

In [None]:
b = sum([1/7]*7) # equivalent to "1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7"
print("1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 + 1/7 != 1.0")
print(b, "!=", 1.0)

In [None]:
# Associativity can produce rounding errors
a = (0.1 + 0.2) + 0.3
b = 0.1 + (0.2 + 0.3)
print(a, b)

In [None]:
# Or overflows depending on associativity
a = (1e307*100) / 100
b = 1e307 * (100/100)
print(a, b)

In [None]:
delta = 0.0001   # Set our tolerance delta
δ = delta 
abs(3.14159265 - 3.1415) < δ

In [None]:
type(δ)

It is also important to be wary of the distinction between integer ("floor") division with `//` as opposed to regular floating-point division with `/`. In addition, dividing by zero raises an exception.

Notice Python permits mixed arithmetic; when values of `int` and `float` type are combined in arithmetic expressions, the `int` is promoted ("cast") to a `float`. The type of a value can be explicitly cast using the constructors `int()`, or `float()` (with appropriate rounding/truncation).

In [None]:
1.0/0

In [None]:
print(1/5)

In [None]:
# Types of division (different from Python 2.x)
print(2/3)
print(2//3)
print(2.0//3.0)

In [None]:
denominators = [3, 4, 6, 0, 3]
for d in denominators:
    print("d = %d" % d)
    print(7/d)

In [None]:
denominators = [3, 4, 6, 0, 3]
for d in denominators:
    print("d = %d" % d)
    try:
        print(7/d)
    except ZeroDivisionError:
        print("Attempt to divide by zero")

## `complex` (complex number) type

In mathematics, it is common to refer to the square root of $-1$ as $i$ or $j$; in Python, we'll use the symbol $\mathtt{j}$ to denote $\sqrt{-1}$. Then *complex numbers* are expressible as a combination of the form $x+yj$ where $x$ and $y$ are real numbers.

* In the expression $x+yj$, $x$ is said to be the *real* part and $y$ is said to be the *imaginary* part.
* In Python, a complex numeric literal is (a) a real numeric literal with the symbol `j` as a suffix or (b) a real numeric literal added or subtracted to a real numeric literal with the symbol `j`. Notice this is the only case in Python where a token mixing numerals and alphabetic characters can begin with a numeral.

In [None]:
# Complex numbers
print(4.56e-3+7.5e1j)
complex_one = 1 + 0j 
print(complex_one == 1.0)
print(complex_one is 1.0)
type(complex_one)   # complex_one has type "complex" even though the imaginary part is zero

When an `int` or a `float` value is combined in an arithmetic expression with a `complex` value, the result is cast to a `complex`. The type of a value can be explicitly cast using the constructors `int()`, `float()`, or `complex()` (with rounding or zeros introduced appropriately).

In [None]:
a = 3  # Try replacing with various integer of floating-point values
print(type(a))
a += complex_one # casts resulting value to a complex value
print(a, type(a))

In [None]:
x, y = 3, 4.0
print("x is of type {} and y is of type {}".format(type(x), type(y)))
# cast to "higher type" as needed
print("x + y == {} is of type {}".format(x+y, type(x+y))) 

In [None]:
x, y = complex(3,4), 4.0
print("x is of type {} and y is of type {}".format(type(x), type(y)))
# cast to "higher type" as needed
print("x * y == {} is of type {}".format(x*y, type(x*y))) 

* Python `complex` values are essentially represented as a pair of Python `float` values.
* Complex numbers are not ordered; as such, comparisons with "less than" and "greater than:" operators fail when applied to complex values.
* If `z==x+y*1j` is Python complex value, the Python object `z` has attributes *`z.real==x`* and *`z.imag==y`* corresponding to the real and imaginary parts respectively.
* The function `abs` returns the *modulus* of a complex value (i.e., $\mathtt{abs}(x+yj)=\sqrt{x^2+y^2}$).
* If `z==x+y*1j==complex(x,y)` (where `x` & `y` are `float` or `int` values), the method `z.conjugate()` returns the value `x-y*1j` (the *complex conjugate* or `z`).

In [None]:
1+1j < -1-.5j

In [None]:
3+4j < 4+3j

In [None]:
1+0j == 1, 2+0j == 2, 1 < 2

In [None]:
1+0j < 2+0j

In [38]:
z = -1.43e-1+0.5e2j
print("real(z) = {:.3e}\nimag(z) = {:.3e}".format(z.real, z.imag))
print("conjugate(z) =", z.conjugate())

real(z) = -1.430e-01
imag(z) = 5.000e+01
conjugate(z) = (-0.143-50j)


In [30]:
abs(3+4j), abs(4+3j), abs(3+4j) < abs(4+3j)

(5.0, 5.0, False)

In summary, when working with numeric data types in Python, the [Python documentation on numeric types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex) is of great value. When trying to understand how certain operators are computing results, it is useful to keep in mind that the results can differ when the operands are of different numeric type.

In [None]:
pow = __builtins__.pow
print(3 + 4)          # addition
print(3 - 4)          # subtraction
print(3 * 4)          # multiplication
print(3 / 4)          # "true division"
print(3 // 4)         # floor division
print(13 % 4)         # modulo
print(3 ** 4)         # power
print(abs(3-4))       # absolute value
print(pow(3.0j,4))    # expect 81+0j
print(pow(3, 4, 5))   # power with optional modulo: (3**4) % 5
print(divmod(3, 4))   # division with remainder
print(int(3.14))      # convert to an int
print(float(3))       # convert to a float

# `str` (string) type

Strings (or `str` objects) are textual data with *delimeters* to denote where the string starts and ends. String literals are constructed with single quote characters(i.e., `'`), double quote characters (i.e., `"`) or a trio of single or double quote characters (i.e., `'''` or `"""`) as delimiters. Triple quoted strings can span multiple lines&mdash;all associated whitespace will be included in the string.

In [8]:
a = 'a string'
print(a, type(a))

a string <class 'str'>


In [9]:
string_1 = 'Single quotes as delimiters permit "double" quotes inside.'
string_2 = "Double quotes as delimiters don't have problems with 'single' quotes inside."
string_3 = '''
Triple (single) quotes don't have problems with 'single' or "double" quotes
inside. They don't even have problems with line breaks!
'''
print(string_1)
print(string_2)
print(string_3)

Single quotes as delimiters permit "double" quotes inside.
Double quotes as delimiters don't have problems with 'single' quotes inside.

Triple (single) quotes don't have problems with 'single' or "double" quotes
inside. They don't even have problems with line breaks!



To embed a single quote character (one or more) within a string delimited by single quotes, a backslash character is needed as an *escape chracter*. The same applies for double quote characters embedded within strings delimited by double quote characters. Other escaped string literals can be found in the [Python documentation](https://docs.python.org/3/reference/lexical_analysis.html#strings). Notice as of Python 3, strings characters are [Unicode code points](https://en.wikipedia.org/wiki/Code_point).

In [10]:
empty_str = ''
string_1 = 'Single quotes as delimiters permit \'escaped single\' quotes inside.'
string_2 = "Double quotes as delimiters \"escaped double\" quotes inside."
string_3 = '''Other escaped characters include the literal backslash \\,
Unicode characters with hex values like \\u00CC == \u00CC,'''
string_4 = 'the\ttab character \\t &\nthe line feed \\n.'
print('empty_str = %r' % empty_str)
print(string_1)
print(string_2)
print(string_3,string_4)

empty_str = ''
Single quotes as delimiters permit 'escaped single' quotes inside.
Double quotes as delimiters "escaped double" quotes inside.
Other escaped characters include the literal backslash \,
Unicode characters with hex values like \u00CC == Ì, the	tab character \t &
the line feed \n.


In [11]:
print("Unicode charcters may be entered by name: \N{GREEK SMALL LETTER DELTA}. ",
      "And also by codepoint: \u03B4")

Unicode charcters may be entered by name: δ.  And also by codepoint: δ


In [12]:
print("Strings can contain either literal non-ASCII characters",
      "Say in Русский.  Or they can contain escapes to codepoints",
      "such as \u0420\u0443\u0441\u0441\u043a\u0438\u0439")

Strings can contain either literal non-ASCII characters Say in Русский.  Or they can contain escapes to codepoints such as Русский


In [13]:
import unicodedata
unicodedata.lookup("GREEK SMALL LETTER DELTA")

'δ'

In [14]:
hex(ord("δ"))

'0x3b4'

In [15]:
unicodedata.name("δ")

'GREEK SMALL LETTER DELTA'

In [16]:
old_s = 'Mary had a little lamb\nIts fleece was white as snow\nAnd everywhere that Mary went\nThe lamb was sure to go.'
print(old_s)

Mary had a little lamb
Its fleece was white as snow
And everywhere that Mary went
The lamb was sure to go.


In [17]:
new_s = """Mary had a little lamb
Its fleece was white as snow
And everywhere that Mary went
The lamb was sure to go."""
print(new_s)

Mary had a little lamb
Its fleece was white as snow
And everywhere that Mary went
The lamb was sure to go.


In [18]:
# Check whether the strings new_s and old_s are identical in every way.
print(new_s == old_s) 
new_s is old_s

True


False

In [19]:
s = "Ain't it a shame?!"  # Single quote in double quotes
s

"Ain't it a shame?!"

In [20]:
s = 'Ain\'t it a shame?!' # Another example of escaping characters within strings
print(s)

Ain't it a shame?!


In [None]:
s = """He said "Ain't that a shame"!""" # Triple quotes to include both single/double
print(s)

In [None]:
print('He said "Hi" to me')

In [None]:
print("He said \"Hi\" to me") # Another example of escaping characters within strings

##String methods

As objects, strings have a variety of *methods* (functions) that can be invoked and operate on data contained in the calling `str` object.

| | | |
 :-: | :-: | :-: | :-: 
`capitalize`|`casefold`|`center`|`count`
`encode`|`endswith`|`expandtabs`|`find`
`format`|`format_map`|`index`|`isalnum`
`isalpha`|`isdecimal`|`isdigit`|`isidentifier`
`islower`|`isnumeric`|`isprintable`|`isspace`
`istitle`|`isupper`|`join`|`ljust`
`lower`|`lstrip`|`maketrans`|`partition`
`replace`|`rfind`|`rindex` | `rjust`
`rpartition`|`rsplit`|`rstrip` | `split`
`splitlines`|`strip`|`swapcase`| `title`
`translate`|`upper`|`zfill` |

Many of these methods have purposes indicated clearly by their names. We can use `help` or the [Python documentation](https://docs.python.org/3/library/stdtypes.html#string-methods) to determine their function. Let's examine a few here.

Given a `str` object with identifier, say, `a_string`, any string *`method`* is invoked using `a_string.`*`method()`* (that is, the string as an argument to the method is positioned as a prefix of the method in the call). Of course, other arguments may be required within the parentheses, depending on which method is used.

With strings, as with all objects, the object instance itself is the first thing passed to the method, defined in the class.  So, for example, writing `a_string.method(other, args)` does the same thing as calling `str.method(a_string, other, args)`.

In [23]:
haiku = """
    an aging willow
    its image unsteady
    in the flowing stream
"""
print(haiku) # We construct a multi-line string to experiment with first.


    an aging willow
    its image unsteady
    in the flowing stream



In [42]:
# haiku.count('i') returns the number of times the character 'i' occurs
haiku.count('i')

6

In [43]:
# haiku.count('in') returns the number of times the substring 'in' occurs
haiku.count('in')

3

In [22]:
haiku.split()

NameError: name 'haiku' is not defined

In [45]:
sum('in' == word for word in haiku.split())

1

In [46]:
sum('in' in word for word in haiku.split())

3

In [None]:
haiku.count('x') # returns 0 because 'x' is not a substring of haiku

In [None]:
print(haiku.strip()) # Removes leading/trailing whitespace (but not internal whitespace)

In [None]:
# Splits string haiku on line feed characters; returns a *list*
lines = haiku.split('\n')
lines

In [25]:
# We're going to jump ahead slightly in this example, i.e., using list conprehension
# The following removes empty lines as well as trailing whitespace
[line.strip() for line in haiku.split('\n') if line]

['an aging willow', 'its image unsteady', 'in the flowing stream']

In [24]:
# joining pieces back together
print("\n".join([line.strip() for line in haiku.split('\n') if line]))

an aging willow
its image unsteady
in the flowing stream


In [26]:
print(haiku.upper()) # Convert to upper case, return a new string
print(haiku)


    AN AGING WILLOW
    ITS IMAGE UNSTEADY
    IN THE FLOWING STREAM


    an aging willow
    its image unsteady
    in the flowing stream



In [None]:
# replaces a source substring with target substring, return as new string
print(haiku.replace('unsteady','wavering'))

In [None]:
# nothing happens with the source substring not found
print(haiku.replace('uneasy','wavering')) 

In [None]:
'uneasy' in haiku # Should evaluate to False

In [None]:
'unsteady' in haiku

In [27]:
haiku.endswith('stream') # Whoops, need to strip the whitespace...

False

In [28]:
haiku.rstrip().endswith('stream') # This is what we expected...

True

In [None]:
# Another jump ahead to map().  This is another way of iterating implicitly
lines = map(str.strip, haiku.split('\n'))
[x for x in lines if x.endswith(('willow','stream'))]

In [29]:
haiku.isalpha() # Only True when all characters are alphanumeric

False

In [None]:
"David".isalpha() # Should be True

In [None]:
haiku.isdigit()

In [None]:
"12345".isdigit()

In [None]:
# This asks "are all the *letters* lowercase?", 
# not "are all the characters lowercase letters?
haiku.islower()

In [None]:
"abc123#$%^&".islower()

In [None]:
# However, there must also *be* some letters for this to be true
"12345".islower()

In [None]:
help(str.islower)

In [None]:
haiku.isupper()

In [31]:
# Converts string haiku into a list of words
# ... more specifically, divide the string around any sequence of whitespace
words = haiku.split() 
words

['an',
 'aging',
 'willow',
 'its',
 'image',
 'unsteady',
 'in',
 'the',
 'flowing',
 'stream']

In [32]:
print(words)
"__".join(words)

['an', 'aging', 'willow', 'its', 'image', 'unsteady', 'in', 'the', 'flowing', 'stream']


'an__aging__willow__its__image__unsteady__in__the__flowing__stream'

In [30]:
"_".join(haiku) # Treats string haiku as list of letters; joins all with '_'

'\n_ _ _ _ _a_n_ _a_g_i_n_g_ _w_i_l_l_o_w_\n_ _ _ _ _i_t_s_ _i_m_a_g_e_ _u_n_s_t_e_a_d_y_\n_ _ _ _ _i_n_ _t_h_e_ _f_l_o_w_i_n_g_ _s_t_r_e_a_m_\n'

In [33]:
prefixes = ('Ti', 'Da', 'Le')
"David".startswith(prefixes)

True

In [34]:
# Replaces line-feed with empty strings; puts all on one line
print(haiku.replace('\n',''))
# split into substrings on 'w'; returns list
haiku.replace('\n','').split('w')

    an aging willow    its image unsteady    in the flowing stream


['    an aging ', 'illo', '    its image unsteady    in the flo', 'ing stream']

In [None]:
# Returns leading index of first occurrence of 'aging' inside 'haiku'
haiku.find('aging')

In [None]:
# haiku.find(substring) returns -1 if the substring is not found
print(haiku.find('old'))

In [None]:
help(str.find)

In [None]:
# also str.rindex() exists, and behaves as expected
haiku.rfind('st'), haiku.find('st')

In [None]:
# haiku.index is like haiku.find() with different error-handling bahaviour
haiku.index('aging') 

In [None]:
haiku.index('old') # ValueError because 'old' not substring of haiku

In [38]:
haiku[0], haiku[10]

('\n', 'i')

In [None]:
haiku[:8]

In [37]:
haiku[8:]

'aging willow\n    its image unsteady\n    in the flowing stream\n'

In [None]:
haiku[8:20]

In [36]:
haiku[8:20] + haiku[20:30] == haiku[8:30]

True

In [35]:
haiku[-1]

'\n'

In [None]:
haiku[-20:-10]

The distinct behaviors of `str.find` and `str.index` suggest two distinct methods for safeguarding output from a program. The first method `str.find` returns `-1` when the substring input argument does not produce a match. By contrast, the second method `str.index` returns an *exception*&mdash;in particular, the exception `ValueError` to inform us that the substring was not found in the string.

Using `str.find`, we can construct an `if-else` block to flag the error. Notice that if we don't try to catch the erroneous return value `-1`, the statement `print(haiku[position:])` prints the last character of `haiku` (which happens to be `\n`, a line feed character.

Note on good programming practice: It is more dangerous to let your program *succeed* in returning a wrong answer than it is to raise an uncaught exception that you *have to* fix before working with the program.  The philosophy behind this is often expressed with the slogan *"Fail early, fail hard!"*

In [39]:
# pos = haiku.find('old')
# end = haiku[pos:]
haiku[-1:]

'\n'

In [None]:
position = haiku.find('old')
if position != -1:
    print(haiku[position:])
else:
    print("Not found")

A more Pythonic idiom to catch an error is to use a *`try-except`* block instead. With the `try-except` block, the Python interpreter attempts to execute the statement `position = haiku.index('old')`. In this case, rather than returning an innocuous value `-1` (as `haiku.find` would do), `haiku.index` *raises an exception* (in this case, the exception `ValueError`). When an exception is raised within a `try` block, the code within the `except` block executes instead. It is generally considered better practice to raise exceptions in functions/modules that can be caught in higher-level namespaces.

Programmers who have worked with languages such as C++ and Java may think of exceptions as terrible events that indicate a program is badly broken.  In contrast, Pythonic code follows the philosophy that *"Exceptions are not that exceptional!"*  Allowing exceptions to occur, and catching them in the appropriate place is good and expected coding style.

In [40]:
# More Pythonic not to allow a bad answer to pass silently
try:
    position = haiku.index('old')
    print(haiku[position:])
except Exception as e: # Exception is the broad class of all exceptions
    # This except block catches *any* exception whatsoever
    print(repr(e))

ValueError('substring not found')


In [None]:
# Even more Pythonic to catch only the exception we know how to deal with
try:
    pos = haiku.index('old')
    print(haiku[pos:])
except ValueError as e: # In this version, we flag the particular exception
    # This except block executes only with a ValueError.
    print("Not found")

## String indexing

An important feature of string manipulation in Python is *string indexing* and *string slicing*. *Indexing* refers to extracting individual elements (characters) from Python strings. The syntax for indexing uses square brackets around an integer index to refer to a character inside the string.
* Indexing starts at 0 at the beginning (left) of the string, e.g., the reference `s[3]` refers to the *fourth* character of the string `s` counting from the left.
  * Sometimes using neologisms of constructing cardinal numbers from ordinal names makes clear the difference between "which character" and "which index position."  E.g. "zeroeth", "oneth", "two-eth", "three-eth" to name indices.  If the use of fake words pains you, don't use these.
* Negative indices start from the end (right) of the string, e.g, `s[-2]` is the second to last character in the string.
* Trying to index with an index too large for the string throws an exception (an `IndexError`)

In [41]:
s = "My name is David"
print(s[11])  # Indexed from zero, so s[11] is the 12th character,; expect 'D'
print(s[-3])  # expect 'v'
print(len(s)) # Prints the length of the string

D
v
16


In [42]:
# Should raies an IndexError (last character in the string has index 15)
print(s[16])

IndexError: string index out of range

In [43]:
s[-17]

IndexError: string index out of range

In [44]:
# We could have checked the length using:
len(s)   # Indices 0 .. 15

16

### Immutability of Python `str` type

A confusing feature for newcomers coming from C-family languages to Python is that strings are *immutable*; that is, individual characters/substrings within a string object cannot be overwritten once the string has been created. Thus, expressions involving string indices (or slices, see below) can occur on the right-hand side of an assignment operator, but never on the left-hand side. There are a handful of other immutable data structures in Python whose items cannot be reassigned after the object has been created. Having certain data structures being immutable enables optimizations in using dictionaries (see below).

In [60]:
print(s)
# This assignment works; s[4] on the right-hand sice of the assignment operator
c3 = s[3:]
c3

My name is David


'name is David'

In [64]:
s[-1] = 'g' # This raises an exception (TypeError)

TypeError: 'str' object does not support item assignment

## String slicing

Beyond indexing individual elements of strings, we can also extract *slices* (substrings) from strings by specifying a (half-open) range of indices within brackets.
* The slice `my_string[a:b]` extracts a substring with characters `my_string[a]`, `my_string[a+1]`, `my_string[a+2]`, … `my_string[b-2]`, `my_string[b-1]` from the string `my_string` (under the assumption `b>a≥0`).
* If `a≥b`, the slice `my_string[a:b]` is the empty string.
* If `step>0` and `b>a>0`, then the slice `my_string[a:b:step]` extracts a substring from `my_string` starting from position `a` up to (but not including) position `b` in steps of length `step`. Of course, the endpoints can be given as negative integers as well, in which case, positions are measured from the right of the string.
* If `step<0`, the slice `my_string[a:b:step]` extracts a substring traversing the string `my_string` from right to left starting at `a`, terminating at (but not including `b`)
* The slice `my_string[:b]` slices from the beginning of the string (position `0`) up to (but not including) postition `b`.
* The slice `my_string[a:]` slices from position `a` up to *and including* the end of the string.

These rules are more easily understood looking at examples.

In [65]:
print(s)
# Slicing: pull out char 4 up to (but *not* including) char 9 from string s
print(s[4:9]) 

My name is David
ame i


In [66]:
s[11:] # From s[11] to the very end

'David'

In [67]:
s[:11] # From the very start up to (but not including) character 11

'My name is '

In [68]:
s[:11] + s[11:]

'My name is David'

In [69]:
s[-5] # s[-5] means the character 5 preceding the end

'D'

In [70]:
s[-5:] # Equivalent to s[11:] for this string

'David'

In [73]:
s[-5:-3] # Again, remember slicing is half-open (non-inclusive)

'Da'

In [72]:
s[1:10:3] # Specify stride of length 3 (i.e., count in steps of 3)

'ya '

In [74]:
s[15:8:-2]

'dvDs'

The next couple cells show why using half-open intervals makes reasoning about slices easier.  The end of one slice adds seamlessly to the start of one with the same index.  This helps avoid what are called "[fence-post errors](https://en.wikipedia.org/wiki/Off-by-one_error)."

There's an old computer science joke that illustrates this:

> There are two hard things in computer science: cache invalidation; naming things; and off-by-one errors.

In [75]:
s[3:7] + s[7:10] # Remember, + operator concatenates strings...

'name is'

In [76]:
s[:5] + s[5:]

'My name is David'

##String conversions and `format`

It is possible to cast numeric values from strings if the strings represent appropriate numeric literals.

In [77]:
print(float("3.14"), int("-8"))

3.14 -8


###Format strings (old-style)

Prior to Python 2.6, the principle way of conversting numeric data and other Python data into strings was using *string interpolation*.  The syntax of this style resembles conventions from the C programming language's `printf` statement. The basic trick was to use a `%` character preceding one of the characters in the table below to specify what would be substituted into the format string.

String interpolation remains widely used, but the newer `str.format()` method is more powerful, albeit also often more complicated.

| Conversion | Meaning
| :-:   | :-:
|`d`     |      Signed integer decimal.
|`i`     |      Signed integer decimal.
|`o`     |      Unsigned octal.
|`u`     |      Unsigned decimal.
|`x`     |      Unsigned hexadecimal (lowercase).
|`X`     |      Unsigned hexadecimal (uppercase).
|`e`     |      Floating point exponential format (lowercase).
|`E`     |      Floating point exponential format (uppercase).
|`f`     |      Floating point decimal format.
|`F`     |      Floating point decimal format.
|`g`     |      Same as "`e`" if exponent is greater than `-4` or less than precision, "`f`" otherwise.
|`G`     |      Same as "`E`" if exponent is greater than `-4` or less than precision, "`F`" otherwise.
|`c`     |      Single character (accepts integer or single character string).
|`r`     |      String (converts any Python object using `repr()`).
|`s`     |      String (converts any Python object using `str()`).
|`%`     |      No argument is converted, results in a "`%`" character in the result.

The format string followed by the `%` character and a Python tuple of values to convert described the "string interpolation". The generic syntax for variable substitution in a format string is

        %[flags][width][.precision]type

where `flags`, `width`, and `precision` are optional parameters.

In [78]:
a = str(42.5)
print(a)
# String interpolation (C-style, more or less)
# i.e. %[flags][width][.precision]type
from math import pi
print("Pi is about %d, in %s" % (pi, "Indiana"))
"For rough use, we often just use %0.4f" % pi

42.5
Pi is about 3, in Indiana


'For rough use, we often just use 3.1416'

In [79]:
# Notice that if just one value is being interpolated into the string, 
# we can give the bare value.  However, if multiple, must use a tuple.
"For rough use, we often just use %0.7f" % pi

'For rough use, we often just use 3.1415927'

In [80]:
"Better precision is %.17f" % pi

'Better precision is 3.14159265358979312'

In [81]:
"Past 17 digits, floating point precision is meaningless: %.30f" % pi

'Past 17 digits, floating point precision is meaningless: 3.141592653589793115997963468544'

In [82]:
"Octal %o; Decimal %i; HEX %X; hex %x; Octal w/ marker %#o; Hex w/ marker %#X" % (
       13,         13,     13,     13,                 13,                13)

'Octal 15; Decimal 13; HEX D; hex d; Octal w/ marker 0o15; Hex w/ marker 0XD'

In [83]:
"Explicit signs %+d, %+d" % (-13, 13)

'Explicit signs -13, +13'

In [84]:
"Zero padded ints %+06d, %+06d" % (-13, 13)

'Zero padded ints -00013, +00013'

In [None]:
"Space padded ints %6d, %6d" % (-13, 13)

In [None]:
"A scientific notation format %.3e using 'e'" % 1234567890

### The `str.format()` mini-language

The `format()` function and `str.format()` method of strings are enormously powerful, and occassionally enormously confusing. An excellent summary of the differences (with examples) can be found at [Pyformat](https://pyformat.info/).

Let's try a few examples both with old-style string interpolation and with `str.format`.

In [None]:
# Define a tuple of numeric values, say dollar amounts.
expenses = (1234.5678, 9900000.1, 83, .02)
for n, item in enumerate(expenses):
    print("Purchase %d:\t$%.2f" % (n+1, item))

We can do better than the last cell using a `format` specifier. In particular, two things we want in formatted currencies is comma separators in large numbers and right alignment.

In [None]:
format_string = "Purchase {}:\t${:>13,.2f}" # Format string using new format mini-language
for n, item in enumerate(expenses):
    print(format_string.format(n+1, item))

We compactly described the currency format above. However, we may rather have the dollar sign close to its amount. This needs to be done in two stages.

In [None]:
format_string = "Purchase {}:\t{:>14}"
for n, item in enumerate(expenses):
    amount = "${:,.2f}".format(item)
    print(format_string.format(n+1, amount))

### Details on the `str.format()` mini-language

Take a look at this for a complete description of the `str.format()` mini language: https://docs.python.org/3.4/library/string.html#formatstrings

| Option | Meaning
|:------:|:--------------------------------------
| `<`      | The field will be left-aligned within the available space. The default for strings.
| `>`      | The field will be right-aligned within the available space. The default for numbers.
| `=`      | Forces the padding to be placed after the sign (if any) but before the digits. This is used for printing fields in the form "`+000000120`". This alignment option is only valid for numeric types.
| `^`      | Forces the field to be centered within the available space.
| `+`      | A sign should be used for both positive as well as negative numbers.
| `-`      | A sign should be used only for negative numbers; the default behavior.
| `space`  | A leading space should be used on positive numbers, a minus sign on negative numbers.

Notice that `str.format()` permits several data structures for specifiying values.

In [47]:
# parameters can be out of order...
print("The capital of {1:s} is {2:s}, a {0:s} city".format(
                      "Northern", "California", "Sacramento", "USA"))

The capital of California is Sacramento, a Northern city


In [48]:
# Using keyword arguments to specify values to format
print("The capital of {state} is {capital}".format(
                      capital="Sacramento", state="California", country="USA"))

The capital of California is Sacramento


See the above linked documents for more full details.

# Data structures

Python has a rich set of builtin data types that act as containers for other data types, including
* `tuple`
* `list`
* `dict` (dictionary)
* `set`

## `tuple` type

A tuple is a heterogeneous and immutable sequence of items. Tuples are constructed by separating values with commas, but parentheses `()` usually enclose them for clarity (and sometimes for disambiguation, depending on context)

* The empty tuple is denoted `()`.
* A single-item tuple requires a trailing comma to distinguish, e.g., `(a,)` from parentheses merely grouping an expression like `(a)` (which would *not* be a tuple).
  * To see why this is strictly required, contrast `(a+b)*3` with `(a+b,)*3`.
* For tuples of length greater than 1, the grouping parentheses are in fact optional. As such, the tuple `(a,b,c)` is equivalent to the tuple `a,b,c` with no delimiting parentheses.
* Tuples can be sliced and indexed exactly like strings with the same rules (but slicing extracts "sub-tuples" rather than substrings).
* As with strings, tuples are immutable, and thus references to tuple items by indexing or slicing cannot occur on the left-hand side of an assignment statement.

In [None]:
# Some special syntax issues
empty_tup = ()              # Empty tuple
singleton_tup = ('Foo',)    # One-item tuple (comma required)
no_parens = "this", "that"  # Parentheses sometimes optional
print('empty_tup == %s has length %d.' % (empty_tup, len(empty_tup)))
print(singleton_tup)
print(no_parens)

In [None]:
my_tup = (4, 'a', 'giraffe', 3-4j)
print(my_tup)
print(type(my_tup))
print(my_tup[2:])

In [None]:
my_tup[4] # raises an IndexError exception

One feature of Python tuples is that they can be used for assigning values to several identifiers at once. This is a surprising, but convenient, feature to programmers coming from other languages.

In [50]:
# Multiple assignments equivalent to following three individual assignments.
# x = 3
# y = 4.7
# z = "My dog"
x, y, z = 3, 4.7, "My dog"  # We could write '(x, y, z) = (3, 4.7, "My dog")' instead
print('x=%d, y=%3.1f, z=%s' % (x,y,z))
my_tup = x, y, z
print('x=%d, y=%3.1f, z=%s' % my_tup)

x=3, y=4.7, z=My dog
x=3, y=4.7, z=My dog


In [51]:
print("x={:d}, y={:3.1f}, z={:s}".format(x, y, z))
print("x={:d}, y={:3.1f}, z={:s}".format(*my_tup))
print("x={2:d}, y={1:3.1f}, z={0:s}".format(z, y, x))

x=3, y=4.7, z=My dog
x=3, y=4.7, z=My dog
x=3, y=4.7, z=My dog


In [52]:
"Just one %s" % "thing"

'Just one thing'

In [None]:
a,b,c = range(2,5) # tuple unpacking the range object
print('a=%d, b=%d, c=%d' % (a,b,c))

In [None]:
a, b, c = range(2,7)

In [None]:
# List unpacking works also
[a, b, c] = [1, 2, 3]

In [None]:
print(a, b)
a, b = b, a # Pythonic idiom for swapping variables by tuple assignment
print(a, b)

In [None]:
# More special syntax
a, b, c = 4, 5, 6    # Tuple on right is "unpacked" into vars on right
print(a, b, c)

In [None]:
# Even more special: we can gather "rest of values" using *name
# prefix for identifiers assigned to
a, b, *c = range(10)
print('a=%s, b=%s, c=%s' % (a, b, c))

In [None]:
a, *b, c = range(10)
print('a=%s, b=%s, c=%s' % (a, b, c))

In [None]:
*a, b, c = range(10)
a, b, c

In [None]:
*a, b, a = range(12)
a, b

In [None]:
a, *b, *c, d = range(100)

In [53]:
# Splitting the middle would be ambiguous, but you could do it explicitly
a, *b, c = range(30)
b1, b2 = b[:10], b[10:]
a, b1, b2, c

(0,
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28],
 29)

In [None]:
my_tup = (1,2,3)
try:
    my_tup[1] = 33
except TypeError:
    print("Trying to assign to immutable thing")

In [None]:
tuple(reversed(my_tup)) # using builtin function reversed

In [None]:
reversed(my_tup), my_tup

Tuples have very few methods.
 
 | |
 :-: | :-: 
 `count` | `index`

The `tuple` methods `count` and `index` behave as the same method in the `str` class. The only distinction is that they compare arbitrary items to elements of a tuple in looking for a match.

In [None]:
my_tup = (4, 'a', 'giraffe', 3-4j)
print(my_tup)
my_tup.index('giraffe')

In [None]:
my_tup.index(2)

In [None]:
letters = ('aaa', 'bbb', 'ccc')
letters.index('bbb')

In [None]:
tup2 = tuple("Mary had a little lamb")
print(tup2)
tup2.count('a')

In [None]:
# We can add (or multiply) tuples to make new ones (like with strings and lists)
x = (1,2,"A") + (4,5,6)
y = (1,2,"A") * 3
print(x)
print(y)

In [None]:
# Occassionally we'd like to "replace an element" in a tuple
# ... which really means, "Create a new tuple with that element differing"
tup10 = tuple(range(10))
newtup = tup10[:5] + ('five',) + tup10[6:]
newtup

## `list` type

Perhaps the most common (mutable) data structure used in Python is the list. Lists are an efficient, ordered sequence of heterogeneous elements. Lists can expand, and have `O(1)` access and amortized `O(1)` append. In essence, for those  who think in C terms, a Python list is mostly an array of references to Python objects.

Sometimes programmers coming from other languages attempt to implement linked-lists, queues, deques, or other special sequence-type structures. In Python, this is largely unnecessary; a list would be easier and usually faster (that said, the standard library provides queues and deques also). The most notable exception where lists are not as fast is when using homogeneous arrays of uniform data type. In those instances, the Python module `numpy` has data structures for homogeneous arrays and associated functions/methods that have been implemented to give very good performance. In either case, a beginning Python programmer need not implement sophisticated sequence/array-type data structures from scratch.

Both the `list` and the `tuple` type are heterogeneous ordered sequences, but their roles are conceptually different. A tuple is much more like a *record* that holds a collection of associated information, whereas a list is a sequence of items that are to be treated similarly. Although neither tuples nor lists need be homogeneous in type, the former are immutable while the latter are mutable; it is fairly common to iterate or loop over the elements of a list doing some kind of repeated computation (possibly mutating the list in place). As such, we expect lists to have *similar* elements (e.g., so that they can be sorted or processed similarly); by contrast, tuples are not generally expected to contain similar elements.

* The empty list is denoted by empty brackets `[]`.
* The simplest way to construct lists is to enclose a comma-separated sequence of items with brackets `[]`.
* Lists can be indexed & sliced. The rules are exactly the same as for tuples & strings.
* The `+` and `*` operator apply to the `list` type as they do to the `str` & `tuple` types.
* Unlike the `tuple` and `str` types, the `list` type is *mutable*. That is, individual items in lists can be reassigned values after the lists has been instantiated. In particular, this also means that lists can be extended (by appending or inserting elements into lists) or by removing/deleting elements.

In [None]:
empty_list = []
print('empty_list = %s' % empty_list)
my_list = [4, 'a', 'giraffe', 3-4j]
print(my_list)
print(type(my_list))
print(my_list[2:])

The `list` type has more methods than the `tuple` type and fewer methods than the `str` type.

| | | | |
|:-: | :-: | :-: | :-:|
|`append` | `clear` | `copy` | `count`|
|`extend` | `index` | `insert`| `pop`|
|`remove` | `reverse` | `sort` | |

Most `list` methods&mdash;notably methods other than `copy`, `count`, `index`, and `pop`&mdash;have return values of type `None`. The other methods are *mutator methods* that modify lists in-place. As such, it is important to avoid errors of the form, e.g.,
```python
my_list = my_list.reverse()
```

that overwrite the original variable name with `None`. This is a common mistake when using methods that modify mutable types in-place. Immutable types are safeguarded from these kinds of errors which is largely why Python has immutable types.

In [None]:
fruit_list = ['banana','apple','mango','canteloupe']
print('Initial: fruit_list = %s' % fruit_list)

In [None]:
# Common mutation of a list: appending elements
fruit_list.append("guava")
print('After appending: fruit_list = %s' % fruit_list)

In [None]:
# Related but different mutation: extending a list with elements from another list
fruit_list.extend(['pineapple','coconut','pear'])
print('After extending: fruit_list = %s' % fruit_list)

In [None]:
fruit_list.sort() # Sorting list in-place
print('After sorting: fruit_list = %s' % fruit_list)

In [None]:
animal_list = ['dog', 'cat', 'frog']
print(animal_list)
animal_list.extend("pig")
# Notice the difference between ["pig"] and "pig" as an argument to list.extend
print(animal_list)

In [None]:
# Remove items either by index, by slice, or by value
del animal_list[-3:]
print(animal_list)

In [None]:
animal_list.append('pig')
animal_list

In [None]:
# You can insert into the middle of lists also, 
# but less efficiently O(n)
animal_list.insert(2, 'horse')
print(animal_list)

In [None]:
# Removing list elements by value
animal_list.remove('cat')
print(animal_list)

In [None]:
# Removing list elements that are not present: raises exception ValueError
animal_list.remove('gopher')
print(animal_list)

In [None]:
# Concatenating lists using the addition (+) operator
print(animal_list)
animal_list = ['bat'] + animal_list
print(animal_list)
animal_list += ['dingo']
print(animal_list)

A common task with, say, a list of strings of dates is to extract the specific date information from each date into numeric lists of months, days, and years. Notice these dates are ambiguous; it is not obvious whether they are encoded as `DD/MM/YYYY` or as `MM/DD/YYYY` (which is why ISO standard dates are `YYYY-MM-DD` unambiguously).

In [None]:
# Decompose elements into parallel lists
dates = ["09/05/1984", "05/04/1999", "01/02/1921"]
months, days, years = [], [], []
for date in dates:
    mm, dd, yyyy = date.split('/') # Use tuple-unpacking to extract substrings
    months.append(int(mm))
    days.append(int(dd))
    years.append(int(yyyy))
    print('months=%s, days=%s, years=%s' % (months, days, years))

Just assigning a list to a new name still points that name to the same list object.  Hence changing the list under one name also changes what the other name points to.

To make an actual copy of a list:
* Use the `list.copy()` method or
* Take a slice of the entire list

To make a *deep copy*, you can use the `copy` modules `deepcopy()` function.  The difference here is that a list might contain inside it more lists (or other mutable objects, and a *shallow* copy still contains references to those).  For lists of immutable objects—like strings and numbers—deep versus shallow makes no difference.

In [None]:
print(animal_list)
animals2 = animal_list
animals2[3] = 'donkey'
print(animal_list)

In [None]:
backup = animal_list[:]   # Same result as `animal_list.copy()`
print('Initially: animal_list=%s' % animal_list)
print('Popped value:', backup.pop(0))
print('After pop: backup=%s' % backup)
print('After pop: animal_list=%s' % animal_list)

In [None]:
list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
list_of_lists

In [None]:
# Oops! We still changed original after modifying copying
lol2 = list_of_lists.copy()
lol2[1].append(55)
lol2.append([11, 12, 13])
print("Original:    ", list_of_lists)
print("Shallow copy:", lol2)

In [None]:
from copy import deepcopy
list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
lol3 = deepcopy(list_of_lists)
lol3[1].extend([55, 66, 77])
print("Original: ", list_of_lists)
print("Deep copy:", lol3)

Pay attention to the distinctions between the method `list.reverse()` and the builtin `reversed` function. The same concern applies when using the method `list.sort()` versus the builtin `sorted` function.

* The output from `reversed` and `sorted` is a *reverse iterator* object. This object evaluates the reverse sequence in a *lazy* manner; that is, elements are not computed until needed. *Lazy evaluation* is useful in the event that the sequence required has more elements than the memory can easily accommodate.

In [None]:
# Reversing (by mutation)
animal_list.reverse()
print(animal_list)

In [None]:
# Reversing (and returning result)
reverse_animal_list = reversed(animal_list)
print(animal_list)
print(reverse_animal_list)

In [None]:
# That was strange! Python 3 is often "lazy" and doesn't 
# compute something until needed
# Cast iterable to list to make printing explict
print(list(reverse_animal_list))

In [None]:
# Pretend `animal_list` is a billion item list (where a copy would eat memory)
reverse_animal_list = reversed(animal_list)  
for item in reverse_animal_list:
    print(item)

In [None]:
# Sorting (by mutation, in-place)
animals = animal_list[:]
animals.sort()
print(animal_list)
print(animals)

In [None]:
# Another gotcha! Not all elements can be compared to be sorted
animals.insert(2, None)
print(animals)
animals.sort()
print(animals)

In [None]:
# We can use a custom sort key function
def my_key(x):
    return "" if x is None else x
    
animals.sort(key=my_key)
animals

In [None]:
# But it still won't sort on *everything*
animals.append(1+1j)
animals.sort(key=my_key)

### Iteration over a collection

Notwithstanding the many cool things we can do with lists by applying methods, calling functions, indexing and slicing, the most common idiom for working with lists is to loop over them.  In fact, this idiom is extremely common for working with many Python objects.  

There are only two looping constructs in Python, `for` and `while`, and the former is far more common than the latter.

In [54]:
# Loop over a collection of "similar" items
class OddThing(object):
    def split(self):
        return "Thomas", "Bayes"
    
name_list = ["Rene Descartes", "Marie Sophie Germain", "Leonhard Euler", 
             "Ada Lovelace", "Isaac Newton", "Georg Cantor", OddThing()]
for name in name_list:
    *first, last = name.split()
    print(first[-1].lower(), last.upper())

rene DESCARTES
sophie GERMAIN
leonhard EULER
ada LOVELACE
isaac NEWTON
georg CANTOR
thomas BAYES


If you come from working in some other languages, you may be unduly tempted to think about the index positions of items in a list.  You *can* "write C in Python."  And you can "write FORTRAN in Python."  And you can "write Perl in Python."  *Ad nauseam*.  Generally it's best to avoid doing these things.

In [57]:
# A C programming veteran's first Python loop using explicit index
name_list = ["Rene Descartes", "Marie Sophie Germain", "Leonhard Euler", 
             "Ada Lovelace", "Isaac Newton", "Georg Cantor"]
count = 0
for k in range(len(name_list)):
    print(k, name_list[k])
    if ('r' in name_list[k]):
        count += 1 # equivalent to count = count + 1
print("The value of count is %d" % count)

0 Rene Descartes
1 Marie Sophie Germain
2 Leonhard Euler
3 Ada Lovelace
4 Isaac Newton
5 Georg Cantor
The value of count is 4


While looping over the indices of a collection obviously *does* work, as in the prior example, it is not *Pythonic* to do so. In Python, we like to think about collections themselves, and only rarely about how we index into them. The *Pythonic* style is to loop over the *elements* of the collection *themselves* (i.e., construct a meaningful identifier describing the entity we actually want to work with). In this case, it is the *names* in `name_list` so let's use the identifier `name` (which is much more informative than the identifier `k` as a loop index).

In [60]:
# "Pythonic" loop without explicit index
name_list = ["Rene Descartes", "Marie Sophie Germain", "Leonhard Euler", 
             "Ada Lovelace", "Isaac Newton", "Georg Cantor"]
count = 0
for name in name_list:
    print(name)
    count += 'r' in name
print("The value of count is %d" % count)

Rene Descartes
Marie Sophie Germain
Leonhard Euler
Ada Lovelace
Isaac Newton
Georg Cantor
The value of count is 4


An even more Pythonic implementation of a `for` loop is to use a *list comprehension*. Here is an example, but we'll see more later.

In [58]:
# Returns list containing first characters of each word in the list mylist
[name[0] for name in name_list]

['R', 'M', 'L', 'A', 'I', 'G']

In [59]:
[name for name in name_list if 'r' in name] # List of "names with r" counted above

['Rene Descartes', 'Marie Sophie Germain', 'Leonhard Euler', 'Georg Cantor']

For occasions where  we want to have access to both the list element and it's index, we can use the built-in function `enumerate`. This function takes an iterable sequence and returns an iterable object that returns tuples comprising the original sequence elements and their corresponding indices. This *lazy evaluation* is useful in the event that the input sequence has more elements than the memory can easily accommodate.

In [61]:
# Sometimes we genuinely do want both an index and an item
for k, name in enumerate(name_list):
    print("(%d) %s" % (k, name))

(0) Rene Descartes
(1) Marie Sophie Germain
(2) Leonhard Euler
(3) Ada Lovelace
(4) Isaac Newton
(5) Georg Cantor


## `dict` (dictionary) type

The `dict` (or *dictionary*) type is one of the most interesting data structures builtin to Python. As a motivating example, suppose you want to keep track of a group of stocks purchased. You might store the stocks symbols in list, the number of shares in another list, the price in another list, and so on.

In [None]:
symbol = ['GOOG', 'AAPL', 'MSFT', 'YHOO']
shares = [267, 349, 123, 181]
price = [396.85,  545.79, 914.49, 169.67]
for k in range(4):
    value = shares[k] * price[k]
    print('The value of %d %s shares is $%10.2f.' % (shares[k], symbol[k], value))

The problem with the preceding code is that its correct functioning depends on the lists all being aligned properly. If another stock symbol is inserted into the list `symbol`, the corresponding number of stocks and the price need to be inserted into the lists `shares` and `price` respectively in the same location. A better data structure to represent the problem would couple properties of each stock more closely.

This is what dictionaries do. A dictionary is an associative array (hash table) that gives `O(1)` lookup and insertion of elements, but does not have any inherent order to it. A dictionary maps zero or more (immutable) *keys* to (possibly mutable) *values*.  Dictionaries are mutable, and keys within them can be added, deleted, or their values modified.

Here is what the code above would look like using a list of dictionaries.

In [None]:
stocks = [ {'symbol':'GOOG', 'shares':267, 'price':396.85},
           {'symbol':'AAPL', 'shares':349, 'price':549.79},
           {'symbol':'MSFT', 'shares':123, 'price':914.49},
           {'symbol':'YHOO', 'shares':181, 'price':169.67}
         ]
for stock in stocks:
    value = stock['shares'] * stock['price']
    print('The value of %d %s shares is $%10.2f.' % (
                        stock['shares'], stock['symbol'], value))

* Each stock is represented by its own *dictionary* in the list of dictionaries. Observe how the loop is closer to natural language and that the data for each stock is coupled more closely in its own dictionary.
* A `dict` is delimited by braces `{}` with a comma-separated sequence of *`key: value`* pairs.
* The *keys* of the dictionary are used to access the values stored within. The idiom resembles indexing with strings, lists, and tuples, except that the indices here are strings.

In [None]:
empty_dict = {}
print('empty_dict has type %s.' % type(empty_dict))

In [None]:
goog = {'symbol': 'GOOG', 'shares': 267, 'price': 396.85}
print('The dict goog:', goog)
print(goog['symbol'])

In [None]:
# Iterating over a dictionary (keys)
for key in goog:
    print(key, goog[key])

Dictionaries have several useful methods.

| | | | |
 :-: | :-: | :-: | :-: | :-:
`clear` | `copy` | `fromkeys` | `get` | `items` 
`keys` | `pop` | `popitem` | `setdefault` | `update`

* The `.get()` method is an alias for indexing with brackets, i.e., `goog_stocks[key]==goog_stocks.get(key)`.  However, `dict.get()` will return a value (by default `None` if the key does not exist, whereas indexing will raise a `KeyError` exception.
* The `.keys()` and `.items()` methods return iterable sequences.
* Looping over a `dict` `D` is equivalent to looping over `D.keys()`
* Dictionaries themselves are mutable, but the *keys* of a `dict` *must* be immutable (e.g., `str` or `tuple`). The values associated with each key may be *mutable* or *immutable*.

In [None]:
for key in goog:  # Equivalent to "for key in goog_stocks.keys()"
    print(key)

In [None]:
goog['acquired']

In [None]:
goog.get('acquired', '1980-01-01')  # Return a default if not in dict

In [None]:
print(goog.get('shares', 99))
print(goog['shares'])

In [None]:
print('Before: %s' % goog)
goog['broker'] = 'Sergey Brin'    # Modifying the dict on the fly
print('After adding broker:\n\t%s' % goog)
goog['price'] = 452.32
print('After price increases:\n\t%s' % goog)

In [None]:
# Dictionaries have a few useful methods
print('Before: %s' % goog)
del goog['broker']
print('After removing broker:\n\t%s' % goog)

In [None]:
# The 'update' method uses a dict to modify key-value pairs in-place
goog.update({'shares':300, 'price':540.12, 'hometown':'Mountain View'})
print('After updating in-place broker:\n\t%s' % goog)

In [None]:
'shares' in goog

## `set` type

Sets are much like dictionaries that lack values, but only have keys.  In fact, for many years, in very old versions of Python, a standard idiom was to use dictionaries as sets, but set all the values to `None`.  Sets allow a few standard set-theoretic operations (intersection, union, subset, etc).

In [None]:
instructors = {"Ben Zaitlen", "Christine Doig"}
instructors.add("David Mertz")
print(instructors)
instructors.add("Dhavide Aruliah")
instructors

In [None]:
list_of_Ds = ["David Mertz", "Christine Doig", "John Doe"]
names_with_D = set(list_of_Ds)
names_with_D.add("Dhavide Aruliah")
names_with_D | instructors    # Union of sets

In [None]:
names_with_D & instructors    # Intersection of sets

In [None]:
names_with_D ^ instructors    # Symmetric difference

In [None]:
names_with_D - instructors    # Difference of sets

In [None]:
names_with_D <= instructors   # Test for improper subset

In [None]:
{'Ben Zaitlen', 'Christine Doig'} < instructors   # Test from proper subsect

In [None]:
names_with_D >= {"John Doe", "David Mertz"}  # Test for improper superset

In [None]:
"John Doe" in names_with_D    # Membership in set

In [None]:
# No effect to add something a second time (or Nth time)
names_with_D.add('David Mertz')
names_with_D

In [None]:
names_with_D.remove('David Mertz')
names_with_D

In [None]:
{x for x in (instructors | names_with_D) if 'Do' in x}

# Data types from the Python Standard Library

There are many other Python builtin data types and classes we could study in more detail (e.g., `byte`, `bytearray`, `iterator`, etc.). Among these are:
* `datetime`: a module for manipulating calendar dates and times;
* `collections`: a module extending standard builtin data collections (`list`, `dict`, etc.);
* `decimal`: a module for manipulating arbitrary precision decimal (i.e., base-10) numbers; and
* `fraction`: a module for manipulating rational numbers (i.e., fractions).

## `datetime` module

Computations involving dates are required frequently (e.g., calculating the number of days, hours, and minutes elapsed between 23:52 on Feb. 13, 2012 and 05:30 on Sept. 17, 2014). When we start to consider adjustments for leap years, time zones, and Daylight Savings Time, these calculations become quite involved. Rather than having to construct routines for date calculations from scratch, Python programmers can use routines from the `datetime` module to answer date-related questions. Of course, it is necessary to use an `import` statement to bring the relevant data structures and routines into the working namespace.

* The `datetime` module provides routines for formatting dates for output as well as routines for computation with dates.
* Complications in working with precise dates/times include time zones, leap seconds, and daylight-savings time corrections. The objects of the `datetime` module are classified as *naïve* or *aware* according to how well they are able to resolve times accurately with regard to such complications. There is a lot of subtlety involved in getting these details correct; suffice it to say that getting `datetime` computations accurate to within a day is likely possible, but getting accuracy to within a second, a minute, or even an hour is not guaranteed.
* The complications above notwithstanding, the most relevant data types from the `datetime` module are `datetime.date` for representing dates, `datetime.time` for representing times, `datetime.datetime` for representing both dates & times, and `datetime.timedelta` for representing time intervals (i.e., time elapsed between two specific `datetime` events).
* Useful functions include `datetime.date.today()` to return the current date, and `datetime.date.isoformat()`
* More details are in the [`datetime` module documentation](https://docs.python.org/3/library/datetime.html)
* At heart, naïve datetime arithmetic is simply "mixed radix" measures of seconds.  I.e. it combines base 60, base 24, and base 365 (and base 366 sometimes), in the way it displays and recognizes numbers.

In [None]:
import datetime as dt

In [None]:
today = dt.date.today()
# datetime.date.isoformat() returns date formatted in ISO format
print('Today is %s' % today.isoformat()) 
# datetime.date.strftime returns date formatted as described by a format string
print('Today is %s' % today.strftime('%A, %B %d, %Y')) # Details in documentation

In [None]:
earlier = dt.datetime(2012,2,13,23,52) # 23:52 on Feb. 13, 2012
later = dt.datetime(2014,9,17,5,30) # 05:30 on Sept. 17, 2014
print('Earlier date = %s' % earlier.strftime('%A, %B %d, %Y at %H:%M'))
print('Later   date = %s' %   later.strftime('%A, %B %d, %Y at %H:%M'))

In [None]:
time_elapsed = later - earlier
print('The time elapsed is %s' % time_elapsed)

## `collections` module

The `collections` module extends the base Python data collection types with a few useful variations. Three particular extensions are:

* `collections.namedtuple`: a function for creating `tuple`-type objects with named fields
* `collections.OrderedDict`: a subclass of `dict` objects with ordered keys
* `collections.Counter`: a subclass of `dict` that works like a "bad" or "multiset" (it's good for counting things, as the name indicates).
* More details are available from the [`collections` module documentation](https://docs.python.org/3/library/collections.html)

Some other nice collection types also include `collections.deque`, `collections.defaultdict` and `queue.Queue` (also `queue.LifoQueue` and `queue.PriorityQueue`).

### `collections.namedtuple`

A `namedtuple` is worth considering very often for clean code.  It requires no extra memory per `tuple`, but allows us to *name* each index position in a `tuple`. This provides for better documentation of our intent when using tuples.

In [None]:
from collections import namedtuple

In [None]:
# We declare a new data type with the identifier "Account"
# As well as indexing tuple entries by position, we can use the labels
# "accountID", "firstname", & "lastname" to retrieve entries from a namedtuple
account_fields = ["accountID", "firstname", "lastname"]
Account = namedtuple('Account', account_fields)
newton = Account('123456789', 'Isaac', 'Newton')
leibnitz = Account('987654321', 'Gottfried', 'Leibnitz')
print(newton)
print(leibnitz)

In [None]:
print(leibnitz[1])
print(leibnitz.firstname) # Same as above
print(newton.accountID)

In [None]:
import datetime as dt # We'll use datetime to represent dates

# We declare a new data type with the identifier "Stock"
# As well as indexing tuple entries by position, we can use the labels
# "symbol", "shares", "price", & "acquired" to retrieve entries from a namedtuple
# Space separated field names
Stock = namedtuple("Stock", "symbol shares price acquired") 

In [None]:
# Having defined the namedtuple data-type Stock, we create a value of type Stock
goog = Stock('GOOG', 100, 538.22, dt.date(2015, 1, 15))
print(goog)
print(goog[2])     # We can extract values from the namedtuple using tuple position
print(goog.price)  # ... or we can use an attribute with the appropriate name.

In [None]:
ibm = Stock('IBM', 500, 172.68, dt.date(1952, 6, 1))
aapl = Stock ('AAPL', 250, 127.62, dt.date(1999, 3, 14))
print(ibm)
print(aapl)
print(ibm.symbol)

In [None]:
mystocks = [goog, ibm, aapl] # Construct a list of the stocks
mystocks

In [None]:
# This is a way to implement the asset computation in a readable way
asset_value = 0
for stock in mystocks:
    asset_value += stock.shares * stock.price
print(asset_value)

In [None]:
sum(stock.shares * stock.price for stock in mystocks)

### `collections.OrderedDict`

Generic `dict` objects do not store the keys in any particular order; the specific way the keys are ordered are implementation-dependent and may vary. The `OrderedDict` from the `collections` module is a special data type in the standard Python library acts as a dictionary but also retains the insertion order of keys within the dictionary. If it is important to maintain a particular ordering for the keys (which may be useful when looping over the keys), an `OrderedDict` permits a fixed ordering of the `dict` keys.

In [None]:
from collections import OrderedDict

# Define a few key-value pairs as a list of tuples
key_value_pairs = [('broker','Roberto Cruz'), 
                   ('price',521.78),
                   ('shares',100), 
                   ('symbol','GOOG')]
plain_dict   = dict(key_value_pairs)
ordered_dict = OrderedDict(key_value_pairs)

In [None]:
print(list(plain_dict.keys()))  # No guarantee about order
print(plain_dict)

In [None]:
print(list(ordered_dict.keys()))   # Keys in specific order of insertion
print(ordered_dict)

In [None]:
ordered_dict['symbol']

In [None]:
ordered_dict['location'] = "Mountain View"

In [None]:
ordered_dict

In [None]:
del ordered_dict['broker']
ordered_dict

## `collections.Counter`

In [None]:
from collections import Counter
c = Counter('abracadabra')
c.most_common()

In [None]:
sorted(c)

In [None]:
c['r']

In [None]:
c['r'] += 4
c['r']

In [None]:
c['x']

In [None]:
c.update("abracadabra")
c

In [None]:
from random import randint
nums = [randint(1,9) for _ in range(100)]
numcount = Counter(nums)
numcount.most_common()

In [None]:
numcount

In [None]:
numcount.most_common(3)

In [None]:
numcount.subtract([7,7,7,7])
numcount.most_common()

In [None]:
numcount[8] -= 2
numcount.most_common()

## `decimal` module

The `decimal` module provides fast decimal (i.e., base-10) floating-point arithmetic (by contrast with base-2 arithmetic that is carried out using `float` types).

* `decimal` incorporates a base-10 floating-point model (like the one we learn at school).
* The `decimal.getcontext()` method permits users to extend the precision of decimal arithmetic.
* Generally, extended precision is slower than builtin `float` arithmetic. For large, data-intensive applications, exact decimal arithmetic carries a price.
* Decimal numbers that cannot be represented exactly using the `float` type can be expressed exactly using the `decimal.Decimal` type. For instance, with standard `float` values, `1.1 + 2.2` gives `3.3000000000000003` because the number $\frac{1}{10}=0.1$ requires an infinitely repeating binary bit pattern in its binary representation (hence the finite precision `float` computation necessarily makes minor rounding errors).
* However, not all Rational numbers can be represented in Decimal precisely (in fact, only a vanishingly small proportion of them can).  E.g $\frac{1}{3}$ or $\frac{2}{7}$ are not expressible exactly as decimals.
* Exact (finite) decimal representations can be constructed from strings, e.g., `decimal.Decimal('0.1')`. When computing with these values, typical decimal arithmetic is recovered (within the limits of the precision of the current context).
* More information is available at
    * [`decimal` module documentation](https://docs.python.org/3/library/decimal.html)
    * [IBM General Decimal Arithmetic Specification](http://speleotrove.com/decimal/decarith.html) (Version 1.70&mdash; Apr 2009)

In [None]:
import decimal as dec
# Find out "context" of decimal arithmetic at present
print(dec.getcontext())

In [None]:
# Decimal numbers 
from decimal import Decimal as D
sum_float = 0.1 + 0.1 + 0.1 - 0.3
sum_dec = D('0.1') + D('0.1') + D('0.1') - D('0.3')
print('The sum 0.1 + 0.1 + 0.1 - 0.3 is %8.2e with regular floats.' % sum_float)
print('The sum 0.1 + 0.1 + 0.1 - 0.3 is %s with decimals.' % sum_dec)

In [None]:
# Notice the difference:
print(D(.4))    # 'Decimal(0.4)' converts inexact float value to a "nearby" Decimal
print(D(".4"))  # 'Decimal("0.4")' converts a string value to an exact Decimal

In [None]:
dec.getcontext().prec = 16
print('The current context is %d digits of precision in decimal arithmetic.'
       % dec.getcontext().prec)
one, seven = dec.Decimal(1), dec.Decimal(7)
print("In the current context, 1/7 is %s" % (one/seven))

In [None]:
dec.getcontext().prec = 50
print('The current context is %d digits of precision in decimal arithmetic.'
       % dec.getcontext().prec)
print("In the current context, 1/7 is %s" % (one/seven))

In [None]:
# Decimal numbers, liked floats, have their own rounding errors
one/seven * 7

In [None]:
one/seven * seven

In [None]:
context = dec.getcontext()
context?

## `fractions` module

The `fractions` module provides support for rational number arithmetic, i.e., exact arithmetic involving ratios of integers.
* `fractions.Fraction` values can be constructed from pairs of integers, strings, floats, `decimal.Decimal` types, and other `fractions.Fraction` values).
* Observe `fractions.Fraction(1.1)` is not $11/10$ as we would expect. The `float` value for `1.1` is a binary (base-2) approximation of the value $1.1$, so the resulting `fractions.Fraction` value is a nearby approximation.
* If the denominator is zero, the `fractions.Fraction` constructor gives a `ZeroDivisionError`.
* Generally, exact arithmetic is much slower than builtin `float` arithmetic. For large, data-intensive applications, exact rational arithmetic carries a price.
* More information is at [`fraction` module documentation](https://docs.python.org/3.4/library/fractions.html).

In [62]:
from fractions import Fraction
value = Fraction(1.1)
print('The value of 1.1 as a fraction (cast from a float) is %s.' % value)

The value of 1.1 as a fraction (cast from a float) is 2476979795053773/2251799813685248.


In [63]:
value = Fraction(11,10)
print('The value of 1.1 as a fraction (cast from integers) is %s.' % value)
value = Fraction("1.1")
print('The value of 1.1 as a fraction (cast from a string) is %s.' % value)

The value of 1.1 as a fraction (cast from integers) is 11/10.
The value of 1.1 as a fraction (cast from a string) is 11/10.


In [64]:
a = Fraction(0.4)
print('The value of 0.4 as a fraction (cast from a float) is %s.' % a)
print('The value of 0.4 as a fraction (cast from a float, limiting the denominator) is %s' \
      % a.limit_denominator(10000000))

The value of 0.4 as a fraction (cast from a float) is 3602879701896397/9007199254740992.
The value of 0.4 as a fraction (cast from a float, limiting the denominator) is 2/5


In [66]:
# Exactly how far to limit denominator is not self-evident
a.limit_denominator(4)

Fraction(1, 3)

In [None]:
# Exact rational arithmetic
x = Fraction(3,4)
y = Fraction(1,3)
print('The value of %s + %s is %s.' % (x, y, x+y))

In [None]:
x.numerator, x.denominator