### Built-in data types
There are three Python data type categories:
- numeric
- sequence
- mapping

There is also the `None` object that represents `Null`, i.e. the absence of a balue. It should not be forgotten that other objects such as classes, files, and exceptions can also properly be considered *types*; however, they will not be considered here.

Every value in Python has a data type. Unlike many programming languages, in Python you do not need to explicitly declare the type of a variable. Python keeps track of object types internally.

| Category | Name | Description |
| :------- | :--- | :---------- |
| None | `None` | It is a null object. |
| Numeric | `int` | This is an integer data type. |
| Numeric | `float` | This data type can store a floating-point number. |
| Numeric | `complex` | It stores a complex number. |
| Numeric | `bool` | It is Boolean type and returns `True` or `False`. |
| Sequences | `str` | It is used to store a string of characters. |
| Sequences | `list` | It can store a list of arbitrary objects. |
| Sequences | `tuple` | It can store a group of arbitrary items. |
| Sequences | `range` | It is used to create a range of integers. |
| Mapping | `dict` | It is a dictionary data type that stores data in *key, value* pairs. |
| Mapping | `set` | It is a mutable and unordered collection of unique items. |
| Mapping | `frozenset` | It is an immutable set. |

### None type
The `None` type is immutable. It is used `None` to show the absence of a value; it is similar `null` in many programming languages, such as C and C++. Objects return `None` when there is actually nothing to return. It is also returned by `False` Boolean expressions. `None` is often used as a default value in function arguments to detect whether a function call has passed a value or not.

### Numeric types
Number types are described above.  Python provides the `int` data type that allows standard arithmetic operators to work on them, similar to other programming languages. A Boolean data type has two possible values, `True` and `False`. These values are mapped to `1` and `0`, respectively.

In [1]:
a = 4; b = 5  # = operator assigns ints to the variable names
print(a, "is of type", type(a))

# See the division of two ints
c = b / a
print(c, "is of type", type(c))

4 is of type <class 'int'>
1.25 is of type <class 'float'>


The `a` and `b` variables are the `int` type and `c` is a floating-point type. The division operator always returns a `float` type; however, if you wish to get the `int` type after division, you can use the floor division operator (`//`), which discards any fractional part and will return the largest integer value that is less than or equal to `x`.

In [2]:
d = b // a
print(d, "is of type", type(d))

print(-7 / 5)
print(-7 // 5)

1 is of type <class 'int'>
-1.4
-2


Use the division operator carefully, as its function according to the Python version. In Python 2, the division operator returns only `integer`, not `float`.

The exponent operator `**` can be used to get the power of a number, and the modulus operator `%` returns the remainder of the division.

In [3]:
a = 7; b = 5
e = b ** a
m = a % b
print(e, m)

78125 2


Complex numbers are represented by two floating-point numbers. They are assigned using the `j` operator to signify the imaginary part of the complex number. We can access the real and imaginary parts with `f.real` and `f.imag`, respectively. Complex numbers are generally used for scientific computations. Python supports addition, subtraction, multiplication, power, conjugates, and so forth on complex numbers.

In [4]:
f = 3 + 5j
print(f, "is of type", type(f))
print("Real:", f.real, "Imaginary:", f.imag)
print(2 * f)
print(f + 3)
print(f - 1)

(3+5j) is of type <class 'complex'>
Real: 3.0 Imaginary: 5.0
(6+10j)
(6+5j)
(2+5j)


In Python, Boolean types are represented using truth values, that is, `True` and `False`; it's similar to `0` and `1`. There is a `bool` class in Python, which returns `True` or `False`. Boolean values can be combined with logical operators such as `and`, `or`, and `not`:

In [5]:
print(bool(2))
print(bool(-2))
print(bool(0))

True
True
False


A Boolean operation returns either `True` or `False`. Boolean operations are ordered in priority, so if more than one Boolean operation occurs in an expression, the operation with the highest priortiy will occur first. The following table outlines the three Boolean operators in **descending** order of priority:
| **Operator** | **Example** |
| :----------- | :---------- |
| `not x` | It returns `False` if `x` is `True`, and returns `True` if `x` is `False` |
| `x and y` | It returns `True` if `x` and `y` are both `True`; otherwise returns `False` |
| `x or y` | It returns `True` if `x` or `y` are `True`; otherwise returns `False` |

Python is very efficient when evaluating Boolean expressions as it will only evaluate an operator if it needs to. For example, if `x` is `True` when Python evaluates `x or y`, it will return `True` without evaluating `y`, which makes sense.

The comparison operators (`<, <=, >, >=, ==,` and `!=`) work with numbers, lists, and other collection objects and return `True` if the condition holds. For collection objects, comparison operators compare the number of elements and the equivalence operator (`==`) returns `True` if each collection object is structurally equivalent, and the value of each element is identical.

In [6]:
see_boolean = (4 * 3 > 10) and (6 + 5 >= 11)
print(see_boolean)

if (see_boolean):
    print("Boolean expression returned True.")
else:
    print("Boolean expression returned False.")

True
Boolean expression returned True.


### Representation error
It should be noted that the native double precision representation of floating-point numbers leads to some unexpected results:

In [7]:
print(1 - 0.9)
print(1 - 0.9 == 0.1)

0.09999999999999998
False


This is a result of the fact that most decimal fractions are not exactly representable as a binary fraction, which is how most underlying hardware represents floating-point numbers. For algorithms or applications where this may be an issue, Python provides a decimal module. This module allows for the exact representation of decimal numbers and facilitates greater control of properties, such as rounding behavior, number of significant digits, and precision. It defines two objects, a `Decimal` type, representing decimal numbers, and a `Context` type, representing the various computational parameters such as precision, rounding, and error handling.

In [8]:
import decimal

x = decimal.Decimal(3.14)
y = decimal.Decimal(2.74)

print(x * y)

# Set the precision of this number to 4 digits (rounds to the thousandths place)
decimal.getcontext().prec = 4
print(x * y)

8.603600000000001010036498883
8.604


Here we have created a global context and set the precision to 4. The `Decimal` object can be treated pretty much as you would treat `int` and `float`. They are subject to all of the same mathematical operations and can be used as dictionary keys, placed in sets, and so on. In addition, `Decimal` objects also have several methods for mathematical operations, such as natural exponents, `x.exp()`; natural logarithms, `x.ln()`; and base 10 logarithms,
`x.log10()`.

Python also has a `fractions` module that implements a rational number type. The following example shows several ways to create fractions:

In [9]:
import fractions
print(fractions.Fraction(3, 4))
print(fractions.Fraction(0.5))
print(fractions.Fraction("0.25"))

3/4
1/2
1/4


There is a NumPy extension. This has types for mathematical objects, such as arrays, vectors, and matrices, and capabilities for linear algebra, calculation of Fourier transforms, eigenvectors, logical operations, and much more.
### Membership, identity, and logical operations
Membership operators (`in` and `not in`) test for variables in sequences, such as lists or strings, and do what you would expect; `x in y` returns `True` if an `x` variable is found in `y`. The `is` operator compares object identity.

In [10]:
x = [1, 2, 3]
y = [1, 2, 3]
print(x == y)  # Tests equivalence
print(x is y)  # Tests object identities of x and y. Returns False;
               # same list, but not the same assignment (point to different lists)
x = y
print(x is y)  # Now these variables reference the same object, so returns True

True
False
True


### Sequences
Sequences are ordered sets of objects indexed by non-negative integers. Sequences include `string`, `list`, `tuple`, and `range` objects. Lists and tuples are sequences of arbitrary objects, whereas strings are sequences of characters. However, `string`, `tuple`, and `range` objects are immutable, whereas, the `list` object is mutable. All sequence types have a number of operations in common. Note that, for the immutable types, any operation will only return a value rather than actually change the value.

For all sequences, the indexing and and slicing operators apply as described in the previous chapter. The `string` and `list` data types were discussed in detail in chapter 1. Here, we look at important methods and operations that are common to all data types:

| Method | Description |
| :----- | :---------- |
|`len(s)` | Returns the number of elemetns in `s` |
| `min(s, [, default = obj, key = func])` | Returns the minimum value in `s`, alphabetically for strings. |
| `max(s, [, default = obj, key = func])` | Returns the maximum value in `s`, alphabetically for strings. |
| `sum(s, [, start = 0])` | Returns the sum of the elements, returns `TypeError` if `s` is not numeric. |
| `all(s)` | Returns `True` if all elements in `s` are `True`, that is, not `0`, `False`, or `Null`. |
| `any(s)` | Cehcks whether any item in `s` is `True`. |

In addition, all sequences support the following operations:

| Operation | Description |
| :-------- | :---------- |
| `s + r` | Concatenates two sequences of the same type. |
| `s * n` | Makes `n` copies of `s`, where `n` is an integer. |
| `v1, v2, ..., vn = s` | Unpacks `n` variables from `s` to `v1`, `v2`, and so on. |
| `s[i]` | Indexing returns the i-th element of `s`. |
| `s[i:j:stride]` | Slicing returns elements between `i` and `j` with optional stride. |
| `x in s` | Returns `True` if the `x` element is in `s`. |
| `x not in s` | Returns `True` if the `x` element is not in `s`. |

In [11]:
print(list())              # an empty list
list1 = [1, 2, 3, 4]
list1.append(1)            # append value 1 at the end of the list
print(list1)

# Make two copies of the list
list2 = list1 * 2
print(list2)

print(min(list1), max(list1))

# Insert a value, 2, into the list at index 0
list1.insert(0, 2)
print(list1)

# Reverse the order of the list
list1.reverse()
print(list1)

# Return the list to the normal order
list1.reverse()
print(list1)

# Remake list2
list2 = [11, 12]
list1.extend(list2)   # Adds 11 and 12 to list1

# Print the new list1
print(list1)

# Print the sum of list1
print(sum(list1))     # 36

# Print the length of list1
print(len(list1))

list1.sort()
print(list1)          # Should print the numeric list in ascending order

# Removes the value from a list if it is in the list
list1.remove(12)

[]
[1, 2, 3, 4, 1]
[1, 2, 3, 4, 1, 1, 2, 3, 4, 1]
1 4
[2, 1, 2, 3, 4, 1]
[1, 4, 3, 2, 1, 2]
[2, 1, 2, 3, 4, 1]
[2, 1, 2, 3, 4, 1, 11, 12]
36
8
[1, 1, 2, 2, 3, 4, 11, 12]


### Learning about tuples
Tuples are immutable sequences of arbitrary objects. A tuple is a comma-separated sequence of values; however, it is common practive to enclose them in parentheses. Tuples are very useful when we want to set up multiple variables in one line, or to allow a function to return multiple values of different objects. Tuple is an ordered sequence of items similar to the `list` data type. Tuples are indexed by integers greater than zero. Tuples are **hashable**, which means we can sort lists of them and they can be used as keys to dictionaries.

We can also create a tuple using the built-in function: `tuple()`. With no argument, this creates an empty tuple. If the argument to `tuple()` is a sequence then this creates a tuple of elements of that sequence. It is important to remember to use a trailing comma when creating a typle with one element-without the trailing comma, this will be interpreted as a string. An important use of tuples is to allow us to assign more than one variable at a time by placing a tuple on the left-hand side of an assignment.

In [12]:
t = tuple()    # Creates an empty tuple
print(type(t))

t = ('a', )    # Creates a tuple with one element, a string
print(t, "is of type", type(t))

# A larger than an element tuple
tpl = ('a', 'b', 'c')
print(tpl)

print(tuple('sequence'))

x, y, z = tpl  # Multiple assignment, each variable is assigned the matching index of the tuple in order shown

print(x, y, z)
print('a' in tpl, 'z' in tpl)  # Returns True, False

<class 'tuple'>
('a',) is of type <class 'tuple'>
('a', 'b', 'c')
('s', 'e', 'q', 'u', 'e', 'n', 'c', 'e')
a b c
True False


Most operators, such as those for slicing and indexing, work as they do on lists. However, because tuples are immutable, trying to modify an element of a tuple will give you `TypeError`. We can compare tuples in the same way that we compare other sequences, using the `==`, `>`, and `<` operators.

In [13]:
# Make a tuple() for example
tupl = 1, 2, 3, 4, 5 # braces are optional
print("Tuple value at index 1 is ", tupl[1])
print("Tuple[1:3] is ", tupl[1:3])

# Make new tuples
tupl2 = (11, 12, 13)
tupl3 = tupl + tupl2
print(tupl3) # Returns (1, 2, 3, 4, 5, 11, 12, 13)

# Print two copies of the first tuple
print(tupl * 2)
# Check if 5 is in the first tuple
print(5 in tupl)
# Print the last index of the first tuple using negative indexing
print(tupl[-1])
# Print the length of the tuple
print(len(tupl))
# Print the minimum and maximum values of the first tuple
print(min(tupl), max(tupl))
# Remember tuples are immutable, so this sort of assignment does not work
tupl[1] = 5

Tuple value at index 1 is  2
Tuple[1:3] is  (2, 3)
(1, 2, 3, 4, 5, 11, 12, 13)
(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
True
5
5
1 5


TypeError: 'tuple' object does not support item assignment

In [14]:
print(tupl == tupl2)   # False, the tuples are clearly different
print(tupl > tupl2)    # False

False
False


In [15]:
l = ['one', 'two']
x, y = l
print(x, y)
x, y = y, x # Reverse the order of the tuple using assignment
print(x, y)

one two
two one


### Beginning with dictionaries
In Python, the `Dictionary` data typle is one of the most popular and useful data types. A dictionary stores the data in a mapping of key and value pair. Dictionaries are mainly a collection of objects; they are indexed by numbers, strings, or any other immutable objects. Keys should be unique in the dictionaries; however, the values in the dictionary can be changed. Python dictionaries are the only built-in mapping type; they can be thought of as a mapping from a set of keys to a set of values. They are created using the `{key:value}` syntax. For example, the following code can be used to create a dictionary that maps words to numerals using different methods:

In [17]:
a = {
    'Monday': 1,
    'Tuesday': 2,
    'Wednesday': 3
}
b = dict({
    'Monday': 1,
    'Tuesday': 2,
    'Wednesday': 3
})
c = dict(zip(['Monday', 'Tuesday', 'Wednesday'], [1, 2, 3]))
d = dict([('Monday', 1), ('Tuesday', 2), ('Wednesday', 3)])
print(a, b, c, d)
print(type(a), type(b), type(c), type(d))

{'Monday': 1, 'Tuesday': 2, 'Wednesday': 3} {'Monday': 1, 'Tuesday': 2, 'Wednesday': 3} {'Monday': 1, 'Tuesday': 2, 'Wednesday': 3} {'Monday': 1, 'Tuesday': 2, 'Wednesday': 3}
<class 'dict'> <class 'dict'> <class 'dict'> <class 'dict'>


We can add keys and values. We can also update multiple values, and test for the membership or occurrence of a vlaue using the `in` operator, as shown in the following code example:

In [18]:
# Adding another key and value(s) to dictionary d
d['Thursday'] = 4         # Adds an item to d
d.update({
    'Friday': 5,
    'Saturday': 6
})

print(d)                  # Prints our updated dictionary
print('Wednesday' in d)   # Membership test, only available for keys
print(5 in d)             # Again, we cannot check for a value in the dictionary

{'Monday': 1, 'Tuesday': 2, 'Wednesday': 3, 'Thursday': 4, 'Friday': 5, 'Saturday': 6}
True
False


The `in` operator to find an element in a list takes too much time if the list is long. The running time required to look up an element in a list increases linearly with an increase in the size of the list. Whereas, the `in` operator in dictionaries uses a hashing function, which enables dictionaries to be very efficient, as the time taken in looking up an element is independent of the size of the dictionary.

Notice when we print out the `{key: value}` paires of the dictionary, it does so in no particular order. This is not a problem since we use specified keys to look up each dictionary value than an ordered sequence of integers, as is the case for strings and lists:

In [19]:
print(dict(zip('packt', range(5))))
# Assign this dictionary to a variable name
a = dict(zip('packt', range(5)))
# See the length of this dictionary (5)
print(len(a))
# Check to see the value of a key, 'c'
print(a['c'])
# Using pop does the same thing, but removes the key and corresponding values
print(a.pop('a'))
# Make a copy of a, and assign it to b
b = a.copy()
print(b)
# See the keys of a
print(a.keys())
# See the values in dictionary a
print(a.values())
# Prints the key: value pairs
print(a.items())
# Update the dictionary, adding the 'a' key: value pair back to the dictionary
a.update({'a': 1})
print(a)
# Change the value assigned to 'a'
a.update({'a': 22})
print(a)

{'p': 0, 'a': 1, 'c': 2, 'k': 3, 't': 4}
5
2
1
{'p': 0, 'c': 2, 'k': 3, 't': 4}
dict_keys(['p', 'c', 'k', 't'])
dict_values([0, 2, 3, 4])
dict_items([('p', 0), ('c', 2), ('k', 3), ('t', 4)])
{'p': 0, 'c': 2, 'k': 3, 't': 4, 'a': 1}
{'p': 0, 'c': 2, 'k': 3, 't': 4, 'a': 22}


| Method | Description |
| :----- | :---------- |
| `len(d)` | Returns total number of items in the dictionary, `d`. |
| `d.clear()` | Removes all of the items from the dictionary, `d`. |
| `d.copy()` | Returns a shallow copy of the dictionary, `d`. |
| `d.fromkeys(s[, value])` | Returns a new dictionary with keys from the `s` sequence and <br> values set to `value`. |
| `d.get(k[, v])` | Returns `d[k]` if it is found; otherwise it returns `v` (`None` if `v` is <br> not given). |
| `d.items()` | Returns all of the `key: value` pairs of the dictionary, `d`. |
| `d.keys()` | Returns all of the keys defined in the dictionary, `d`. |
| `d.pop(k[, default])` | Returns `d[k]` and removes it from `d`. |
| `d.popitem()` | Removes a random `key: value` pair from the dictionary, `d`, <br> and returns it as a tuple. |
| `d.setdefault(k[,v])` | Returns `d[k]`. If it is not found, it returns `v` and sets `d[k]` to `v`. |
| `d.update(b)` | Adds all of the objects form the `b` dictionary to the `d` dictionary. |
| `d.values()` | Returns all of the values in the dictionary, `d`. |

### Python
It should be noted that the `in` operator, when applied to dictionaries, works in a slightly different way to when it is applied to a list. When we use the in operator on a list, the relationship between the time it takes to find an elementand the size of the list is considered linear. That is, as the size of the list gets bigger, the corresponding time it takes to find an element grows, at most, linearly. The relationship between the time an algorithm takes to run compared to the size of its input is often referred to as its time complexity. We will talk more about this important topic in the next few chapters.

In contrast to the `list` object, when the `in` operator is applied to dictionaries, it uses a hashing algoritm, and this has the effect of an increase in each lookup time that is almost independent of the size of the dictionary. This makes dictionaries extremely useful as a way of rates of growth hashing in chapter 4.
### Sorting dictionaries
IF we want to do a simple sort on either the keys or values of a dictionary, we can do this:

In [20]:
d = {
    'one': 1,
    'two': 2,
    'three': 3,
    'four': 4,
    'five': 5,
    'six': 6
}

d_srtd = sorted(list(d))        # Returns the sorted values of the keys, in ascending alphabetical order when strings
print(d_srtd)
print(sorted(list(d.values()))) # Returns the sorted values of from the dictionary

['five', 'four', 'one', 'six', 'three', 'two']
[1, 2, 3, 4, 5, 6]


Note that the first line in the preceding code sorts the keys alphabetically and the second line sorts the values in order of the integer value.

The `sorted()` method has two optional arguments that are of interest; `key` and `reverse`. The `key` argument has nothing to do with the dictionary keys, but rather is a way of passing a function to the sort algorithm to determine the sort order. For example, in the following code, we use the `__getitem__` special method to sort the dictionary keys according to the dictionary.

In [21]:
sorted(list(d), key = d.__getitem__)

['one', 'two', 'three', 'four', 'five', 'six']

Essentially, what the preceding code is doing is, for every key in `d`, it uses the corresponding value to sort. We can also sort the values according to the sorted order of the dictionary keys. However, since dictionaries do not have a method to return a key by using its value, the equivalent of the `list.index` method for lists, using the optional key argument to do this is a little tricky. An alternative approach is to use a list comprehension:

In [22]:
[value for (key, value) in sorted(d.items())]

[5, 4, 1, 6, 3, 2]

The `sorted()` method also has an optional `reverse` argument, and unsurprisingly this does exactly what it says-it reverses the order of the sorted list:

In [23]:
sorted(list(d), key = d.__getitem__, reverse = True)

['six', 'five', 'four', 'three', 'two', 'one']

We want to order this dictionary of English to French numbers in correct numerical order by the French numbers:

In [24]:
d2 = {
    'one': 'uno',
    'two': 'deux',
    'three': 'trois',
    'four': 'quatre',
    'five': 'cinq',
    'six': 'six'
}

Of course, when we print this dictionary out, it will be unlikely to print in the correct order. Because all keys and values are strings, we have no context for numerical ordering. To place these items in the correct order, we need to use the first dictionary we created, mapping words to numerals as a way to order our English to French dictionary:

In [25]:
sorted(d2, key = d.__getitem__)

['one', 'two', 'three', 'four', 'five', 'six']

Notice we are using the values of the first dictionary, `d`, to sort the keys of the second dictionary, `d2`. Since our keys in both dictionaries are the same, we can use a list comprehension to sort the values of the French to English dictionary:

In [26]:
[d2[i] for i in sorted(d2, key = d.__getitem__)]

['uno', 'deux', 'trois', 'quatre', 'cinq', 'six']

We can, of course, define our own custom method that we can use as the key argument to the sorted method. For example, here we define a function that simply returns the last letter to a string:

In [28]:
def corder(string):
    return(string[len(string) - 1])

# We can use this function to sort each element by the last character
sorted(d2.values(), key = corder)

['quatre', 'uno', 'cinq', 'trois', 'deux', 'six']

### Dictionaries for text analysis
A common use of the dictionaries is to count the occurences of like items in a sequence; a typical example is counting the occurences of words in a body of text. The following code creates a dictionary where each word in the text is used as a key and the number of occurences as its value. This uses a very common idiom of nested loops. Here we are using it to traverse the lines in a file in an outer loop and the keys of a dictionary on the inner loop:

In [29]:
def wordcount(fname):
    try:
        fhand = open(fname)   # Open the file name
    except:
        print("File cannot be opened.")
        exit()
        
    count = dict()
    for line in fhand:
        # Outer loop splits each line's words after reading the file line by line
        words = line.split()
        for word in words:
            # Create a new key to the dictionary when a new word is found
            if word not in count:
                count[word] = 1
            # Add one to the count if the word is already found
            else:
                count[word] += 1
                
    return count

In [33]:
count = wordcount("alice.txt")
# Only keep words with at most 20 occurences and at least 16 occurrences
filtered = {
    key:value for key, value in count.items() if value < 20 and value > 16
}
print(filtered)

{'once': 18, 'eyes': 18, 'There': 19, 'this,': 17, 'before': 19, 'take': 18, 'tried': 18, 'even': 17, 'things': 19, 'sort': 17, 'her,': 18, '`And': 17, 'sat': 17, '`But': 19, "it,'": 18, 'cried': 18, '`Oh,': 19, 'and,': 19, "`I'm": 19, 'voice': 17, 'being': 19, 'till': 19, 'Mouse': 17, '`but': 19, 'Queen,': 17}


Note the use of the **dictionary comprehension** used to construct the filtered dictionary. Dictionary comprehensions work in an identical way to the list comprehensions we looked at in chapter 1.
### Sets
Sets are unordered collections of unique items. Sets are themselves mutable-we can add and remove items from them; however, the items themselves must be immutable. An important distinction with sets is that they cannot duplicate items. Sets are typically used to perform mathematical operations such as intersection, union, difference, and complement.

Unlike sequence types, set types do not provide any indexing or slicing operaions. There are two types of set objects in Python, the mutable `set` object and the immutable `frozenset` object. Sets are created using comma-separated values within curly braces. By the way, we cannot create an empty set useing `a = {}`, because this will create a dictionary. To create an empty set, we write either `a = set()` or `a = frozenset()`.

| Method | Description |
| :----- | :---------- |
| `len(a)` | Provides the total number of elements in set `a` |
| `a.copy()` | Provides another copy of set `a` |
| `a.difference(t)` | Provides a set of elements that are in the set `a` but not in `t` |
| `a.intersection(t)` | Provides a set of elements that are in both sets, `a` and `t` |
| `a.isdisjoint(t)` | Returns `True` if no element is common in both the sets, `a` and `t` |
| `a.issubset(t)` | Returns `True` if all of the elements of the set `a` are also in <br> the `t` set. |
| `a.issuperset(t)` | Returns `True` if all of the elements of the set `t` are also in the set `a`. |
| `a.symmetric_difference(t)` | Returns a set of elements that are in either setsd `a` or `t`, <br> but not in both. |
| `a.union(t)` | Returns a set of elements that are in either sets `a` or `t`. |

In the preceding table, the `t` parameter can be any Python object that supports iteration and all methods are available to both `set` and `frozenset` objects. It is important to be aware that the operator versions of these methods require their arguments to bew sets, whereas the methods themselves can accept any iterable type. For example, `s = {1, 2, 3}`, for any set `s`, will generate an unsupported operand type. Using the equivalent, `s.difference([1, 2, 3])` will return a result. Mutable `set` object have additional methods, described in the follwoing table:

| Method | Description |
| :----- | :---------- |
| `s.add(item)` | Adds an item to `s`; nothing happens if the item is <br> already added. |
| `s.clear()` | Removes all elements from the set `s`. |
| `s.difference_update(t)` | Removes those elements from the set `s` that are <br> also in the other set, `t`. |
| `s.discard(item)` | Removes the item from the set, `s`. |
| `s.intersection_update(t)` | Remove the items from the set, `s`, which are not in <br> the intersection of the sets, `s` and `t`. |
| `s.pop()` | Returns an arbitrary item from the set, `s`, and it <br> removes it from the set `s`. |
| `s.remove(item)` | Deletes the item from the set `s`. |
| `s.symmetric_difference_update(t)` | Deletes all of the elements from the set `s` that are <br> not in the symmetric difference of the sets, `s` and `t`. |
| `s.update(t)` | APpends all of the items in an iterable object, `t`, to <br> the `s` set. |

In [34]:
# Show the addition, removal, discard, and clear operations
s1 = set()
s1.add(1); s1.add(2); s1.add(3); s1.add(4)
print(s1)
# Remove 4
s1.remove(4)
print(s1)
s1.discard(3)
print(s1)
s1.clear()
print(s1)

{1, 2, 3, 4}
{1, 2, 3}
{1, 2}
set()


In [35]:
s1 = {'ab', 3, 4, (5, 6)}
s2 = {'ab', 7, (7, 6)}
# s1 - s2 = s1.difference(s2)
print(s1 - s2, s1.difference(s2))
# Show the intersection of s1 and s2: {'ab'}
print(s1.intersection(s2))
# Show the union of s1 and s2: {'ab', 3, 4, (5, 6), 7, (7, 6)}
print(s1.union(s2))

{(5, 6), 3, 4} {(5, 6), 3, 4}
{'ab'}
{3, 4, (5, 6), 7, 'ab', (7, 6)}


Notice that the `set` object does not care that its members are not all of the same type, as long as they are all immutable. If you try to use a mutable object such as a list or dictionary in a set, you will receive an unhashable type error. Hashable types all have a hash value that does not change throughout the lifetime of the instance. All built-in immutable types are hashable. All built-in mutable types are not hashable, so they cannot be used as elements of sets or keys to dictionaries.

Notice also in the preceding code that when we print out the union of `s1` and `s2`, there is only one element with the value `ab`. This is a natural property of sets in that they do not include duplicates.

In addition to these built-in  methods, there are a number of other operations that we can perform on sets. For example, to test for membership of a set, use the following:

In [36]:
print('ab' in s1)     # True
print('ab' not in s1) # True
# We can loop through the elements in the set as well
for element in s1: print(element)

True
False
ab
3
4
(5, 6)


### Immutable sets
Python has an immutable set type called `frozenset`. It works almost exactly like `set`, apart from not allowing methods or operations that change values such as `add()` or `clear()` methods. There are several ways that this immutability can be useful.

For example, since normal sets are mutable and therefore not hashable, they cannot be used as members of other sets. On the other hand `frozenset` is immutable and therefore able to be used as a member of a set:

In [37]:
# You cannot add a set to another set, you have to take its union at best
s1.add(s2)

TypeError: unhashable type: 'set'

In [38]:
s1.add(frozenset(s2))
print(s1)

{3, 4, (5, 6), 'ab', frozenset({'ab', (7, 6), 7})}


The immutable property of `frozenset` means we can use it for a key to a dictionary as in the following example:

In [39]:
fs1 = frozenset(s1)
fs2 = frozenset(s2)

print({fs1: 'fs1', fs2: 'fs2'})

{frozenset({3, 4, 'ab', (5, 6), frozenset({'ab', (7, 6), 7})}): 'fs1', frozenset({'ab', (7, 6), 7}): 'fs2'}


### Modules for data structures and algorithms
In addition to the built-in types, there are several Python modules that we can use to extend the built-in types and functions. In many cases, these Python modules may offer efficiency and programming advantages that allow us to simplify our code.

So far, we have looked at the built-in datatypes of strings, lists, sets, and dictionaries as well as the decimal and fraction modules. They are often described by the term **Abstracted Data Types (ADTs)**. ADTs can be considered mathematical specifications for the set of operations that can be performed on data. They are defined by their behavior rather than their implementation. In addition to the ADTs that we have looked at, there are several Python libraries that provide extensions to the built-in datatypes. These will be discussed in the following section.
### Collections
The `collections` module provides move specialized, high-performance alternatives for the built-in data types as well as a utility function to create named tuples. The following table lists the datatypes and operations of the collections module and their descriptions:

| Datatype or operation | Description |
| :-------------------- | :---------- |
| `namedtuple()` | Creates typle subclasses with named fields. |
| `deque` | Lists with fast appends and pops either end. |
| `ChainMap` | Dictionary-like class to create a single view of multiple <br> mappings. |
| `Counter` | Dictionary subclass for counting hashable objects. |
| `OrderedDict` | Dictionary subclass that remembers the entry order. |
| `defaultdict` | Dictionary subclass that calls a function to supply missing <br> values. |
| `UserDict`, `UserList`, `UserString` | These three data types are simply wrappers for their <br> underlying base classes. Their use has largely been supplanted <br> by the ability to subclass their respective base classes directly. <br> Can be used to access the underlying object as an attribute. |

### Deques
Double-ended queues, or deques (usually pronounced *decks*), are list-like objects that support thread-safe, memory-efficient appends. Deques are mutable and support some of the operations of lists, such as indexing. Deques can be assigned by index, for example, `dq[1] = z`; however, we cannot directly slice deques. For example, `dq[1:2]` results in a `TypeError`.

The major advantage of deques over lists is that inserting items at the beginning of a deque is much faster than inserting items at the beginning of a list, although inserting items at the end of a deque is very slightly slower than the equivalent operation on a list. Deques are thread-safe and can be serialized using the `pickle` module.

A useful way of thinking about deques is in terms of populating and consuming items. Items in deques are usually populated and consumed sequentially from either end.

In [42]:
from collections import deque

dq = deque('abc')     # Creates deque(['a', 'b', 'c'])
dq.append('d')        # Adds the value d to the right/"end" of the deque
dq.appendleft('z')    # Adds the value z to the left/"beginning" of the deque
dq.extend("efg")      # Adds multiple items to the right/"end" of the deque
dq.extendleft("yxw")  # Adds multiple items to the left/"beginning" of the deque

print(dq)

deque(['w', 'x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g'])


We can use the `pop()` and `popleft()` methods for consuming items in the deque:

In [43]:
print(dq.pop())       # Returns and removes an item from the right
print(dq.popleft())   # Returns and removes an item from the left
print(dq)

g
w
deque(['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f'])


We can also use the `rotate(n)` method to move and rotate all items of `n` steps to the right for positive values of the `n` integer or negative values of `n` steps to the left, using positive integers as the argument:

In [44]:
dq.rotate(2)  # Moves e and f to the front of the list after moving the other 7 characters two places to the left
print(dq)
dq.rotate(-2) # Moves everything back by moving everything to the left two places
print(dq)

deque(['e', 'f', 'x', 'y', 'z', 'a', 'b', 'c', 'd'])
deque(['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f'])


Note that we can use the `rotate` and `pop` methods to delete selected elements. Also worth knowing is a simple way to return a slice of a deque, as a list, which can be done as follows:

In [45]:
import itertools

# See the current deque
print(dq)

# Here is how we take out the first three elements from our deque
list(itertools.islice(dq, 3, 9))

deque(['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f'])


['a', 'b', 'c', 'd', 'e', 'f']

The `itertools.islice()` method works in the same way that slice works on a list, except rather than taking a list for an argument, it takes an iterable and returns selected values start and stop indicies, as a list.

A useful feature of deques is that they support a `maxlen` optional parameter that restricts the size of the deque. This makes it ideally suited to a data structure known as a **circular buffer**. This is a fixed-size structure that is effectively connected end to end and they are typically used for buffering data streams.

In [46]:
dq2 = deque([], maxlen = 3)

for i in range(6):
    dq2.append(i)
    print(dq2)

deque([0], maxlen=3)
deque([0, 1], maxlen=3)
deque([0, 1, 2], maxlen=3)
deque([1, 2, 3], maxlen=3)
deque([2, 3, 4], maxlen=3)
deque([3, 4, 5], maxlen=3)


In this example, we are populating the deque from the right and continuing from the left. Notice that once the buffer is full the oldest values are consumed first and values are replaced from the right. We will look at circular buffers in chapter 4, when implementing circular lists.
### ChainMap objects
The `collections.chainmap` class was added in Python 3.2, and it provides a way to link a number of dictionaries, or other mappings, so that they can be treated as one object. In addition, there is a `maps` attribute, a `new_child()` method, and a `parents` property. The underlying mappings for `ChainMap` objects are stored in a list and are accessible using `maps[i]` attribute to retrieve the `ith` dictionary. Note that, even though dictionaries themselves are unordered, `ChainMap` objects are ordered lists of dictionaries.

`ChainMap` is useful in applications where we are using a number of dictionaries containing related data. The consuming application expects data in terms of a priority, where the same key in two dictionaries is given priority if it occurs at the beginning of the underlying `ChainMap` is typically used to simulate nested contexts such as when we have multiple overriding configuration settings. The following example demonstrates a possible use case for `ChainMap`.

In [48]:
import collections

dict1 = {
    'a': 1,
    'b': 2,
    'c': 3
}

dict2 = {
    'd': 4,
    'e': 5
}

chainmap = collections.ChainMap(dict1, dict2)   # Linking two dictionaries
print(chainmap)
print(chainmap.maps)
print(chainmap.values)

# Accessing values
print(chainmap['b'], chainmap['e'])

ChainMap({'a': 1, 'b': 2, 'c': 3}, {'d': 4, 'e': 5})
[{'a': 1, 'b': 2, 'c': 3}, {'d': 4, 'e': 5}]
<bound method Mapping.values of ChainMap({'a': 1, 'b': 2, 'c': 3}, {'d': 4, 'e': 5})>
2 5


The advantage of using `ChainMap` objects, rather than just a dictionary, is that we retain previously set values. Adding a child context overrides values for the same key, but it does not remove it from the data structure. This can be useful when we may need to keep a record of changes so that we can easily roll back to a previous setting.

We can retrieve and change any value in any of the dictionaries by providing the `map()` method with an appropriate index. This index represents a dictionary in `ChainMap`. Also, we can retrieve the parent setting, that is, the default setting, by using the `parents()` method:

In [58]:
from collections import ChainMap
defaults = {
    'theme': 'Default',
    'language': 'eng',
    'showIndex': True,
    'showFooter': True
}

cm = ChainMap(defaults)         # Creates a ChainMap with defaults
print(cm.maps)
print(cm.values())

# Make a new ChainMap child that overrides the parent
cm2 = cm.new_child({
    'theme': 'bluesky'
})

print(cm2['theme'])             # Returns the overridden them 'bluesky'
print(cm2.pop('theme'))         # Removes the child theme value
print(cm2['theme'])             # Goes back to the default values
print(cm2.maps)
print(cm2.parents)

[{'theme': 'Default', 'language': 'eng', 'showIndex': True, 'showFooter': True}]
ValuesView(ChainMap({'theme': 'Default', 'language': 'eng', 'showIndex': True, 'showFooter': True}))
bluesky
bluesky
Default
[{}, {'theme': 'Default', 'language': 'eng', 'showIndex': True, 'showFooter': True}]
ChainMap({'theme': 'Default', 'language': 'eng', 'showIndex': True, 'showFooter': True})


### Counter objects
`Counter` is a subclass of dictionary where each dictionary key is a hashable object and the associated value is an integer count of that object. There are three ways to intialize a counter. We can pass it any sequence object, a dictionary of `key: value` pairs, or a tuple of the format (`object = value, ...`), as in the following example:

In [61]:
from collections import Counter

# Example of a counter object for a sequence
Counter('anysequence')
# Making multiple counter objects
c1 = Counter('anysequence')
c2 = Counter({'a': 1, 'c': 1, 'e': 3})
c3 = Counter(a = 1, c = 1, e = 3)
print(c1, c2, c3)

Counter({'e': 3, 'n': 2, 'a': 1, 'y': 1, 's': 1, 'q': 1, 'u': 1, 'c': 1}) Counter({'e': 3, 'a': 1, 'c': 1}) Counter({'e': 3, 'a': 1, 'c': 1})


We can also create an empty counter object and populate it by passing its `update` method, an iterable or a dictionary. Notice how the `update` methods adds the counts rather than replacing them with new values. Once the counter is populated, we can access stored values in the same way we would do for dictionaries, as in the following example:

In [62]:
ct = Counter()
ct.update('abca')     # Populates the object
print(ct)
ct.update({'a': 3})   # Adds 3 to the count of As
print(ct)

for item in ct:
    print('%s: %d' % (item, ct[item]))

Counter({'a': 2, 'b': 1, 'c': 1})
Counter({'a': 5, 'b': 1, 'c': 1})
a: 5
b: 1
c: 1


The most notable difference between counter objects and dictionaries is that counter objects return a zero count for missing items rather than raising a key error. We can create an iterator out of a `Counter` object by using its `elements()` method. THis returns an iterator where counts below one are not included and the order is not guaranteed. In the following code, we perform some updages, create an iterator from `Counter` elements, and use `sorted()` to sort the keys alphabetically:

In [63]:
print(ct)      # Prints Counter({'a': 5, 'b': 1, 'c': 1})
print(ct['x']) # Prints 0
ct.update({'a': -3, 'b': -2, 'e': 2})
print(ct)      # Prints Counter({'a': 2, 'e': 2, 'c': 1, 'b': -1})
sorted(ct.elements())

Counter({'a': 5, 'b': 1, 'c': 1})
0
Counter({'a': 2, 'e': 2, 'c': 1, 'b': -1})


['a', 'a', 'c', 'e', 'e']

Two other `Counter` methods worth mentioning are `most_common()` and `subtract()`. The most common method takes a positive integer argument that determines the number of most common elements to return. Elements are returned as a list of (key, value) tuples.

The subtract method works exactly like update except, instead of adding values, it subtracts the,  as in the following example:

In [64]:
print(ct.most_common())
ct.subtract({'e': 2})
print(ct)

[('a', 2), ('e', 2), ('c', 1), ('b', -1)]
Counter({'a': 2, 'c': 1, 'e': 0, 'b': -1})


### Ordered dictionaries
The important thing about ordered dictionaries is that they remember the insertion order, so when we iterate over them, they return values in the order they were inserted. This is in contrast to a normal dictionary, where the order is arbitrary. When we test to see whether two dictionaries are equal, this equality is only based on their keys and values; however with `OrderedDict`, the insertion order is also considered an equality test between two `OrderedDict` objects with the same keys and values, but a diffferent insertion order will return `False`:

In [65]:
od1 = collections.OrderedDict()
od1['one'] = 1
od1['two'] = 2
od2 = collections.OrderedDict()
od2['two'] = 2
od2['one'] = 1

print(od1 == od2)      # False

False


Similarly, when we add values from a list using `update`, `OrderedDict` will retain the sam order as the list. This is the order that is returned when we iterate the values:

In [66]:
kvs = [('three', 3), ('four', 4), ('five', 5)]
od1.update(kvs)
print(od1)

# print out the key, value pairs
for k, v in od1.items(): print(k, v)

OrderedDict([('one', 1), ('two', 2), ('three', 3), ('four', 4), ('five', 5)])
one 1
two 2
three 3
four 4
five 5


`OrderedDict` is often used in conjunction with the sorted method to create a sort method to create a sorted dictionary. In the following example, we use a lambda function to sort the values, and here we use a numerical expression to sort the integer values:

In [67]:
od3 = collections.OrderedDict(sorted(od1.items(), key = lambda t: (4 * t[1]) - t[1] ** 2))
print(od3)
print(od3.values())

OrderedDict([('five', 5), ('four', 4), ('one', 1), ('three', 3), ('two', 2)])
odict_values([5, 4, 1, 3, 2])


### defaultdict
The `defaultdict` object is a subclass of `dict`, and therefore they share methods and operations. It acts as a convenient way to initialized dictionaries. With `dict`, Python will throw `KeyError` when attempting to access a key that is not already in the dictionary. The `defaultdict` overrides one method, `missing (key)`, and creates a new instance variable, `default_factory`. With `defaultdict`, rather than throw an error, it will run the function supplied as the `default_factory` argument, which will generate a value. A simple use of `defaultdict` is set `default_factory` to `int` and use it to quickly tally the counts of items in the dictionary:

In [68]:
dd = collections.defaultdict(int)
words = str.split("red blue green red yellow blue red green green red")

for word in words: dd[word] += 1
    
dd

defaultdict(int, {'red': 4, 'blue': 2, 'green': 3, 'yellow': 1})

You will notice that if we tried to do this with an ordinary dictionary, we would get a key error when we tried to add the first key. The `int` we supplied as an argument to the `defaultdict` is really the `int()` function that simply returns a zero.

We can, of course, create a function that will determine the dictionary's values. For example, the following function returns `True` if the supplied argument is a primary color, that is `red`, `green`, or `blue`, or returns `False` otherwise:

In [69]:
def isprimary(c):
    if (c == 'red') or (c == 'blue') or (c == 'green'):
        return True
    else:
        return False

### Learning about named tuples
The `namedtuple` method returns a tuple-like object that has fields accessible with named indices as well as the integer indicies of normal tuples. This allows for code that is, to a certain extent, self-documenting and more readable. It can be especially useful in an application where there are a large number of tuples and we need to easily keep track of what each tuple represents. Furthermore, `namedtuple` inherits methods from tuple and it is backward-compatible with tuple.

The field names are passed to the `namedtuple` method as a comma and/or whitespace-separated values. They can also be passed as a sequence of strings. Field names are single strings, and they can be any legal Python identifier that does not begin with a digit or an underscore:

In [81]:
space = collections.namedtuple("space", "x y z")
s1 = space(x = 2., y = 4., z = 10)               # We can also use space(2., 4., 10)
print(s1)
print(s1.x * s1.y * s1.z)

space(x=2.0, y=4.0, z=10)
80.0


In addition to the inherited tuple methods, the named tuple also defines three methods of its own, `_make()`, `asdict()`, and `_replace()`. These methods begin with an underscore to prevent potential conflicts with field names. The `_make()` method takes an iterable as an argument and turns it into a named tuple object:

In [82]:
s1 = [4, 5, 6]
space._make(s1)
print(s1)

[4, 5, 6]


The `_asdict` method returns an `OrderedDict` object with the field names mapped to index keys and the values mapped to the dictionary values. The `_replace` method returns a new instance of the tuple, replacing the specified values. In addition, `_fields` returns the tuple of string listing the fields names. The `_fields_defaults` method provides dictionary mapping field names to the default values.

In [87]:
space = collections.namedtuple("space", "x y z")
s1 = space(x = 2., y = 4., z = 10)
print(s1._asdict)
print(s1._fields)

<bound method space._asdict of space(x=2.0, y=4.0, z=10)>
('x', 'y', 'z')


### Arrays
The `array` module defines a data type array that is similar to the list data type except for the constraint that their contents must be of a single type of the underlying representation as is determined by the machine architecture or underlying C implementation.

| Code | C type | Python type | Minimum bytes |
| :--- | :----- | :---------- | :------------ |
| `b` | `signedchar` | int | 1 |
| `B` | `unsignedchar` | int | 1 |
| `u` | `Py_UNICODE` | Unicodecharacter | 2 |
| `h` | `signedshort` | int | 2 |
| `H` | `unsignedshort` | int | 2 |
| `i` | `signedint` | int | 2 |
| `I` | `unsignedint` | int | 2 |
| `l` | `signedlong` | int | 4 |
| `L` | `unsignedlong` | int | 8 |
| `q` | `signedlonglong` | int | 8 |
| `Q` | `unsignedlonglong` | int | 8 |
| `f` | `float` | float | 4 |
| `d` | `double` | float | 8 |

The array objects support the attributes and methods:

| Attribute or method | Description |
| :------------------ | :---------- |
| `a.itemsize` | The size of one array item in bytes. |
| `a.append(x)` | Appends element `x` at the end of array `a`. |
| `a.buffer_info()` | Returns a tuple containing the current memory location and length of <br> the buffer used to store the array `a`. |
| `a.byteswap()` | Swaps the byte order of each item in the array `a`. |
| `a.count(x)` | Returns the number of occurrences of `x` in the array `a`. |
| `a.extend(b)` | Appends all the elements from iterable `b` at the end of array `a`. |
| `a.frombytes(s)` | Appends elements from string `s`, where the string is an array of <br> machine values. |
| `a.fromfile(f, n)` | Reads `n` machine values from the file `f` and appends them at the end of <br> the array. |
| `a.fromlist(l)` | APpends all of the elements from the list `l` to the array `a`. |
| `a.fromunicode(s)` | Extends an array of the `u` (unicode) type with the Unicode string, `s`. |
| `a.index(x)` | Returns the first (smallest) index of the `x` element in `a`. |
| `a.insert(i, x)` | Inserts an item of which the value is `x`, in the array at `i` index <br> position. |
| `a.pop([i])` | Returns the item at index, `i`, and removes it from the array. |
| `a.remove(x)` | Removes the first occurence of the x item from the array. |
| `a.reverse()` | Reverses the order of items in the array `a`. |
| `a.tofile(f)` | Writes all the elements to the file object `f`. |
| `a.tolist()` | Converts the array into a list. |
| `a.tounicode()` | Converts an array of the `u` type into a Unicode string. |

Array objects support all of the normal sequence operations such as indexing, slicing, concatenation, and multiplication.

Using arrays, as opposed to lists, is a much **more efficient way to store data that is of the same type.** In the following example, we have created an integer array of the digits from 0 to one million minus 1, and an identical list. Storing one million integers in an integer array requires arround 45% of the memory of an equivalent list:

In [88]:
import array

ba = array.array('i', range(10 ** 6))
bl = list(range(10 ** 6))

import sys
# Compare the sizes of these two objects:
100. * sys.getsizeof(ba) / sys.getsizeof(bl)

45.46534532014713

Because we are interested in saving space, that is, we are dealing with large datasets and limited memory size, we usually perform in-place operations on arrays, and only create copies when we need to. Typically, enumerate is used to perform an operation on each element. IN the following snippet, we perform the simple operation of adding one to each item in the array.

It should be noted that when performing operations on arrays that create lists, such as list comprehensions, the memory efficiency gains of using an array in the first place will be negated. When we need to create a new data object, a solution is to use a generator expression to perform the operation.

Arrays created with this module are unsuitable for work that requires a matrix of vector operations. Numerical work can be done with the NumPy extension.