# Fluent Python: Chapter 2
### Clear, Concise, and Effective Programming
### O'Reilly -- Ramalho# 

Mastering list comprehensions opens up the door to generator expressions, which -- among other uses -- can produce elements to fill up sequences of any type.  Both are the subject of the next section.

## List Comprehensions and Generator Expression

A quick way to build a sequence is using a list comprehension (or listcom) if the target is a list, or a generator expression or (genexp) for all other types of sequences. 

##### Example 2-2:

In [6]:
symbols = '$œ∑®†¥' #Note different symbols than textbook used
codes = [ord(symbol) for symbol in symbols]
codes

[36, 339, 8721, 174, 8224, 165]

## Cartesian Products

##### Example 2-4 Cartesian Products using list comprehensions

In [7]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors 
                         for size in sizes]

In [8]:
tshirts

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

This generates a list of tuples arranged by color, then size. Note how the resulting list is arranged as if the for loops were nested in the same order as they appear in the listcomp.

Adding a line break between the "for" clauses enhances readability

## Generator Expression

To initalize tuples, arrays, and other sequences, you could start from a listcomp, but genexps are more efficient. 

Genexps use the same syntax as listcomps, but are enclosed in parentheses rather than brackets. 

##### Example 2-5. Initializing a tuple and an array from a general expression. 

In [23]:
example_tuple = tuple(ord(symbol) for symbol in symbols)
example_tuple

(36, 339, 8721, 174, 8224, 165)

In [10]:
type(example_tuple)

tuple

In [11]:
import array 

In [12]:
example_array = array.array('I', (ord(symbol) for symbol in symbols))

In [13]:
type(example_array)

array.array

In [14]:
example_array

array('I', [36, 339, 8721, 174, 8224, 165])

Note that the $array$ constructor takes 2 arguments. The first argument of the $array$ constructor, "I", indicates the storage types (integer). Because two inputs are required, the parentheses around the genexp are required. 

##### Example 2-6. Cartesian product in a generator expression

In [15]:
for tshirt in ('%s %s' % (c, z) for c in colors for z in sizes):
    print(tshirt)

black S
black M
black L
white S
white M
white L


### Tuples as Records

##### Example 2-7. Tuples used as records

In [16]:
lax_coordinates = (33.9425, -118.408056)
# Example of tuple unpacking by assigning tuple values to multiple variables in a single statement
# This is called parallel assignment
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)
traveler_ids = [('USA','31195855'),('BRA', 'CE342567'), ('ESP', 'XDA205856')]
for passport in traveler_ids: # iterating over the list, passport is bound to each tuple
    #Example of tuple unpacking using the % operator
    print('%s/%s' % passport) # The % formatting operator understands tuples and treats each item as a separate field

USA/31195855
BRA/CE342567
ESP/XDA205856


The for loop knows how to retrieve the items of a tuple separately-- this is called "unpacking." Here we are not interested in the second item so it is assigned to _, a dummy variable

In [25]:
for country, _ in traveler_ids:
    print(country)

USA
BRA
ESP


### Tuple Unpacking

Another example of tuple unpacking is prefixing an argument with a star when calling a function

In [52]:
divmod(20,8)

(2, 4)

In [53]:
t = (20,8)
divmod(*t)

(2, 4)

If you try to do it as below, it looks like you are only feeding in one tuple instead of 2 values

In [54]:
divmod(t)

TypeError: divmod expected 2 arguments, got 1

In [55]:
quotient, remainder = divmod(*t)
print('Quotient: '+str(quotient) + '  Remainder: ' + str(remainder))

Quotient: 2  Remainder: 4


The `os.path.split()` function builds a tuple `(path, last_part)` from a filesystem path. The dummy variable `_` can be used as a placeholder for the unpacked parts of the tuple we do not care about. 

In [56]:
import os 
_, filename = os.path.split('/home/luciano/.ssh/idrsa.pub')
filename

'idrsa.pub'

#### Using * to grab excess items

Defining function parameters with `*args` to grab arbitrary excess arguments is a classic python feature. In Python 3, this idea was extended to apply to parallel assignment as well:

In [57]:
a, b, *rest = range(5)
a, b, rest

(0, 1, [2, 3, 4])

In the context of parallel assignment, the * prefix can be applied to exactly one variable, but can appear in any position

In [32]:
a, *body, c, d = range(5)
a, body, c, d 

(0, [1, 2], 3, 4)

In [34]:
*head, a, b, c, d = range(7)
head, a, b, c, d

([0, 1, 2], 3, 4, 5, 6)

### Nested Tuple Unpacking

##### Example 2-8. Unpacking nested tuples to access the longitude

In [50]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.7,139.7)),
    ('Delhi NCR', 'IN', 21.9, (28.6, 77.2)),
    ('Mexico City', 'MX', 21.9, (28.6, -99.2)),
    ('New York-Newark', 'US', 20.1, (40.8, -74.275687657))
]

In [51]:
print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt =  '{:15} | {:^9.4f} | {:^9.4f}' # print exactly 4 places past the decimal
for name, cc, pop, (latitude, longitude) in metro_areas:
    if longitude <= 0:
        print(fmt.format(name, latitude, longitude))

                |   lat.    |   long.  
Mexico City     |  28.6000  | -99.2000 
New York-Newark |  40.8000  | -74.2757 


When using tuples, sometimes it is desirable to name the fields using named tuples

### Named Tuples 

The `collections.namedtuple` function is a factory that produces subclasses of `tuple` enhanced with field names and a class name -- which helps debugging

Recall the card example from Chapter 1's Example 1-1 

In [60]:
import collections
Card = collections.namedtuple('Card', ['rank','suit'])

##### Exmaple 2-9. Defining and using a named tuple type

In [98]:
# collections is a module that implements specialized container datatypes 
# providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.
from collections import namedtuple

In [81]:
# Two parameters are required to create a named tuple: a call name and a list of field names
# which can be given as an iterable of strings or as a single space-delimited string
City = namedtuple('city', 'name country population, coordinates') # lowercase 'city' is the call name

# Data must be passed as positional arguments to the constructor
# (in contrast, the `tuple` constructor takes a single iterable)
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.2342))
tokyo

city(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.2342))

You can access the fields by name or position

In [77]:
tokyo.population

36.933

In [78]:
tokyo.coordinates

(35.689722, 139.2342)

In [79]:
tokyo[1]

'JP'

A named tuple type has a few attributes in addition to those inherited from `tuple`. Example 2-10 shows the most useful: the `_fields` class attribute, the class method `_make(iterable)`, and the `_asdict()` instance method

##### Example 2-10. Named tuples attributes and methods (continued from the previous example)

In [85]:
City._fields # _fields is a tuple with the field names of the class 
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
# _make() is a class method that allows you to instantiate a named tuple from and iterable.
# City(*delhi_data) would do the same
delhi = City._make(delhi_data)

In [90]:
delhi

city(name='Delhi NCR', country='IN', population=21.935, coordinates=LatLong(lat=28.613889, long=77.208889))

In [91]:
delhi.coordinates

LatLong(lat=28.613889, long=77.208889)

In [92]:
delhi.coordinates.lat

28.613889

`delhi` is an instance of the `City` class. We can call an instance method on it. `_asdict()` returns a `collections.OrderedDict` built from the named tuple instance. That can be used to produce a nice display of city data

In [99]:
delhi._asdict()

{'name': 'Delhi NCR',
 'country': 'IN',
 'population': 21.935,
 'coordinates': LatLong(lat=28.613889, long=77.208889)}

tuples are not only powerful as records. They also act as an immutable variant of the `list` type

### Slicing

##### Example 2-11. Line items from a flat-file invoice

In [67]:
invoice="""
0.....6..........................40.......52...55........
1909  Pimoroni PiBrella                  $17.50     3    $52.50
1489  6mm Tactile Switch x20             $4.95      2    $9.90
1510  Panavise Jr. - PV-201              $28.00     1    $28.00
"""

In [68]:
SKU = slice(0,6)
DESCRIPTION = slice(6,40)
UNIT_PRICE = slice(40,52)
QUANTITY = slice(52,55)
ITEM_TOTAL = slice(55,None)
# make each line an element in a list
# Get rid of the first 2 lines: the empty line and the line with the periods
line_items = invoice.split('\n')[2:]

In [74]:
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

 $17.50      Pimoroni PiBrella                 
 $4.95       6mm Tactile Switch x20            
 $28.00      Panavise Jr. - PV-201             
 


In [28]:
invoice.split('\n')[0:]

['',
 '0.....6..........................40...........52...55......',
 '1909  Pimoroni PiBrella          $17.50  3  $52.50',
 '1489  6mm Tactile Switch x20     $4.95   2  $9.90',
 '1510  Panavise Jr. - PV-201      $28.00  1  $28.00',
 '']

### Multidimensional Slicing and Elipsis

Python recognizes `...` as a shortcut when slicing arrays of many dimensions. For example, if `x` is a 4D array, `x[i, ...] = x[i, :, :, :]`

### Building Lists of Lists

##### Example 2-12. A list with three lists of length 3 can represent a tic-tac-toe board

In [81]:
board = [['-']*3 for i in range(3)]
board[1][2] = 'X'
board

[['-', '-', '-'], ['-', '-', 'X'], ['-', '-', '-']]

This mechanically functions as:

In [86]:
board = [] 
for i in range(3):
    row = [['-']*3]
    board.append(row)
board

[[['-', '-', '-']], [['-', '-', '-']], [['-', '-', '-']]]

##### Example 2-13. A list with 3 references to the same list is useless

In [82]:
weird_board = [['-']*3]*3
weird_board[1][2] = 'O'
weird_board

[['-', '-', 'O'], ['-', '-', 'O'], ['-', '-', 'O']]

CAUTION: In the example above, the outer list is made of 3 references to the inner list. All rows are aliases referring to the same object. The mechanics and pitfalls of references and mutable objects is explored in chapter 8 to explain why this behavior exists. 

In [99]:
weird_board = [] 
row = ['-']*3
for i in range(3):
    weird_board.append(row)
weird_board

[['-', '-', '-'], ['-', '-', '-'], ['-', '-', '-']]

In [100]:
weird_board[1][2] = 'O'
weird_board

[['-', '-', 'O'], ['-', '-', 'O'], ['-', '-', 'O']]

### Augmented Assignment with Sequences

The augmented assignment operators += and *= behave very differently depending on the first operand. The special method that makes `+=` work is `__iadd__` (for "in-place addition"). However, if `__iadd__` is not implemented, Python falls back to calling `__add__`. Mutable sequences (`list`, `bytearray`, `array.array`) can implement `__iadd__`. 

Here, `l` is a list and is changed in place, as evidences by its same place in memory.

In [102]:
l = [1, 2, 3]
id(l)

140252835140224

In [103]:
l *= 2
print(id(l))
print(l)

140252835140224
[1, 2, 3, 1, 2, 3]


Here, `t` is a tuple and is not changed in place. After multiplication, a new tuple is created.

In [107]:
t = (1, 2, 3)
print(id(t))
t *= 2 
print(id(t))
t

140252835226368
140252832693984


(1, 2, 3, 1, 2, 3)

##### Example 2-15. The unexpected result: item t2 is changed and an exception is raised

In [108]:
t = (1, 2, [30, 40])
t[2] += [50, 60]

TypeError: 'tuple' object does not support item assignment

In [109]:
t

(1, 2, [30, 40, 50, 60])

Python first performs the in place addition for the mutable list in t[2]. Then it tries to assign that new list to t[2], but an error is thrown. We can look at the bytecode to see how this happens.

##### Example 2-16. Bytecode for the expression s[a] += b

The `dis` module supports the analysis of CPython bytecode by disassembling it.

In [112]:
import dis
dis.dis('s[a] += b')

  1           0 LOAD_NAME                0 (s)
              2 LOAD_NAME                1 (a)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_NAME                2 (b)
             10 INPLACE_ADD
             12 ROT_THREE
             14 STORE_SUBSCR
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE


Step 4: Pus `s` on the top of the stack (TOS) <br>
Step 10: Perform TOS + `b`. This succeeds if TOS is a mutable list <br>
Step 14: Assigns `s[a]` = TOS. This fails if `s` is immutable as in the tuple `t` in example 2-15 


Takeaways from this example: <br>
$\bullet$ Putting mutable items in tuples is not a good idea <br>
$\bullet$ Augmented assignment is not an atomic operation-- we just saw it throwing an exception after doing part of its job (atomic operations are an instruciton feed to an execution unit that cannot be stopped in the middle) <br>
$\bullet$ Inspecting Python bytecode is not too difficult, and is often helpful to see what is going on under the hood.

### list.sort and the sorted Built-In Function

The `list.sort` methods sorts a list in place -- that is, without making a copy. It returns `None` to remind us that it changes the target object, and does not create a new list. This is an important Python API convention: functions or methods that change an object in place should return `None` to make it clear to the caller that the object itself was changed, and no new object was created. the same behavior can be seen, for example, in the `random.shuffle` function. <br>
In contrast, the built-in function `sorted` creates and returns a new list. It accepts any iterable object as an argument, including immutable sequences and generators.  <br>
Both `list.sort` and `sorted` take two optional, keyword-only arguments: <br>
$\bullet$ `reverse`: If `True`, the items are returned in descending order. The default is `False`. <br>
$\bullet$ `key`: A one-argument function that will be applied to each item to produce its sorting key. For example, when sorting a list of strings, `key=str.lower` can be used to perform a case-sensitive sort, and `key=len` will sort the strings by character length. The defauls is the identity functions (i.e., the items themselves are comapred)

In [115]:
fruits = ['grape', 'rasberry', 'apple', 'banana']
print(id(fruits))
alphabetical_fruits = sorted(fruits)
print(alphabetical_fruits)
print(id(alphabetical_fruits))

140252834934080
['apple', 'banana', 'grape', 'rasberry']
140252835650048


In [117]:
sorted(fruits, reverse=True)

['rasberry', 'grape', 'banana', 'apple']

Note, because `len('grape')==len('apple')`, `grape` is sorted before `apple` because that is the relative order in which they appear in `fruits`

In [118]:
sorted(fruits, key=len)

['grape', 'apple', 'banana', 'rasberry']

There strings are sorted in descending order. It is not the reverse order of the previous result because the sorting is stable, so again, `grape` appears before `apple`

In [119]:
sorted(fruits, key=len, reverse=True)

['rasberry', 'banana', 'grape', 'apple']

The ordering of `fruits` has remained unchanged.

In [122]:
fruits

['grape', 'rasberry', 'apple', 'banana']

Applying `.sort()` sorts the list in place, and returns None (which the console omits)

In [123]:
fruits.sort()

In [124]:
fruits

['apple', 'banana', 'grape', 'rasberry']

In [125]:
return_val = fruits.sort()

In [130]:
print(return_val)

None


In [131]:
type(return_val)

NoneType

Once your sequences are sorted, they can be efficiently searched. Fortunately, the standard binary search algorithm is already provided in the `bisect` module of the Python standard library

### Managing Ordered Sequences with bisect

The `bisect` module offers two main functions -- `bisect` and `insort` -- that use the binary search algorithm to quickly find and insert items in any sorted sequence.

#### Searching with bisect

`bisect(haystack, needle)` does a binary search for `needle` in `haystack` -- which must be a sorted sequence -- to locate the position where `needle` can be inserted while maintaining `haystack` in ascending order. In other words, all items appearing up to that position are less than or equal to needle. You could yse the result of `bisect(haystack, needle)` as the `index` argument to `haystack.insert(index,needle)`--however, using `insort` does both steps, and is faster.

##### Example 2-17, bisect finds insertion points for items in a sorted sequence

In [132]:
import bisect
import sys

In [148]:
HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]

ROW_FMT = '{0:2d} @ {1:2d}    {2}{0:<2d}'

def demo(bisect_fn):
    for needle in reversed(NEEDLES):
        position = bisect_fn(HAYSTACK, needle)  # <1>
        offset = position * '  |'  # <2>
        print(ROW_FMT.format(needle, position, offset))  # <3>

if __name__ == '__main__':

    if sys.argv[-1] == 'left':    # <4>: Choose the `bisect` function to use according to last command-line argument
        bisect_fn = bisect.bisect_left
    else:
        bisect_fn = bisect.bisect

    print('DEMO:', bisect_fn.__name__)  # <5>
    print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK))
    demo(bisect_fn)

DEMO: bisect_right
haystack ->  1  4  5  6  8 12 15 20 21 23 23 26 29 30
31 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |31
30 @ 14      |  |  |  |  |  |  |  |  |  |  |  |  |  |30
29 @ 13      |  |  |  |  |  |  |  |  |  |  |  |  |29
23 @ 11      |  |  |  |  |  |  |  |  |  |  |23
22 @  9      |  |  |  |  |  |  |  |  |22
10 @  5      |  |  |  |  |10
 8 @  5      |  |  |  |  |8 
 5 @  3      |  |  |5 
 2 @  1      |2 
 1 @  1      |1 
 0 @  0    0 


Each row above starts with the notation needle @ position. And the needle value appears again below its insertion point in the haystack.

In [151]:
HAYSTACK

[1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]

##### Example 2-18. Given a test score, grade returns the corresponding letter grade

In [152]:
def grade(score, breakpoints = [60, 70, 80, 90], grades = 'FDCBA'):
    i = bisect.bisect(breakpoints, score)
    return grades[i]

In [153]:
[grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

['F', 'A', 'C', 'C', 'B', 'A', 'A']

### Inserting with bisect.insort

Sorting is expensive, so once you have a sorted sequence, it's good to keep it that way. That is why `bisect.insort` was created

##### Example 2-19. Insort keeps a sorted sequence always sorted

In [154]:
import bisect
import random

In [156]:
SIZE = 7

random.seed(1729)

my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE*2)
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

10 -> [10]
 0 -> [0, 10]
 6 -> [0, 6, 10]
 8 -> [0, 6, 8, 10]
 7 -> [0, 6, 7, 8, 10]
 2 -> [0, 2, 6, 7, 8, 10]
10 -> [0, 2, 6, 7, 8, 10, 10]


### When a List is Not the Answer

The `list` type is flexible and easy to use, but depending on specific requirements, there are better options. The example, if you need to store 10 million floating-point values, an `array` is much more efficient, because and `array` does not actually hold full-fledged `float` objects, but only the packages bytes representing their machine values -- just like an array in the C language. 

General tip: If your code does a lot of containment checks (e.g. `item in my_collection`), consider using a `set` for `my_collection`, especially if it holds a large number of items. Sets are optimized for fast membership checking. But they are not sequences (their content is unordered). We cover them in Chapter 3.  

The remainder  of this chapter discusses mutable sequence types that can replace lists in many cases, starting with arrays.

### Arrays 

If the list will only contain numbers, an `array.array` is more efficient than a `list`: it supports all mutable sequence operations (`.pop`, `.insert`, `.extend`) and additional methods for fast loading and saving such as `.frombytes` and `.tofile`

When creating an `array`, you provide a type-code, a letter to determine the underlying C type used to store each item in the array. For example, `b` is the typecode for `signed char`. If you create `array('b')`, the each item will be stored in a single byte and interpreted as an integer from -128 to 127. For large sequences of numbersm this saves a lot of memory. 

##### Example 2-20. Creating, saving, and loading a large array of floats

In [165]:
from array import array 
from random import random

floats = array('d', (random() for i in range(10**7)))

In [166]:
floats[-1]

0.7138849263281074

In [169]:
fp = open('aux_files/floats.bin', 'wb')
floats.tofile(fp)
fp.close()