## Unpacking Elements from Iterables of Arbitrary Length

Python **'star expressions'** can be used to address this problem (Python 3)


In [10]:
user_record = ('Dave', 'dave@example.com', '773-555-1212', '847-555-1212')
name, email, *phone_numbers = user_record

print ("Name: ", name)
print ("Email: ", email)
print ("Phone Number: ", phone_numbers)

Name:  Dave
Email:  dave@example.com
Phone Number:  ['773-555-1212', '847-555-1212']


<hr>
#### The starred variable can also be the first one in the list.

In [20]:
*trailing, current = [10, 8, 7, 1, 9, 5, 10, 3]
print (trailing)
print (current)

[10, 8, 7, 1, 9, 5, 10]
3


#### It is worth noting that the star syntax can be especially useful when iterating over a sequence of tuples of varying length.

In [12]:

records = [
    ('foo', 1, 2),
    ('bar', 'hello'),
    ('foo', 3, 4),
]
def do_foo(x, y):
    print('foo', x, y)
def do_bar(s):
    print('bar', s)
for tag, *args in records:
    if tag == 'foo':
        do_foo(*args)
    elif tag == 'bar':
        do_bar(*args)

foo 1 2
bar hello
foo 3 4


<hr>
#### Star unpacking can also be useful when combined with certain kinds of string processing operations, such as splitting.

In [22]:

line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false'
uname, *fields, homedir, sh = line.split(':')
print (uname)
print (homedir)
print (sh)


nobody
/var/empty
/usr/bin/false


<hr>
### Keeping the Last N Items
#### You want to keep a limited history of the last few items seen during iteration or during some other kind of processing.

Keeping a limited history is a perfect use for a collections.deque. Using deque(maxlen=N) creates a fixed-sized queue. When new items are added and
the queue is full, the oldest item is automatically removed.

In [5]:
from collections import deque

q = deque(maxlen=3)
q.append(1)
q.append(2)
q.append(3)
print ('New Deque of maximum length created: ')
print (q)

q.append(4)
print ('Added item to right and the leftmost item is automatically popped: ')
print (q)

q.appendleft(5)
print ('Added item to left and the rightmost item is automatically popped: ')
print (q)

New Deque of maximum length created: 
deque([1, 2, 3], maxlen=3)
Added item to right and the leftmost item is automatically popped: 
deque([2, 3, 4], maxlen=3)
Added item to left and the rightmost item is automatically popped: 
deque([5, 2, 3], maxlen=3)


Although you could manually perform such operations on a list (e.g., appending, deleting,
etc.), the queue solution is far more elegant and runs a lot faster. Adding or popping items from either end of a queue has O(1) complexity. This is unlike
a list where inserting or removing items from the front of the list is O(N).

<hr>
### Finding Largest or Smallest N items
#### You want to make a list of the largest or smallest N items in a collection.

The heapq module has two functions—nlargest() and nsmallest()—that do exactly
what you want.


In [6]:
import heapq

nums = [1, 6, 3, 5, 33, -34, 45, 66, -87, 0, 21]

print (heapq.nlargest(3, nums))

print (heapq.nsmallest(3, nums))


[66, 45, 33]
[-87, -34, 0]


Both functions also accept a key parameter that allows them to be used with more complicated data structures.

In [12]:
portfolio = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]

cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
print (cheap)

max_share = heapq.nlargest(1, portfolio, key=lambda s: s['shares'])
print (max_share)

[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}]
[{'name': 'FB', 'shares': 200, 'price': 21.09}]


If you are looking for the N smallest or largest items and N is small compared to the overall size of the collection, these functions provide superior performance. Underneath the covers, they work by first converting the data into a list where items are ordered as a heap.

In [18]:
import heapq

nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
heap = list(nums)

heapq.heapify(heap)
print (heap)

[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]


The most important feature of a heap is that heap[0] is always the smallest item. Moreover,
subsequent items can be easily found using the heapq.heappop() method, which
pops off the first item and replaces it with the next smallest item (an operation that
requires O(log N) operations where N is the size of the heap).

In [19]:
print (heapq.heappop(heap))
print (heap)

print (heapq.heappop(heap))
print (heap)

-4
[1, 2, 2, 23, 7, 8, 18, 23, 42, 37]
1
[2, 2, 8, 23, 7, 37, 18, 23, 42]


The **nlargest()** and **nsmallest()** functions are most appropriate if you are trying to find a relatively small number of items. 

If you are simply trying to find the single smallest or largest item (N=1), it is faster to use min() and max(). 

Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice (i.e.,
use sorted(items)[:N] or sorted(items)[-N:]).

It should be noted that the actual
implementation of nlargest() and nsmallest() is adaptive in how it operates and will
carry out some of these optimizations on your behalf (e.g., using sorting if N is close to
the same size as the input).

<hr>
### Implementing a Priority Queue
#### Implement a queue that sorts items by a given priority and always returns the item with the highest priority on each pop operation.



In [9]:
import heapq

class PriorityQueue:
    def __init__(self):
        self._queue = []
        self._index = 0
        
    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item))
        self._index += 1
        
    def pop(self):
        return heapq.heappop(self._queue)[-1]
    

class Item:
    def __init__(self, name):
        self.name = name
        
    def __repr__(self):
        return 'Item(%s)' % self.name
    
q = PriorityQueue()
q.push(Item('foo'), 1)
q.push(Item('bar'), 5)
q.push(Item('spam'), 4)
q.push(Item('grok'), 1)

print (q.pop())
print (q.pop())
print (q.pop())
print (q.pop())

Item(bar)
Item(spam)
Item(foo)
Item(grok)


The queue consists of tuples of the form (-priority, index, item). The
priority value is negated to get the queue to sort items from highest priority to lowest
priority. This is opposite of the normal heap ordering, which sorts from lowest to highest
value.
The role of the index variable is to properly order items with the same priority level.
By keeping a constantly increasing index, the items will be sorted according to the order
in which they were inserted. However, the index also serves an important role in making
the comparison operations work for items that have the same priority level.


<hr>
### Mapping Keys to Multiple Values in a Dictionary

#### You want to make a dictionary that maps keys to more than one value (a so-called “multidict”).

Dictionary is a mapping where each key is mapped to a single value. If you want to
map keys to multiple values, you need to store the multiple values in another container
such as a list or set.

To easily construct such dictionaries, you can use defaultdict in the collections
module. A feature of defaultdict is that it automatically initializes the first value so
you can simply focus on adding items.

In [11]:
from collections import defaultdict

d = defaultdict(list)
d['a'].append(1)
d['b'].append(3)
d['a'].append(5)

print (d)


city_list = [('TX','Austin'), ('TX','Houston'),
             ('NY','Albany'), ('NY', 'Syracuse'),
             ('NY', 'Buffalo'), ('NY', 'Rochester'),
             ('TX', 'Dallas'), ('CA','Sacramento'),
             ('CA', 'Palo Alto'), ('GA', 'Atlanta')]


cities_by_state = defaultdict(list)
for state, city in city_list:
    cities_by_state[state].append(city)

print (cities_by_state)


defaultdict(<class 'list'>, {'a': [1, 5], 'b': [3]})
defaultdict(<class 'list'>, {'TX': ['Austin', 'Houston', 'Dallas'], 'NY': ['Albany', 'Syracuse', 'Buffalo', 'Rochester'], 'CA': ['Sacramento', 'Palo Alto'], 'GA': ['Atlanta']})


This recipe is strongly related to the problem of grouping records together in data processing
problems.

<hr>
### Keeping Dictionaries in Order

#### You want to create a dictionary, and you also want to control the order of items when iterating or serializing.

To control the order of items in a dictionary, you can use an OrderedDict from the
collections module. It exactly preserves the original insertion order of data when
iterating.


In [12]:
from collections import OrderedDict

d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
d['d'] = 4

for key in d:
    print (key, d[key])


a 1
b 2
c 3
d 4


An OrderedDict can be particularly useful when you want to build a mapping that you
may want to later serialize or encode into a different format. For example, if you want
to precisely control the order of fields appearing in a JSON encoding, first building the
data in an OrderedDict will do the trick

In [15]:
import json

print (json.dumps(d))

{"a": 1, "b": 2, "c": 3, "d": 4}


<hr>
### Calculating with dictionaries

#### You want to perform various calculations (e.g., minimum value, maximum value, sorting, etc.) on a dictionary of data.

In order to perform useful calculations on the dictionary contents, it is often useful to
invert the keys and values of the dictionary using zip(). For example, here is how to
find the minimum and maximum price and stock name:

In [43]:
prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

min_price = min(zip(prices.values(), prices.keys()))
print ('Minimum price: ', min_price)

max_price = max(zip(prices.values(), prices.keys()))
print ('Maximum price: ', max_price)

prices_sorted = sorted(zip(pricebs.values(), prices.keys()))
print ('Prices sorted in desc order', prices_sorted)

print ('\nAnother method to get key and value')
print ('key: ', min(prices, key=lambda k:prices[k]))
print ('value: ', prices[min(prices, key=lambda k:prices[k])])

Minimum price:  (10.75, 'FB')
Maximum price:  (612.78, 'AAPL')
Prices sorted in desc order [(10.75, 'FB'), (37.2, 'HPQ'), (45.23, 'ACME'), (205.55, 'IBM'), (612.78, 'AAPL')]

Another method to get key and value
key:  FB
value:  10.75


The solution involving zip() solves the problem by “inverting” the dictionary into a
sequence of (value, key) pairs. When performing comparisons on such tuples, the
value element is compared first, followed by the key. This gives you exactly the behavior
that you want and allows reductions and sorting to be easily performed on the dictionary
contents using a single statement.

It should be noted that in calculations involving (value, key) pairs, the key will be
used to determine the result in instances where multiple entries happen to have the same
value. For instance, in calculations such as min() and max(), the entry with the smallest
or largest key will be returned if there happen to be duplicate values. For example

In [26]:
prices = { 'AAA' : 45.23, 'ZZZ': 45.23 }
print (min(zip(prices.values(), prices.keys())))
print (max(zip(prices.values(), prices.keys())))

(45.23, 'AAA')
(45.23, 'ZZZ')


### Finding common between two dictionaries

#### You have two dictionaries and want to find out what they might have in common (same keys, same values, etc.).


In [None]:
a = {
    'x': 1,
    'y': 2,
    'z': 3
}

b = {
    'w': 10,
    'x': 11,
    'y': 2
}

print ('Find keys in common: ')
print (a.keys() & b.keys())

print ('\nFind keys in a that are not in b: ')
print (a.keys() - b.keys())

print ('\nFind (key, value) pairs in common')
print (a.items() & b.items())

print ('\nMake a new dictionary with certain keys removed')
c = {key:a[key] for key in a.keys() - {'z', 'w'}}
print (c)

A dictionary is a mapping between a set of keys and values. The keys() method of a
dictionary returns a keys-view object that exposes the keys. A little-known feature of
keys views is that they also support common set operations such as unions, intersections,
and differences. Thus, if you need to perform common set operations with dictionary
keys, you can often just use the keys-view objects directly without first converting them
into a set.

The items() method of a dictionary returns an items-view object consisting of (key,
value) pairs. This object supports similar set operations and can be used to perform
operations such as finding out which key-value pairs two dictionaries have in common.

Although similar, the values() method of a dictionary does not support the set operations
described in this recipe. In part, this is due to the fact that unlike keys, the items
contained in a values view aren’t guaranteed to be unique. This alone makes certain set
operations of questionable utility. However, if you must perform such calculations, they
can be accomplished by simply converting the values to a set first.

### Removing duplicates while maintaining order from a sequence

If the values in the sequence are hashable, the problem can be easily solved using a set
and a generator.

In [38]:
def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)
            
a = [1, 5, 6, 2, 1, 9, 2, 5, 10]

print (list(dedupe(a)))


[1, 5, 6, 2, 9, 10]


If you are trying to eliminate
duplicates in a sequence of unhashable types (such as dicts), you can make a slight
change to this recipe, as follows:

In [63]:
def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)          
            

a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
print (list(dedupe(a, key=lambda d: (d['x'], d['y']))))

print (list(dedupe(a, key=lambda d: (d['x']))))

[{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]
[{'x': 1, 'y': 2}, {'x': 2, 'y': 4}]


<hr>
### Determining the Most Frequently Occurring Items in a Sequence

#### You have a sequence of items, and you’d like to determine the most frequently occurring items in the sequence.

The collections.Counter class is designed for just such a problem. It even comes with
a handy most_common() method that will give you the answer.

In [64]:
from collections import Counter

words = [
    'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
    'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
    'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
    'my', 'eyes', "you're", 'under'
]

word_counts = Counter(words)
top_three = word_counts.most_common(3)
print (top_three)

[('eyes', 8), ('the', 5), ('look', 4)]


If you want to increment the count manually, you could use the update() method.

In [66]:
morewords = ['why','are','you','not','looking','in','my','eyes']

word_counts.update(morewords)

print (word_counts)

Counter({'eyes': 9, 'the': 5, 'look': 4, 'my': 4, 'into': 3, 'not': 2, 'around': 2, "don't": 1, "you're": 1, 'under': 1, 'why': 1, 'are': 1, 'you': 1, 'looking': 1, 'in': 1})


<hr>
### Sorting a list of dictionaries by common key

#### You have a list of dictionaries and you would like to sort the entries according to one or more of the dictionary values.

Sorting this type of structure is easy using the operator module’s itemgetter function.

In [7]:
from operator import itemgetter

rows = [
    {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
    {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
    {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
    {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]

rows_by_fname = sorted(rows, key=itemgetter('fname'))
rows_by_uid = sorted(rows, key=itemgetter('uid'))

print ('Rows sorted by firstname: ')
print (rows_by_fname)
print ('\nRows sorted by uid: ')
print (rows_by_uid)

rows_by_lfname = sorted(rows, key=lambda r: (r['lname'],r['fname']))
print ('\nRows sorted by last and first name: ')
print (rows_by_lfname)

print ('\nlowest uid row: ', min(rows, key=itemgetter('uid')))
print ('\nhighest uid row', max(rows, key=itemgetter('uid')))

Rows sorted by firstname: 
[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]

Rows sorted by uid: 
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]

Rows sorted by last and first name: 
[{'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}]

lowest uid row:  {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}

highest uid row {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}


<hr>
### Sorting objects without native comparison support

#### You want to sort objects of the same class, but they don’t natively support comparison operations.

The built-in sorted() function takes a key argument that can be passed a callable that
will return some value in the object that sorted will use to compare the objects.

For
example, if you have a sequence of User instances in your application, and you want to
sort them by their user_id attribute, you would supply a callable that takes a User
instance as input and returns the user_id.

In [12]:
class User:
    def __init__(self, user_id):
        self.user_id = user_id
        
    def __repr__(self):
        return 'User({})'.format(self.user_id)
    
users = [User(23), User(3), User(99)]

print(sorted(users, key=lambda u: u.user_id))

[User(3), User(23), User(99)]


Instead of using lambda, an alternative approach is to use operator.attrgetter()

In [11]:
from operator import attrgetter

print (sorted(users, key=attrgetter('user_id')))

[User(3), User(23), User(99)]


<hr>
### Grouping Records together based on a field

#### You have a sequence of dictionaries or instances and you want to iterate over the data in groups based on the value of a particular field, such as date.

The itertools.groupby() function is particularly useful for grouping together like this.

In [18]:
from operator import itemgetter
from itertools import groupby

rows = [
    {'address': '5412 N CLARK', 'date': '07/01/2012'},
    {'address': '5148 N CLARK', 'date': '07/04/2012'},
    {'address': '5800 E 58TH', 'date': '07/02/2012'},
    {'address': '2122 N CLARK', 'date': '07/03/2012'},
    {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
    {'address': '1060 W ADDISON', 'date': '07/02/2012'},
    {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
    {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

rows.sort(key=itemgetter('date'))

for date, items in groupby(rows, key=itemgetter('date')):
    print (date)
    for i in items:
        print ('       ', i)

07/01/2012
        {'address': '5412 N CLARK', 'date': '07/01/2012'}
        {'address': '4801 N BROADWAY', 'date': '07/01/2012'}
07/02/2012
        {'address': '5800 E 58TH', 'date': '07/02/2012'}
        {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}
        {'address': '1060 W ADDISON', 'date': '07/02/2012'}
07/03/2012
        {'address': '2122 N CLARK', 'date': '07/03/2012'}
07/04/2012
        {'address': '5148 N CLARK', 'date': '07/04/2012'}
        {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}


The groupby() function works by scanning a sequence and finding sequential “runs”
of identical values (or values returned by the given key function). On each iteration, it
returns the value along with an iterator that produces all of the items in a group with
the same value.
An important preliminary step is sorting the data according to the field of interest. Since
groupby() only examines consecutive items, failing to sort first won’t group the records
as you want.

If your goal is to simply group the data together by dates into a large data structure that
allows random access, you may have better luck using defaultdict() to build a
multidict.

In [30]:
from collections import defaultdict

rows = [
    {'address': '5412 N CLARK', 'date': '07/01/2012'},
    {'address': '5148 N CLARK', 'date': '07/04/2012'},
    {'address': '5800 E 58TH', 'date': '07/02/2012'},
    {'address': '2122 N CLARK', 'date': '07/03/2012'},
    {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
    {'address': '1060 W ADDISON', 'date': '07/02/2012'},
    {'address': '4801 N BROADWAY', 'date': '07/01/2012'},
    {'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

rows_by_date = defaultdict(list)

for row in rows:
    rows_by_date[row['date']].append(row)

print ('The defaultdict with the updated data: ')
print (rows_by_date)

for r in rows_by_date:    ### the way to retrieve data
    print ('\n', rows_by_date[r])

The defaultdict with the updated data: 
defaultdict(<class 'list'>, {'07/01/2012': [{'address': '5412 N CLARK', 'date': '07/01/2012'}, {'address': '4801 N BROADWAY', 'date': '07/01/2012'}], '07/04/2012': [{'address': '5148 N CLARK', 'date': '07/04/2012'}, {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}], '07/02/2012': [{'address': '5800 E 58TH', 'date': '07/02/2012'}, {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}, {'address': '1060 W ADDISON', 'date': '07/02/2012'}], '07/03/2012': [{'address': '2122 N CLARK', 'date': '07/03/2012'}]})

 [{'address': '5412 N CLARK', 'date': '07/01/2012'}, {'address': '4801 N BROADWAY', 'date': '07/01/2012'}]

 [{'address': '5148 N CLARK', 'date': '07/04/2012'}, {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}]

 [{'address': '5800 E 58TH', 'date': '07/02/2012'}, {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}, {'address': '1060 W ADDISON', 'date': '07/02/2012'}]

 [{'address': '2122 N CLARK', 'date': '07/03/2012'}]


<hr>
### Filtering sequence elements

#### You have data inside of a sequence, and need to extract values or reduce the sequence using some criteria.

The easiest way to filter sequence data is often to use a list comprehension, One potential downside of using a list comprehension is that it might produce a large
result if the original input is large. If this is a concern, you can use generator expressions
to produce the filtered values iteratively.

In [38]:
mylist = [1, 4, -5, 10, -7, 2, 5, -4]

list_com = [i for i in mylist if i > 0]
print ('new list created: ', list_com)

generator_exp = (i for i in mylist if i > 0)
print ('generator expression: ', generator_exp)

print ('Access data from generator object: ')
for x in generator_exp:
    print (x)

new list created:  [1, 4, 10, 2, 5]
generator expression:  <generator object <genexpr> at 0x0000000005560CA8>
Access data from generator object: 
1
4
10
2
5


Sometimes, the filtering criteria cannot be easily expressed in a list comprehension or
generator expression. For example, suppose that the filtering process involves exception
handling or some other complicated detail. For this, put the filtering code into its own
function and use the built-in filter() function.

In [1]:
values = ['1', '2', '-3', '-', '4', 'N/A', '5']

def is_int(val):
    try:
        x = int(val)
        return True
    except ValueError:
        return False
    
ivals = list(filter(is_int, values))  ## filter creates an iterator
print (ivals)

['1', '2', '-3', '4', '5']


One variation on filtering involves replacing the values that don’t meet the criteria with
a new value instead of discarding them. For example, perhaps instead of just finding
positive values, you want to also clip bad values to fit within a specified range. This is
often easily accomplished by moving the filter criterion into a conditional expression

In [4]:
mylist = [1, 4, -5, 10, -7, 2, 5, -4]

replace_neg_with_zero = [n if n > 0 else 0 for n in mylist]
print (replace_neg_with_zero)

replace_pos_with_zero = [n if n < 0 else 0 for n in mylist]
print (replace_pos_with_zero)

[1, 4, 0, 10, 0, 2, 5, 0]
[0, 0, -5, 0, -7, 0, 0, -4]


Another notable filtering tool is `itertools.compress()`, which takes an iterable and
an accompanying Boolean selector sequence as input. As output, it gives you all of the
items in the iterable where the corresponding element in the selector is True. This can
be useful if you’re trying to apply the results of filtering one sequence to another related
sequence.

Suppose you want to make a list of all addresses where the corresponding count value was greater than 5

In [7]:
from itertools import compress

addresses = [
    '5412 N CLARK',
    '5148 N CLARK',
    '5800 E 58TH',
    '2122 N CLARK'
    '5645 N RAVENSWOOD',
    '1060 W ADDISON',
    '4801 N BROADWAY',
    '1039 W GRANVILLE',
]

counts = [ 0, 3, 10, 4, 1, 7, 6, 1]

more5 = [n > 5 for n in counts]
print (more5)

new_address_list = list(compress(addresses, more5))
print ('\nnew_address_list : ', new_address_list)

[False, False, True, False, False, True, True, False]

new_address_list:  ['5800 E 58TH', '4801 N BROADWAY', '1039 W GRANVILLE']


The key here is to first create a sequence of Booleans that indicates which elements
satisfy the desired condition. The compress() function then picks out the items corresponding
to True values.

Like filter(), compress() normally returns an iterator. Thus, you need to use list()
to turn the results into a list if desired.

<hr>
### Extracting Subset of a dictionary

#### You want to make a dictionary that is a subset of another dictionary.

This is easily accomplished using a dictionary comprehension.

In [11]:
prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

p1 = { key:value for key, value in prices.items() if value > 200}
print ('dictionary of all prices over 200: ', p1)

tech_names = { 'AAPL', 'IBM', 'HPQ', 'MSFT' }
p2 = {key: value for key, value in prices.items() if key in tech_names}
print ('Dictionary with keys from a set: ', p2)

dictionary of all prices over 200:  {'AAPL': 612.78, 'IBM': 205.55}
Dictionary with keys from a set:  {'AAPL': 612.78, 'IBM': 205.55, 'HPQ': 37.2}


<hr>
### Mapping names to sequence elements

#### You have code that accesses list or tuple elements by position, but this makes the code somewhat difficult to read at times. You’d also like to be less dependent on position in the structure, by accessing the elements by name.

`collections.namedtuple()` provides these benefits, while adding minimal overhead
over using a normal tuple object. `collections.namedtuple()` is actually a factory
method that returns a subclass of the standard Python tuple type. You feed it a type
name, and the fields it should have, and it returns a class that you can instantiate, passing
in values for the fields you’ve defined, and so on.

In [15]:
from collections import namedtuple

Subscriber = namedtuple('Subscriber', ['email', 'joined'])

sub = Subscriber('jon@example.com', '2017-10-10')
print (sub)
print ('Subscriber\'s email: ', sub.email)
print ('Subscriber\'s joining date: ', sub.joined)

Subscriber(email='jon@example.com', joined='2017-10-10')
Subscriber's email:  jon@example.com
Subscriber's joining date:  2017-10-10


A major use case for named tuples is decoupling your code from the position of the
elements it manipulates. So, if you get back a large list of tuples from a database call,
then manipulate them by accessing the positional elements, your code could break if,
say, you added a new column to your table. Not so if you first cast the returned tuples
to namedtuples.

In [50]:
# def compute_cost(records):
#     total = 0.0
#     for r in records:
#         total += r[1] * r[2]
#     return total


# ### With namedtuple
# from collections import namedtuple

# Stock = ('Stock', 'name shares price')
# def compute_cost(records):
#     total = 0.0
#     for r in records:
#         s = Stock(*r)
#         total += s.shares * s.price
#     return total    

One possible use of a namedtuple is as a replacement for a dictionary, which requires
more space to store. Thus, if you are building large data structures involving dictionaries,
use of a namedtuple will be more efficient. However, be aware that unlike a dictionary,
a namedtuple is immutable.

In [53]:
from collections import namedtuple

Stock = namedtuple('Stock', ['name', 'shares', 'price'])

s = Stock('ACME', 100, 123.45)
s.shares = 75

AttributeError: can't set attribute

If you need to change any of the attributes, it can be done using the _replace() method
of a namedtuple instance, which makes an entirely new namedtuple with specified values
replaced.

In [54]:
from collections import namedtuple

Stock = namedtuple('Stock', 'name shares price')

s = Stock(name='ACME', shares=100, price=123.45)
s = s._replace(shares=75)
print (s)

Stock(name='ACME', shares=75, price=123.45)


A subtle use of the _replace() method is that it can be a convenient way to populate
named tuples that have optional or missing fields. To do this, you make a prototype
tuple containing the default values and then use _replace() to create new instances
with values replaced.

In [39]:
from collections import namedtuple

Stock = namedtuple('Stock', ['name', 'shares', 'price', 'date', 'time'])

# Create a prototype instance
stock_prototype = Stock('', 0, 0.0, None, None)

# Function to convert a dictionary to a Stock
def dict_to_stock(s):
    return stock_prototype._replace(**s)

a = {'name': 'ACME', 'shares': 100, 'price': 123.45}
print (dict_to_stock(a))

b = {'name': 'ACME', 'shares': 100, 'price': 123.45, 'date': '12/17/2012'}
print (dict_to_stock(b))

Stock(name='ACME', shares=100, price=123.45, date=None, time=None)
Stock(name='ACME', shares=100, price=123.45, date='12/17/2012', time=None)


<hr>
### Transforming and Reducing Data at the Same Time

#### You need to execute a reduction function (e.g., sum(), min(), max()), but first need to transform or filter the data.

A very elegant way to combine a data reduction and a transformation is to use a
generator-expression argument.

In [37]:
nums = [1, 2, 3, 4, 5]
s = sum(x**2 for x in nums)
print (s)


import os
files = os.listdir('C:/Pharji_projects')
if any(name.endswith('.py') for name in files):
    print('python')
else:
    print ('No python')

from operator import itemgetter

portfolio = [
    {'name':'GOOG', 'shares': 50},
    {'name':'YHOO', 'shares': 75},
    {'name':'AOL', 'shares': 20},
    {'name':'SCOX', 'shares': 65}
]
min_shares = min(s['shares'] for s in portfolio)
print (min_shares)

min_shares = min(portfolio, key=itemgetter('shares'))  ## alternative
print (min_shares)

55
No python
20
{'name': 'AOL', 'shares': 20}


<hr>
### Combining Multiple mappings into single mapping

#### You have multiple dictionaries or mappings that you want to logically combine into a single mapping to perform certain operations, such as looking up values or checking for the existence of keys.

easy way to do this is to use the `ChainMap` class from the collections module.

In [16]:
from collections import ChainMap

a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

c = ChainMap(a, b)
print(c['x'])   # Outputs 1 (from a)
print(c['y'])   # Outputs 2 (from b)
print(c['z'])   # Outputs 3 (from a)



1
2
3


In [18]:
values = ChainMap()
values['x'] = 1
# Add a new mapping
values = values.new_child()
values['x'] = 2
# Add a new mapping
values = values.new_child()
values['x'] = 3
print ('ChainMap with 3 mappings: ', values)

print (values['x'])

# Discard last mapping
values = values.parents
print (values['x'])

# Discard last mapping
values = values.parents
print (values['x'])

print (values)

ChainMap with 3 mappings:  ChainMap({'x': 3}, {'x': 2}, {'x': 1})
3
2
1
ChainMap({'x': 1})


As as alternative to ChainMap, you might consider merging dictionaries together using the update() method.

In [20]:
a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

merged = dict(b)
merged.update(a)

print (merged)

print (merged['x'])
print (merged['y'])
print (merged['z'])

{'y': 2, 'z': 3, 'x': 1}
1
2
3


This works, but it requires you to make a completely separate dictionary object (or
destructively alter one of the existing dictionaries). Also, if any of the original dictionaries
mutate, the changes don’t get reflected in the merged dictionary.

In [26]:
a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

merged = dict(b)
merged.update(a)

a['x'] = 13
print (merged['x'])   ### Value did not change


a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }
merged = ChainMap(a, b)
print (merged['x'])
a['x'] = 42
print (merged['x'])   ### Value is changed 

1
1
42
