# Lists and Dictionaries

### Item 11: Know How to Slice Sequences

✦ Avoid being verbose when slicing: Don’t supply 0 for the start index or the length of the sequence for the end index.

✦ Slicing is forgiving of start or end indexes that are out of bounds, which means it’s easy to express slices on the front or back bound- aries of a sequence (like a[:20] or a[-20:]).

✦ Assigning to a list slice replaces that range in the original sequence with what’s referenced even if the lengths are different.

Slicing can be extended to any Python class that implements the __ getitem__ and __ setitem__ special methods

The basic form of the slicing syntax is somelist[start:end], where start is inclusive and end is exclusive:

In [1]:
a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
print('Middle two:  ', a[3:5])
print('All but ends:', a[1:7])

Middle two:   ['d', 'e']
All but ends: ['b', 'c', 'd', 'e', 'f', 'g']


When slicing from the start of a list, you should leave out the zero index to reduce visual noise:


In [None]:
assert a[:5] == a[0:5]

When slicing to the end of a list, you should leave out the final index
because it’s redundant:


In [None]:
assert a[5:] == a[5:len(a)]

Using negative numbers for slicing

In [None]:
print(a[-4:])
print(a[-4:-1])

Slicing deals properly with start and end indexes that are beyond the boundaries of a list by silently omitting missing items.

In [None]:
print(a[:30])
print(a[-20:])

The result of slicing a list is a whole new list. References to the objects from the original list are maintained. Modifying the result of slicing won’t affect the original list:

In [None]:
b = a[3:]
print('Before:   ', b)
b[1] = 99
print('After:    ', b)
print('No change:', a)

When used in assignments, slices replace the specified range in the original list.

In [None]:
a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
print('Before ', a)
a[2:7] = [99,  14]
print('After  ', a)


print('Before ', a)
a[2:3] = [47, 11]
print('After  ', a)

If you leave out both the start and the end indexes when slicing, you end up with a copy of the original list:

In [None]:
b = a[:]
assert b == a and b is not a
if(b == a and b is not a):
    print('hi')

If you assign to a slice with no start or end indexes, you replace the entire contents of the list with a copy of what’s referenced (instead of allocating a new list):

In [None]:
b=a
print('Before a', a)
print('Before b', b) 
a[:] = [101, 102, 103] 
assert a is b
print('After a ', a)
print('After b ', b)

### Item 12: Avoid Striding and Slicing in a Single Expression

✦ Specifying start, end, and stride in a slice can be extremely confusing.

✦ Prefer using positive stride values in slices without start or end indexes. Avoid negative stride values if possible.

✦ Avoid using start, end, and stride together in a single slice. If you need all three parameters, consider doing two assignments (one to stride and another to slice) or using islice from the itertools built-in module.

Python has special syntax for the stride of a slice in the form somelist[start:end:stride]. This lets you take every nth item when slicing a sequence.

In [None]:
x = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']
odds = x[::2]
evens = x[1::2]
print(odds)
print(evens)

The point is that the stride part of the slicing syntax can be extremely confusing. Having three numbers within the brackets is hard enough to read because of its density. 

If you must use a stride with start or end indexes, consider using one assignment for striding and another for slicing:

In [None]:
x = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y = x[::2]   # ['a', 'c', 'e', 'g']
z = y[1:-1]  # ['c', 'e']
print(z)

 Striding and then slicing creates an extra shallow copy of the data. The first operation should try to reduce the size of the resulting slice by as much as possible. If your program can’t afford the time or mem- ory required for two steps, consider using the itertools built-in mod- ule’s islice method (see Item 36: “Consider itertools for Working with Iterators and Generators”), which is clearer to read and doesn’t permit negative values for start, end, or stride.

### Item 13: Prefer Catch-All Unpacking Over Slicing

✦ Unpacking assignments may use a starred expression to catch all values that weren’t assigned to the other parts of the unpacking pattern into a list.

✦ Starred expressions may appear in any position, and they will always become a list containing the zero or more values they receive.

✦ When dividing a list into non-overlapping pieces, catch-all unpack- ing is much less error prone than slicing and indexing.

Python supports catch-all unpacking through a starred expression. This syntax allows one part of the unpacking assignment to receive all values that didn’t match any other part of the unpacking pattern. 

In [None]:
car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest, second_oldest, *others = car_ages_descending
print(oldest, second_oldest, others)


A starred expression may appear in any position, so you can get the benefits of catch-all unpacking anytime you need to extract one slice:

In [None]:
oldest, *others, youngest = car_ages_descending
print(oldest, youngest, others)
*others, second_youngest, youngest = car_ages_descending
print(youngest, second_youngest, others)

However, to unpack assignments that contain a starred expres- sion, you must have at least one required part, or else you’ll get a SyntaxError. You can’t use a catch-all expression on its own:

In [None]:
# *others = car_ages_descending


You also can’t use multiple catch-all expressions in a single-level unpacking pattern:

In [None]:
# first, *middle, *second_middle, last = [1, 2, 3, 4]

If there are no leftover items from the sequence being unpacked, the catch-all part will be an empty list. This is especially useful when you’re pro- cessing a sequence that you know in advance has at least N elements:

In [None]:
short_list = [1, 2]
first, second, *rest = short_list
print(first, second, rest)

Processing the results of this generator using indexes and slices is fine, but it requires multiple lines and is visually noisy:

In [None]:
def generate_csv():
    yield ('Date', 'Make' , 'Model', 'Year', 'Price')
    ...
all_csv_rows = list(generate_csv())
print(all_csv_rows)
header = all_csv_rows[0]
rows = all_csv_rows[1:]
print('CSV Header:', header)
print('Row count: ', len(rows))

Unpacking with a starred expression makes it easy to process the first row—the header—separately from the rest of the iterator’s contents. This is much clearer:

In [None]:
it = generate_csv()
header, *rows = it
print('CSV Header:', header)
print('Row count: ', len(rows))

Keep in mind, however, that because a starred expression is always turned into a list, unpacking an iterator also risks the potential of using up all of the memory on your computer and causing your pro- gram to crash. So you should only use catch-all unpacking on itera- tors when you have good reason to believe that the result data will all fit in memory 

### Item 14: Sort by Complex Criteria Using the key Parameter


✦ The sort method of the list type can be used to rearrange a list’s contents by the natural ordering of built-in types like strings, inte- gers, tuples, and so on.

✦ The sort method doesn’t work for objects unless they define a natu- ral ordering using special methods, which is uncommon.

✦ The key parameter of the sort method can be used to supply a helper function that returns the value to use for sorting in place of each item from the list.

✦ Returning a tuple from the key function allows you to combine mul- tiple sorting criteria together. The unary minus operator can be used to reverse individual sort orders for types that allow it.

✦ For types that can’t be negated, you can combine many sorting cri- teria together by calling the sort method multiple times using dif- ferent key functions and reverse values, in the order of lowest rank sort call to highest rank sort call.

By default, sort will order a list’s contents by the natural ascending order of the items.

In [None]:
numbers = [93, 86, 11, 68, 70]
numbers.sort()
print(numbers)

Sorting objects of this type doesn’t work because the sort method tries to call comparison special methods that aren’t defined by the class:

In [None]:
class Tool:
    def __init__(self, name, weight):
        self.name = name
        self.weight = weight
    def __repr__(self):
        return f'Tool({self.name!r}, {self.weight})'
    
tools = [
    Tool('level', 3.5),
    Tool('hammer', 1.25),
    Tool('screwdriver', 0.5),
    Tool('chisel', 0.25),
]

tools.sort()

Often there’s an attribute on the object that you’d like to use for sort- ing. To support this use case, the sort method accepts a key param- eter that’s expected to be a function. The key function is passed a single argument, which is an item from the list that is being sorted. The return value of the key function should be a comparable value (i.e., with a natural ordering) to use in place of an item for sorting purposes.



Within the lambda function passed as the key parameter you can access attributes of items as I’ve done here, index into items (for sequences, tuples, and dictionaries), or use any other valid expression.

In [None]:
tools.sort(key=lambda x: x.weight)
print('By weight:', tools)

In [None]:
places = ['home', 'work', 'New York', 'Paris']
places.sort()
print('Case sensitive:  ', places)
places.sort(key=lambda x: x.lower())
print('Case insensitive:', places)

In [None]:
def test(input):
    return input[1]

a = ["abaa","bab","dc"]
a.sort(key=test)
print(a)


Sometimes you may need to use multiple criteria for sorting. For example, say that I have a list of power tools and I want to sort them first by weight and then by name. How can I accomplish this?

In [None]:

power_tools = [
    Tool('drill', 4),
    Tool('circular saw', 5),
    Tool('jackhammer', 40),
    Tool('sander', 4),
]

The simplest solution in Python is to use the tuple type. Tuples are immutable sequences of arbitrary Python values. Tuples are compara- ble by default and have a natural ordering, meaning that they imple- ment all of the special methods, such as __lt__, that are required by the sort method. Tuples implement these special method comparators by iterating over each position in the tuple and comparing the cor- responding values one index at a time. Here, I show how this works when one tool is heavier than another:

In [None]:
saw = (5, 'circular saw')
jackhammer = (40, 'jackhammer')
assert not (jackhammer < saw)  # Matches expectations

If the first position in the tuples being compared are equal—weight in this case—then the tuple comparison will move on to the second position, and so on:

In [None]:
drill = (4, 'drill')
sander = (4, 'sander')
assert drill[0] == sander[0]  # Same weight
assert drill[1] < sander[1]   # Alphabetically less
assert drill < sander         # Thus, drill comes first

In [None]:

power_tools.sort(key=lambda x: (x.weight, x.name),reverse= True)
print(power_tools)

For numerical values it’s possible to mix sorting directions by using the unary minus operator in the key function. This negates one of the values in the returned tuple, effectively reversing its sort order while leaving the others intact. Here, I use this approach to sort by weight descending, and then by name ascending (note how 'sander' now comes after 'drill' instead of before):

In [None]:
power_tools = [
    Tool('drill', 4),
    Tool('circular saw', 5),
    Tool('jackhammer', 40),
    Tool('sander', 4),
]
power_tools.sort(key=lambda x: (-x.weight, x.name))
print(power_tools)



Unfortunately, unary negation isn’t possible for all types. Here, I try to achieve the same outcome by using the reverse argument to sort by weight descending and then negating name to put it in ascending order:

In [None]:
power_tools = [
    Tool('drill', 4),
    Tool('circular saw', 5),
    Tool('jackhammer', 40),
    Tool('sander', 4),
]
power_tools.sort(key=lambda x: (x.weight, -x.name),reverse = True)



For situations like this, Python provides a stable sorting algorithm. The sort method of the list type will preserve the order of the input list when the key function returns values that are equal to each other. This means that I can call sort multiple times on the same list to combine different criteria together.

In [None]:
power_tools = [
    Tool('drill', 4),
    Tool('circular saw', 5),
    Tool('jackhammer', 40),
    Tool('sander', 4),
]
power_tools.sort(key=lambda x: x.name)
print(power_tools)
power_tools.sort(key=lambda x: x.weight,
                 reverse=True)
print(power_tools)

In [None]:
power_tools = [
    Tool('drill', 4),
    Tool('circular saw', 5),
    Tool('jackhammer', 40),
    Tool('sander', 4),
]
power_tools.sort(key=lambda x: x.weight,
                 reverse=True)
print(power_tools)




This same approach can be used to combine as many different types of sorting criteria as you’d like in any direction, respectively. You just need to make sure that you execute the sorts in the opposite sequence of what you want the final list to contain. In this example, I wanted the sort order to be by weight descending and then by name ascend- ing, so I had to do the name sort first, followed by the weight sort.

That said, the approach of having the key function return a tuple, and using unary negation to mix sort orders, is simpler to read and requires less code. I recommend only using multiple calls to sort if it’s absolutely necessary.

### Item 15: Be Cautious When Relying on dict Insertion Ordering

✦ Since Python 3.7, you can rely on the fact that iterating a dict instance’s contents will occur in the same order in which the keys were initially added.

✦ Python makes it easy to define objects that act like dictionaries but that aren’t dict instances. For these types, you can’t assume that insertion ordering will be preserved.

✦ There are three ways to be careful about dictionary-like classes: Write code that doesn’t rely on insertion ordering, explicitly check for the dict type at runtime, or require dict values using type anno- tations and static analysis.


Starting with Python 3.6, and officially part of the Python specifica- tion in version 3.7, dictionaries will preserve insertion order. Now, this code will always print the dictionary in the same way it was originally created by the programmer:

In [None]:
baby_names = {
    'cat': 'kitten',
    'dog': 'puppy',
}
print(baby_names)

Again, you can now assume that the order of assignment for these
instance fields will be reflected in __ dict__:

In [None]:
class MyClass:
    def __init__(self):
        self.alligator = 'hatchling'
        self.elephant = 'calf'
a = MyClass()
for key, value in a.__dict__.items():
    print(f'{key} = {value}')



However, you shouldn’t always assume that insertion ordering behav- ior will be present when you’re handling dictionaries. Python makes it easy for programmers to define their own custom container types that emulate the standard protocols matching list, dict, and other types (see Item 43: “Inherit from collections.abc for Custom Con- tainer Types”). Python is not statically typed, so most code relies on duck typing—where an object’s behavior is its de facto type—instead of rigid class hierarchies. This can result in surprising gotchas.

In [None]:
votes = {
    'otter': 1281,
    'polar bear': 587,
    'fox': 863, }
def populate_ranks(votes, ranks):
    names = list(votes.keys())
    names.sort(key=votes.get, reverse=True)
    for i, name in enumerate(names, 1):
        ranks[name] = i

def get_winner(ranks):
    return next(iter(ranks))
ranks = {}
populate_ranks(votes,ranks)
print(ranks)
winner = get_winner(ranks)
print(winner)
    


Now, imagine that the requirements of this program have changed. The UI element that shows the results should be in alphabet- ical order instead of rank order. To accomplish this, I can use the collections.abc built-in module to define a new dictionary-like class that iterates its contents in alphabetical order:

In [None]:
from collections.abc import MutableMapping
class SortedDict(MutableMapping):
    def __init__(self):
        self.data = {}
    def __getitem__(self, key):
        return self.data[key]
    def __setitem__(self, key, value):
        self.data[key] = value
    def __delitem__(self, key):
        del self.data[key]
    def __iter__(self):
        keys = list(self.data.keys())
        keys.sort()
        for key in keys:
            yield key
    def __len__(self):
        return len(self.data)

I can use a SortedDict instance in place of a standard dict with the functions from before and no errors will be raised since this class conforms to the protocol of a standard dictionary. However, the result is incorrect:

In [None]:
sorted_ranks = SortedDict()
populate_ranks(votes, sorted_ranks)
print(sorted_ranks.data)
winner = get_winner(sorted_ranks)

# The result is not what we expected
print(winner)

The problem here is that the implementation of get_winner assumes that the dictionary’s iteration is in insertion order to match populate_ranks. This code is using SortedDict instead of dict, so that assumption is no longer true. Thus, the value returned for the winner is 'fox', which is alphabetically first.

There are three ways to mitigate this problem. First, I can reimple- ment the get_winner function to no longer assume that the ranks dic- tionary has a specific iteration order. This is the most conservative and robust solution:

In [None]:
def get_winner(ranks):
    for name, rank in ranks.items():
        if rank == 1:
            return name
winner = get_winner(sorted_ranks)
print(winner)

The second approach is to add an explicit check to the top of the func- tion to ensure that the type of ranks matches my expectations, and to raise an exception if not. This solution likely has better runtime performance than the more conservative approach:

In [None]:

def get_winner(ranks):
    if not isinstance(ranks, dict):
        raise TypeError('must provide a dict instance')
    return next(iter(ranks))
get_winner(sorted_ranks)

The third alternative is to use type annotations to enforce that the value passed to get_winner is a dict instance and not a MutableMapping with dictionary-like behavior (see Item 90: “Consider Static Analysis via typing to Obviate Bugs”). Here, I run the mypy tool in strict mode on an annotated version of the code above:

In [None]:
from typing import Dict, MutableMapping
def populate_ranks(votes: Dict[str, int],
                   ranks: Dict[str, int]) -> None:
    names = list(votes.keys())
    names.sort(key=votes.get, reverse=True)
    for i, name in enumerate(names, 1):
        ranks[name] = i
def get_winner(ranks: Dict[str, int]) -> str:
    return next(iter(ranks))
class SortedDict(MutableMapping[str, int]):
    ...
votes = {
    'otter': 1281,
    'polar bear': 587,
'fox': 863, }
sorted_ranks = SortedDict()
populate_ranks(votes, sorted_ranks)
print(sorted_ranks.data)
winner = get_winner(sorted_ranks)
print(winner)

### Item 16: Prefer get Over in and KeyError to Handle Missing Dictionary Keys


✦ There are four common ways to detect and handle missing keys in dictionaries: using in expressions, KeyError exceptions, the get method, and the setdefault method.

✦ The get method is best for dictionaries that contain basic types like counters, and it is preferable along with assignment expres- sions when creating dictionary values has a high cost or may raise exceptions.

✦ When the setdefault method of dict seems like the best fit for your problem, you should consider using defaultdict instead.

In [None]:
counters = {
    'pumpernickel': 2,
    'sourdough': 1,
}

To increment the counter for a new vote, I need to see if the key exists, insert the key with a default counter value of zero if it’s missing, and then increment the counter’s value. This requires accessing the key two times and assigning it once. Here, I accomplish this task using an if statement with an in expression that returns True when the key is present:

In [None]:
key = 'wheat'

In [None]:
if key in counters:
    count = counters[key]
else:
    count = 0
counters[key] = count + 1

Another way to accomplish the same behavior is by relying on how dictionaries raise a KeyError exception when you try to get the value for a key that doesn’t exist. This approach is more efficient because it requires only one access and one assignment:

In [None]:
try:
    count = counters[key]
except KeyError:
    count = 0
counters[key] = count + 1

This flow of fetching a key that exists or returning a default value is so common that the dict built-in type provides the get method to accomplish this task. The second parameter to get is the default value to return in the case that the key—the first parameter—isn’t present. This also requires only one access and one assignment, but it’s much shorter than the KeyError example:

In [None]:
count = counters.get(key, 0)
counters[key] = count + 1

What if the values of the dictionary are a more complex type, like a list? For example, say that instead of only counting votes, I also want to know who voted for each type of bread. Here, I do this by associat- ing a list of names with each key:

In [25]:
votes = {
    'baguette': ['Bob', 'Alice'],
    'ciabatta': ['Coco', 'Deb'],
}
key = 'brioche'
who = 'Elmer'
if key in votes:
    names = votes[key]
else:
    votes[key] = names = []
names.append(who)
print(votes)


{'baguette': ['Bob', 'Alice'], 'ciabatta': ['Coco', 'Deb'], 'brioche': ['Elmer']}


The triple assignment statement (votes[key] = names = []) populates the key in one line instead of two. Once the default value has been inserted into the dic- tionary, I don’t need to assign it again because the list is modified by reference in the later call to append.

Similarly, you can use the get method to fetch a list value when the key is present, or do one fetch and one assignment if the key isn’t present:


In [26]:
names = votes.get(key)
if names is None:
    votes[key] = names = []

The approach that involves using get to fetch list values can further be shortened by one line if you use an assignment expres- sion (introduced in Python 3.8; see Item 10: “Prevent Repetition with Assignment Expressions”) in the if statement, which improves readability:

In [35]:
votes = {
    'baguette': ['Bob', 'Alice'],
    'ciabatta': ['Coco', 'Deb'],
}
key = 'brioche'
who = 'Elmer'
if (names := votes.get(key)) is None:
    votes[key] = names = []
names.append(who)
print(votes)

{'baguette': ['Bob', 'Alice'], 'ciabatta': ['Coco', 'Deb'], 'brioche': ['Elmer']}


The dict type also provides the setdefault method to help shorten this pattern even further. setdefault tries to fetch the value of a key in the dictionary. If the key isn’t present, the method assigns that key to the default value provided. And then the method returns the value for that key: either the originally present value or the newly inserted default value. Here, I use setdefault to implement the same logic as in the get example above:

In [28]:
names = votes.setdefault(key, [])
names.append(who)


There’s also one important gotcha: The default value passed to setdefault is assigned directly into the dictionary when the key is missing instead of being copied. Here, I demonstrate the effect of this when the value is a list:

In [30]:
data = {}
key = 'foo'
value = []
data.setdefault(key, value)
print('Before:', data)
value.append('hello')
print('After: ', data)

Before: {'foo': []}
After:  {'foo': ['hello']}


This example effectively demonstrates that in Python, variables are essentially references to objects. When you modify the contents of a mutable object, such as a list, the object itself remains the same, and thus its memory address remains unchanged. However, when you assign a new object to a variable, the object that the variable refers to changes, and therefore its memory address also changes.

In [50]:
data = {}
key = 'foo'
value = []
print(id(value))
data.setdefault(key, value)
print('Before:', data)
value.append('hello')
print('After: ', data)
print(id(value))

# key = 'ddd'
value = []
print(id(value))
# data.setdefault(key, value)
value.append('ccccc')
print('Finally: ', data)



140368831295040
Before: {'foo': []}
After:  {'foo': ['hello']}
140368831295040
140368817745344
Finally:  {'foo': ['hello']}


For this situation, I demonstrate the effect by using get

In [34]:
votes = {
    'baguette': ['Bob', 'Alice'],
    'ciabatta': ['Coco', 'Deb'],
}
key = 'brioche'
who = 'Elmer'
if (names := votes.get(key)) is None:
    votes[key] = names = []
names.append(who)
key = 'dddd'
who = 'c'
if (names := votes.get(key)) is None:
    votes[key] = names = []
names.append(who)
print(votes)

{'baguette': ['Bob', 'Alice'], 'ciabatta': ['Coco', 'Deb'], 'brioche': ['Elmer'], 'dddd': ['c']}


Going back to the earlier example that used counters for dictionary values instead of lists of who voted: Why not also use the setdefault method in that case? Here, I reimplement the same example using this approach:

In [51]:
count = counters.setdefault(key, 0)
counters[key] = count + 1

The problem here is that the call to setdefault is superfluous. You always need to assign the key in the dictionary to a new value after you increment the counter, so the extra assignment done by setdefault is unnecessary. The earlier approach of using get for counter updates requires only one access and one assignment, whereas using setdefault requires one access and two assignments.

### Item 17: Prefer defaultdict Over setdefault to Handle Missing Items in Internal State


✦ If you’re creating a dictionary to manage an arbitrary set of poten- tial keys, then you should prefer using a defaultdict instance from the collections built-in module if it suits your problem.

✦ If a dictionary of arbitrary keys is passed to you, and you don’t con- trol its creation, then you should prefer the get method to access its items. However, it’s worth considering using the setdefault method for the few situations in which it leads to shorter code.

When working with a dictionary that you didn’t create, there are a variety of ways to handle missing keys (see Item 16: “Prefer get Over in and KeyError to Handle Missing Dictionary Keys”). Although using the get method is a better approach than using in expressions and KeyError exceptions, for some use cases setdefault appears to be the shortest option.


For example, say that I want to keep track of the cities I’ve visited in countries around the world. Here, I do this by using a dictionary that maps country names to a set instance containing corresponding city names:

In [1]:
visits = {
    'Mexico': {'Tulum', 'Puerto Vallarta'},
    'Japan': {'Hakone'},
}

I can use the setdefault method to add new cities to the sets, whether the country name is already present in the dictionary or not. This approach is much shorter than achieving the same behavior with the get method and an assignment expression (which is available as of
Python 3.8):


In [2]:

visits.setdefault('France', set()).add('Arles')  # Short
if (japan := visits.get('Japan')) is None: # Long
    visits['Japan'] = japan = set()
japan.add('Kyoto')
print(visits)

{'Mexico': {'Puerto Vallarta', 'Tulum'}, 'Japan': {'Kyoto', 'Hakone'}, 'France': {'Arles'}}


What about the situation when you do control creation of the dictionary being accessed? This is generally the case when you’re using a dictionary instance to keep track of the internal state of a class, for example. Here, I wrap the example above in a class with helper methods to access the dynamic inner state stored in a dictionary:

In [3]:
class Visits:
    def __init__(self):
        self.data = {}
    def add(self, country, city):
        city_set = self.data.setdefault(country, set())
        city_set.add(city)

In [4]:
visits = Visits()
visits.add('Russia', 'Yekaterinburg')
visits.add('Tanzania', 'Zanzibar')
print(visits.data)

{'Russia': {'Yekaterinburg'}, 'Tanzania': {'Zanzibar'}}


However, the implementation of the Visits.add method still isn’t ideal. The setdefault method is still confusingly named, which makes it more difficult for a new reader of the code to immediately understand what’s happening. And the implementation isn’t efficient because it constructs a new set instance on every call, regardless of whether the given country was already present in the data dictionary.

Luckily, the defaultdict class from the collections built-in module simplifies this common use case by automatically storing a default value when a key doesn’t exist. All you have to do is provide a function that will return the default value to use each time a key is missing (an example of Item 38: “Accept Functions Instead of Classes for Sim- ple Interfaces”). Here, I rewrite the Visits class to use defaultdict:

In [11]:
from collections import defaultdict
class Visits:
    def __init__(self):
        self.data = defaultdict(set)
    def add(self, country, city):
        self.data[country].add(city)
visits = Visits()
visits.add('England', 'Bath')
visits.add('England', 'London')
visits.add('Australia', 'Sydney')
print(visits.data)
for i in visits.data.items():
    print(i)

defaultdict(<class 'set'>, {'England': {'London', 'Bath'}, 'Australia': {'Sydney'}})
('England', {'London', 'Bath'})
('Australia', {'Sydney'})


More about defaultdict

In [2]:
from collections import defaultdict

# Using int as a default value factory. For non-existent keys, int() will return 0
count = defaultdict(int)
# A simple list
my_list = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']

# Counting occurrences of each element
for item in my_list:
    count[item] += 1

print(count)  # Output: defaultdict(<class 'int'>, {'apple': 3, 'banana': 2, 'orange': 1})


KeyError: 'apple'

### Item 18: Know How to Construct Key-Dependent Default Values with __ missing__



✦ The setdefault method of dict is a bad fit when creating the default value has high computational cost or may raise exceptions.

✦ The function passed to defaultdict must not require any argu- ments, which makes it impossible to have the default value depend on the key being accessed.

✦ You can define your own dict subclass with a __ missing__ method in order to construct default values that must know which key was being accessed.

The built-in dict type’s setdefault method results in shorter code when handling missing keys in some specific circumstances (see Item 16: “Prefer get Over in and KeyError to Handle Missing Dictionary Keys” for examples). For many of those situations, the better tool for the job is the defaultdict type from the collections built-in module (see Item 17: “Prefer defaultdict Over setdefault to Handle Missing Items in Internal State” for why). However, there are times when nei- ther setdefault nor defaultdict is the right fit.

In [3]:
pictures = {}
path = 'profile_1234.png'
if (handle := pictures.get(path)) is None:
    try:
        handle = open(path, 'a+b')
    except OSError:
        print(f'Failed to open path {path}')
        raise 
    else:
        pictures[path] = handle
handle.seek(0)
image_data = handle.read()

Fortunately, this situation is common enough that Python has another built-in solution. You can subclass the dict type and imple- ment the __missing__ special method to add custom logic for han- dling missing keys. Here, I do this by defining a new class that takes advantage of the same open_picture helper method defined above:

In [4]:
class Pictures(dict):
    def __missing__(self, key):
        value = open_picture(key)
        self[key] = value
        return value
pictures = Pictures()
handle = pictures[path]
handle.seek(0)
image_data = handle.read()


NameError: name 'open_picture' is not defined