<a href="https://colab.research.google.com/github/PaulToronto/Math-and-Data-Science-Reference/blob/main/Python_dict_and_defaultdict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python - `dict` and `defaultdict`

Notes from: https://realpython.com/python-defaultdict/#handling-missing-keys-in-dictionaries

## `dict`

### Defining a `dict`

#### With key:value pairs

In [1]:
MLB_team = {
    'Colorado' : 'Rockies',
    'Boston'   : 'Red Sox',
    'Minnesota': 'Twins',
    'Milwaukee': 'Brewers',
    'Seattle'  : 'Mariners'
}

#### With a constructor

In [2]:
MLB_team = dict([
    ('Colorado', 'Rockies'),
    ('Boston', 'Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers'),
    ('Seattle', 'Mariners')
])

### Accessing `dict` values

In [3]:
MLB_team['Minnesota']

'Twins'

```python
# raises a KeyErrer exception
MLB_team['Toronto']
```

### Adding and updating an entry

In [4]:
MLB_team['Toronto'] = 'Blue'

MLB_team

{'Colorado': 'Rockies',
 'Boston': 'Red Sox',
 'Minnesota': 'Twins',
 'Milwaukee': 'Brewers',
 'Seattle': 'Mariners',
 'Toronto': 'Blue'}

In [5]:
MLB_team['Toronto'] = 'Blue Jays'

MLB_team

{'Colorado': 'Rockies',
 'Boston': 'Red Sox',
 'Minnesota': 'Twins',
 'Milwaukee': 'Brewers',
 'Seattle': 'Mariners',
 'Toronto': 'Blue Jays'}

### Deleting an entry

```
# This is formatted as code
```



In [6]:
del MLB_team['Toronto']

MLB_team

{'Colorado': 'Rockies',
 'Boston': 'Red Sox',
 'Minnesota': 'Twins',
 'Milwaukee': 'Brewers',
 'Seattle': 'Mariners'}

### Operators and Built-in Functions

In [7]:
'Milwaukee' in MLB_team, 'Toronto' in MLB_team, 'Toronto' not in MLB_team

(True, False, True)

In [8]:
len(MLB_team)

5

#### Using short-curcuit evalution to avoid raising an error

In [9]:
'Toronto' in MLB_team and MLB_team['Toronto']

False

### Built-in Methods

#### `d.clear()`

In [10]:
d = {'a': 10, 'b': 20, 'c': 30}
d

{'a': 10, 'b': 20, 'c': 30}

In [11]:
d.clear()
d

{}

#### `d.get(<key>[, <default>])`

In [12]:
d = {'a': 10, 'b': 20, 'c': 30}
d

{'a': 10, 'b': 20, 'c': 30}

In [13]:
d.get('b')

20

In [14]:
d.get('z')

In [15]:
d.get('z', -1)

-1

#### `d.items()`, `d.keys()`, and `d.values()`

In [16]:
d.items()

dict_items([('a', 10), ('b', 20), ('c', 30)])

In [17]:
for k, v in d.items():
    print(k, v)

a 10
b 20
c 30


In [18]:
d.keys()

dict_keys(['a', 'b', 'c'])

In [19]:
for v in d.values():
    print(v)

10
20
30


#### `d.pop(<key>[, <default>])`

In [20]:
d

{'a': 10, 'b': 20, 'c': 30}

In [21]:
d.pop('b')

20

In [22]:
d

{'a': 10, 'c': 30}

In [23]:
d.pop('z', -1)

-1

In [24]:
d

{'a': 10, 'c': 30}

#### `d.popitem()`

In [25]:
d = {'a': 10, 'b': 20, 'c': 30}

In [26]:
d.popitem()

('c', 30)

In [27]:
d

{'a': 10, 'b': 20}

```python
# raise a KeyError except
d = {}
d.popitem()
````

#### `d.update(<obj>)`

In [28]:
d1 = {'a': 10, 'b': 20, 'c': 30}
d2 = {'b': 200, 'd': 400}

# notice what happens to `b`
d1.update(d2)

# happens in place
d1

{'a': 10, 'b': 200, 'c': 30, 'd': 400}

In [29]:
d1 = {'a': 10, 'b': 20, 'c': 30}
d2 = {'b': 200, 'd': 400}

# notice what happens to `b`
new_dict = d1.update(d2)

# happens in place
d1

{'a': 10, 'b': 200, 'c': 30, 'd': 400}

In [30]:
# update returns None
new_dict, type(new_dict)

(None, NoneType)

In [31]:
# `<obj>` can alse a be sequence of key-value pair
d1 = {'a': 10, 'b': 20, 'c': 30}
d1.update([('b', 2000), ('d', 4000)])
d1

{'a': 10, 'b': 2000, 'c': 30, 'd': 4000}

In [32]:
# `<obj>` can alse be a list of keyward arguments
d1 = {'a': 10, 'b': 20, 'c': 30}
d1.update(b = 20000, d = 40000)
d1

{'a': 10, 'b': 20000, 'c': 30, 'd': 40000}

### Handling Missing Keys in `dict`

#### For ways:

1. `.setdefault()`
2. `.get()`
3. Use `key in a_dict` idiom
4. `try ... except` block

#### `.setdefault()`

In [33]:
a_dict = {}

# a_dict['missing_key'] ## KeyError

a_dict.setdefault('missing_key', 'default value')
a_dict['missing_key']

'default value'

```python
# doesn't work, only works for 'missing_key'
a_dict['another_missing_key'] # KeyError
```

In [34]:
# not3 that an entry was added
a_dict

{'missing_key': 'default value'}

#### `.get()`

In [35]:
a_dict = {}

print(type(a_dict.get('a')))
print(a_dict.get('a', 'can provide a default here'))

<class 'NoneType'>
can provide a default here


#### Use `key in a_dict` idiom

In [36]:
a_dict = {}

if 'key' in a_dict:
    print(a_dict['key'])
else:
    a_dict['key'] = 'missing'
    print(a_dict['key'])

missing


#### `try ... except` block

In [37]:
a_dict = {}

try:
    print(a_dict['key'])
except KeyError:
    print("error")

error


## `defaultdict`

Does two things that a `dict` does not:

1. It overrides `.__missing__()`
2. It adds `.default_factory`, a writable instance variable that needs to be provided at the time of instantiation
    - must be a "callable" or `None`

In [38]:
from collections import defaultdict

issubclass(defaultdict, dict)

True

### Example: using `list` as `.default_factory` 

In [39]:
def_dict = defaultdict(list)

def_dict
def_dict['one'] = 1 # add an item
def_dict['two'] # this adds an item with an empty list as the value
def_dict['three'].append(4) # this adds an item with list as the value and appends a 4

def_dict

defaultdict(list, {'one': 1, 'two': [], 'three': [4]})

In [40]:
def_dict['one'], def_dict['tww'], def_dict['three']

(1, [], [4])

### Grouping with `defaultdict`

In [41]:
dd = defaultdict(list)

dd['a'].append(1)
dd['a'].append(2)
dd['a'].append(3)

dd

defaultdict(list, {'a': [1, 2, 3]})

In [42]:
dep = [('Sales', 'John Doe'),
       ('Sales', 'Martin Smith'),
       ('Accounting', 'Jane Doe'),
       ('Marketing', 'Elizabeth Smith'),
       ('Marketing', 'Adam Doe')]

dep

[('Sales', 'John Doe'),
 ('Sales', 'Martin Smith'),
 ('Accounting', 'Jane Doe'),
 ('Marketing', 'Elizabeth Smith'),
 ('Marketing', 'Adam Doe')]

In [43]:
dep_dd = defaultdict(list)

for department, employee in dep:
    dep_dd[department].append(employee)

dep_dd

defaultdict(list,
            {'Sales': ['John Doe', 'Martin Smith'],
             'Accounting': ['Jane Doe'],
             'Marketing': ['Elizabeth Smith', 'Adam Doe']})

In [44]:
# can also be done with a `dict`
dep_d = dict()
for department, employee in dep:
    dep_d.setdefault(department, []).append(employee)

dep_d

{'Sales': ['John Doe', 'Martin Smith'],
 'Accounting': ['Jane Doe'],
 'Marketing': ['Elizabeth Smith', 'Adam Doe']}

### Grouping Unique Items: using `set` as `.default_factory` 

In [45]:
dep = [('Sales', 'John Doe'),
       ('Sales', 'Martin Smith'),
       ('Accounting', 'Jane Doe'),
       ('Marketing', 'Elizabeth Smith'),
       ('Marketing', 'Elizabeth Smith'),
       ('Marketing', 'Adam Doe'),
       ('Marketing', 'Adam Doe'),
       ('Marketing', 'Adam Doe')]

dep_dd = defaultdict(set)

for department, employee in dep:
    dep_dd[department].add(employee)

dep_dd

defaultdict(set,
            {'Sales': {'John Doe', 'Martin Smith'},
             'Accounting': {'Jane Doe'},
             'Marketing': {'Adam Doe', 'Elizabeth Smith'}})

### Counting Items:  using `int` as `.default_factory` 

In [46]:
a_dd = defaultdict(int)

a_dd['test'] # this will be 0

a_dd 

defaultdict(int, {'test': 0})

In [47]:
dep = [('Sales', 'John Doe'),
       ('Sales', 'Martin Smith'),
       ('Accounting', 'Jane Doe'),
       ('Marketing', 'Elizabeth Smith'),
       ('Marketing', 'Adam Doe')]

dd = defaultdict(int)

for department, _ in dep:
    dd[department] += 1

dd

defaultdict(int, {'Sales': 2, 'Accounting': 1, 'Marketing': 2})

In [48]:
s = 'mississippi'

dd = defaultdict(int)

for letter in s:
    dd[letter] += 1

dd

defaultdict(int, {'m': 1, 'i': 4, 's': 4, 'p': 2})

In [49]:
from collections import Counter

s = 'mississippi'
Counter(s)

Counter({'m': 1, 'i': 4, 's': 4, 'p': 2})

### Accumulating Values

In [50]:
incomes = [('Books', 1250.00),
           ('Books', 1300.00),
           ('Books', 1420.00),
           ('Tutorials', 560.00),
           ('Tutorials', 630.00),
           ('Tutorials', 750.00),
           ('Courses', 2500.00),
           ('Courses', 2430.00),
           ('Courses', 2750.00),]

incomes

[('Books', 1250.0),
 ('Books', 1300.0),
 ('Books', 1420.0),
 ('Tutorials', 560.0),
 ('Tutorials', 630.0),
 ('Tutorials', 750.0),
 ('Courses', 2500.0),
 ('Courses', 2430.0),
 ('Courses', 2750.0)]

In [51]:
dd = defaultdict(float)

for product, income in incomes:
    dd[product] += income

dd

defaultdict(float, {'Books': 3970.0, 'Tutorials': 1940.0, 'Courses': 7680.0})

In [52]:
for product, income in dd.items():
    print(f'Total income for {product}: ${income:,.2f}')

Total income for Books: $3,970.00
Total income for Tutorials: $1,940.00
Total income for Courses: $7,680.00
