# Maniplating Dictionaries

While working on data extraction and cleansing, oftenly we play with dictionary form data. And there are different ways to handle according to different needs. In this post I will cover several use case I met.

### Simple dictionary:

In [7]:
data = {
    'a': 1,
    'b': 1,
    'c': 1,
    'd': 1,
    'e': 1,
    'f': 3,
    'g': 1,
    'h': 1,
    'i': 1,
    'j': 1,
    'k': 1,
}

#### Delete item with conditions:

If we need to screen through every values in a dictionary, and drop according to some conditions, we can use a for loop with a copy of the dictionary (we cannot drop item while running through the loop!).

In [6]:
# dict.items() method return (key, value) form in a for loop.
for k, v in data.items():
    if k == 'e':
        data.pop(k)

RuntimeError: dictionary changed size during iteration

In [11]:
from copy import copy


loop_data = copy(data)
for k, v in loop_data.items():
    if v == 3:
        data.pop(k)

In [12]:
data

{'a': 1,
 'b': 1,
 'c': 1,
 'd': 1,
 'e': 1,
 'g': 1,
 'h': 1,
 'i': 1,
 'j': 1,
 'k': 1}

So now consider we have a list of dictionaries with different contents, and we know that one of the field have to be dropped before passing to the next function. (For example, next function will batch upload data to database, and we are going to write a data cleansing function to screen out useless information.)

In [14]:
sub_data = {'a': 1, 'delete_field': 2, 'c': 3}
data = [sub_data for _ in range(4)]
print('before')
print(data)

for item in data:
    if 'delete_field' in item:
        item.pop('delete_field')
        
print('after')
print(data)

before
[{'a': 1, 'delete_field': 2, 'c': 3}, {'a': 1, 'delete_field': 2, 'c': 3}, {'a': 1, 'delete_field': 2, 'c': 3}, {'a': 1, 'delete_field': 2, 'c': 3}]
after
[{'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'a': 1, 'c': 3}]


Another possibility is that only remove field if it is 'None'.

In [16]:
sub_data = {'a': 1, 'delete_field': None, 'c': 3}
data = [sub_data for _ in range(4)]
print('before')
print(data)

# now we have to loop through the dict to check each value
# use .items() to expend key and value, and use if v is None to check the value
# if we use "if v:" then field with empty string and 0 will also be removed (both 0 and '' are considered as 'not v')
for item in data:
    for k, v in copy(item).items():
        if v is None:
            item.pop(k)
            
print('after')
print(data)

before
[{'a': 1, 'delete_field': None, 'c': 3}, {'a': 1, 'delete_field': None, 'c': 3}, {'a': 1, 'delete_field': None, 'c': 3}, {'a': 1, 'delete_field': None, 'c': 3}]
after
[{'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'a': 1, 'c': 3}]


We can also try to create a clean data from a messy one.

In [21]:
sub_data = {'a': 1, 'delete_field': None, 'c': 3, 'd': 6, 'e': None, 'f': 3}
data = [sub_data for _ in range(4)] + [{'e': 'a'}] + [{'c': 0}]
print('before')
print(data)

# if we need 'a', 'c' and 'e' only
# also we don't want field with value 'None'
output = []
for item in data:
    sub_output = {}
    for k in ['a', 'c', 'e']:
        if item.get(k) is not None:    # we have to put is not None here otherwise {'c': 0} will be dropped
            sub_output[k] = item[k]
    if sub_output:
        output.append(sub_output)
        
print('after')
print(output)

before
[{'a': 1, 'delete_field': None, 'c': 3, 'd': 6, 'e': None, 'f': 3}, {'a': 1, 'delete_field': None, 'c': 3, 'd': 6, 'e': None, 'f': 3}, {'a': 1, 'delete_field': None, 'c': 3, 'd': 6, 'e': None, 'f': 3}, {'a': 1, 'delete_field': None, 'c': 3, 'd': 6, 'e': None, 'f': 3}, {'e': 'a'}, {'c': 0}]
after
[{'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'a': 1, 'c': 3}, {'e': 'a'}, {'c': 0}]
