# Find Missing Parents

## Setup
This just creates test data.

In [59]:
# set up simple environment
# start with suffixes.
suffixes = ["a", "b", "c", "d", "e", "f", "g"]
# Produce the names as a generator
parent_names =  ("biome_"+p for p in suffixes)
# Do the same for the children.
child_names = ("child_" + str(i) for i in range(100, 120))
# Don't create these; they are what we want to find
missing = {"biome_c", "biome_e", "biome_f"}
# Create the parent nodes, excewpt for the missing ones above.
parent_nodes = {n:{"name": n, "id": i, "parent": i} for (n, i) in zip(parent_names, iter(range(1, 8))) if not n in missing}
# Only use parents im the ID range 1 - 4, so only "biome_c" will be missing-but-used
child_nodes = {n:{"name":n, "id": i, "parent": (i % 4) + 1} for (n, i) in zip(child_names, range(100,120))}
# Combine the two sets of nodes.
nodes = {**parent_nodes, **child_nodes}

## Our test data:
OK, now we have our test data set up. Let's look at it.

Note that no child has a parent of 5, 6, 7, or 8

In [44]:
nodes

{'biome_a': {'name': 'biome_a', 'id': 1, 'parent': 1},
 'biome_b': {'name': 'biome_b', 'id': 2, 'parent': 2},
 'biome_d': {'name': 'biome_d', 'id': 4, 'parent': 4},
 'biome_g': {'name': 'biome_g', 'id': 7, 'parent': 7},
 'child_100': {'name': 'child_100', 'id': 100, 'parent': 1},
 'child_101': {'name': 'child_101', 'id': 101, 'parent': 2},
 'child_102': {'name': 'child_102', 'id': 102, 'parent': 3},
 'child_103': {'name': 'child_103', 'id': 103, 'parent': 4},
 'child_104': {'name': 'child_104', 'id': 104, 'parent': 1},
 'child_105': {'name': 'child_105', 'id': 105, 'parent': 2},
 'child_106': {'name': 'child_106', 'id': 106, 'parent': 3},
 'child_107': {'name': 'child_107', 'id': 107, 'parent': 4},
 'child_108': {'name': 'child_108', 'id': 108, 'parent': 1},
 'child_109': {'name': 'child_109', 'id': 109, 'parent': 2},
 'child_110': {'name': 'child_110', 'id': 110, 'parent': 3},
 'child_111': {'name': 'child_111', 'id': 111, 'parent': 4},
 'child_112': {'name': 'child_112', 'id': 112, '

## Find the missing parents.

Because we don't have an easy lookup from ID to node or name, we'll associate each ID a node in a map. Otherwise, we'd use sets rather than maps.

### Find all the parents.


In [58]:
parents = {node["id"]:node for node in nodes.values() if node["id"] == node["parent"]}
parents

{1: {'name': 'biome_a', 'id': 1, 'parent': 1},
 2: {'name': 'biome_b', 'id': 2, 'parent': 2},
 4: {'name': 'biome_d', 'id': 4, 'parent': 4},
 7: {'name': 'biome_g', 'id': 7, 'parent': 7}}

Now let's find the children the same way.

### Find all the children.

In [57]:
children = {node["id"]:node for node in nodes.values() if node["id"] != node["parent"]}
children

{100: {'name': 'child_100', 'id': 100, 'parent': 1},
 101: {'name': 'child_101', 'id': 101, 'parent': 2},
 102: {'name': 'child_102', 'id': 102, 'parent': 3},
 103: {'name': 'child_103', 'id': 103, 'parent': 4},
 104: {'name': 'child_104', 'id': 104, 'parent': 1},
 105: {'name': 'child_105', 'id': 105, 'parent': 2},
 106: {'name': 'child_106', 'id': 106, 'parent': 3},
 107: {'name': 'child_107', 'id': 107, 'parent': 4},
 108: {'name': 'child_108', 'id': 108, 'parent': 1},
 109: {'name': 'child_109', 'id': 109, 'parent': 2},
 110: {'name': 'child_110', 'id': 110, 'parent': 3},
 111: {'name': 'child_111', 'id': 111, 'parent': 4},
 112: {'name': 'child_112', 'id': 112, 'parent': 1},
 113: {'name': 'child_113', 'id': 113, 'parent': 2},
 114: {'name': 'child_114', 'id': 114, 'parent': 3},
 115: {'name': 'child_115', 'id': 115, 'parent': 4},
 116: {'name': 'child_116', 'id': 116, 'parent': 1},
 117: {'name': 'child_117', 'id': 117, 'parent': 2},
 118: {'name': 'child_118', 'id': 118, 'parent

So the question now is, what parents are in use? Let's construct that set. Again, we'll use a map rather than a set, so we remember an example child to go with each `in_use` parent.

In [62]:
in_use = {node["parent"]: node for node in children.values()}
in_use

{1: {'name': 'child_116', 'id': 116, 'parent': 1},
 2: {'name': 'child_117', 'id': 117, 'parent': 2},
 3: {'name': 'child_118', 'id': 118, 'parent': 3},
 4: {'name': 'child_119', 'id': 119, 'parent': 4}}

So the next question is, which of these parents don't exist?

Here, we'll use set operations, removing the existing parents from the ones in use.

In [64]:
need_to_create = set(in_use.keys()).difference(parents.keys())
need_to_create

{3}

But we want child nodes, not just parent ID's, so we'll refer back to our `in_use` variable to find those.

In [66]:
to_create_parents = [in_use[n] for n in need_to_create]
to_create_parents

[{'name': 'child_118', 'id': 118, 'parent': 3}]

And those are the ones (only 1 in this case) that we need to create a parent for!

## Recap

So to review what we did:

```python
parents = {node["id"]:node for node in nodes.values() if node["id"] == node["parent"]}  #1
children = {node["id"]:node for node in nodes.values() if node["id"] != node["parent"]} #2
in_use = {node["parent"]: node for node in children.values()}                           #3
need_to_create = set(in_use.keys()).difference(parents.keys())                          #4
to_create_parents = (in_use[n] for n in need_to_create)                                 #5
for node in to_create_parents:                                                          #6
    create_parent(node)
```

Or:
1. Find the parents, in a dict indexed by `id`.
2. Find the children, in a dict indexed by `id`.
3. Find the parents in use by scanning the children, in a dict indexed by parent.
4. Find the parents in use but not existing, but removing the ones from #3 that exist.
5. Find the selected child for each missing parent.
6. Create the parents.

### Done!
