In [43]:
friendships = [l.strip().split() for l in open('../../data/06-friendships.txt')]

In [12]:
friendships = [l.strip().split() for l in open('../../data/06-small.txt')]

In [18]:
friendships = [['Jacqueline', 'Edgar'],
 ['Ella-Louise', 'Raj'],
 ['Abby', 'Edgar'],
 ['Anita', 'Harlow'],
 ['Raj', 'Edgar'],
 ['Bronwyn', 'Sanjay'],
 ['Caiden', 'Anita'],
 ['Raj', 'Finlay'],
 ['Raj', 'Jacqueline'],
 ['Ella-Louise', 'Abby'],
 ['Samson', 'Sanjay'],
 ['Samson', 'Alessandra'],
 ['Edgar', 'Finlay'],
 ['Finlay', 'Jacqueline'],
 ['Bronwyn', 'Samson']]

This is a **union-find** problem. 

It's all about asking if two people are in the same connected set. Socially, looking at the example in the question, there are three "gangs": Edgar's gang, Harlow's gang, and Sanjay's gang. We label each gang with some representative person from that gang: the _exemplar_. For each person, we can ask which gang they're in by asking for their gang's exemplar. So long as each member of a gang uses the same exemplar, checking if people are in the same gang is easy: just check if they have the same exemplar.

That's the **find** part. 

The **union** part is for merging gangs. We can merge two gangs by setting the exemplar of one gang to be the exemplar of the other.

How to implement this?

The simplest approach is to use a lookup table (a `dict`) that goes from each person to their exemplar. If someone _is_ an exemplar, their entry points to themself. For instance, we could add friendships like this, abritrarily calling one of each pair the exemplar of the new group:

```
Jacqueline ----> Edgar
Ella-Louise ---> Raj

```

When we add the Abby-Edgar link, we see Edgar is already an exemplar, so we make Abby have Edgar as her exemplar. Anita-Harlow starts a new gang.

```
Jacqueline --+-> Edgar
Abby --------+

Ella-Louise ---> Raj

Anita -------+-> Harlow
```

That does _find_. How about _union_?

For instance, what do we do in the above diagram when we find Edgar and Raj are friends?

To join two groups, we could change all the exemplars in the absorbed group to point to the absorber's exemplar. But that's a lot of effort. Instead, let's just change the absorbed exemplar to point to the absorbing exemplar (i.e. Raj's exemplar changes from Raj to Edgar). When we're looking up exemplars, we change the algorithm from being a straight lookup to being a "chain" lookup. So to find Ella-Louise's exemplar, we look her up in the table and find Raj. We then look up Raj and find Edgar. Finally, we look up Edgar and find he's his own exemplar. 

Effectively, we have the structure like this, and to find the exemplar we keep following the links up and right.

```
Ella-Louise ---> Raj -+-> Edgar
Jacqueline -----------+
Abby -----------------+

Anita ---------> Harlow
```

The entire friendship group in the example will look like this:

```
Ella-Louise ---> Raj -+-> Edgar
Jacqueline -----------+
Abby -----------------+
Finlay ---------------+

Anita -------+-> Harlow
Caiden ------+

Bronwyn -----+-> Sanjay
Samson  -----+
Alessandra --+
```

To find the number of groups, we just look in the table for the number of exemplar (people who point to themselves).

For part 2, the sizes of groups, we extend the value in the lookup table to include the group size. An exemplar's group size is the number of people in that group. When a group is absorbed, we increase the absorbing exemplar's size by the absorbed group's size.

In [3]:
def exemplar_of(person, groups):
    if person in groups:
        exemplar = person
        while groups[exemplar]['parent'] != exemplar:
            exemplar = groups[exemplar]['parent']
        return exemplar
    else:
        return None

In [35]:
def new_group(name, debug=False):
    if debug: print('adding new', name)
    return {'parent': name, 'size': 1}

In [44]:
debug = False

groups = {}
for this, that in friendships:
    # if need be, create a new group of size 1 for each person mentioned.
    if this not in groups:
        groups[this] = new_group(this, debug=debug)
    if that not in groups:
        groups[that] = new_group(that, debug=debug)
    # now we know we have two groups, merge them if necessary.
    # first find the two exemplars
    this_exemplar = exemplar_of(this, groups)
    that_exemplar = exemplar_of(that, groups)
    if debug: print('{} -> {} ; {} -> {}'.format(this, this_exemplar, that, that_exemplar))
    if this_exemplar != that_exemplar:
        # different groups, so need to merge
        # absorb the smaller into the larger, so find the sizes
        this_size = groups[this_exemplar]['size']
        that_size = groups[that_exemplar]['size']
        if this_size > that_size:
            # set the absorbed exemplar to be the absorbing exemplar
            groups[that_exemplar]['parent'] = this_exemplar
            # update the absorbing group's size
            groups[this_exemplar]['size'] = this_size + that_size
            if debug: print('merging {} <- {}'.format(this_exemplar, that_exemplar))
        else:
            groups[this_exemplar]['parent'] = that_exemplar
            groups[that_exemplar]['size'] = this_size + that_size
            if debug: print('merging {} -> {}'.format(this_exemplar, that_exemplar))

In [45]:
sum(1 for k, v in groups.items() if v['parent'] == k)

21

In [46]:
max(g['size'] for g in groups.values())

147