# Tutorial 5

In this tutorial we elaborate a way to establish a lookup table for a person name string variable, which consists of multiple components, i.e. first name, middle name, last name.

As a library we use _metaphone_ which may not be installed in your environment. When you are based on conda python envrionment you may have to install the package via the _pip_ command as described here

* https://www.puzzlr.org/install-packages-pip-conda-environment/

## Introduction

Suppose that the person is called _Peter Alfred Escher_ and we got that name via an official API.

We use the standard library _itertools_ to calcutlate our permutations and we require the doublemetaphone algorithm.

In [1]:
import itertools
from metaphone import doublemetaphone

Suppose we retrieved the full qualified name in one of our variables and we have also a unique identifier of this person.

In [2]:
person_qualified_name = "Peter Alfed Escher"
person_id = "A123"

As a fresh up,  we use the _doubelmetaphone_ alogrithm to generate a tuple of keys which can be used for the comparison. Refer to the tutorial __[Medium Towards Data Science Tutorial](https://towardsdatascience.com/python-tutorial-fuzzy-name-matching-algorithms-7a6f43322cc5)__

In [3]:
tp = doublemetaphone("Peter Alfed Escher")
tp.__str__()

"('PTRLFTXR', 'PTRLFTSKR')"

We now want to calculate all permutations of this name, which could be used in a Twitter account name. I.e. potentially our person will miss out some components in his Twittter account name, he could use the name "Peter Escher" or "Peter A. Escher".
As a first step we calculate all permutations of 3 the name components:

In [4]:
name_list = list( itertools.permutations(["peter", "alfred", "escher"]))
print(name_list)

[('peter', 'alfred', 'escher'), ('peter', 'escher', 'alfred'), ('alfred', 'peter', 'escher'), ('alfred', 'escher', 'peter'), ('escher', 'peter', 'alfred'), ('escher', 'alfred', 'peter')]


## Build up a Lookup Directory 
We now to build up a lookup dictionary. Each name component combination we concatenate into a single string, generate a metaphone tuple and add it to our dictionary. As value we store our unique person identifier. For this we define a function.

In [5]:
lookup_dict = [{},{}]

def addPermutationsToDictionary(name_list,person_id):
    for name_component in name_list:
        name = ''.join(name_component)
        metaphone_tuple = doublemetaphone(name)
        lookup_dict[0][metaphone_tuple[0]] = [person_id]
        lookup_dict[1][metaphone_tuple[1]] = [person_id]

addPermutationsToDictionary(name_list, person_id)
print("Our dictionary: "+lookup_dict.__str__())

Our dictionary: [{'PTRLFRTXR': ['A123'], 'PTRXRLFRT': ['A123'], 'ALFRTPTRXR': ['A123'], 'ALFRTXRPTR': ['A123'], 'AXRPTRLFRT': ['A123'], 'AXRLFRTPTR': ['A123']}, {'PTRLFRTSKR': ['A123'], 'PTRSKRLFRT': ['A123'], 'ALFRTPTRSKR': ['A123'], 'ALFRTSKRPTR': ['A123'], 'ASKRPTRLFRT': ['A123'], 'ASKRLFRTPTR': ['A123']}]


As one can see in the above output. We add the value (person identifier) in an list. We are doing that because potentially we have a second name combination which will map to the same key and therefore we have to ensure that we can store multiple person identifiers.

Perhaps our user was missing out a name component and was registering on Twitter with the account name _Peter Escher_ or as _Escher Peter_. So let's get  also all these permutations. This can be easily achieved by passing an additional parameter (in our case 2) to the _permutations_ method, which identifies the number of elements in the permutations.

In [6]:
name_list = list( itertools.permutations(["peter", "alfred", "escher"],2))
print(name_list)

[('peter', 'alfred'), ('peter', 'escher'), ('alfred', 'peter'), ('alfred', 'escher'), ('escher', 'peter'), ('escher', 'alfred')]


Let's add our permutations to our lookup dictionary

In [7]:
addPermutationsToDictionary(name_list, person_id)
print("Our dictionary: "+lookup_dict.__str__())

Our dictionary: [{'PTRLFRTXR': ['A123'], 'PTRXRLFRT': ['A123'], 'ALFRTPTRXR': ['A123'], 'ALFRTXRPTR': ['A123'], 'AXRPTRLFRT': ['A123'], 'AXRLFRTPTR': ['A123'], 'PTRLFRT': ['A123'], 'PTRXR': ['A123'], 'ALFRTPTR': ['A123'], 'ALFRTXR': ['A123'], 'AXRPTR': ['A123'], 'AXRLFRT': ['A123']}, {'PTRLFRTSKR': ['A123'], 'PTRSKRLFRT': ['A123'], 'ALFRTPTRSKR': ['A123'], 'ALFRTSKRPTR': ['A123'], 'ASKRPTRLFRT': ['A123'], 'ASKRLFRTPTR': ['A123'], '': ['A123'], 'PTRSKR': ['A123'], 'ALFRTSKR': ['A123'], 'ASKRPTR': ['A123'], 'ASKRLFRT': ['A123']}]


To finalize this, our user could have just one of his name component used when registering with Twitter.

In [8]:
name_list = list( itertools.permutations(["peter", "alfred", "escher"],1))
print(name_list)

[('peter',), ('alfred',), ('escher',)]


Let's add also this permutation to our dictionary.

In [9]:
addPermutationsToDictionary(name_list, person_id)

Let's refactor our method, we have to check now also if we have already a lookup key

In [10]:
def addPermutationsToDictionary(name_list,person_id):
    for name_component in name_list:
        name = ''.join(name_component)
        metaphone_tuple = doublemetaphone(name)
        if metaphone_tuple[0] in lookup_dict[0]:
            lookup_dict[0][metaphone_tuple[0]]
            lookup_dict[0][metaphone_tuple[0]].append(person_id)
        else:
            lookup_dict[0][metaphone_tuple[0]] = [person_id]
        if metaphone_tuple[1] in lookup_dict[1]:
            lookup_dict[1][metaphone_tuple[1]]
            lookup_dict[1][metaphone_tuple[1]].append(person_id)
        else:
            lookup_dict[1][metaphone_tuple[1]] = [person_id]
    
    
addPermutationsToDictionary(["Peter"], "A555")
print("Our dictionary: "+lookup_dict.__str__())

Our dictionary: [{'PTRLFRTXR': ['A123'], 'PTRXRLFRT': ['A123'], 'ALFRTPTRXR': ['A123'], 'ALFRTXRPTR': ['A123'], 'AXRPTRLFRT': ['A123'], 'AXRLFRTPTR': ['A123'], 'PTRLFRT': ['A123'], 'PTRXR': ['A123'], 'ALFRTPTR': ['A123'], 'ALFRTXR': ['A123'], 'AXRPTR': ['A123'], 'AXRLFRT': ['A123'], 'PTR': ['A123', 'A555'], 'ALFRT': ['A123'], 'AXR': ['A123']}, {'PTRLFRTSKR': ['A123'], 'PTRSKRLFRT': ['A123'], 'ALFRTPTRSKR': ['A123'], 'ALFRTSKRPTR': ['A123'], 'ASKRPTRLFRT': ['A123'], 'ASKRLFRTPTR': ['A123'], '': ['A123', 'A555'], 'PTRSKR': ['A123'], 'ALFRTSKR': ['A123'], 'ASKRPTR': ['A123'], 'ASKRLFRT': ['A123'], 'ASKR': ['A123']}]


As one can see in the above example. Two keys are now pointing to two person identifiers: ['A123','A155']