# Name Screening

## Name Search

## Table of Contents <a class="anchor" id="toc"></a>

1. [Function Definitions - Data Loading](#func-defs)
    1. [Screening Solutions](#first-func-def)
2. [Loading dataset](#load-data)
3. [Querying dataset](#query-data)

## Libraries

In [None]:
from platform import python_version
print("Python Version:", python_version())

import warnings
#warnings.filterwarnings(action='once')
warnings.filterwarnings('ignore')

# pip install abydos

import re
import os
import time
import numpy as np
import pandas as pd
from datetime import datetime

from abydos import phonetic, distance
from Levenshtein import ratio as lev_ratio
from Levenshtein import setratio as lev_setratio

# 1. Function Definitions <a class="anchor" id="func-defs"></a>

Go to [Table of Contents](#toc)

## 1.1. Function Definition - All name screening solutions <a class="anchor" id="first-func-def"></a>

Go to [Table of Contents](#toc)

The function provided **TWO** options. They are as follows:
1. *(DEFAULT)* Using BOTH fuzzy logic on names AND Levenshtein distance on phonemes of the names for comparision ***(Proposed approach 1)***
2. Using EITHER fuzzy logic on names OR Levenshtein distance on phonemes of the names for comparision ***(Proposed approach 2)***

In [None]:
SOL_TYPE = 1

FUNC = [lev_setratio, lev_ratio]
THRES = [0.56, 0.55]

In [None]:
def solutions(name, sol_type=SOL_TYPE, db=names, func=FUNC, thres=THRES):
    
    print()
    sol_name = ""
    if sol_type != 1 and sol_type != 2 and sol_type !=3:
        print('Invalid Option! Choose from 1 to 3')
        return None
        
    results = pd.DataFrame()
    
    start_time= time.time()
    
    pn = []
    for t in name.split():
        pn.append(phonetic.DoubleMetaphone().encode(t)[0])

    
    for row in db.iterrows():
        
        metric1 = func[0](row[1]['Name'].lower(), name.lower())
        dist_score = []
        dist_score = [max([(func[1](i,j)) for j in row[1]['Phonemes']]) for i in pn]
        metric2 = np.mean(dist_score)

        if sol_type == 1:
            condition = (metric1 >= thres[0] and metric2 >= thres[1])
        elif sol_type == 2:
            condition = (metric1 >= thres[0] or metric2 >= thres[1])
        elif sol_type == 3:
            condition = (metric2 >= thres[1])

        
        if condition:
            
            df2 = {'Name': row[1]['Name'], 
                   'Source': row[1]['List'],
                   'Timestamp': row[1]['Timestamp'], 
                   'Lev_Set_Ratio': metric1, 
                   'Lev_Ratio': metric2}

            results = results.append(df2, ignore_index = True)

    fin_time = np.round((time.time() - start_time), 2)
    print(f"--- Execution Time: {fin_time:,} seconds ---")
    
    if not results.empty:
        results.sort_values(['Lev_Ratio', 'Lev_Set_Ratio'], ascending=False, inplace=True)
    
    results.reset_index(drop=True, inplace=True)
    
    print("Number of same/similar names:", results.shape[0])

    columns = ['Name', 'Source', 'Timestamp']
    
    check = results[results['Name'].str.contains(name)]

    if check.shape[0]>0:
        print("\nNAME FOUND!")
        display(check)
        print("\nAlso displaying other close matches!")
        display(results[columns])
    else:
        print("\nEXACT NAME NOT FOUND!")
        print("Showing list of closest matches")
        display(results[columns])
        return results[columns]



# 2. Load Data <a class="anchor" id="load-data"></a>

Go to [Table of Contents](#toc)

In [None]:
names = pd.read_pickle('Final_Names_wo_Random.pkl')
names

# 3. Query Data <a class="anchor" id="query-data"></a>

Go to [Table of Contents](#toc)

In [None]:
# Faris Chiabhi
# Dave Mazengo
# Vladmir Terentev
# Abd El Kader Sabra
# Andy Gimmatove

In [None]:
name = input("Enter name to be searched: ")
res = solutions(name)
