# Welcome to Xilinx Fuzzy Match Acceleration Demo
---
This notebook demonstrates how to use the Xilinx Fuzzy Match product and shows the power of Xilinx Alveo FPGAs cards to accelerate Fuzzy Match.

### The demo
In this demo, we will use the Xilinx Fuzzy Match python module (**xilFuzzyMatchPython**) to match new names against reference names and retrieve matches within a similarity threshold specified by the user.

>**NOTE**: Xilinx Fuzzy Match module requires atleast ***python 3.6***

### Setup
---
Let's start by importing the module and setting up demo variables

In [1]:
import xilFuzzyMatchPython as xfm
import os
import os.path
import sys
import argparse
import pandas as pd
import numpy as np
from timeit import default_timer as timer

The run is setup with the following options:
- xclbin_path: full path to the Alveo FPGA device executable
- deviceNames: full name of the Alveo board to run on (Supported devices Alveo U50, U55C and AWS F1)
- in_dir: location of the data files

The variables here are set from the environment variables in the *run* script used to launch the jupyter server.

In [2]:
xclbin_path = os.environ.get('XCLBIN_FILE')
deviceNames = os.environ.get('DEV_NAME')
in_dir = os.environ.get('DATA_DIR')

In [3]:
print(xclbin_path, deviceNames, in_dir)

/opt/xilinx/apps/graphanalytics/fuzzymatch/0.2/xclbin/fuzzy_xilinx_u50_gen3x16_xdma_201920_3.xclbin xilinx_u50_gen3x16_xdma_201920_3 /proj/gdba/ywu/ghe/graphanalytics/fuzzymatch/staging/examples/python/../data


In [4]:
# set options for FuzzyMatch
#create options
opt = xfm.options()
opt.xclbinPath=xfm.xString(xclbin_path)
opt.deviceNames=xfm.xString(deviceNames)

### Load Data and initialize compute

In [6]:
# Load data
peopleFile = in_dir + "/ref-names.csv"
trans_num=100
test_input = in_dir + "/new-names.csv"
stats=pd.read_csv(test_input, delimiter=',', names=['Id','Name'])
peopleVecs=pd.read_csv(peopleFile, delimiter=',',names = ['Id','Name'])

totalEntities = 10000000

stats=stats.iloc[1:]
peopleVecs=peopleVecs.iloc[1:]
peopleVec=peopleVecs[['Name']]
data_vec=stats[['Name']]

inputVec=[]
inputId=[]
print(len(peopleVec['Name']))
for idx in range(1,len(peopleVec['Name'])):
    #print(peopleVec['Name'][idx])
    inputVec.append(peopleVec['Name'][idx])

842


Now let's create the Xilinx FuzzyMatch object using the options we just created

In [7]:
#create fuzzymatch object
mchecker = xfm.FuzzyMatch(opt)

Next, the Alveo card needs to be prepared for the Fuzzy Match run. This step involves establishing connection with the Alveo device and program the FPGA with executable binary called xclbin (if not already done).

In [8]:
# initialize the FPGA device for Fuzzy Match run
stat_check=mchecker.startFuzzyMatch()

INFO: Found requested device: xilinx_u50_gen3x16_xdma_201920_3 ID=2
INFO: Start Fuzzy Match on xilinx_u50_gen3x16_xdma_201920_3
INFO: found device=xilinx_u50_gen3x16_xdma_201920_3
INFO: fuzzy_kernel has 2 CU(s)


### Execute  FuzzyMatch
---
Finally, the computation is executed by calling the following API. This transfers (DMA) data to the Alveo HBMs and runs the Fuzzy Match algorithm. Counters are used here to time the run which shows the incredibly fast return time of the Xilinx Fuzzy Match librrary.

In [10]:
# execute Fuzzy Match on FPGA
threshold = 90
# Load reference data
stat_check=mchecker.fuzzyMatchLoadVec(inputVec,inputId)

# create input patterns 
test_transaction=[]
print('the size of data_vec',len(data_vec))

for idx in range (1,len(data_vec)):
    test_transaction.append(data_vec['Name'][idx])

result_list={}

# run fuzzymatch on input patterns in batch mode 
start=timer()
result_list = mchecker.executefuzzyMatch(test_transaction, threshold)
end = timer()
timeTaken = (end - start)*1000


INFO: FuzzyMatchImpl::fuzzyMatchLoadVec vec_pattern size=841
the size of data_vec 259


### Display FuzzyMatch Result

In [11]:
for idx in range (0,len(data_vec)-1):
    print(test_transaction[idx],"---> ", end=' ')
    if not result_list[idx]:
        print("no match")
    else :
        for item in result_list[idx]:
            print('{',inputVec[item[0]],':', item[1], '}', end=';  ')
        print()

    

print('Average time taken per string', '{:.3f}'.format(timeTaken/len(test_transaction)) , '\n')

HJGGJNCOTHAM --->  no match
STRINGFELMOW --->  { STRINGFELLOW : 92 };  { STRINGFELLOW : 92 };  
VAOLBNDINGHAM --->  no match
COLLINSWQRTH --->  { COLLINSWORTH : 92 };  { COLLINSWORTH : 92 };  
OSHAUGHNETSY --->  { OSHAUGHNESSY : 92 };  { OSHAUGHNESSY : 92 };  
HOLLANESWORUH --->  no match
FLECKENTTEIO --->  no match
PAPADOPOVLOT --->  no match
FENSTERMACHFR --->  { FENSTERMACHER : 93 };  { FENSTERMACHER : 93 };  
XITHERINGTON --->  { WITHERINGTON : 92 };  { WITHERINGTON : 92 };  
PGANOENSTIEL --->  no match
FENSTFRNAKER --->  no match
WGISENBERGER --->  { WEISENBERGER : 92 };  { WEISENBERGER : 92 };  
SUEFFENSMEIES --->  no match
SCHSECKEOGOTT --->  no match
LICHUENBERGER --->  { LICHTENBERGER : 93 };  { LICHTENBERGER : 93 };  
VILLAVIDENDIO --->  no match
RICKENBBCKES --->  no match
HQLLINGWORUH --->  no match
HIGGEOBOTHAM --->  { HIGGENBOTHAM : 92 };  { HIGGENBOTHAM : 92 };  
COLMINGSWORTH --->  { COLLINGSWORTH : 93 };  { COLLINGSWORTH : 93 };  
BLANKENBBLER --->  no match
KNAQPENBER