# Welcome to Xilinx Cosine Similarity Acceleration Demo
---
This notebook demonstrates how to use the Xilinx Cosine Similarity product and shows the power of Xilinx FPGAs to accelerate Cosine Similarity.

### The demo
In this demo, we will use the Xilinx Cosine Similarity module (**xilCosineSim**) and setup a population against which similarity of target vectors can be calculated. The population as well as the target vector are created randomly and the datatype used is integer. The demo is structured in two sections:

1. [**Create and load Population**](#load)

2. [**Match Target Vector**](#match)

>**NOTE**: Xilinx Cosine Similarity module requires atleast ***python 3.6***. We can check the Jupyter kernel version using the following cell:

In [1]:
from platform import python_version
print(python_version())

3.6.13


### Setup
---
Let's start by importing the module and setting up demo variables

In [2]:
import xilCosineSim as xcs
import random as rand

Following variables set the scale of the run:
- VectorLength: number of elements in each vector
- NumVectors: number of vectors, (size of the population)
- Maxvalue: range of vector element values

In [3]:
VectorLength = 200
NumVectors = 5000
MaxValue = 16383
testVec = []

The CosineSim library supports the following options to be setup (use the options() attribute as follows):
- `vecLength`: number of elements in each vector
- `numDevices` number of FPGA devices available to be used to perform compute (more devices leads to higher speedup)
- `xclbinPath` path to the FPGA binary image

The number of devies and xclbinPath here are left default to 1 and location of Xilinx product installation.

In [4]:
opt = xcs.options()
opt.vecLength = VectorLength
testVecIdx = rand.randint(0, NumVectors - 1) # select one index as the test vector

Now we create the cosinesim object to operate on.

In [5]:
cs = xcs.cosinesim(opt, 4) # (options, data size in bytes)

### Create and load Population <a id="load"></a>
---
We start by {FILL ME}

In [6]:
cs.startLoadPopulation(NumVectors)

Next we create random data and load it one vector at a time. Any numerical embeddings can be used to create this data, rendering the acceleration library useful in a wide variety of use cases.

`getPopulationVectorBuffer` API is called to return a buffer element that can be filled. This is followed by a `finishCurrentPopulationVector` call that {FILL ME}. When all the vectors are created, `finishLoadPopulationVectors` is called to finish loading the population data into FPGA accessible HBM memory. 

This finishes the **one-time** population creation process.

In [7]:
# Create vector embeddings
for vecNum in range(NumVectors):
    # Get a vector
    vecBuf = cs.getPopulationVectorBuffer(vecNum)

    # Fill the vector
    for vecIdx in range(VectorLength):
        val = rand.randint(int(-MaxValue/2), int(MaxValue/2))
        vecBuf.fill(val)  # fill cpp managed memory
        if vecNum == testVecIdx:
            testVec.append(val)

    # Finish filling the vector
    cs.finishCurrentPopulationVector(vecBuf)

# Finishing embedding creation
cs.finishLoadPopulationVectors()

### Match Target Vector <a id="match"></a>
---

In this part we show how an user can easily call the `matchTargetVector` API to get top "similar" vectors/embeddings from the loaded population of embeddings. Here we ask for top **10** best matches for our Target vector.

In [8]:
result = cs.matchTargetVector(10, testVec)
print("Results:")
print("Similarity   Vector #")
print("----------   --------")
for item in result:
    print('{:.6f}'.format(item.similarity) + "       " + str(item.index))

Results:
Similarity   Vector #
----------   --------
1.000000       2833
0.279117       1149
0.257780       3174
0.248899       3345
0.232691       681
0.228679       3363
0.221258       2267
0.218737       1018
0.218032       3618
0.210160       2100
