# build a 100% local hosted, Duplicate Document Finder
using Faiss: A Facebook AI Similarity Search (Faiss) library for efficient similarity search

## use case
- Duplicate Invoice, Expense, Voucher, Receipt Finder
    - Fraud Detection
    - Automate Document sorting, de-duplication
    - Automate data reconciliation
    - Audit 3 way document match to find exceptions (request, pay, voucher, invoice, receipt)

## Objective
In this notebook, we'll learn basic of FAISS fundamentals and using FAISS as vector search for a huge dataset and in later section, we'll use a real life production data set to implement vector search.

Faiss is a library for efficient similarity search and clustering of dense vectors.

Meta claim: We've built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art, along with the fastest k-selection algorithm on the GPU known in the literature.

## Process flow

```mermaid
stateDiagram-v2
        direction LRstateDiagram-v2
        [*] --> User_Query
        User_Query --> Conversation_AI_Agent
        Conversation_AI_Agent --> SQL_DB
        SQL_DB --> Conversation_AI_Agent
        Conversation_AI_Agent --> RAG_VectorDB
        RAG_VectorDB --> Conversation_AI_Agent
        [*] --> File_Drop
        File_Drop --> PyPDF
        File_Drop --> Tesseract
        File_Drop --> AzureDocementService
        File_Drop --> OracleVisionAI
        File_Drop --> OtherVisionAPI
        PyPDF --> QC
        Tesseract --> QC
        AzureDocementService --> QC
        OracleVisionAI --> QC
        OtherVisionAPI --> QC
        QC --> QC_AI_Agent
        QC_AI_Agent --> QC
        QC_AI_Agent --> RAG_VectorDB
        QC_AI_Agent --> SQL_DB
        QC_AI_Agent --> [*]
        Conversation_AI_Agent --> [*]

%% Define classes for coloring
    classDef red fill:#ff8,stroke:#333,stroke-width:2px;
    classDef green fill:#8fa,stroke:#333,stroke-width:2px;
    classDef blue fill:#8af,stroke:#333,stroke-width:2px;
    classDef orange fill:#f92,stroke:#333,stroke-width:2px;
    classDef brown fill:#e6f,stroke:#333,stroke-width:2px;
    classDef neil fill:#1ff,stroke:#333,stroke-width:2px;

    %% Apply classes to states
    class User_Query green
    class Conversation_AI_Agent orange
    class social green
    class Tesseract brown
    class SQL_DB blue
    class File_Drop green
    class RAG_VectorDB blue
    class PyPDF brown
    class QC_AI_Agent orange
    class QC red
    class AzureDocementService brown
    class OracleVisionAI brown
    class OtherVisionAPI brown
```

# Code

In [None]:
# !pip install faiss-cpu

In [2]:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# create an app to handle file upload and dynamically search FAISS Index

1. create a chatbot like app
2. handle upload image (crop and resize to right dimensionality)
3. query search and result
4. display result back

# Next Steps : building a Pro Enterprise app

- Faster search - index tuning
- running on GPUs
- arranging data to optimize queries