Skip to content

DataIntelligenceCrew/koios-semantic-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KOIOS

KOIOS is an efficient and exact filter verification framework to find the top-k sets with the maximum biparitie matching to a query set. Here we use KOIOS for semantic overlap search, where semantic overlap is the maximum biparite matching score between the tokens of the query set and the candidate set.

Installation

  • Clone the repository onto your local machine.
  • Download the fasttext-database from here, and save it in the root folder.
  • Make sure all paths are correct in the Makefile
  • Run the following commands to initialize environment and Intel-OneAPI:
source bashrc
. /opt/intel/oneapi/setvars.sh --config=intel.config

Usage

For Syntactic Overlap Search

make koios-semantic
./build/koios-semantic <data-lake-path> <query> <result-location> <sim-threshold> <k> <number-of-partitions> 1

For Semantic Overlap Search using KOIOS

make koios-semantic
./build/koios-semantic <data-lake-path> <query> <result-location> <sim-threshold> <k> <number-of-partitions> 0

For Semantic Overlap Search using Baseline

make baseline-semantic
./build/baseline-semantic <data-lake-path> <query> <result-location> <sim-threshold> <k> 

Dependencies

Cmake version 3.18 (version important):

If older version installed: apt remove --purge --auto-remove cmake

Faiss index by Facebook:

  • Refer INSTALL.md for details
  • For now, if encountering any CUDA error, please use -DFAISS_ENABLE_GPU=OFF when compiling faiss
  • Step 3: sudo make install "is not optional"

Sqlite3:

apt-get install sqlite3 libsqlite3-dev

FastText:

- Use the following API to generate the FastTextDB https://github.com/ekzhu/go-fasttext

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published