Skip to content

[VLDB 2024] Source code for FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data

License

Notifications You must be signed in to change notification settings

JunHao-Zhu/FusionQuery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FusionQuery

Python implementation of FusionQuery in paper FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data.

Dependencies

  • Python 3.8
  • sentence-transformers 2.2.2
  • faiss-gpu 1.7.2
  • numpy 1.23.1
  • pytorch 1.12.1

Datasets

This repo contains two datasets, Movie and Book. We released KG version of these two datasets in the data. Each data source is stored in three files. Entities in source n, are stored in ent_ids_n, relations are stored in rel_ids_n and triples are stored in triples_n. The queries conducted on the datasets are stored in query.json.

More datasets can be found in this web

Run code

Perform the entire workflow of FusionQuery.

python main.py --data_root "./data/movie" \
--data_name movie \
--fusion_model FusionQuery \
--types JSON KG CSV \
--iters 20 \
--thres_for_query 0.9 \
--thres_for_fusion 0.4

The more detailed information about arguments is listed as follows.

Arguments Explainations Default
--data_root root path of data ../data/movie
--data_name data name used in the current experiment movie
--fusion_model data fusion methods used in the framework (e.g., FusionQuery, DART, CASE, etc.) FusionQuery
--types data types used in the current experiment (a list) JSON KG CSV
--iters maximum iterations for convergence 20
--thres_for_query initial matching threshold $\tau$ 0
--thres_for_fusion threshold for data veracity 0.5
--gpu the gpu device id 0
--seed random seed 2021

About

[VLDB 2024] Source code for FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages