GitHub - ZJU-DAILY/FeLeDetect: Source code for FeLeDetect: Federated Learning for Error Detection over Different Data Sources

FeLeDetect: Federated Learning for Error Detection over Different Data Sources

FeLeDetect, a federatedlearning-based error detection approach, which utilizes different data sources to improve the quality of error detection without privacy leakage. First, a graph-based error detection model GEDM is presented to capture sufficient data features from each data source for FeLeDetect. Then, an information-lossless federated learning mechanism is proposed to collaboratively train GEDM over different data sources without privacy leakage. Furthermore, we design a series of optimizations to reduce the communication cost during the federated learning and the manual labeling effort.

Requirements

Python 3.7
PyTorch 1.7.1
torch_scatter 2.0.7
Nvidia GPU with cuda 10.1

Please refer to the source code to install all required packages in Python

Datasets

We conduct experiments on three real-life datasets with differnet types of data errors, including substitute errors, missing values, violated attribute dependencies, and format issues.

Run Experimental Case

To train the FeLeDetect for error detection over different data sources in the Federated senario on DA_5:

python fed_main.py -dataset DA_5

To train the GEDM for error detection over single data source DA_5 in the Cetralized senario:

python main.py -dataset DA_5 -whole true

To train the GEDM for error detection over single data source DA_5_1 in the Local senario:

python main.py -dataset DA_5_1

Acknowledgement

We use the code of Raha.

The original datasets DBLP-ACM is from https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md

The original dataset flights is from http://lunadong.com/fusionDataSets.htm

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
raha		raha
.gitignore		.gitignore
README.md		README.md
auto_label.py		auto_label.py
cipher.py		cipher.py
compress.py		compress.py
data_loader.py		data_loader.py
fed_label.py		fed_label.py
fed_layer.py		fed_layer.py
fed_main.py		fed_main.py
fed_model.py		fed_model.py
layer.py		layer.py
main.py		main.py
model.py		model.py
utils.py		utils.py

ZJU-DAILY/FeLeDetect

Folders and files

Latest commit

History

Repository files navigation

FeLeDetect: Federated Learning for Error Detection over Different Data Sources

Requirements

Datasets

Run Experimental Case

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages