Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea
famer-clusterPostProcessing
famer-clustering
famer-common
famer-example
inputGraphs
FAMER_Clustering.iml
LICENSE
README.md
XMLInput.suml
famer.iml
pom.xml

README.md

FAMER_Clustering

FAMER is a research project designed for FAst Multi-source Entity Resolution. It is implemented on top of Apache Flink and the graph analytics tool Gradoop. The framework is still highly under development. So the whole code of FAMER is not publicly available yet. This repository provides the new clustering algorithm CLIP as well as the cluster repair algorithm RLIP that we presented in this paper at Extended European Semantic Web Conference in June (ESWC 2018).

In this repository you can find the following modules of FAMER:

famer-clustering: it contains the implementation of CLIP and the baseline method Connected Components.
famer-clusterPostProcessing: it contains the implementation of overlapResolve algorithm. Even though overlapped entities shared between multiple clusters is meaningless in the context of entity resolution, some ER clustering algorithms result into overlapped clusters. The overlapResolve algorithm resolves entities that are shared between several clusters and assigns them to only one cluster.
famer-common: it contains some APIs that are used in other modules.
famer-example: it contains the example scripts for both CLIP and RLIP algorithms as well as computing the quality of input graphs and clustering output in terms of FMeasure.
inputGraphs: in this folder you can find all generated input graphs by FAMER that we reported in our this papers ([1] and [2]) for all three datasets we listed and made publicly available in FAMER homepage.