This is the Git repository of our experiment on reimplementing Babelfy. We presented this at the CLiN26 (26th Computational Linguistics in the Netherlands) conference.
Steps 1-4 need to be executed once for good. Steps 5-8 need to be executed once per test set.
Setup local BabelNet endpoint - Please go to https://github.com/minhlab/babelnet-lookup and setup your own BabelNet API endpoint (useful for non-Java programs).
./genrel.sh(make sure you adjust the settings to your local environment). This script generates a text file called
relations.txtwith all triples in BabelNet. It also creates
name_colldatabase collection in MongoDB which contains all names from BabelNet. This is essential for the process of generating candidates using partial matching.
Populate your local database (we use mongodb), in order to access all BabelNet data easily and perform lookups on partial matches. To do this, run
semsig.sh phase1. Make sure you have the path to
relations.txtsetup correctly. Duration: 100 min
semsig.sh phase2to generate weights based on triangular relations. These weights are useful for the building of semantic signature later. -> Described in Section 5 of the paper Duration: 7.5 hours
Generate candidates (run python candidates.py) -> Section 6 of the paper Duration: 17 mins
Generate semantic signature database structure (run
semsig.sh phase3). -> Algorithm 1 in the paper, section 5 Duration: ~3 days
Do the Babelfy disambiguation algorithm (run python disambiguate.py) -> Algorithm 2 and 3 in the paper, section 7 Duration: 10 hours