Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
GSoC 2017: Final Report
Unsupervised Learning of DBpedia Taxonomy
Mentors: Marco Fossati, Dimitris Kontokostas
About the project:
Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project tries to extract structured information from Wikipedia and transform it into RDF.
The main classification system of DBpedia depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. DBTax provides an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several NLP and interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users.
The approach to unsupervised learning of taxonomy was presented in DBTax paper. Streamline & improve the approach that is described in the paper and make it easy to run on a new DBpedia release.
- Worked on fixing inconsistencies with expected output, improvements in Stage 3 and 4. The entire pipeline working version.PR7 (in review)
- Cycle removal code in Page type assignment PR6 (merged)
- Remaining steps in Stage 3: T-Box generation and Stage 4: Page type assignment steps PR5 (closed)
- Hierarchy Generation Attempt, integrated Logger, ported to Stanford NLP PR4 (closed)
- Stage 2: Prominent node discovery step, Automated Threshold calculation approach PR3 (merged)
- Stage 1: Leaf Extraction Step. PR 2 (closed)
- Scripts to download Wikidumps PR1 (merged)
The entire pipeline works well for English.
- To enable faster testing, we planned to integrate automated testing and CI. This will be done in next few weeks.
- There are a few open challenges which are encountered and may be worked upon from a research perceptive Open Challenges
I would like to thank every member of DBpedia community, especially my mentors, Marco Fossati and Dimitris Kontokostas, for being so nice and helpful. I have learnt a lot in the past 3 months and it has been a great experience to be a part of this wonderful community. I also like to thank DBpedia and Google for giving me this opportunity.