GSoC 2017: Final Report

Shashank Motepalli edited this page Aug 27, 2017 · 4 revisions

Unsupervised Learning of DBpedia Taxonomy

Student : Shashank Motepalli

Mentors: Marco Fossati, Dimitris Kontokostas

About the project:

Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project tries to extract structured information from Wikipedia and transform it into RDF.

The main classification system of DBpedia depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. DBTax provides an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several NLP and interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users.

Goals

The approach to unsupervised learning of taxonomy was presented in DBTax paper. Streamline & improve the approach that is described in the paper and make it easy to run on a new DBpedia release.

My Work

Repository I contributed: DBTax
Link to my Daily Progress page

Pull Requests

  • Worked on fixing inconsistencies with expected output, improvements in Stage 3 and 4. The entire pipeline working version.PR7 (in review)
  • Cycle removal code in Page type assignment PR6 (merged)
  • Remaining steps in Stage 3: T-Box generation and Stage 4: Page type assignment steps PR5 (closed)
  • Hierarchy Generation Attempt, integrated Logger, ported to Stanford NLP PR4 (closed)
  • Stage 2: Prominent node discovery step, Automated Threshold calculation approach PR3 (merged)
  • Stage 1: Leaf Extraction Step. PR 2 (closed)
  • Scripts to download Wikidumps PR1 (merged)

Open issues

The entire pipeline works well for English.

  • To enable faster testing, we planned to integrate automated testing and CI. This will be done in next few weeks.
  • There are a few open challenges which are encountered and may be worked upon from a research perceptive Open Challenges

Acknowledgement

I would like to thank every member of DBpedia community, especially my mentors, Marco Fossati and Dimitris Kontokostas, for being so nice and helpful. I have learnt a lot in the past 3 months and it has been a great experience to be a part of this wonderful community. I also like to thank DBpedia and Google for giving me this opportunity.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.