Skip to content
This repository has been archived by the owner on Oct 20, 2018. It is now read-only.

Topic selection for topical classification

Ani Tumanyan edited this page Oct 28, 2013 · 3 revisions

You can use dbpedia live extractor.http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework. You need to configure proper extractors(Ex: infobox properties extractor, abstract extractor ..etc). It will download the latest wikipedia dumps and generates the dbpedia datasets.

You may need to make some code changes to get only the required data. One of my colleague did this for German data sets. You still need a lot of disk space for this.

Solution 2(I don't know whether it is really possible or not.):

Do a grep for the required properties on the datasets. You need to know the exact URIs of the properties you want to get.

ex: For getting all the home pages: bzgrep 'http://xmlns.com/foaf/0.1/homepage' dbpedia_2013_03_04.nt.bz2 >homepages.nt

It will give you all the N-triples with homepages. You can load that in the rdf store.

Clone this wiki locally