forked from apache/nutch
-
Notifications
You must be signed in to change notification settings - Fork 3
Index Metadatas
mbauhardt edited this page Sep 13, 2010
·
6 revisions
If you want to index the metadatas from the url metadata file which you have uploaded (see Admin Url Upload) then you have to configure the ADMIN-GUI-INSTALLATION/plugins/index-metadata/plugin.xml from the index-metadata plugin.
Edit the extension-implementation-id MetadataIndexingFilter and add the raw-field foo to index all values untokenized with key foo. If you use fields instead raw-fields all values will be index as tokenized fields.
<extension id="org.apache.nutch.indexer.metadata.index"
name="Nutch Metadata Indexing Filter"
point="org.apache.nutch.indexer.IndexingFilter">
<implementation id="MetadataIndexingFilter"
class="org.apache.nutch.indexer.metadata.MetadataIndexingFilter">
<parameter name="raw-fields" value="foo"/>
</implementation>
</extension>
To make a query for example “http foo:0.1” or “http foo:0.9” you have to configure the MetadataQueryFilter. Edit the plugin.xml.
<extension id="org.apache.nutch.indexer.metadata.query"
name="Nutch Metadata Query Filter"
point="org.apache.nutch.searcher.QueryFilter">
<implementation id="MetadataQueryFilter"
class="org.apache.nutch.indexer.metadata.MetadataQueryFilter">
<parameter name="raw-fields" value="foo"/>
</implementation>
</extension>
If you use raw-fields then a RawFieldQueryFilter is used. If you use fields instead raw-fields the FieldQueryFilter is used.