-
Notifications
You must be signed in to change notification settings - Fork 3
Admin Url Upload
With this plugin you are able to maintain your urls. All urls you wat to upload must be zipped into a zip archive. Four different types of url’s are supported.
- Start Urls
- Limit Urls
- Exclude Urls
- Metadata Urls
Example fetch all urls under http://lucene.apache.org/nutch/ excepting http://lucene.apache.org/nutch/bot.html
Start Urls
A start url file contains a flat list of your start urls. For our example we need only one start url “http://lucene.apache.org/nutch/index.html”
Limit Urls
A limit url file contains a flat list of limit urls we want to fetch. For our example we need only one limit url “http://lucene.apache.org/nutch/”.
Exclude Urls
A exclude url file contains a flat list of limit urls we dont want to fetch. For our example we need only one exclude url “http://lucene.apache.org/nutch/bot.html”.
Metadata Urls
A exclude url file contains a flat list of urls with metadatas. The file has the format:
| url | tab | metaKey: | tab | metaValue | tab | metaValue | .. | tab | metaKey: | tab | metaValue | .. |
For our example we want to give the url “http://lucene.apache.org/nutch/apidocs-1.0/” the metadata foo:1.0 and the url “http://lucene.apache.org/nutch/apidocs-0.9/” the metadata foo:0.9
| http://lucene.apache.org/nutch/apidocs-1.0 | foo: | 1.0 | |
| http://lucene.apache.org/nutch/apidocs-0.9 | foo: | 0.9 | |
| http://lucene.apache.org/nutch/apidocs-0.8 | foobar: | 0.9 | 1.0 |
Upload Url Zip Files
Now zip all these urls for example
- startUrls.txt → startUrls.zip
- limitUrls.txt → limitUrls.zip
- excludeUrls.txt → excludeUrls.zip
- metadataUrls.txt → metadataUrls.zip
and upload these zip files.
Index Metadatas
If you want to index the metadatas which you have uploaded, read the section Index Metadatas.
Note: Metadata and Black White Filtering is enabled by default. You can disable this in the Configuration Screen Admin Configuration.
