Permalink
Browse files

NUTCH-1547 BasicIndexingFilter - Problem to index full title

git-svn-id: https://svn.apache.org/repos/asf/nutch/trunk@1462078 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information...
1 parent 6e42974 commit 2f1ca3e7acc5f745f3bcb0c2efb79497b6c861fb lufeng committed Mar 28, 2013
View
2 CHANGES.txt
@@ -2,6 +2,8 @@ Nutch Change Log
(trunk): Current Development
+* NUTCH-1547 BasicIndexingFilter - Problem to index full title (Feng)
+
* NUTCH-1389 parsechecker and indexchecker to report truncated content (snagel)
* NUTCH-1419 parsechecker and indexchecker to report protocol status (snagel + lewismc)
View
2 conf/nutch-default.xml
@@ -897,7 +897,7 @@
<property>
<name>indexer.max.title.length</name>
<value>100</value>
- <description>The maximum number of characters of a title that are indexed.
+ <description>The maximum number of characters of a title that are indexed. A value of -1 disables this check.
</description>
</property>
View
2 src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
@@ -108,7 +108,7 @@ public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum
// title
String title = parse.getData().getTitle();
- if (title.length() > MAX_TITLE_LENGTH) { // truncate title if needed
+ if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) { // truncate title if needed
title = title.substring(0, MAX_TITLE_LENGTH);
}

0 comments on commit 2f1ca3e

Please sign in to comment.