We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I've:
The output was :
[non-job]: 2016-06-07 16:47:20 INFO - Starting execution. [non-job]: 2016-06-07 16:47:20 INFO - Version: Norconex HTTP Collector 2.5.0 (Norconex Inc.) [non-job]: 2016-06-07 16:47:20 INFO - Version: Norconex Collector Core 1.5.0 (Norconex Inc.) [non-job]: 2016-06-07 16:47:20 INFO - Version: Norconex Importer 2.5.2 (Norconex Inc.) [non-job]: 2016-06-07 16:47:20 INFO - Version: Norconex JEF 4.0.7 (Norconex Inc.) [non-job]: 2016-06-07 16:47:20 INFO - Version: Norconex Committer Core 2.0.3 (Norconex Inc.) Norconex Minimum Test Page: 2016-06-07 16:47:20 INFO - Running Norconex Minimum Test Page: BEGIN (Tue Jun 07 16:47:20 CEST 2016) Norconex Minimum Test Page: 2016-06-07 16:47:20 INFO - Norconex Minimum Test Page: RobotsTxt support: true Norconex Minimum Test Page: 2016-06-07 16:47:20 INFO - Norconex Minimum Test Page: RobotsMeta support: true Norconex Minimum Test Page: 2016-06-07 16:47:20 INFO - Norconex Minimum Test Page: Sitemap support: false Norconex Minimum Test Page: 2016-06-07 16:47:20 INFO - Norconex Minimum Test Page: Canonical links support: true Norconex Minimum Test Page: 2016-06-07 16:47:20 INFO - Norconex Minimum Test Page: User-Agent: <None specified> Norconex Minimum Test Page: 2016-06-07 16:47:21 INFO - Norconex Minimum Test Page: Initializing sitemap store... Norconex Minimum Test Page: 2016-06-07 16:47:21 INFO - Norconex Minimum Test Page: Done initializing sitemap store. Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - 1 start URLs identified. Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - CRAWLER_STARTED Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: Crawling references... Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - REJECTED_REDIRECTED: http://www.norconex.com/product/collector-http-test/minimum.php Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - REJECTED_FILTER: https://www.norconex.com/product/collector-http-test/minimum.php Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: Re-processing orphan references (if any)... Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: Reprocessed 0 orphan references... Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: Crawler finishing: committing documents. Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: 1 reference(s) processed. Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - CRAWLER_FINISHED Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: Crawler completed. Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Norconex Minimum Test Page: Crawler executed in 2 seconds. Norconex Minimum Test Page: 2016-06-07 16:47:22 INFO - Running Norconex Minimum Test Page: END (Tue Jun 07 16:47:20 CEST 2016)
The text was updated successfully, but these errors were encountered:
OK found the problem : the startUrl starts with http://, which redirects to https:// when accessed, which is rejected.
Modifying the example with the following lines did the trick for me: https://www.norconex.com/product/collector-http-test/minimum.php
<referenceFilters> <filter class="com.norconex.collector.core.filter.impl.RegexReferenceFilter" onMatch="include"> https?://www\.norconex\.com/.* </filter> </referenceFilters>
Sorry, something went wrong.
I have updated the sample configuration files to now point to https instead of http (for the next release).
I have already updated the online copies to reflect this:
Thanks for reporting this.
No branches or pull requests
I've:
The output was :
$ ls examples-output/minimum/
total 24K
drwx------ 6 gm gm 4.0K Jun 7 16:47 .
drwx------ 3 gm gm 4.0K Jun 7 16:47 ..
drwx------ 3 gm gm 4.0K Jun 7 16:47 crawlstore
drwx------ 3 gm gm 4.0K Jun 7 16:47 logs
drwx------ 3 gm gm 4.0K Jun 7 16:47 progress
drwx------ 3 gm gm 4.0K Jun 7 16:47 sitemaps
The text was updated successfully, but these errors were encountered: