Skip to content

Dockerized Apache Nutch 2.3.1 configured for MongoDB

Notifications You must be signed in to change notification settings

cicdteam/nutch-mongo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Nutch wtih MongoDB as backend

Based on official Apache Nutch release (current Nutch version is 2.3.1).

Supported tags and respective Dockerfile links

Used technologies

  • Nutch 2.3.1
  • OpenJDK 8
  • Gora 0.6.1
  • Gora MongoDB 0.6.1

Start Nutch in development mode

Use docker-compose.yml file to run MongoDB and Apache Nutch

docker-compose up -d
docker-compose logs -f nutch

Start Nutch in production mode

  • Create youw own Dockerfile
FROM pure/nutch-mongo:alpine

ADD urls/ /urls/
ADD conf/ /nutch/conf/
docker build -t my-nutch .
  • Run your own Nutch with desired count of iterations:
docker run \
    -d
    -e ITERATIONS=5 \
    --name my-crawler \
    my-nutch
  • Check logs
docker logs -f my-nutch

About

Dockerized Apache Nutch 2.3.1 configured for MongoDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages