Skip to content

Latest commit

 

History

History
58 lines (32 loc) · 1.61 KB

README.md

File metadata and controls

58 lines (32 loc) · 1.61 KB

README v1.0 / 2015-08-17

Crawler

Introduction

We needed a crawler to find files on our website, which are not of the "usual pictures, slides and office" file types. Everything that is not "usual" and might create "problems", should be found and put on a list for review. This is exactly what this crawler does.

Special thanks to Johannes Lorenz for allowing to reuse his code.

Usage

crawler$ groovy src/de/fau/rrze/pp/crawler/Crawler.groovy

Contributing

Issue a pull request. It will be evaluated and in all likelihood merged.

Help

Currently there is no help beside of knowledge and understanding ... ☹

Installation

Requirements

Clone this repository

git clone https://github.com/RRZE-PP/crawler.git

Configuration

Change the content of the list seedUrls.add("") in src/de/fau/rrze/pp/crawler/Crawler.groovy (starting at line 43).

Pay attention to use proper URLs!

Credits

Contact

License

This project is licensed under GNU GPL V 3. See LICENSE for details.