Skip to content
Browse files


  • Loading branch information...
Depado committed Apr 9, 2018
1 parent f0a8d77 commit 1ab766ffa817f10bf7e73e4708863b3e994878e5
Showing with 56 additions and 0 deletions.
  1. +56 −0
@@ -1,2 +1,58 @@
# launeparser
Parsing newspapers everyday to get a corpus

This program periodically scrapes web pages, two times a day and dumps the
resulting texts to a directory with this format :

newspaper name/

You can specify as many sites to scrapes as you want in the configuration file.

## Usage

Launeparser scrapes newspapers
launeparser [command]
Available Commands:
help Help about any command
scrape Instantly scrape
start Start the server and scraping
version Show build and version
-h, --help help for launeparser
--log.format string one of text or json (default "text")
--log.level string one of debug, info, warn, error or fatal (default "info")
--log.line enable filename and line in logs
--output string output directory (default "out")
Use "launeparser [command] --help" for more information about a command.

## Configure

port: 8012
debug: true
level: debug
format: text
line: true
- url: http://...
name: ...

The `server` part is not needed, as well as the `log` server as there are sane
defaults. Also the `server` part is completely unused when using the
`launeparser scrape` command.

0 comments on commit 1ab766f

Please sign in to comment.
You can’t perform that action at this time.