Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
contrib/dupefilter middleware: new middleware for duplicate URLs filter Dec 11, 2017
.gitignore Initial commit Oct 1, 2017
.travis.yml update: remove go 1.7+ Dec 12, 2017
Gopkg.lock update Gopkg.toml Jul 27, 2018
Gopkg.toml update Gopkg.toml Jul 27, 2018
LICENSE Initial commit Oct 1, 2017
README.md docs: add BingWallpaper project Jul 28, 2018
compression.go BREAKING: rewrite architecture and all code Nov 29, 2017
compression_test.go test: update test Dec 1, 2017
cookies.go BREAKING: rewrite architecture and all code Nov 29, 2017
cookies_test.go test: fix for go 1.9 Dec 2, 2017
crawler.go fix spider deadlock, remove respCh channel Dec 25, 2017
crawler_test.go test: fix format issues Jul 27, 2018
html.go support XPath query for JSON Jul 27, 2018
html_test.go test: fix format issues Jul 27, 2018
json.go support XPath query for JSON Jul 27, 2018
json_test.go support XPath query for JSON Jul 27, 2018
logger.go feature: new Logger interface, replace log.Logger Dec 22, 2017
main_test.go test: new test file Nov 30, 2017
middleware.go BREAKING: rewrite architecture and all code Nov 29, 2017
pipeline.go BREAKING: rewrite architecture and all code Nov 29, 2017
proxy.go BREAKING: rewrite architecture and all code Nov 29, 2017
proxy_test.go test: add basic auth test Dec 1, 2017
robotstxt.go robots.txt request support proxy Dec 8, 2017
robotstxt_test.go robots.txt request support proxy Dec 8, 2017
spider.go BREAKING: rewrite architecture and all code Nov 29, 2017
xml.go update: replace xquery with xmlquery & htmlquery Dec 5, 2017
xml_test.go test: new test file Dec 1, 2017

README.md

Antch

Build Status Coverage Status Go Report Card GoDoc

Antch, inspired by Scrapy. If you're familiar with scrapy, you can quickly get started.

Antch is a fast, powerful and extensible web crawling & scraping framework for Go, used to crawl websites and extract structured data from their pages.

Get Started

Getting Started

Follow the Getting Started instructions to start your first spider.

Features

  • Polite, highly concurrent web crawler.
  • Powerful and customizable HTTP middleware.
  • Item data pipeline for the web spider.
  • Built-in proxy support (HTTP, HTTPS, SOCKS5).
  • Built-in XPath query support for HTML/XML documents.
  • Easy to use and integrate with your project.

Examples

BingWallpaper - Bing daily wallpaper.

Documentation

See https://github.com/antchfx/antch/wiki