course project for Introduction of Golang on imooc.
- Golang 1.11
- Elasticsearch 6.5.4
- Docker
/crawler
/crawler-distributed
/frond-end
docker run -d -p 9200:9200 elasticsearch:x.x.x (your es version)
(under project root directory) cd crawler
go run main.go
docker run -d -p 9200:9200 elasticsearch:x.x.x (your es version)
(under crawler-distributed) cd persist
go run itemSaver.go
(under crawler-distributed) cd worker/server
go run worker.go (start as many server as you want, as long as you add port configuration and set them in config.go)
(under project root directory) cd crawler-distributed
go run main.go
(under project root directory) cd front-end
go run start.go
- Crawl more website, with css selector or xpath (instead of regular expression).
- Handle with anti-crawl mechanism (qps limit, encrypted cookie), or follow robots agreement.
- Login mechanism.
- Put De-dup in a separate module (with Redis).
- Optimize ES search quality.
- A more handy front-end page.
- Use these data to play with AI.
- Use Docker + Kubernetes to package and deploy.