Skip to content

Albert-W/crawler-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler-website

It's crawler website using Go language.

This is the mainPage image

This is the homePage image

Features

  • Go language
  • Docker
  • Elastic Search
  • MVC pattern
  • Microservices
  • Singleton -> Concurrent -> Distribute

Installation and go packages

  • go language
  • docker
  • elasticsearch
  • go get golang.org/x/text
  • go get -v github.com/gpmgo/gopm
  • gopm get -g -v golang.org/x/text
  • gopm get -g -v golang.org/x/net/html
  • go get gopkg.in/olivere/elastic.v5

Usage for Concurrent

  • Start Docker.
  • Run Script "docker run -d -p 9200:9200 elasticsearch"
  • Run "src/crawler/main.go", to start the singleton crawler.
  • Run "src/crawler/frontend/starter.go", to view the result in the website.
  • Visit "http://localhost:8888/" in your browser
  • Type in query string with REST format. such as "女 && Age>20"

Usage for Distribute

  • Start Docker.
  • Run Script "docker run -d -p 9200:9200 elasticsearch"
  • Open a Terminal, execute: src\crawler_distributed\persist\server>go run ItemSaver.go --port=1234
  • Open a Terminal, execute: src\crawler_distributed\worker\server>go run worker.go --port=9000
  • Open a Terminal, execute: src\crawler_distributed\worker\server>go run worker.go --port=9001
  • Open a Terminal, execute: src\crawler_distributed>go run main.go --itemsaver_host=":1234" --worker_hosts=":9000,:9001"
  • Run "src/crawler/frontend/starter.go", to view the result in the website.
  • Visit "http://localhost:8888/" in your browser
  • Type in query string with REST format. such as "男 && 已购车"

Architecture

image

Framework

image

Algorithm

image

Reference