Skip to content

Olament/Spidy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spidy

A simple concurrent golang web crawler

Notice that this crawler is work-in-progress. Many core features may subject to change in the the future. Please use with caution.

Quick start

From main.go

crawler := controller.Controller {
	&schedule.SimpleSchedule{}, // select schedule 
	10,                         // number of workers
}

crawler.Run(
	controller.Request {
	Url:       "www.google.com", // URL
	ParseFunc: parser.Parse,     // Parser to parse the content from URL
},
	controller.Request{ .        // support multiple seed request
	Url:       "www.bing.com",
	ParseFunc: parser.Parse,
})

Features to implement

  • More choice of UA
  • Improve performance of Schedule
  • Implement Parser
  • Implement Pipeline

About

A simple concurrent golang web crawler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages