Spidy

A simple concurrent golang web crawler

Notice that this crawler is work-in-progress. Many core features may subject to change in the the future. Please use with caution.

Quick start

From main.go

crawler := controller.Controller {
	&schedule.SimpleSchedule{}, // select schedule 
	10,                         // number of workers
}

crawler.Run(
	controller.Request {
	Url:       "www.google.com", // URL
	ParseFunc: parser.Parse,     // Parser to parse the content from URL
},
	controller.Request{ .        // support multiple seed request
	Url:       "www.bing.com",
	ParseFunc: parser.Parse,
})

Features to implement

More choice of UA
Improve performance of Schedule
Implement Parser
Implement Pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
controller		controller
fetcher		fetcher
parser		parser
pipeline		pipeline
schedule		schedule
README.md		README.md
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

controller

controller

fetcher

fetcher

parser

parser

pipeline

pipeline

schedule

schedule

README.md

README.md

main.go

main.go

Repository files navigation

Spidy

Quick start

Features to implement

About

Releases

Packages

Languages

Olament/Spidy

Folders and files

Latest commit

History

Repository files navigation

Spidy

Quick start

Features to implement

About

Topics

Resources

Stars

Watchers

Forks

Languages