SPIDERSCOUT

Module Planning

URL Frontier Module

Responsible for managing & selection of the URLs (for crawlling)

Downloader Module

To download the web page as per the politeness policies

Parser Module

Parses the HTML content and extract the metadata, images, links etc. from the downloaded content

Indexer Module

Responsible for maintaining the inverted index of the retrieved/extracted data

Scheduler Module

Acts as Manager, coordinates the overall crwalling process and assigning tasks to various modules.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
app		app
components		components
crawler		crawler
lib		lib
public		public
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
components.json		components.json
main.py		main.py
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPIDERSCOUT

Module Planning

URL Frontier Module

Downloader Module

Parser Module

Indexer Module

Scheduler Module

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPIDERSCOUT

Module Planning

URL Frontier Module

Downloader Module

Parser Module

Indexer Module

Scheduler Module

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages