Skip to content

advanced373/WebCrawler

Repository files navigation

WebCrawler 🕸️

Overview

WebCrawler is an utility software written in Java.

Features

Crawl using a configuration file with following structure:

n_threads=4 delay=100 root_dir=C://root log_level=3 depth=100 It can crawl using robots.txt file

Filter using fileType

Search a keyword

Generate sitemap

Tips

For Sitemap, a Sitemaps folder should be created. For Filter and Search, an existing index.json with a specific structure needed.

Commands

Search <site_name> Filter <site_name> <file_type> Help Search Sitemap <absolute_path_to_site_file> Crawl <use_robots_yes_or_no> <size_limit>

Team

Stoica Mihai 👨‍🎓

Vlîjia Stefan 👨‍🎓

Rosca Stefan 👨‍🎓

Tănase Corina 👩🏼‍🎓

Teacher

Avram Dan 👨‍🏫

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages