Web Crawler

This is a web crawler made in php. It crawls all the sites traversed through a given web page in depth first manners. It gets its data into an array first and then into the database 5 at a time. Parser.php is the main source code that you need to execute. It makes use of Simple_DOM_Parser. sitemap.xml is the sitemap of a website - johnsonwatch.com you can run php through any local server that supports php. I have used XAMPP Server. Place your files in C:\xampp\htdocs And you are ready to go. I am dumping 5 urls with titles to database at a time. You can store anything that you see as relevant such as 'meta tags'. You can also change max execution time in php.ini file.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Parser.php		Parser.php
README.md		README.md
_config.yml		_config.yml
simple_html_dom.php		simple_html_dom.php
sitemap.xml		sitemap.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser.php

Parser.php

README.md

README.md

_config.yml

_config.yml

simple_html_dom.php

simple_html_dom.php

sitemap.xml

sitemap.xml

Repository files navigation

Web Crawler

About

Releases

Packages

Languages

Gunjan-Satija/Web-Crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

About

Resources

Stars

Watchers

Forks

Languages