Simple PHP scraper/crawler written for PHP command line
Hopefully it will help anyone out there... Enjoy ;)
- PHP command line utility (comes with PHP download no need to install web server)
- Add
phpbinary orphp.exepath to PATH environment variable
string $urlStarting URLstring $crawl_regexRegular expression that will be used for link crawlingstring $scrape_regexRegular expression that will be used for data scrapinginteger $levelUsed for recursion, use 0 when calling functionstring $out_fileName of CSV file to export tointeger $max_levelMaximum levels or depth to crawl intostring $domain(Optional) Used for recursion, use "" when calling functioninteger $max_retries(Optional) Number of HTTP retries when timeouts or errors occur (default 3)boolean $use_cache(Optional) True to cache web pages for fast extraction after re-running the script
- Open terminal (cmd, PowerShell or Git Bash for Windows)
- Change directory to script directory
- Run
phpto start scripting mode - Run the scrape function using your required parameters
<?php include 'php_scraper.php';
scrape("https://www.google.com/", "test", "test", 0, "output.csv", 20); ?>
- Press Ctrl+Z then enter to run
To know more about the regular expressions used check my tutorial
I encourage you all to contribute into this simple project to make better and more usable.