A Web based application used to scrape the data of all the engineering colleges in a particular city from the website Shiksha.com. The app creates a URL of Shiksha.com for any city as an input and scrapes the data of all the engineering colleges located in that city using RegEx.
Before trying to run the project create a database for the project by importing colleges.sql file present inside model folder.
- Type in following commands on CLI of Cloud9 if using CS50 IDE
- apache50 start ./project_name
- mysql50 start
- If using XAMPP, simply store the project in the htdocs folder.
If you're not using a proxy server update the arguments of curl function in city.php accordingly. curl function is used twice in city.php with proxy and authentication. curl function is implemented in helpers.php where required parameters are set to fetch a page.
curl($pageUrl, "host:port", "username:password")
By default all the above fields are used, just remove any field except $pageUrl, if not applicable to your network.
- helpers.php
- commonly used functions
- index.php
- home page HTML
- city.php
- main file
- gets HTML using curl
- scrapes it
- stores scraped data in database
- generates table HTML from scraped data
- scraper.js
- displays loading icon and gets data from backend
This project is licensed under the MIT License - see the LICENSE.md file for details.