Skip to content
This repository has been archived by the owner on Apr 17, 2021. It is now read-only.

5amyak/College-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

College Scraper

A Web based application used to scrape the data of all the engineering colleges in a particular city from the website Shiksha.com. The app creates a URL of Shiksha.com for any city as an input and scrapes the data of all the engineering colleges located in that city using RegEx.

Getting Started

Before trying to run the project create a database for the project by importing colleges.sql file present inside model folder.

  1. Type in following commands on CLI of Cloud9 if using CS50 IDE
    • apache50 start ./project_name
    • mysql50 start
  2. If using XAMPP, simply store the project in the htdocs folder.

Deployment

If you're not using a proxy server update the arguments of curl function in city.php accordingly. curl function is used twice in city.php with proxy and authentication. curl function is implemented in helpers.php where required parameters are set to fetch a page.

curl($pageUrl, "host:port", "username:password")

By default all the above fields are used, just remove any field except $pageUrl, if not applicable to your network.

Important Files

  1. helpers.php
    • commonly used functions
  2. index.php
    • home page HTML
  3. city.php
    • main file
    • gets HTML using curl
    • scrapes it
    • stores scraped data in database
    • generates table HTML from scraped data
  4. scraper.js
    • displays loading icon and gets data from backend

Built With

  • Regexr - Regular expression tester with syntax highlighting
  • XAMPP - PHP development environment

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Releases

No releases published

Packages

No packages published