Skip to content
This repository

Python Script to extract college names from UGC, India website.

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 files
Octocat-spinner-32 resultfiles
Octocat-spinner-32 ugc_aksh
Octocat-spinner-32 README.md
Octocat-spinner-32 college.py
Octocat-spinner-32 engg.py
Octocat-spinner-32 test-page.html
Octocat-spinner-32 test-page2.html
Octocat-spinner-32 test-page3.html
Octocat-spinner-32 testcollege.py
Octocat-spinner-32 testengg.py
README.md

Extracting college names, address from UGC site

Author: Karambir Singh Nain

This include a python script which I made to extract college names from ugc main site. It uses reguler expressions. It outputs a file name colleges.txt with all college names and address. I am able to extract 7758 colleges from 8000 in the list. Most which I couldn't extract were bad data entries in UGC's site.

I wanted to practice Rgex a bit.

It can also be done with string find methods.

Requirements:

  1. UrlLib2 - for downloading html files from usc website.

  2. Re - regular expressions module.

If you have any query, give a pull request.

Something went wrong with that request. Please try again.