Skip to content
This repository

Python Script to extract college names from UGC, India website.

branch: master

Fetching latest commit…


Cannot retrieve the latest commit at this time

Octocat-spinner-32 files
Octocat-spinner-32 resultfiles
Octocat-spinner-32 ugc_aksh
Octocat-spinner-32 test-page.html
Octocat-spinner-32 test-page2.html
Octocat-spinner-32 test-page3.html

Extracting college names, address from UGC site

Author: Karambir Singh Nain

This include a python script which I made to extract college names from ugc main site. It uses reguler expressions. It outputs a file name colleges.txt with all college names and address. I am able to extract 7758 colleges from 8000 in the list. Most which I couldn't extract were bad data entries in UGC's site.

I wanted to practice Rgex a bit.

It can also be done with string find methods.


  1. UrlLib2 - for downloading html files from usc website.

  2. Re - regular expressions module.

If you have any query, give a pull request.

Something went wrong with that request. Please try again.