# Capstone Project - Student Hostel Near Engineering Colleges in Hyderabad  

# Introduction / Business Problem

Hyderabad is the capital and largest city of the South Indian state of Telangana. It has a population of over 20 million people spread across 650 square kilometres. Being one of the top IT services companies’ destination in the country and offering ample employment opportunities has resulted in Hyderabad being home to 131 engineering colleges and thus attracting thousands of students from across the country. 

With scores of students flocking the city from across the country and near by towns & cities, the demand for student accomodation has gradually increased. Sensing a lucrative business opportunity, one of my friends is interested in setting up a student hostel near engineering colleges and tasked me with zeroing on an area with highest density of engineering college, so that he has a steady supply of occupants for his hostel. 

Through this project I would be exploring Hyderabad for engineering colleges and find area/neighbourhood with highest density of engineering colleges. 

# Data Section

Firstly, I obtained postal code details all the localities of Hyderabad. The data was extracted using the below python library:

There is a python library pgeocode (https://github.com/symerio/pgeocode) which is high performance off-line querying of GPS coordinates, region name and municipality name from postal codes. Distances between postal codes as well as general distance queries are also supported. The used GeoNames database includes postal codes for 83 countries.
In this library I used ' index postal_codes' function which creates a data frame of unique postal codes of a given country. 

The data frame consists of following columns:

    •	country code: iso country code, 2 characters
    •	postal code: postal code
    •	place name: place name (e.g. town, city etc)
    •	state_name: 1. order subdivision (state)
    •	state_code: 1. order subdivision (state)
    •	county_name: 2. order subdivision (county/province)
    •	county_code: 2. order subdivision (county/province)
    •	community_name: 3. order subdivision (community)
    •	community_code: 3. order subdivision (community)
    •	latitude: estimated latitude (wgs84)
    •	longitude: estimated longitude (wgs84)
    •	accuracy: accuracy of lat/lng from 1=estimated to 6=centroid

Since images are not shown in github, here is the link (https://github.com/RamakanthDontula/Coursera_Capstone/blob/master/Hyd%20Loc%20Code.png)

Secondly, I needed list of engineering colleges in Hyderabad. Luckily, these were already available at https://en.wikipedia.org/wiki/List_of_engineering_colleges_in_Telangana 

Lastly, I merged localities data with engineering colleges data on Locality to get Lat, Lon values for engineering colleges. 

Data Wrangling like dropping unnecessary columns, blank values and duplicate rows were done to ge the desired data frame.

## Methodology

Geo location codes of all the localities of Hyderabad are pulled using pgeocode library

Engineering colleges in Hyderabad data was collected from https://en.wikipedia.org/wiki/List_of_engineering_colleges_in_Telangana and cleaned and processed into a dataframe.

FourSquare be used to locate all areas in the city. Engineering Colleges details with geolocation data is added to the dataframe.

Data is processed to calculate additional values like engineering colleges density in each locality

Finally, the data is visually plotted using graphing from various Python libraries.

## Code

### 1. Importing Necessary Libraries

In [3]:
!pip install lxml
!pip install pgeocode
!pip install geopy
import pandas as pd
import pgeocode
import requests
from geopy.geocoders import Nominatim

Collecting pgeocode
  Downloading https://files.pythonhosted.org/packages/86/44/519e3db3db84acdeb29e24f2e65991960f13464279b61bde5e9e96909c9d/pgeocode-0.2.1-py2.py3-none-any.whl
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1


### 2. Scraping the web for the list of Enigneering Colleges

In [4]:
# Scraping the web page
df=pd.read_html("https://en.wikipedia.org/wiki/List_of_engineering_colleges_in_Telangana")[0]

In [5]:
df.head()

Unnamed: 0,Short Name,Full Name,Website,! Est. Year,Affiliated University,Revenue Division,District,Ownership,Additional information,Unnamed: 9
0,ACEEC,ACE Engineering College,https://www.aceec.ac.in,2007,"JNTU, Hyderabad",Ankushapur,Medchal-Malkajgiri,[Private],With a Difference in Excellence,
1,Mrits,Malla reddy institution of technology and science,https://www.mrits.ac.in,2005,"JNTU, Hyderabad","masimmagudha,dhulapally",Medchal-Malkajgiri,[Private],,
2,RGUKT-Basar,Rajiv Gandhi University of Knowledge Technologies,http://www.rgukt.ac.in/,2008,Autonomous,Basar,Nirmal,Public,,
3,VBIT,Vignana Bharathi Institute of technology,http://vbithyd.ac.in//,2004,Autonomous,Ghatkesar,Medchal-Malkajgiri,[Private],,
4,AURP,Aurora Group of Institutions,http://www.aurora.in/,1989,"JNTU, Hyderabad",Ghatkesar,Medchal-Malkajgiri,[Private],,


In [6]:
df.drop(['Short Name', 'Website', '! Est. Year', 'Affiliated University', 'Ownership', 'Additional information', 'Unnamed: 9'], axis=1, inplace=True)

In [8]:
df.head()

Unnamed: 0,Full Name,Revenue Division,District
0,ACE Engineering College,Ankushapur,Medchal-Malkajgiri
1,Malla reddy institution of technology and science,"masimmagudha,dhulapally",Medchal-Malkajgiri
2,Rajiv Gandhi University of Knowledge Technologies,Basar,Nirmal
3,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri
4,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri


In [9]:
df.rename(columns={'Full Name':'College','Revenue Division':'Locality'}, inplace=True)

In [10]:
df.head()

Unnamed: 0,College,Locality,District
0,ACE Engineering College,Ankushapur,Medchal-Malkajgiri
1,Malla reddy institution of technology and science,"masimmagudha,dhulapally",Medchal-Malkajgiri
2,Rajiv Gandhi University of Knowledge Technologies,Basar,Nirmal
3,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri
4,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri


In [12]:
#getting lAt, Lon values of localities in Hyderabad City
country = pgeocode.Nominatim('in')
Postcode = country._index_postal_codes()
Telangana = Postcode[Postcode.state_name == 'Telangana']
HydLoc = Telangana[Telangana.county_name == 'Hyderabad']
HydLoc.head()

Unnamed: 0,country code,postal_code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
8286,IN,500001,"Gandhi Bhawan (Hyderabad), Moazzampura, Hydera...",Telangana,40,Hyderabad,536.0,Nampally,,17.3862,78.462,1
8287,IN,500002,"Hyderabad Jubilee H.O, Moghalpura",Telangana,40,Hyderabad,536.0,Charminar,,17.3862,78.462,1
8288,IN,500003,"Kingsway, Secunderabad H.O",Telangana,40,Hyderabad,536.0,Secunderabad,,17.3862,78.462,1
8289,IN,500004,"Bazarghat (Hyderabad), Khairatabad H.O, Parish...",Telangana,40,Hyderabad,536.0,Khairatabad,,17.3872,78.4621,4
8290,IN,500005,"Jalapally, Crp Camp (Hyderabad), Keshogiri, Ba...",Telangana,40,Hyderabad,536.0,Hyderabad,,17.3102,78.4997,3


In [13]:
HydLoc.shape

(104, 12)

In [14]:
#Dropping unnecessary columns
HydLoc.drop(['country code', 'postal_code','place_name','state_name','state_code', 'county_name', 'county_code','community_code','accuracy'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [15]:
HydLoc.head()

Unnamed: 0,community_name,latitude,longitude
8286,Nampally,17.3862,78.462
8287,Charminar,17.3862,78.462
8288,Secunderabad,17.3862,78.462
8289,Khairatabad,17.3872,78.4621
8290,Hyderabad,17.3102,78.4997


In [16]:
HydLoc.rename(columns={'community_name':'Locality'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [17]:
#Merging Hyderbad localities to colleged list for lat, lon values for colleges 
HydMerge = pd.merge(df, HydLoc, on='Locality')

In [18]:
HydMerge.head()

Unnamed: 0,College,Locality,District,latitude,longitude
0,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
1,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699
2,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
3,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699
4,Vignan Institute of management and technology ...,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402


In [19]:
HydMerge.shape

(73, 5)

In [20]:
HydMerge.head(30)

Unnamed: 0,College,Locality,District,latitude,longitude
0,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
1,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699
2,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
3,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699
4,Vignan Institute of management and technology ...,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
5,Vignan Institute of management and technology ...,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699
6,Princeton Institute of engineering and technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
7,Princeton Institute of engineering and technology,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699
8,Nalla Malla Reddy Engineering College,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
9,Nalla Malla Reddy Engineering College,Ghatkesar,Medchal-Malkajgiri,17.3898,78.4699


In [22]:
#Dropping duplicates
HydCollege = HydMerge.drop_duplicates(['College'])

In [23]:
HydCollege.shape

(25, 5)

In [24]:
HydCollege.head()

Unnamed: 0,College,Locality,District,latitude,longitude
0,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
2,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
4,Vignan Institute of management and technology ...,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
6,Princeton Institute of engineering and technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
8,Nalla Malla Reddy Engineering College,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402


In [25]:
HydCollege_Final = HydCollege.reset_index(drop=True)

In [26]:
HydCollege_Final.head()

Unnamed: 0,College,Locality,District,latitude,longitude
0,Vignana Bharathi Institute of technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
1,Aurora Group of Institutions,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
2,Vignan Institute of management and technology ...,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
3,Princeton Institute of engineering and technology,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402
4,Nalla Malla Reddy Engineering College,Ghatkesar,Medchal-Malkajgiri,17.3535,78.2402


In [27]:
address = 'Hyderabad, IN'

geolocator = Nominatim(user_agent="hyd_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad are 17.38878595, 78.46106473453146.


In [28]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
# create map of Hyderabad using latitude and longitude values
map_hyderabad = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(HydCollege_Final['latitude'], HydCollege_Final['longitude'], HydCollege_Final['Locality']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hyderabad)  
    
map_hyderabad

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |   py36h9f0ad1d_1         149 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    openssl-1.1.1f             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

In [29]:
count = HydCollege_Final['Locality'].value_counts() 

In [30]:
print(count)

Ghatkesar      6
Hyderabad      5
Bandlaguda     4
Saroornagar    3
Saidabad       2
Nampally       1
Khairatabad    1
Uppal          1
Shaikpet       1
Name: Locality, dtype: int64


## Results & Discussion

I scrapped the web for list of engineering colleges in Hyderabad and obtained an initial list of colleges. Since the geo location codes were not available, I used pgeocode library to get geo location codes for localities and merged it with the engineering collgese list. It was found that there were 104 localities in Hyderabad and 25 localities had engineering collges. 
Based on the analysis above, it is found that the Locality of Ghatkesar has highest density of engineering colleges at 6. Thus it was suggested to set up student hostel in this locality.   

# Conclusion

The purpose of the project was to find areas in Hyderabad for highest density of engineering colleges for setting up a student hostel. Initially, the geographical coordinates of the localities of hyderabad were extracted as geo codes were not directly available for engineering colleges. The dataframe of Hyderabad localities was merged with colleges list on location to get colleges geographical cordinates. The results were mapped using the folium library for visual representation. From the results, Ghatkesar locality was found to have maximum college density. 