In [1]:
import pandas as pd 

## 1. Download and Explore Dataset

In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains the place names as well as the the latitude and longitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web. The dataset is obtained from http://www.geonames.org/export/zip/. The in.zip file in the page gives the details for almost 154809 rows representing several places in India. This dataset is downloaded and the IN.txt file is used for this assignment.

### Load and explore the data

Next, let's load the data.

In [2]:
df = pd.read_csv("IN.txt",sep='\t',names=['country code','postal code','place name','admin name1','admin code1','admin name2','admin code2','admin name3','admin code3','latitude','longitude','accuracy']) 
df

Unnamed: 0,country code,postal code,place name,admin name1,admin code1,admin name2,admin code2,admin name3,admin code3,latitude,longitude,accuracy
0,IN,744101,Marine Jetty,Andaman & Nicobar Islands,1,South Andaman,,Portblair,,11.6667,92.7500,3
1,IN,744101,Port Blair,Andaman & Nicobar Islands,1,South Andaman,,Port Blair,,11.6667,92.7500,4
2,IN,744101,N.S.Building,Andaman & Nicobar Islands,1,South Andaman,,Portblair,,11.6667,92.7500,3
3,IN,744102,Haddo,Andaman & Nicobar Islands,1,South Andaman,,Port Blair,,11.6833,92.7167,4
4,IN,744102,Chatham,Andaman & Nicobar Islands,1,South Andaman,,Portblair,,11.7000,92.6667,3
...,...,...,...,...,...,...,...,...,...,...,...,...
154804,IN,509412,Mustipally,Telangana,40,Mahabub Nagar,,Peddakothapally,,16.6514,78.0760,3
154805,IN,509412,Peddakarpamula,Telangana,40,Mahabub Nagar,,Peddakothapally,,16.6514,78.0760,3
154806,IN,509412,Gantaraopally,Telangana,40,Mahabub Nagar,,Peddakothapally,,16.6514,78.0760,3
154807,IN,509412,Ganyagula,Telangana,40,Mahabub Nagar,,Nagarkurnool,,16.6514,78.0760,3


### Clean the Dataset to choose only the relevant rows and columns needed for this assignment

In [3]:
mask = (df['admin name2'] == 'Chennai')
df_place = df[mask]
df_place.loc[:,('postal code')] = df_place.loc[:,('postal code')].apply(str)
df_place.dtypes
df_place.drop(['admin code1','admin code2','admin name3','admin code3','accuracy'], axis = 1,inplace = True)

In [4]:
df_place

Unnamed: 0,country code,postal code,place name,admin name1,admin name2,latitude,longitude
82499,IN,600001,Chennai G.P.O.,Tamil Nadu,Chennai,13.0933,80.2842
82500,IN,600001,MPT AO,Tamil Nadu,Chennai,13.0933,80.2842
82501,IN,600001,Flower Bazaar,Tamil Nadu,Chennai,13.0933,80.2842
82502,IN,600001,Sowcarpet,Tamil Nadu,Chennai,13.0897,80.2803
82503,IN,600001,Mannady (Chennai),Tamil Nadu,Chennai,13.0969,80.2881
...,...,...,...,...,...,...,...
82774,IN,600113,TTTI Taramani,Tamil Nadu,Chennai,13.0380,80.2301
82775,IN,600113,Tidel Park,Tamil Nadu,Chennai,13.0380,80.2301
82783,IN,600118,Erukkancheri,Tamil Nadu,Chennai,13.1256,80.2531
82784,IN,600118,Rv Nagar,Tamil Nadu,Chennai,13.1281,80.2557


In [5]:
df_place = df_place.groupby(['latitude','longitude']).agg({'country code':lambda x: ','.join(sorted(pd.Series.unique(x))),'postal code': lambda x: ','.join(sorted(pd.Series.unique(x))),'place name':lambda x: ','.join(sorted(pd.Series.unique(x))),'admin name1':lambda x: ','.join(sorted(pd.Series.unique(x))),'admin name2':lambda x: ','.join(sorted(pd.Series.unique(x)))}).reset_index()

In [6]:
df_place

Unnamed: 0,latitude,longitude,country code,postal code,place name,admin name1,admin name2
0,12.9194,80.1697,IN,600078,Kalaignar Karunanidhi Nagar,Tamil Nadu,Chennai
1,12.9675,80.2598,IN,600041,Valmiki Nagar,Tamil Nadu,Chennai
2,12.9855,80.2604,IN,600041,Tiruvanmiyur,Tamil Nadu,Chennai
3,13.0156,80.2467,IN,600085,Kotturpuram,Tamil Nadu,Chennai
4,13.025,80.2575,IN,600028,"Raja Annamalaipuram,Ramakrishna Nagar (Chennai)",Tamil Nadu,Chennai
5,13.0269,80.2406,IN,600035,Nandanam,Tamil Nadu,Chennai
6,13.0292,80.2708,IN,600004,"Mandaveli,Mylapore,Vivekananda College Madras",Tamil Nadu,Chennai
7,13.038,80.2301,IN,600113,"TTTI Taramani,Tidel Park",Tamil Nadu,Chennai
8,13.0389,80.2258,IN,600033,"Mambalam R.S.,West Mambalam",Tamil Nadu,Chennai
9,13.0433,80.2528,IN,600018,"Pr. Accountant General,Teynampet",Tamil Nadu,Chennai


In [7]:
df_place.to_csv('Chennai.csv')