# The Battle Of Neighborhoods 

> Applied Data Science Capstone Project

# Project Title: <u> Custom Clustering of Indian Cities</u>


<b>Table of contents</b>

   > 1.Introduction <br>
   > 2.Data Source



### 1. Introduction

The objective of this project is to create a custom clustering interface for users to cluster all the top cities in India based on several factors. The foursquare API is used to obtain details,venues of each city. I've used KMeans clustering to cluster the dataset. And finally I've provided a user interface where a user can input the keyword for which he/she wants to visualize the clusters.

### 2. Data Source

The main dataset (comprising information of 500+ cities in India) is obtained from Kaggle <br><br>
1.Kaggle dataset link: <a>https://www.kaggle.com/zed9941/top-500-indian-cities</a><br>
2.Geo Location of the Cities was obtained from Nominatim from geopy<br>
3.Trending Venue data from foursquare api

<hr>

# <u> PART 1

# Installing and importing libraries

In [1]:
!pip install geocoder
!pip install geopy
!pip install folium
from geopy.geocoders import Nominatim
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
import warnings 
warnings.filterwarnings('ignore')



In [79]:
df= pd.read_csv('cities_r2.csv')
df.head()

Unnamed: 0,name_of_city,state_code,state_name,dist_code,population_total,population_male,population_female,0-6_population_total,0-6_population_male,0-6_population_female,...,literates_female,sex_ratio,child_sex_ratio,effective_literacy_rate_total,effective_literacy_rate_male,effective_literacy_rate_female,location,total_graduates,male_graduates,female_graduates
0,Abohar,3,PUNJAB,9,145238,76840,68398,15870,8587,7283,...,44972,890,848,79.86,85.49,73.59,"30.1452928,74.1993043",16287,8612,7675
1,Achalpur,27,MAHARASHTRA,7,112293,58256,54037,11810,6186,5624,...,43086,928,909,91.99,94.77,89.0,"21.257584,77.5086754",8863,5269,3594
2,Adilabad,28,ANDHRA PRADESH,1,117388,59232,58156,13103,6731,6372,...,37660,982,947,80.51,88.18,72.73,"19.0809075,79.560344",10565,6797,3768
3,Adityapur,20,JHARKHAND,24,173988,91495,82493,23042,12063,10979,...,54515,902,910,83.46,89.98,76.23,"22.7834741,86.1576889",19225,12189,7036
4,Adoni,28,ANDHRA PRADESH,21,166537,82743,83794,18406,9355,9051,...,45089,1013,968,68.38,76.58,60.33,"15.6322227,77.2728368",11902,7871,4031


In [82]:
df.drop(columns=['state_code','state_name','dist_code','population_total','population_male','population_female','0-6_population_total','0-6_population_male','0-6_population_female','literates_female','sex_ratio','child_sex_ratio','effective_literacy_rate_total','effective_literacy_rate_male','effective_literacy_rate_female','total_graduates','male_graduates','female_graduates'],axis=1,inplace=True)
df.head()

Unnamed: 0,name_of_city,literates_total,literates_male,location
0,Abohar,103319,58347,"30.1452928,74.1993043"
1,Achalpur,92433,49347,"21.257584,77.5086754"
2,Adilabad,83955,46295,"19.0809075,79.560344"
3,Adityapur,125985,71470,"22.7834741,86.1576889"
4,Adoni,101292,56203,"15.6322227,77.2728368"


In [83]:
df.drop(columns=['literates_total','literates_male'],axis=1,inplace=True)
df.head()

Unnamed: 0,name_of_city,location
0,Abohar,"30.1452928,74.1993043"
1,Achalpur,"21.257584,77.5086754"
2,Adilabad,"19.0809075,79.560344"
3,Adityapur,"22.7834741,86.1576889"
4,Adoni,"15.6322227,77.2728368"


<h1> Obtaining geolocation

In [84]:
geolocator = Nominatim(user_agent="shital")
def lat(city):
    geolocator = Nominatim(user_agent="shital")
    location=geolocator.geocode(str(city))
    
    if location: 
        print(str(city),location.latitude)
        return location.latitude
    else:
        print(str(city),'NaN')
        return np.nan

def long(city):
    geolocator=Nominatim(user_agent='shital')
    location=geolocator.geocode(str(city))
    if location: return location.longitude
    else: return np.nan
    

In [85]:
 df['latitude'] = df.apply(lambda row: lat(row[0]),axis=1)

Abohar  30.1450543
Achalpur  21.241444700000002
Adilabad  19.5
Adityapur  22.7823546
Adoni  39.913451
Agartala  23.8312377
Agra  27.1752554
Ahmadabad  23.0216238
Ahmadnagar  19.25
Aizawl  23.7435236
Ajmer  26.4691
Akbarpur  26.4398744
Akola  20.7618624
Alandur  13.00282155
Alappuzha  9.5006651
Aligarh  27.87698975
Allahabad  25.4381302
Alwar  27.639077049999997
Ambala  30.3843674
Ambala Sadar  30.336571
Ambarnath  19.1436074
Ambattur  13.1128863
Ambikapur  23.1226343
Ambur  12.7929067
Amravati  21.15454115
Amreli  20.866667
Amritsar  31.6343083
Amroha  28.9233969
Anand  22.5584995
Anantapur  14.6546235
Anantnag  33.7368773
Arrah  25.62345725
Asansol  23.6871297
Ashoknagar Kalyangarh  22.8387354
Aurangabad  19.877263
Aurangabad  19.877263
Avadi  13.1254758
Azamgarh  26.02269675
Badlapur  25.895924450000003
Bagaha  27.05901115
Bagalkot  16.1853166
Bahadurgarh  28.660964800000002
Baharampur  24.1044927
Bahraich  27.7336958
Baidyabati  22.7949104
Baleshwar Town  21.49543405
Ballia  25.8779

Pithampur  22.6105576
Porbandar  21.6409
Port Blair  11.6645348
Proddatur  14.752265900000001
Puducherry  11.9340568
Pudukkottai  10.5
Pune  18.521428
Puri  19.8076083
Purnia  26.0
Puruliya  23.19989585
Rae Bareli  26.25
Raichur  16.083333
Raiganj  25.680653900000003
Raigarh  22.5
Raipur  21.2379469
Rajahmundry  17.0050454
Rajapalayam  9.403158399999999
Rajarhat Gopalpur  NaN
Rajkot  22.3051991
Rajnandgaon  20.9727404
Rajpur Sonarpur  22.4382026
Ramagundam  18.7615156
Rampur  28.79406825
Ranchi  23.3700354
Ranibennur  14.625888
Raniganj  25.799292299999998
Ratlam  23.4805919
Raurkela Industrial Township  NaN
Raurkela Town  NaN
Rewa  24.75926685
Rewari  28.1956468
Rishra  22.7261414
Robertson Pet  35.3587114
Rohtak  28.9010899
Roorkee  29.8693496
Rudrapur  28.974744
S.A.S. Nagar  30.676693999999998
Sagar  23.80961225
Saharanpur  29.9880774
Saharsa  25.83264165
Salem  44.9391565
Sambalpur  21.4
Sambhal  28.61875255
Sangli Miraj Kupwad  16.8502534
Santipur  23.259345949999997
Sasaram  24.

In [86]:
df['longitude']= df.apply(lambda row: long(row[0]),axis=1)

In [87]:
df.head()

Unnamed: 0,name_of_city,location,latitude,longitude
0,Abohar,"30.1452928,74.1993043",30.145054,74.19566
1,Achalpur,"21.257584,77.5086754",21.241445,77.425757
2,Adilabad,"19.0809075,79.560344",19.5,78.5
3,Adityapur,"22.7834741,86.1576889",22.782355,86.159003
4,Adoni,"15.6322227,77.2728368",39.913451,9.185837


In [100]:
df.dropna(axis=0,inplace=True)

In [101]:
df.isnull().sum()

name_of_city    0
location        0
latitude        0
longitude       0
dtype: int64

<h1> Saving the dataset

In [102]:
df.to_csv('indian_cities.csv')