# Data Section

## The Location 
---------------
Kolkata may be a popular destination for brand spanking new immigrants in West Bengal to reside. Being home to varied religious groups and places of worship. Although immigration has become a hot topic over the past few years with more governments seeking more restrictions on immigrants and refugees, the overall trend of immigration into Kolkata has been on the increase.

## Location Data
---------------
The location data of Kolkata postal code and ward-wise is not easily available and we have to use web scraping to get the data in required format by iterating 144 webpages.

* Data Example:
[Kolkata wards](https://en.wikipedia.org/wiki/Wards_of_Kolkata_Municipal_Corporation)

## Intertwining Foursquare API
---------------
This project would use Four-square API because it’s prime data gathering source as it features a database of many places, especially their places API which provides the power to perform location search, location sharing, and details a few businesses.

## Methodology
---------------
### 1. Workflow
Using credentials of Foursquare API features of nearby places of the neighborhoods would be mined. Due to HTTP request limitations, the number of places per neighborhood parameter would reasonably be set to 100 and therefore the radius parameter would be set to 2000.

### 2. Clustering Approach
To compare the similarities and dissimilarities of two postal code neighborhoods, we decided to segment them and group them into clusters to find similar neighborhoods in a big city like Kolkata. To be able to perform that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm.

### 3. Libraries used
Pandas: For creating and manipulating data frames.
<br>Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
<br>Scikit Learn: For importing k-means clustering.
<br>JSON: Library to handle JSON files.
<br>XML: To separate data from presentation and XML stores data in plain text format.
<br>Geocoder: To retrieve Location Data.
<br>Beautiful Soup and Requests: To scrap and library to handle http requests.
<br>Matplotlib: Python Plotting Module.


## Location data example

In [1]:
import numpy as np
import pandas as pd

import json
from bs4 import BeautifulSoup as bs
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize
import geocoder

print('Libraries imported.')

Libraries imported.


In [2]:
lat=[];lng=[];nei=[];bur=[];pin=[]

for i in range(1,145):
  df=pd.read_html('https://en.wikipedia.org/wiki/Ward_No._'+str(i)+',_Kolkata_Municipal_Corporation')
  df=df[0]

  #Handling edge cases for column values
  if(i==1):
    j=11;k=16;l=3;n=7
  elif(i==2):
    j=11;k=15;l=3;n=7
  elif(i==101):
    i=i+1;j=11;k=17;l=4;n=8
  elif(i==114):
    j=13;k=17;l=4;n=9
  else:
    j=10;k=14;l=3;n=7

  st=df.loc[l]['Ward No. '+str(i)]
  s=st.split()
  latnew=s[-2].replace('\ufeff','').split('°')[0]
  lngnew=s[-1].split('°')[0]

  neinew=df.loc[n]['Ward No. '+str(i)+'.1']
  burnew=df.loc[j]['Ward No. '+str(i)+'.1']
  pinnew='NA' if i>=142 and i<=145 else df.loc[k]['Ward No. '+str(i)+'.1'] #Handling edge cases for blank pincodes

  lat.append(float(latnew))
  lng.append(float(lngnew))
  nei.append(str(neinew))
  bur.append(str(burnew))
  pin.append(str(pinnew))

df = pd.DataFrame({'Ward no':[x for x in range(1,145)],'Burough':bur,'PIN Code': pin,'Neighbourhood':nei,'Latitude':lat,'Longitude':lng})
df.set_index('Ward no',inplace=True)
df.head(10)

Unnamed: 0_level_0,Burough,PIN Code,Neighbourhood,Latitude,Longitude
Ward no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,700 003,Cossipore,22.617889,88.370556
2,1,"700030, 700050",Sinthee (Ramlila Bagan-Biswanath Colony-Roypar...,22.628056,88.384444
3,1,700 037,"Belgachia, Duttabagan",22.604444,88.383333
4,1,700 002/ 700 037,Paikpara,22.613056,88.379444
5,1,700 002,"Tala, Belgachia",22.608889,88.379694
6,1,700 002,"Chitpur, Cossipore",22.610863,88.371213
7,1,700 003,Bagbazar,22.603567,88.365806
8,1,700 003,"Bagbazar, Shobhabazar",22.601806,88.3665
9,1,700 004,"Shobhabazar, Kumortuli",22.595889,88.365306
10,2,700 004,"Shyambazar, Shobhabazar, Shyampukur, Hatibagan",22.597889,88.364326
