### Library Imports

In [1]:
import folium
import numpy as np
import pandas as pd
from datetime import datetime
import re

Initializing coordinates for Map. As we will be producing a map for Uttar Pradesh, the coordinates are intialized, to point to Lucknow.

In [2]:
SF_COORDINATES = (26.8467, 80.9462)

Initialize values for *from_year*, *to_year*, *from_month* and *to_month*. These variables are used to construct date object **from_date** and **to_date**, which will be used to filter records from database. 

Only records between specified time period will be featured on final map.

In [3]:
from_year = 2016
to_year = 2017
from_month = 10
to_month = 6
from_day_of_month = 1
to_day_of_month = 0

if to_day_of_month == 0:
    if to_month in [1,3,5,7,8,10,12]:
        to_day_of_month = 31
    elif to_month == 2:
        to_day_of_month = 28
    else:
        to_day_of_month = 30
    
from_date = datetime(from_year,from_month,from_day_of_month)
to_date = datetime(to_year,to_month,to_day_of_month)

Sample data for this tutorial is stored in **sample_firdata.csv**

We will use pandas to read the sample data.

In [4]:
crimedata = pd.read_csv('sample_firdata.csv')
crimedata.head()

Unnamed: 0,_id,FIR_NUM,FIR_REG_NUM,PS,RANGE_NAME,LONGITUDE,ZONE_NAME,DISTRICT,ACT_SECTION,LATITUDE,date,REG_MONTH,REG_YEAR,REG_D_W
0,ObjectId(5c7dd2219164b2149b56c6d8),755/2017,31630028170755,HAFIZGANJ,BAREILLY,---,BAREILLY,BAREILLY,"["" IPC 1860-324"","" IPC 1860-506"","" IPC 1860-308""]",---,2017-08-20T23:00:00.000Z,8,2017,1
1,ObjectId(5c7dd22f9164b2149b599ffa),355/2017,31642012170355,FATEHGARH KOTWALI,KANPUR,27-35-56-0,KANPUR,FATEHGARH,"["" IPC 1860-279"","" IPC 1860-304-A""]",79-61-27-5,2017-06-16T09:30:00.000Z,6,2017,6
2,ObjectId(5c7dd20d9164b2149b52c615),18/2016,31909007160018,SIDHPURA,ALIGARH,78.9212,AGRA,KASGANJ,"["" IPC 1860-279"","" IPC 1860-304-A"","" IPC 1860-...",27.5636,2016-01-26T06:30:00.000Z,1,2016,3
3,ObjectId(5c7dd2509164b2149b602096),402/2018,31658075180402,TRANSPORT NAGAR,MEERUT,-,MEERUT,MEERUT,"["" IPC 1860-352"","" IPC 1860-323"","" IPC 1860-325""]",-,2018-06-26T07:10:00.000Z,6,2018,3
4,ObjectId(5c7dd2239164b2149b573974),782/2016,31631056160782,DUBAULIA,BASTI,---,GORAKHPUR,BASTI,"["" IPC 1860-419"","" IPC 1860-420"","" IPC 1860-46...",---,2016-10-20T14:15:00.000Z,10,2016,5


Next we will filter our dataframe to get records with value of **date** between **from_date** and **to_date**.

In [5]:
mask = (crimedata['date'] > str(from_date)) & (crimedata['date'] <= str(to_date))
crimedata = crimedata.loc[mask]
crimedata.head()

Unnamed: 0,_id,FIR_NUM,FIR_REG_NUM,PS,RANGE_NAME,LONGITUDE,ZONE_NAME,DISTRICT,ACT_SECTION,LATITUDE,date,REG_MONTH,REG_YEAR,REG_D_W
1,ObjectId(5c7dd22f9164b2149b599ffa),355/2017,31642012170355,FATEHGARH KOTWALI,KANPUR,27-35-56-0,KANPUR,FATEHGARH,"["" IPC 1860-279"","" IPC 1860-304-A""]",79-61-27-5,2017-06-16T09:30:00.000Z,6,2017,6
4,ObjectId(5c7dd2239164b2149b573974),782/2016,31631056160782,DUBAULIA,BASTI,---,GORAKHPUR,BASTI,"["" IPC 1860-419"","" IPC 1860-420"","" IPC 1860-46...",---,2016-10-20T14:15:00.000Z,10,2016,5
7,ObjectId(5c7dd1f79164b2149b4e76b8),6/2017,31685006170006,HATHINALA,MIRZAPUR,83.1013,VARANASI,SONBHADRA,"["" IPC 1860-376"","" SC AND THE ST (PREVENTION O...",24.1622,2017-04-22T12:45:00.000Z,4,2017,7
12,ObjectId(5c7dd1f69164b2149b4e4f39),71/2017,31683055170071,SARAIYLAKHANSI,AZAMGARH,---,VARANASI,MAU,"["" IPC 1860-323"","" IPC 1860-506"","" IPC 1860-325""]",---,2017-01-31T21:45:00.000Z,1,2017,3
15,ObjectId(5c7dd2569164b2149b614b5c),1624/2016,31661033161624,NAI MANDI,SAHARANPUR,77.7116,MEERUT,MUZAFFAR NAGAR,"["" IPC 1860-379""]",29.4602,2016-11-13T20:30:00.000Z,11,2016,1


In [6]:
crimedata.describe()

Unnamed: 0,FIR_REG_NUM,REG_MONTH,REG_YEAR,REG_D_W
count,2176.0,2176.0,2176.0,2176.0
mean,31686140000000.0,5.778033,2016.709099,4.028952
std,83151870000.0,3.685685,0.454282,1.968858
min,31621000000000.0,1.0,2016.0,1.0
25%,31642030000000.0,3.0,2016.0,2.0
50%,31657060000000.0,5.0,2017.0,4.0
75%,31680040000000.0,10.0,2017.0,6.0
max,31949020000000.0,12.0,2017.0,7.0


Following function performs data cleaning and converts varying coordinate formats to standard format.

e.g. 78.2510-द -> 78.251 (cleaning)

e.g. '---' -> nan (handle missing values)

e.g. 28-37-88 -> 28.6411 (DMS to Decimal format)

Formula used for DMS to decimal conversion :

decimal coordinate = Degree + Minutes/60 + Seconds/3600

In [7]:
def convert_coordinate(coord):
    inp = str(coord)
    num = ''
    numlist = [str(i) for i in range(10)]
    if '.' in inp:
        for i in inp:
            if i =='.' and '.' not in num or i in numlist:
                num += i
        if num == '.':
            return np.nan
        else:
            return float(num)
    else:
        if inp == 'nan':
            return np.nan
        inp_split = inp.split('-')
        r = re.compile('^[0-9]+$')
        dms = [ i for i in inp_split if r.match(i)]
        while '' in dms:
            dms.remove('')
        if len(dms) == 0:
            return np.nan
        else:
            if len(dms) < 3:
                while len(dms) < 3:
                    dms.append(0)            
            dec = int(dms[0]) + int(dms[1])/60.0 + int(dms[2])/3600.0
        return float(dec)


Cleaning and standardizing coordinate data for plotting

In [8]:
crimedata['LATITUDE'] = crimedata['LATITUDE'].apply(convert_coordinate)
crimedata['LONGITUDE'] = crimedata['LONGITUDE'].apply(convert_coordinate)

We will drop records with missing values for either Latitude or Longitude.

In [9]:
crimedata.dropna(subset=['LATITUDE','LONGITUDE'],inplace=True)

For creating and visualizing map we are using python library **folium**. Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map.

For more info on folium click here https://python-visualization.github.io/folium/

In [10]:
m = folium.Map(location=SF_COORDINATES, zoom_start=7)

In [11]:
from folium.plugins import MarkerCluster
mc = MarkerCluster().add_to(m)

We are adding a marker for each record on the map. **popup** option is used to provide text to display when a marker is clicked. Currently our markers will show the name of police station under which the case is registered.

In [12]:
for each in crimedata.iterrows():
    folium.Marker([each[1]['LATITUDE'],each[1]['LONGITUDE']],popup=each[1]['PS']).add_to(mc) #+ ": " + str(each[1]['FIR_REG_NUM'])

Saving the map.

In [13]:
m.save('map_clustered.html')

Rendering the map inside notebook.

In [14]:
m