# San Francisco Crime Dataset Use Case. Advanced Task
### Master Advanced Analytics on Big Data. Module 5. Use case I
#### Author: Alonso Andrade Blázquez (aandradeblazquez@gmail.com)

Created on:

In [1]:
from datetime import datetime

str(datetime.now())

'2018-04-14 13:31:33.871655'

## 0. Introduction

### Task statement

Taking advantage of **pymongo API**, perform an analysis of possible **incidents detected for a district** (e.g. SOUTHERN, MISSION). The following specifications should be considered:
1. Incidents detected in neighbourhoods should be selected by means of spatial queries, where the incident coordinates should intersect or be within the selected district
2. Resulting points and polygons must be properly mapped in San Francisco digital area. To this end, you should use the "folium” API.

For this task, you should prepare a Jupyter Notebook file with extension ".ipynb” containing each step of analysis, showing how you have performed the exercise, the outputs, plots, and comments (discussions/decisions adopted).

### Importing required packages

In [2]:
import re
import os
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt 
%matplotlib inline
# import seaborn as sns
from IPython.core.display import display, HTML
from IPython.display import HTML
import json

import sys
sys.path.insert(0,'..')
import folium
print (folium.__file__)
print (folium.__version__)
from matplotlib.colors import Normalize, rgb2hex
from IPython.display import display

import pymongo
from pymongo import MongoClient, GEO2D

import folium
import numpy as np
from folium.plugins import MarkerCluster # for marker clusters
from folium.plugins import HeatMap
from folium import plugins
from folium.plugins import HeatMapWithTime
import datetime
from datetime import datetime, date, time

C:\ProgramData\Anaconda3\lib\site-packages\folium\__init__.py
0.5.0


## 1. Data cleaning to adecquate it for spatial queries

### Access MongoDB

Let's use the database previously obtained in Use Case I Mandatory Task, ``UseCaseI``

In [3]:
# Set MongoDB client
client = MongoClient('localhost', 27017)

# Set MongoDB database
db = client['UseCaseI']

### Prepare incidents collection for spatial queries

Firstly, we can retrieve the documents included in the collection previously obtained in Use Case I Mandatory Task, ``SanFranciscoCrimes``. Let's see the structure of the documents within that collection:

In [4]:
# Set MongoDB collection
collection_incidents = db['SanFranciscoCrimes']

# Check if we can access the data from MongoDB
cursor_incidents = collection_incidents.find()
cursor_incidents[0]

{'Address': '18TH ST / VALENCIA ST',
 'Category': 'NON-CRIMINAL',
 'Date': '01/19/2015 12:00:00 AM',
 'DayOfWeek': 'Monday',
 'Days': 4401.0,
 'Descript': 'LOST PROPERTY',
 'IncidntNum': 150060275,
 'Location': '(37.7617007179518, -122.42158168137)',
 'PdDistrict': 'MISSION',
 'Resolution': 'NONE',
 'Time': '14:00',
 'X': -122.4215816814,
 'Y': 37.761700718,
 '_id': ObjectId('5ac9c30040bdcc3844650a54')}

As we can see in the document showed above, structure of spatial fields within the collection is not the proper one to perform spatial queries in MongoDB, so let's modify the structure of the documents to adapt them to **GEOJson structure**.

For that purpose, firstly we will generate an array with documents with the same information as it was already in the collection, with the following modifications:
- Remove keys ``X``, ``Y`` and ``Location``
- Replace those keys by the following fields in GEOJson structure:
```json
        "loc": {
            "type": "Point",
            "coordinates": [doc['X'], doc['Y']]
        }
```

In [5]:
incidents_geojson = []

for doc in cursor_incidents:
    incident_geojson = {}
    
    incident_geojson = {
        'Address': doc['Address'],
        'Category': doc['Category'],
        'Date': doc['Date'],
        'DayOfWeek': doc['DayOfWeek'],
        'Days': doc['Days'],
        'Descript': doc['Descript'],
        'IncidntNum': doc['IncidntNum'],
        'PdDistrict': doc['PdDistrict'],
        'Resolution': doc['Resolution'],
        'Time': doc['Time'],
        "loc": {
            "type": "Point",
            "coordinates": [doc['X'], doc['Y']]
        }
    }
   
    incidents_geojson.append(incident_geojson)

Now that we have the array with GEOJson structure we can create a new collection that will be valid for performing geospatial queries:

In [6]:
collection_incidents_geojson = db['SanFranciscoCrimes_geojson']

collection_incidents_geojson.drop()

for doc in incidents_geojson:
    collection_incidents_geojson.insert_one(doc)

In [7]:
cursor_incidents_geojson = collection_incidents_geojson.find()

# Check if we can access the data from MongoDB
cursor_incidents_geojson[0]

{'Address': '18TH ST / VALENCIA ST',
 'Category': 'NON-CRIMINAL',
 'Date': '01/19/2015 12:00:00 AM',
 'DayOfWeek': 'Monday',
 'Days': 4401.0,
 'Descript': 'LOST PROPERTY',
 'IncidntNum': 150060275,
 'PdDistrict': 'MISSION',
 'Resolution': 'NONE',
 'Time': '14:00',
 '_id': ObjectId('5ad1e6c940bdcc1b302ac3d6'),
 'loc': {'coordinates': [-122.4215816814, 37.761700718], 'type': 'Point'}}

### Prepare Districts collection for spatial queries

Firstly, we can retrieve the documents included in the collection previously obtained in Use Case I Mandatory Task, ``districts``.

In [8]:
collection_districts_in_one_document = db['districts']

However, the problem with the structure in this collection is that it stores a ``featurecollection``, it means only one document. This makes difficult the retrieving of one of the districts for our purposes. Let's split this feature collection in several documents and store them in a new collection, ``districts_multi_docs``:

In [9]:
cursor_districts_features = collection_districts_in_one_document.find_one()['features']

collection_districts_multi_docs = db['districts_multi_docs']
collection_districts_multi_docs.drop()
collection_districts_multi_docs.insert_many(cursor_districts_features)

cursor_districts_multi_docs = collection_districts_multi_docs.find()
cursor_districts_multi_docs[0]

{'_id': ObjectId('5ad1e89340bdcc1b3033eb96'),
 'geometry': {'coordinates': [[[[-122.38157774241415, 37.75307043091241],
     [-122.38156949251606, 37.753060959298274],
     [-122.38159239626694, 37.75309424492931],
     [-122.38155614326205, 37.753045901366754],
     [-122.38155137472305, 37.75304127677009],
     [-122.38154650687193, 37.753036719547374],
     [-122.3815415385967, 37.75303223061754],
     [-122.38153647334677, 37.75302781172802],
     [-122.38153131112232, 37.75302346287867],
     [-122.38152605423795, 37.753019185835115],
     [-122.38152070389688, 37.75301498328154],
     [-122.38151526236827, 37.75301085518192],
     [-122.38150973196667, 37.75300680330162],
     [-122.38150411158058, 37.75300282855954],
     [-122.38149840577098, 37.7529989317842],
     [-122.38149261460651, 37.75299511567804],
     [-122.38148674033339, 37.75299137930393],
     [-122.38148078528907, 37.75298772532827],
     [-122.38147475063121, 37.75298415463372],
     [-122.38146863865168, 37.75

## 2. Query one neighborhood: MISSION

### Create geoindex to perform spatial queries with districts collection

In [10]:
# Check indexes information before spatial indexes addition
collection_districts_multi_docs.index_information()

{'_id_': {'key': [('_id', 1)], 'ns': 'UseCaseI.districts_multi_docs', 'v': 2}}

In [11]:
# Creating spatial index in points collection
collection_districts_multi_docs.create_index([("loc ", GEO2D )])

'loc _2d'

In [12]:
# Check indexes after adding geospatial index
collection_districts_multi_docs.index_information()

{'_id_': {'key': [('_id', 1)], 'ns': 'UseCaseI.districts_multi_docs', 'v': 2},
 'loc _2d': {'key': [('loc ', '2d')],
  'ns': 'UseCaseI.districts_multi_docs',
  'v': 2}}

### Retrieve MISSION district

Now we can retrieve MISSION district document in order to analyze the incidents within later.

Let's retrieve MISSION data using a geospatial query, just to practice with geospatial indexes. For that purpose, we can retrieve the information of an incident which happened in Mission district ...

In [13]:
query_incident_in_mission = {
    'PdDistrict': 'MISSION'
}

In [14]:
cursor_incident_in_mission = collection_incidents_geojson.find_one(query_incident_in_mission)

cursor_incident_in_mission

{'Address': '18TH ST / VALENCIA ST',
 'Category': 'NON-CRIMINAL',
 'Date': '01/19/2015 12:00:00 AM',
 'DayOfWeek': 'Monday',
 'Days': 4401.0,
 'Descript': 'LOST PROPERTY',
 'IncidntNum': 150060275,
 'PdDistrict': 'MISSION',
 'Resolution': 'NONE',
 'Time': '14:00',
 '_id': ObjectId('5ad1e6c940bdcc1b302ac3d6'),
 'loc': {'coordinates': [-122.4215816814, 37.761700718], 'type': 'Point'}}

... and use the information of the location of that incident to query Mission district data, using **``geoIntersects``**. Spatial field in ``collection_districts_multi_docs`` is ``geometry``:

In [15]:
query_mission_district_by_incident_location = {
    "geometry":{
        "$geoIntersects":{
            "$geometry": cursor_incident_in_mission["loc"]
        }
    }
}

In [16]:
cursor_mission_district_by_incident_location = collection_districts_multi_docs.find(query_mission_district_by_incident_location)

cursor_mission_district_by_incident_location[0]

{'_id': ObjectId('5ad1e89340bdcc1b3033eba8'),
 'geometry': {'coordinates': [[[[-122.41095899969652, 37.76943300044438],
     [-122.41093100020761, 37.769410999642986],
     [-122.41090799997778, 37.76942900029767],
     [-122.41088599991679, 37.76944700005473],
     [-122.40973600056184, 37.770349000005766],
     [-122.40903500016421, 37.769792999593676],
     [-122.40843600020258, 37.769317999707376],
     [-122.40800700007466, 37.76924399995716],
     [-122.40789500023209, 37.76934299991101],
     [-122.40773900022478, 37.76946600018427],
     [-122.40763999957679, 37.769544999700884],
     [-122.40762299961686, 37.76955799998237],
     [-122.40758699953828, 37.769586000260034],
     [-122.40700100015842, 37.77005000013585],
     [-122.40621300047536, 37.76946300013261],
     [-122.4059390005741, 37.769659000295114],
     [-122.40584399996843, 37.76966500002409],
     [-122.40558899994181, 37.76968000026003],
     [-122.40540800025224, 37.769689999772275],
     [-122.40520300037058, 

We can observe in output above that we retrieved the information of Mission district.

An easier way to do this without need of geospatial query would have been just quering  by ``PdDistrict`` field:

In [17]:
query_mission_district = {
    'properties.nhood': 'Mission'
}

In [18]:
cursor_mission_district = collection_districts_multi_docs.find_one(query_mission_district)

cursor_mission_district

{'_id': ObjectId('5ad1e89340bdcc1b3033eba8'),
 'geometry': {'coordinates': [[[[-122.41095899969652, 37.76943300044438],
     [-122.41093100020761, 37.769410999642986],
     [-122.41090799997778, 37.76942900029767],
     [-122.41088599991679, 37.76944700005473],
     [-122.40973600056184, 37.770349000005766],
     [-122.40903500016421, 37.769792999593676],
     [-122.40843600020258, 37.769317999707376],
     [-122.40800700007466, 37.76924399995716],
     [-122.40789500023209, 37.76934299991101],
     [-122.40773900022478, 37.76946600018427],
     [-122.40763999957679, 37.769544999700884],
     [-122.40762299961686, 37.76955799998237],
     [-122.40758699953828, 37.769586000260034],
     [-122.40700100015842, 37.77005000013585],
     [-122.40621300047536, 37.76946300013261],
     [-122.4059390005741, 37.769659000295114],
     [-122.40584399996843, 37.76966500002409],
     [-122.40558899994181, 37.76968000026003],
     [-122.40540800025224, 37.769689999772275],
     [-122.40520300037058, 

Let's store Mission district geometry in a variable for its later use for the spatial queries with the incidents:

In [19]:
mission_district_geometry = cursor_mission_district['geometry']

## 3. Query different incidents categories within MISSION district

### Create geoindex to perform spatial queries with incidents collection

In [20]:
# Check indexes information before spatial indexes addition
collection_incidents_geojson.index_information()

{'_id_': {'key': [('_id', 1)],
  'ns': 'UseCaseI.SanFranciscoCrimes_geojson',
  'v': 2}}

In [21]:
# Creating spatial index in points collection
collection_incidents_geojson.create_index([("loc ", GEO2D )])

'loc _2d'

In [22]:
# Check indexes after adding geospatial index
collection_incidents_geojson.index_information()

{'_id_': {'key': [('_id', 1)],
  'ns': 'UseCaseI.SanFranciscoCrimes_geojson',
  'v': 2},
 'loc _2d': {'key': [('loc ', '2d')],
  'ns': 'UseCaseI.SanFranciscoCrimes_geojson',
  'v': 2}}

- ### All incident categories within MISSION

Using **``geoWithin``** we can retrieve all the incidents within the previously obtained Mission district:

In [23]:
query_mission_incidents = {
    "loc":{
        "$geoWithin":{
            "$geometry": mission_district_geometry
        }
    }
}

In [24]:
cursor_mission_incidents = collection_incidents_geojson.find(query_mission_incidents)

# Check one incident to validate the query
cursor_mission_incidents[2]

{'Address': '1700 Block of HARRISON ST',
 'Category': 'DRUG/NARCOTIC',
 'Date': '02/01/2015 12:00:00 AM',
 'DayOfWeek': 'Sunday',
 'Days': 4414.0,
 'Descript': 'POSSESSION OF METH-AMPHETAMINE',
 'IncidntNum': 150098345,
 'PdDistrict': 'MISSION',
 'Resolution': 'ARREST, BOOKED',
 'Time': '14:00',
 '_id': ObjectId('5ad1e6c940bdcc1b302ac3e1'),
 'loc': {'coordinates': [-122.413354187, 37.7690748004], 'type': 'Point'}}

Let's store all mission incidents in a new collection, ``mission_incidents``:

In [25]:
collection_mission_incidents = db['mission_incidents']

collection_mission_incidents.drop()

collection_mission_incidents.insert_many(cursor_mission_incidents)

<pymongo.results.InsertManyResult at 0x19602756808>

In [26]:
cursor_mission_incidents = collection_mission_incidents.find()

In [27]:
mission_incidents_df = pd.DataFrame(list(cursor_mission_incidents))

In [28]:
display(mission_incidents_df.head(5))

Unnamed: 0,Address,Category,Date,DayOfWeek,Days,Descript,IncidntNum,PdDistrict,Resolution,Time,_id,loc
0,18TH ST / VALENCIA ST,NON-CRIMINAL,01/19/2015 12:00:00 AM,Monday,4401.0,LOST PROPERTY,150060275,MISSION,NONE,14:00,5ad1e6c940bdcc1b302ac3d6,"{'type': 'Point', 'coordinates': [-122.4215816..."
1,1700 Block of HARRISON ST,LARCENY/THEFT,02/01/2015 12:00:00 AM,Sunday,4414.0,PETTY THEFT SHOPLIFTING,150098345,MISSION,"ARREST, BOOKED",14:00,5ad1e6c940bdcc1b302ac3e0,"{'type': 'Point', 'coordinates': [-122.4133541..."
2,1700 Block of HARRISON ST,DRUG/NARCOTIC,02/01/2015 12:00:00 AM,Sunday,4414.0,POSSESSION OF METH-AMPHETAMINE,150098345,MISSION,"ARREST, BOOKED",14:00,5ad1e6c940bdcc1b302ac3e1,"{'type': 'Point', 'coordinates': [-122.4133541..."
3,1700 Block of HARRISON ST,DRUG/NARCOTIC,02/01/2015 12:00:00 AM,Sunday,4414.0,POSSESSION OF NARCOTICS PARAPHERNALIA,150098345,MISSION,"ARREST, BOOKED",14:00,5ad1e6c940bdcc1b302ac3e2,"{'type': 'Point', 'coordinates': [-122.4133541..."
4,1700 Block of HARRISON ST,WARRANTS,02/01/2015 12:00:00 AM,Sunday,4414.0,WARRANT ARREST,150098345,MISSION,"ARREST, BOOKED",14:00,5ad1e6c940bdcc1b302ac3e3,"{'type': 'Point', 'coordinates': [-122.4133541..."


As we can see above, some of the incidents have a district neighborhood from Mission in field ``PdDistrict``. This is not that we made a mistake in retrieving only mission incidents, but probably for incidents dataset the limits for the neighborhood for considering one neighborhood or another were not exactly the same as in districts one. We will check later in the map that any of the retrieved points is out of Mission limits.

In [29]:
collection_mission_incidents.find().count()

71646

There are 71646 incident records within Mission district geometry.

#### Divide Mission incidents by date

With the aim of analyze the distribution of the crimes across the time in MISSION district, we can divide the incidents in two temporal groups:
- Before 2010
- After 2010

We will use ``Days`` field for the split. As minimum date in the original dataset was January the 1st, 2003, days passed until 2010 are approximately 2567, as it is checked below:

In [30]:
mission_incidents_df["Date"].min()

'01/01/2003 12:00:00 AM'

In [31]:
print((mission_incidents_df.loc[mission_incidents_df["Days"] == 2557.0])["Date"].head(1))

30004    01/01/2010 12:00:00 AM
Name: Date, dtype: object


Let's perform the queries with this consideration:
- Oldest crimes (before 2010):

In [32]:
query_old_incidents ={
    "Days":{
        "$lt": 2557.0
    }
}

In [33]:
cursor_old_incidents = collection_mission_incidents.find(query_old_incidents)

collection_old_incidents = db['old_incidents']
collection_old_incidents.drop()
collection_old_incidents.insert_many(cursor_old_incidents)

cursor_old_incidents = collection_old_incidents.find()

cursor_old_incidents[0]

{'Address': 'HOWARD ST / LAFAYETTE ST',
 'Category': 'ASSAULT',
 'Date': '01/07/2005 12:00:00 AM',
 'DayOfWeek': 'Friday',
 'Days': 737.0,
 'Descript': 'BATTERY',
 'IncidntNum': 50025371,
 'PdDistrict': 'SOUTHERN',
 'Resolution': 'NONE',
 'Time': '15:59',
 '_id': ObjectId('5ad1e7d540bdcc1b302e47ef'),
 'loc': {'coordinates': [-122.4163057233, 37.772455644], 'type': 'Point'}}

In [34]:
collection_old_incidents.find().count()

28307

- Newest crimes (after 2010):

In [35]:
query_new_incidents ={
    "Days":{
        "$gte": 2557.0
    }
}

In [36]:
cursor_new_incidents = collection_mission_incidents.find(query_new_incidents)

collection_new_incidents = db['new_incidents']
collection_new_incidents.drop()
collection_new_incidents.insert_many(cursor_new_incidents)

cursor_new_incidents = collection_new_incidents.find()

cursor_new_incidents[0]

{'Address': '18TH ST / VALENCIA ST',
 'Category': 'NON-CRIMINAL',
 'Date': '01/19/2015 12:00:00 AM',
 'DayOfWeek': 'Monday',
 'Days': 4401.0,
 'Descript': 'LOST PROPERTY',
 'IncidntNum': 150060275,
 'PdDistrict': 'MISSION',
 'Resolution': 'NONE',
 'Time': '14:00',
 '_id': ObjectId('5ad1e6c940bdcc1b302ac3d6'),
 'loc': {'coordinates': [-122.4215816814, 37.761700718], 'type': 'Point'}}

In [37]:
collection_new_incidents.find().count()

43339

There are considerably more incidents after 2010 than before 2010 stored in our database.

Now let's do the same but with specific incident categories:

- ### Larcenies / thefts within MISSION

In [38]:
query_mission_larcenies_thefts = {
    "Category": "LARCENY/THEFT",
    "loc":{
        "$geoWithin":{
            "$geometry": mission_district_geometry
        }
    }
}

In [39]:
cursor_mission_larcenies_thefts = collection_incidents_geojson.find(query_mission_larcenies_thefts)

# Check one incident to validate the query
cursor_mission_larcenies_thefts[2]

{'Address': '2900 Block of 16TH ST',
 'Category': 'LARCENY/THEFT',
 'Date': '02/01/2015 12:00:00 AM',
 'DayOfWeek': 'Sunday',
 'Days': 4414.0,
 'Descript': 'PETTY THEFT FROM A BUILDING',
 'IncidntNum': 150098981,
 'PdDistrict': 'MISSION',
 'Resolution': 'NONE',
 'Time': '19:00',
 '_id': ObjectId('5ad1e6c940bdcc1b302ac437'),
 'loc': {'coordinates': [-122.4192024594, 37.7650244301], 'type': 'Point'}}

In [40]:
collection_mission_larcenies_thefts = db['mission_larceny_theft']

collection_mission_larcenies_thefts.drop()

collection_mission_larcenies_thefts.insert_many(cursor_mission_larcenies_thefts)

cursor_mission_larcenies_thefts = collection_mission_larcenies_thefts.find()

collection_mission_larcenies_thefts.find().count()

10615

#### Divide Larceny/theft by date

- Oldest crimes (before 2010):

In [41]:
query_old_larceny_theft ={
    "Days":{
        "$lt": 2557.0
    }
}

In [42]:
cursor_old_larceny_theft = collection_mission_larcenies_thefts.find(query_old_larceny_theft)

collection_old_larceny_theft = db['old_larceny_theft']
collection_old_larceny_theft.drop()
collection_old_larceny_theft.insert_many(cursor_old_larceny_theft)

cursor_old_larceny_theft = collection_old_larceny_theft.find()

cursor_old_larceny_theft[0]

collection_old_larceny_theft.find().count()

3472

- Newest crimes (after 2010):

In [43]:
query_new_larceny_theft ={
    "Days":{
        "$gte": 2557.0
    }
}

In [44]:
cursor_new_larceny_theft = collection_mission_larcenies_thefts.find(query_new_larceny_theft)

collection_new_larceny_theft = db['new_larceny_theft']
collection_new_larceny_theft.drop()
collection_new_larceny_theft.insert_many(cursor_new_larceny_theft)

cursor_new_larceny_theft = collection_new_larceny_theft.find()

cursor_new_larceny_theft[0]

collection_new_larceny_theft.find().count()

7143

- ### Missing people in MISSION

In [45]:
query_mission_missing_person = {
    "Category": "MISSING PERSON",
    "loc":{
        "$geoWithin":{
            "$geometry": mission_district_geometry
        }
    }
}

In [46]:
cursor_mission_missing_person = collection_incidents_geojson.find(query_mission_missing_person)

# Check one incident to validate the query
cursor_mission_missing_person[2]

{'Address': '900 Block of POTRERO AV',
 'Category': 'MISSING PERSON',
 'Date': '02/05/2015 12:00:00 AM',
 'DayOfWeek': 'Thursday',
 'Days': 4418.0,
 'Descript': 'MISSING JUVENILE',
 'IncidntNum': 150110278,
 'PdDistrict': 'MISSION',
 'Resolution': 'NONE',
 'Time': '06:50',
 '_id': ObjectId('5ad1e6cb40bdcc1b302ac953'),
 'loc': {'coordinates': [-122.4066049195, 37.7571580432], 'type': 'Point'}}

In [47]:
collection_mission_missing_person = db['mission_missing_person']

collection_mission_missing_person.drop()

collection_mission_missing_person.insert_many(cursor_mission_missing_person)

cursor_mission_missing_person = collection_mission_missing_person.find()

collection_mission_missing_person.find().count()

2358

#### Missing people by date

- Oldest crimes (before 2010):

In [48]:
query_old_missing_person ={
    "Days":{
        "$lt": 2557.0
    }
}

In [49]:
cursor_old_missing_person = collection_mission_missing_person.find(query_old_missing_person)

collection_old_missing_person = db['old_missing_person']
collection_old_missing_person.drop()
collection_old_missing_person.insert_many(cursor_old_missing_person)

cursor_old_missing_person = collection_old_missing_person.find()

cursor_old_missing_person[0]

collection_old_missing_person.find().count()

708

- Newest crimes (after 2010):

In [50]:
query_new_missing_person ={
    "Days":{
        "$gte": 2557.0
    }
}

In [51]:
cursor_new_missing_person = collection_mission_missing_person.find(query_new_missing_person)

collection_new_missing_person = db['new_missing_person']
collection_new_missing_person.drop()
collection_new_missing_person.insert_many(cursor_new_missing_person)

cursor_new_missing_person = collection_new_missing_person.find()

cursor_new_missing_person[0]

collection_new_missing_person.find().count()

1650

Now let's represent these data in different maps for enhaced analysis.

- ### Drug/narcotic in MISSION

In [52]:
query_drug_narcotic = {
    "Category": "DRUG/NARCOTIC",
    "loc":{
        "$geoWithin":{
            "$geometry": mission_district_geometry
        }
    }
}

In [53]:
cursor_mission_drug_narcotic = collection_incidents_geojson.find(query_drug_narcotic)

# Check one incident to validate the query
cursor_mission_drug_narcotic[2]

collection_mission_drug_narcotic = db['mission_drug_narcotic']

collection_mission_drug_narcotic.drop()

collection_mission_drug_narcotic.insert_many(cursor_mission_drug_narcotic)

cursor_mission_drug_narcotic = collection_mission_drug_narcotic.find()

collection_mission_drug_narcotic.find().count()

4694

- ### Vehicle theft in MISSION

In [54]:
query_vehicle_theft = {
    "Category": "VEHICLE THEFT",
    "loc":{
        "$geoWithin":{
            "$geometry": mission_district_geometry
        }
    }
}

In [55]:
cursor_mission_vehicle_theft = collection_incidents_geojson.find(query_vehicle_theft)

# Check one incident to validate the query
cursor_mission_vehicle_theft[2]

collection_mission_vehicle_theft = db['mission_vehicle_theft']

collection_mission_vehicle_theft.drop()

collection_mission_vehicle_theft.insert_many(cursor_mission_vehicle_theft)

cursor_mission_vehicle_theft = collection_mission_vehicle_theft.find()

collection_mission_vehicle_theft.find().count()

3774

## 4. Animated map visualization of the incidents in MISSION district

In [56]:
# Set general coordinates of Mission neighborhood
MISSION_COORDINATES = (37.76, -122.42)

Now we will display different maps showing the evolution of different crime categories in time, showing the crimes each year. Map is enhaced with animation to see the evolution each year (with the possibility of select the speed of animation) and fullscreen option button at the top right side was also included.

### Map 0: Evolution of all Mission crimes in time

In [57]:
# ALL INCIDENTS
# Obtain data
cursor = collection_mission_incidents.find()
cursor_df = pd.DataFrame(list(cursor))

# Add year column 
cursor_df['Date'] = pd.to_datetime(cursor_df['Date'])
cursor_df['Year'] = cursor_df['Date'].apply(lambda time: time.year)

# Add columns Latitude and longitude
cursor_df['Latitude'] = [x['coordinates'][1] for x in cursor_df['loc']]
cursor_df['Longitude'] = [x['coordinates'][0] for x in cursor_df['loc']]

# Obtain a list of lists, including the coordinates of the incidents for each year
coordinatesHeatMap = [cursor_df[cursor_df['Year']==cursor_df['Year'].unique()[i]] \
               [ [ "Latitude", "Longitude" ] ].values.tolist() \
               for i in range(len(cursor_df['Year'].unique()))]

# Index for scrolling the map over time
indexHeatMap = [i for i in sorted(cursor_df['Year'].unique())]

# Heatmap with time
heatmaptime = plugins.HeatMapWithTime(data=coordinatesHeatMap,index=indexHeatMap)

# Get center of map for starting position
meanlat = np.mean([i[0] for i in coordinatesHeatMap])
meanlon = np.mean([i[1] for i in coordinatesHeatMap])

# Initialize map
map0_Incidents = folium.Map(location = MISSION_COORDINATES, zoom_start = 14)

# Display neighborhood by polygon
folium.GeoJson(
    mission_district_geometry,
    name='Mission').add_to(map0_Incidents)

# Add option for fullscreen display
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True).add_to(map0_Incidents)

# Set map
# map1_Old_Larceny_Theft.add_children(MarkerCluster(locations=coordinatesHeatMap))
map0_Incidents.add_children(heatmaptime)

# Saving map to a file
map0_Incidents.save(outfile='map0_Incidents.html')

# # Show map
# map0_Incidents



In [58]:
%%HTML
<iframe width="100%" height="500" src="map0_Incidents.html?inline=true"></iframe>

Map above shows that any of the areas of the district is free of crimes any year. Let's visualize different crimes to obtain enhaced information:

### Map 1: Evolution of Mission larcenies / thefts in time

In [59]:
# LARCENY/THEFT EVOLUTION
# Obtain data
cursor = collection_mission_larcenies_thefts.find()
cursor_df = pd.DataFrame(list(cursor))

# Add year column 
cursor_df['Date'] = pd.to_datetime(cursor_df['Date'])
cursor_df['Year'] = cursor_df['Date'].apply(lambda time: time.year)

# Add columns Latitude and longitude
cursor_df['Latitude'] = [x['coordinates'][1] for x in cursor_df['loc']]
cursor_df['Longitude'] = [x['coordinates'][0] for x in cursor_df['loc']]

# Obtain a list of lists, including the coordinates of the incidents for each year
coordinatesHeatMap = [cursor_df[cursor_df['Year']==cursor_df['Year'].unique()[i]] \
               [ [ "Latitude", "Longitude" ] ].values.tolist() \
               for i in range(len(cursor_df['Year'].unique()))]

# Index for scrolling the map over time
indexHeatMap = [i for i in sorted(cursor_df['Year'].unique())]

# Heatmap with time
heatmaptime = plugins.HeatMapWithTime(data=coordinatesHeatMap,index=indexHeatMap)

# Get center of map for starting position
meanlat = np.mean([i[0] for i in coordinatesHeatMap])
meanlon = np.mean([i[1] for i in coordinatesHeatMap])

# Initialize map
map1_Larceny_Theft = folium.Map(location = MISSION_COORDINATES, zoom_start = 14)

# Display neighborhood by polygon
folium.GeoJson(
    mission_district_geometry,
    name='Mission').add_to(map1_Larceny_Theft)

# Add option for fullscreen display
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True).add_to(map1_Larceny_Theft)

# Set map
# map1_Old_Larceny_Theft.add_children(MarkerCluster(locations=coordinatesHeatMap))
map1_Larceny_Theft.add_children(heatmaptime)

# Saving map to a file
map1_Larceny_Theft.save(outfile='map1_Larceny_Theft.html')

# Show map
map1_Larceny_Theft



Larceny/theft is also very common in Mission. Areas around 16th and 24th streets seems to be the most problematic ones (there was a specially problematic 2012 around 16th Street). There is a certain decreasing tendency in this crime rate from 2012.

### Map 2: Evolution of Mission missing people in time

In [60]:
# MISSING PEOPLE EVOLUTION
# Obtain data
cursor = collection_mission_missing_person.find()
cursor_df = pd.DataFrame(list(cursor))

# Add year column 
cursor_df['Date'] = pd.to_datetime(cursor_df['Date'])
cursor_df['Year'] = cursor_df['Date'].apply(lambda time: time.year)

# Add columns Latitude and longitude
cursor_df['Latitude'] = [x['coordinates'][1] for x in cursor_df['loc']]
cursor_df['Longitude'] = [x['coordinates'][0] for x in cursor_df['loc']]

# Obtain a list of lists, including the coordinates of the incidents for each year
coordinatesHeatMap = [cursor_df[cursor_df['Year']==cursor_df['Year'].unique()[i]] \
               [ [ "Latitude", "Longitude" ] ].values.tolist() \
               for i in range(len(cursor_df['Year'].unique()))]

# Index for scrolling the map over time
indexHeatMap = [i for i in sorted(cursor_df['Year'].unique())]

# Heatmap with time
heatmaptime = plugins.HeatMapWithTime(data=coordinatesHeatMap,index=indexHeatMap)

# Get center of map for starting position
meanlat = np.mean([i[0] for i in coordinatesHeatMap])
meanlon = np.mean([i[1] for i in coordinatesHeatMap])

# Initialize map
map2_Missing_People = folium.Map(location = MISSION_COORDINATES, zoom_start = 14)

# Display neighborhood by polygon
folium.GeoJson(
    mission_district_geometry,
    name='Mission').add_to(map2_Missing_People)

# Add option for fullscreen display
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True).add_to(map2_Missing_People)

# Set map
map2_Missing_People.add_children(heatmaptime)

# Saving map to a file
map2_Missing_People.save(outfile='map2_Missing_People.html')

# Show map
map2_Missing_People



Missing people crime is less common in this neighborhood than others. Again 16th and 24th streets as well as South Van Ness Avenue are the areas with higher rate. North East area of this district seems to be not very problematic concerning missing people.

### Map 3: Evolution of Mission drug/narcotic in time

In [61]:
# DRUG/NARCOTIC EVOLUTION
# Obtain data
cursor = collection_mission_drug_narcotic.find()
cursor_df = pd.DataFrame(list(cursor))

# Add year column 
cursor_df['Date'] = pd.to_datetime(cursor_df['Date'])
cursor_df['Year'] = cursor_df['Date'].apply(lambda time: time.year)

# Add columns Latitude and longitude
cursor_df['Latitude'] = [x['coordinates'][1] for x in cursor_df['loc']]
cursor_df['Longitude'] = [x['coordinates'][0] for x in cursor_df['loc']]

# Obtain a list of lists, including the coordinates of the incidents for each year
coordinatesHeatMap = [cursor_df[cursor_df['Year']==cursor_df['Year'].unique()[i]] \
               [ [ "Latitude", "Longitude" ] ].values.tolist() \
               for i in range(len(cursor_df['Year'].unique()))]

# Index for scrolling the map over time
indexHeatMap = [i for i in sorted(cursor_df['Year'].unique())]

# Heatmap with time
heatmaptime = plugins.HeatMapWithTime(data=coordinatesHeatMap,index=indexHeatMap)

# Get center of map for starting position
meanlat = np.mean([i[0] for i in coordinatesHeatMap])
meanlon = np.mean([i[1] for i in coordinatesHeatMap])

# Initialize map
map3_Drug_Narcotic = folium.Map(location = MISSION_COORDINATES, zoom_start = 14)

# Display neighborhood by polygon
folium.GeoJson(
    mission_district_geometry,
    name='Mission').add_to(map3_Drug_Narcotic)

# Add option for fullscreen display
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True).add_to(map3_Drug_Narcotic)

# Set map
map3_Drug_Narcotic.add_children(heatmaptime)

# Saving map to a file
map3_Drug_Narcotic.save(outfile='map3_Drug_Narcotic.html')

# Show map
map3_Drug_Narcotic



For this category, again around 16th street there is a high rate, being the main drug/narcotic source in the district. 24th Street area was also very problematic in 2011.

### Map 4: Evolution of Mission vehicle thefts in time

In [62]:
# VEHICLE THEFTS EVOLUTION
# Obtain data
cursor = collection_mission_vehicle_theft.find()
cursor_df = pd.DataFrame(list(cursor))

# Add year column 
cursor_df['Date'] = pd.to_datetime(cursor_df['Date'])
cursor_df['Year'] = cursor_df['Date'].apply(lambda time: time.year)

# Add columns Latitude and longitude
cursor_df['Latitude'] = [x['coordinates'][1] for x in cursor_df['loc']]
cursor_df['Longitude'] = [x['coordinates'][0] for x in cursor_df['loc']]

# Obtain a list of lists, including the coordinates of the incidents for each year
coordinatesHeatMap = [cursor_df[cursor_df['Year']==cursor_df['Year'].unique()[i]] \
               [ [ "Latitude", "Longitude" ] ].values.tolist() \
               for i in range(len(cursor_df['Year'].unique()))]

# Index for scrolling the map over time
indexHeatMap = [i for i in sorted(cursor_df['Year'].unique())]

# Heatmap with time
heatmaptime = plugins.HeatMapWithTime(data=coordinatesHeatMap,index=indexHeatMap)

# Get center of map for starting position
meanlat = np.mean([i[0] for i in coordinatesHeatMap])
meanlon = np.mean([i[1] for i in coordinatesHeatMap])

# Initialize map
map4_Vehicle_theft = folium.Map(location = MISSION_COORDINATES, zoom_start = 14)

# Display neighborhood by polygon
folium.GeoJson(
    mission_district_geometry,
    name='Mission').add_to(map4_Vehicle_theft)

# Add option for fullscreen display
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True).add_to(map4_Vehicle_theft)

# Set map
map4_Vehicle_theft.add_children(heatmaptime)

# Saving map to a file
map4_Vehicle_theft.save(outfile='map4_Vehicle_Theft.html')

# Show map
map4_Vehicle_theft



2004 was a specially bad year in this crime, with 16th Street and Bryant Street areas specially affected. The rest of years this crime shows to be very common in the entire district, even if the tendency seems to be to decrease over the time.

<span style="color:blue">**Remark: if any of the maps is not correctly displayed on the notebook, please open html version included in the folder of this task.**</span>