# Introduction

  The city of San Francisco has accepted proposals to build a new police and fire station, however they do not know yet where to place the new combined station. As land is in short supply and expensive, the city would like to discern the most effective and yet best place to serve the neighborhood and fulfill their proposal by adding a police department in the community of most need, and a fire department in the neighborhood of least non-violent crime. Thus ensuring the station could be utilized to the best capability. The money for the project has been allocated, now the city is seeking the area best suited for their needs. 
  Utilizing data analysis and cleaning this report will first analyze and cluster via geospatial mapping 1000 non-violent crimes broken down by neighborhood. Then using Foursquare and statistical K-means determine which would be the best area for the new station and present those results to the city of San Francisco as part of a proposal. 

## Table of Contents
-Abstract Summary
-Methodologies
-Data Summary
-Results
-Conclusion

## Abstract Summary

This is a high level analysis to help the city of San Francisco rule out areas for the their new police and fire station. Determining which police stations already serve more historically non-violent neighborhoods rather than focusing on violent crime indexes will be best for placing the fire station as most non-violent calls are for fire or accidents. This study used combined methodologies of dataframe cleaning and analysis, clustering non-violent incidents, determining the best K-Means though statistical analysis and presenting the information to a non-technical audience.    

## Methodologies

The Methodologies below provide a detailed insight into the highest crime neighborhoods vs those with the least crime and utilizing that data cluster the police stations in order to find the best placement of the new combined station.
    - Data source: San Francisco crime data csv
    - Data Cleaning to pull the most relevant data
    - Data Analysis to identify and map the highest crime neighborhoods
    - K-Means Statistical Analysis
    - Geolocation data clustering of current stations

## Results

## Conclusion

In [1]:
#import necessary libaries
!pip install geopy
!pip install folium
import pandas as pd
import folium
import requests
from geopy.geocoders import Nominatim
from IPython.display import Image 
from IPython.core.display import HTML
from pandas.io.json import json_normalize



In [2]:
#import URL
url = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv'
df_incidents = pd.read_csv(url)

print('Dataset downloaded')

Dataset downloaded


In [5]:
#Cleaning the dataset
df_incidents = df_incidents.drop(['IncidntNum', 'Descript', 'DayOfWeek', 'Date', 'Time', 'Address', 'Location', 'PdId'], axis =1)
df_incidents.head()

Unnamed: 0,Category,PdDistrict,Resolution,X,Y
0,WEAPON LAWS,SOUTHERN,"ARREST, BOOKED",-122.403405,37.775421
1,WEAPON LAWS,SOUTHERN,"ARREST, BOOKED",-122.403405,37.775421
2,WARRANTS,BAYVIEW,"ARREST, BOOKED",-122.388856,37.729981
3,NON-CRIMINAL,TENDERLOIN,NONE,-122.412971,37.785788
4,NON-CRIMINAL,MISSION,NONE,-122.419672,37.76505


In [14]:
#Pulling only non-criminal incidents from data set
df = df_incidents.loc[df_incidents['Category'] == 'NON-CRIMINAL']
df.head()

Unnamed: 0,Category,PdDistrict,Resolution,X,Y
3,NON-CRIMINAL,TENDERLOIN,NONE,-122.412971,37.785788
4,NON-CRIMINAL,MISSION,NONE,-122.419672,37.76505
7,NON-CRIMINAL,TENDERLOIN,NONE,-122.411778,37.783981
11,NON-CRIMINAL,TARAVAL,NONE,-122.47796,37.745739
29,NON-CRIMINAL,INGLESIDE,NONE,-122.434609,37.709201


In [16]:
df.shape

(17866, 5)

In [17]:
limit = 1000
df = df.iloc[0:limit, :]

In [6]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [7]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sanfran_map

In [23]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df.Y, df.X, df.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map