# Analysing San Francisco Crime dataset
## Introduction 
The crime rate in San-Francisco became the topic of relevancy over the years of progressive violence in the city sidewalks. As an example, 2018 statistic shows the average violent crime rate, which exceeded 150% over other United States cities, inferring to the potential threat that people faces while kindly stepping the sidewalk. As if some areas might be safe, some areas extremely violent with a high chance of being robbed, such as Tenderloin. Also, most of the crimes are related to thefts and property crimes. The following analysis provides which part of the cities are the most dangerous ones and also will give a description of the current problem. 

In 2018 city faced 65 calls per day with the compliancy of theft, and becoming the second most frequent city with compliancy issues among other United State cities. Frequency of the compliancy gave birth to a "poop patrol" to deal with amid homelessness.

Due to the high compliancy on crime-level in San-Francisco, research on this dataset would provide intriguing insights and information about the city crime frequency and possible analysis combatting the issue of high violence. Moreover, timestamps would be an additional source for analysis to explain the appearance and location of each crime. Reshaped and sanitized results would deliver infographics for different variables, thus: time, location, and form of the crime.

### Sources
+ https://en.wikipedia.org/wiki/San_Francisco#Crime)
+ https://time.com/5368610/san-francisco-poop-patrol-problem/
+ https://www.kaggle.com/roshansharma/sanfranciso-crime-dataset

## Dataset description and goals

### Preface
Given the year (2016) of the dataset, it might be an obsolete one. However, the statistical analysis does not lose its relevance, since the results deliver additional information to the future projects and yearly evaluation, and serve as a comparison backup analysis.

### Data Variables

These variables will be used in the data analysis. 

+ Category - classify each type of crime and 
+ Descript - brief explanation of a committed crime
+ DayOfWeek - given week of the day
+ Date - when does incident happened
+ Time - exact moment of the committed crime
+ PdDistrict - area where the committed happened
+ Address - the location and given address name
+ X - X-axis latency location
+ Y - Y-axis latency location

### Research Question
<ol>
    <li>Analyze the areas and location of committed crime on the San-Francisco map</li>
    <li>Evaluate the weekday and timely information using gistographical analysis of frequency, and possible time gap results between each committed crime.</li>
    <li>Analyze the most frequent and less frequent committed crime and sort by relevance or occurency</li>
    <li>Determine time of a committed crime for each area and its frequency</li>
    <li>Classify each category and make subset for each incident with its brief description</li>
<ol>

## Data Sanitization and cleaning

Since given dataset would be analyzed for completeness further in the data validation part, dataset would be cleansed from unnecessary columns and column values.

In [1]:
import numpy as np
import pandas as pd
#initial used libraries
#additional libraries would be added as the project progresses

In [2]:
#training model
tm = pd.read_csv('/Users/harmonyof/Desktop/Police_Department_Incidents_Previous_Year__2016_.csv', encoding='UTF-8')
#delete unnecessary columns
del tm['Location']
del tm['Resolution']
del tm['PdId']
del tm['IncidntNum']
#display the graph in a frame
df = pd.DataFrame(tm)
df.head(5)

Unnamed: 0,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Address,X,Y
0,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421
1,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421
2,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,KEITH ST / SHAFTER AV,-122.388856,37.729981
3,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,JONES ST / OFARRELL ST,-122.412971,37.785788
4,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,16TH ST / MISSION ST,-122.419672,37.76505


On the Date Column 12:00:00 AM part is continuously appears on each column, since it does not provide any information, removing this part does not affect the dataset appearance.

In [3]:
df['Date'] = df['Date'].map(lambda x: x.rstrip('AM').rstrip(' ').rstrip('12:00:00'))
#Cleared Column
df.head(5)

Unnamed: 0,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Address,X,Y
0,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016,11:00,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421
1,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016,11:00,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421
2,WARRANTS,WARRANT ARREST,Monday,04/25/2016,14:59,BAYVIEW,KEITH ST / SHAFTER AV,-122.388856,37.729981
3,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016,23:50,TENDERLOIN,JONES ST / OFARRELL ST,-122.412971,37.785788
4,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016,00:30,MISSION,16TH ST / MISSION ST,-122.419672,37.76505


#### Data division by research questions

1) Areas and location of each committed crime and the consequent graph will be used for future retrievement

In [4]:
df1 = df[['PdDistrict', 'Address','X','Y','Category']]
df1.head(5)

Unnamed: 0,PdDistrict,Address,X,Y,Category
0,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421,WEAPON LAWS
1,SOUTHERN,800 Block of BRYANT ST,-122.403405,37.775421,WEAPON LAWS
2,BAYVIEW,KEITH ST / SHAFTER AV,-122.388856,37.729981,WARRANTS
3,TENDERLOIN,JONES ST / OFARRELL ST,-122.412971,37.785788,NON-CRIMINAL
4,MISSION,16TH ST / MISSION ST,-122.419672,37.76505,NON-CRIMINAL


2) Evaluate the weekday and timely information and the consequent graph will be used for future retrievement

In [5]:
df2 = df[['DayOfWeek', 'Date','Time']]
df2.head(5)

Unnamed: 0,DayOfWeek,Date,Time
0,Friday,01/29/2016,11:00
1,Friday,01/29/2016,11:00
2,Monday,04/25/2016,14:59
3,Tuesday,01/05/2016,23:50
4,Friday,01/01/2016,00:30


3) Analyze the most frequent and less frequent committed crime and the consequent graph will be used for future retrievement

In [6]:
df3 = df[['Category','Date','Time', 'Address']]
df3.head(5)

Unnamed: 0,Category,Date,Time,Address
0,WEAPON LAWS,01/29/2016,11:00,800 Block of BRYANT ST
1,WEAPON LAWS,01/29/2016,11:00,800 Block of BRYANT ST
2,WARRANTS,04/25/2016,14:59,KEITH ST / SHAFTER AV
3,NON-CRIMINAL,01/05/2016,23:50,JONES ST / OFARRELL ST
4,NON-CRIMINAL,01/01/2016,00:30,16TH ST / MISSION ST


4) Determine time of a committed crime for each area and the consequent graph will be used for future retrievement

In [7]:
df4 = df[['Time', 'Category', 'X', 'Y']]
df4.head(5)

Unnamed: 0,Time,Category,X,Y
0,11:00,WEAPON LAWS,-122.403405,37.775421
1,11:00,WEAPON LAWS,-122.403405,37.775421
2,14:59,WARRANTS,-122.388856,37.729981
3,23:50,NON-CRIMINAL,-122.412971,37.785788
4,00:30,NON-CRIMINAL,-122.419672,37.76505


5) Classify each category and make subset for each incident with its brief description

In [8]:
df5 = df[['Category', 'Descript']]
df5.head(5)

Unnamed: 0,Category,Descript
0,WEAPON LAWS,POSS OF PROHIBITED WEAPON
1,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE"
2,WARRANTS,WARRANT ARREST
3,NON-CRIMINAL,LOST PROPERTY
4,NON-CRIMINAL,LOST PROPERTY
