Skip to content

Latest commit

 

History

History
32 lines (26 loc) · 1.32 KB

README.md

File metadata and controls

32 lines (26 loc) · 1.32 KB

Analytics using SFO crime data from Kaggle



Graphical analysis & Data Exploration

(1) HEATMAPS:
Code for heatmap and charts given in program "hmap.R".
To view images/charts or run the code online use the Kaggle platform link below:
https://www.kaggle.com/anu2analytics/sf-crime/heatmap-for-crime-categories-and-area
One heatmap image added to this folder as heatmap_SFO.png

(2) CHI-SQ TEST & CORRELATION VISUALIZATION:
program "relations.R" performs the following tasks:

    (a) chi-sq test to check if relationship exists between crimecategory (categorical variable) and other factors like district, x/y coordinaates, month, year, etc.
    (b) corrgram function to visualize correlation matrix and dependencies.
Related images include corr_categ_vars.jpg, corr_matrx_quant.jpg & cov_variables.jpg .

Prediction Algorithm:

Multinomial regression.

Model 1: simple formula with default options.

Formula: Category ~ DayOfWeek PdDistrict
Score: 2.65
Output file: multinom.zip

Model 2: complex formula using higher number of iterations.

Formula: multinom(Category ~ DayOfWeek + year + mth + PdDistrict, data = mytrain, maxit = 500)
Score: 2.60
Output file: multinom_dates.zip