# National Institute of Justice Crime Forecasting Challenge

This page is an entry for the [NIJ Real-Time Crime Forecasting Challenge](http://www.nij.gov/funding/Pages/fy16-crime-forecasting-challenge.aspx). The challenge basically asks, using real historical data on crimes, to predict crimes in Portland, Oregon from March 1st through May 31st. 

### Team

This is a small team entry with the following team-members:
* David Freiberg - PhD Student (Materials Science), Drexel University
* Mairen Hilary - Undergraduate Student (Criminology and Justice Studies), Drexel University
* Axel Munoz - Undergraduate Student (Cryptology and Intelligence Analysis), Drexel University
* Christine Palmer - Data Analyst, Public Sector
* Matt Wolf - Data Analyst, Public Sector

### The Data

NIJ provided data from March 1st, 2012 until the end of the competition, February 28th. They provided data in multiple different forms, such as db files and shapefiles, but utilizing the csv files would be more efficient. A huge speedbump was determining their geodetic projection scheme. 

Turns out NIJ decided to release the data in an odd format (primarily used by DoD and counterparts): WGS 84. A script was created using the Python project PyProj to convert the coordinates. The script creates two files: a file containing all data, and a file containing only the first 50 lines for development purposes.

In [None]:
########################################################################################################################
#                                                  IMPORT LIBRARIES
#
#   pyproj - coordinate library for python
#   glob - detect files inside a directory for parsing
#   csv - edit CSV files
########################################################################################################################
from pyproj import Proj, transform
import glob
import csv

########################################################################################################################
#                                                  GLOBAL VARIABLES
#
#   inProj - the crazy coordinate system they use
#   outProj - lat/lon coordinate system
########################################################################################################################
inProj = Proj(init='epsg:3857')
outProj = Proj(init='epsg:4326')


########################################################################################################################
#                                                  FUNCTION: main
#   Drives the program (old habits die hard)
########################################################################################################################
def main():
    #Create the final document
    allData = open('lat-lon', 'w')
    sampleData = open('sample', 'w')

    # Get the files in the CSV folder
    csvFiles = glob.glob("csv/*.csv")

    # Loop through them and make a super CSV
    for filename in csvFiles:
        csvFile = open(filename, 'rt')
        reader = csv.reader(csvFile)
        for i, row in enumerate(reader):
            if "x_coordinate" not in row:
                inProj = Proj(init='esri:102726', preserve_units=True)
                outProj = Proj(init='epsg:4236')
                x,y=transform(inProj,outProj,float(row[5]),float(row[6]))
                print x, y
                allData.write(row[0].strip(' ') + '\t')
                allData.write(row[1].strip(' ') + '\t')
                allData.write(row[2].strip(' ') + '\t')
                allData.write(row[3].strip(' ') + '\t')
                allData.write(row[4].strip(' ') + '\t')
                allData.write(str(x)+ '\t')
                allData.write(str(y) + '\t')
                allData.write(row[7].strip(' '))
                allData.write('\n')

                if i < 51:
                    sampleData.write(row[0].strip(' ') + '\t')
                    sampleData.write(row[1].strip(' ') + '\t')
                    sampleData.write(row[2].strip(' ') + '\t')
                    sampleData.write(row[3].strip(' ') + '\t')
                    sampleData.write(row[4].strip(' ') + '\t')
                    sampleData.write(str(x) + '\t')
                    sampleData.write(str(y) + '\t')
                    sampleData.write(row[7].strip(' '))
                    sampleData.write('\n')

    allData.close()
    sampleData.close()
main()

The next step I took was to write code to handle the data itself. That means sorting the data via the dates, or extracting data with certain keywords (i.e. only data with 'Assault' or 'Gang Related' in it), or even for mass manipulation of latitude and longitude.

Originally I wrote a small script with a bunch of functions to do these seperately, but then I realized it was probably better to write an 'engine' for efficieny and portability for future projects.