# Universidade Federal do Rio Grande do Norte
## Programa de Pós-graduação em Engenharia Elétrica e de Computação
### Course: Tópicos Especiais C
### Professor: Ivanovitch Silva
### Students: 
* Aguinaldo Bezerra Batista Júnior
* Pedro Klisley Ferreira da Silva
* Ycaro Ravel Dantas

#### Activity: Task 2, Project 1
#### Subject: Finding Patterns in Crime
#### Objectives: Perform data analysis of a given dataset following guidance and hints in the professor's notebook.
#### Dataset: MontgomeryCountyCrime2013.csv

### 1 Getting to know the proposed dataset

Loading the given dataset into Pandas object and showing some rows

In [1]:
import pandas as pd
crimes = pd.read_csv("MontgomeryCountyCrime2013.csv")
crimes.head()

Unnamed: 0,Incident ID,CR Number,Dispatch Date / Time,Class,Class Description,Police District Name,Block Address,City,State,Zip Code,...,Sector,Beat,PRA,Start Date / Time,End Date / Time,Latitude,Longitude,Police District Number,Location,Address Number
0,200939101,13047006,10/02/2013 07:52:41 PM,511,BURG FORCE-RES/NIGHT,OTHER,25700 MT RADNOR DR,DAMASCUS,MD,20872.0,...,,,,10/02/2013 07:52:00 PM,,,,OTHER,,25700.0
1,200952042,13062965,12/31/2013 09:46:58 PM,1834,CDS-POSS MARIJUANA/HASHISH,GERMANTOWN,GUNNERS BRANCH RD,GERMANTOWN,MD,20874.0,...,M,5M1,470.0,12/31/2013 09:46:00 PM,,,,5D,,
2,200926636,13031483,07/06/2013 09:06:24 AM,1412,VANDALISM-MOTOR VEHICLE,MONTGOMERY VILLAGE,OLDE TOWNE AVE,GAITHERSBURG,MD,20877.0,...,P,6P3,431.0,07/06/2013 09:06:00 AM,,,,6D,,
3,200929538,13035288,07/28/2013 09:13:15 PM,2752,FUGITIVE FROM JUSTICE(OUT OF STATE),BETHESDA,BEACH DR,CHEVY CHASE,MD,20815.0,...,D,2D1,11.0,07/28/2013 09:13:00 PM,,,,2D,,
4,200930689,13036876,08/06/2013 05:16:17 PM,2812,DRIVING UNDER THE INFLUENCE,BETHESDA,BEACH DR,SILVER SPRING,MD,20815.0,...,D,2D3,178.0,08/06/2013 05:16:00 PM,,,,2D,,


Listing the columns of the dataset and their types

In [2]:
crimes.columns

Index(['Incident ID', 'CR Number', 'Dispatch Date / Time', 'Class',
       'Class Description', 'Police District Name', 'Block Address', 'City',
       'State', 'Zip Code', 'Agency', 'Place', 'Sector', 'Beat', 'PRA',
       'Start Date / Time', 'End Date / Time', 'Latitude', 'Longitude',
       'Police District Number', 'Location', 'Address Number'],
      dtype='object')

In [3]:
crimes.dtypes

Incident ID                 int64
CR Number                   int64
Dispatch Date / Time       object
Class                       int64
Class Description          object
Police District Name       object
Block Address              object
City                       object
State                      object
Zip Code                  float64
Agency                     object
Place                      object
Sector                     object
Beat                       object
PRA                       float64
Start Date / Time          object
End Date / Time            object
Latitude                  float64
Longitude                 float64
Police District Number     object
Location                   object
Address Number            float64
dtype: object

By inspecting column names and some cells, we figured out some relations between some columns. Columns **"Police Distric Name"** and **"Police District Number"** stand for the same information: the identification of the Police District (Name or Code). Similarily, columns **"Class"** and **"Class Description"** refer to crime classes codes and their descriptions. **"Location"** is clearly a join of the information on **"Latitude"** and **"Longitude" columns**. A quick google search allowed to perceive the meaning of remaining non self-explanatory columns. **"CR Number"** refers to the Police Report Number, **"Beat"** stands for a police patrol area (subset of a Sector) and **"PRA"**, the Police Response Area, is a subset of Beat.



The column **"Zip Code"**, as already pointed in the guidance notebook, contains wrong type data. It is important to convert wrong or inadequate type data to more appropriate ones to help further analysis.

We could convert data in the **"Zip Code"** and **"Address Number"** columns to the more suitable interger type. However, these columns are full of _NaN_ and the lack of _NaN_ and _NA_ representations in integer columns is a well known Pandas "gotcha". Maybe thats why these columns were built as float type data (compatible with NAs and NaNs). A workaround to this problem could be the definition of a sentinel interger values for this kind of data absense.

NaNs and NAs are likeky to happen in a dataset during data acquisition, transmission and conversion steps. 
They negatively affect the quality of the dataset and they should be carefuly spotted and evaluated, as they may
lead to weak or innacurate conclusions.

### 2 Preparing Data

Converting some columns to more appropriate data types (see if it is really necessary)

Converting Date/Time data in columns **"Dispatch Date / Time"**, **"Start Date / Time"** and **"End Date / Time"**
from a generic Pandas object to a more convinient Date/Time Pandas object. This will allow the further extraction of date and time components easily. The resulting converted columns are going to be added right next to the original data columns.

In [4]:
crimes.insert(3, "datetime_Dispatch", pd.to_datetime(crimes["Dispatch Date / Time"]))

In [5]:
crimes.insert(17, "datetime_Start", pd.to_datetime(crimes["Start Date / Time"]))

In [6]:
crimes.insert(19, "datetime_End", pd.to_datetime(crimes["End Date / Time"]))

Verifying the new columns

In [7]:
crimes.head()

Unnamed: 0,Incident ID,CR Number,Dispatch Date / Time,datetime_Dispatch,Class,Class Description,Police District Name,Block Address,City,State,...,PRA,Start Date / Time,datetime_Start,End Date / Time,datetime_End,Latitude,Longitude,Police District Number,Location,Address Number
0,200939101,13047006,10/02/2013 07:52:41 PM,2013-10-02 19:52:41,511,BURG FORCE-RES/NIGHT,OTHER,25700 MT RADNOR DR,DAMASCUS,MD,...,,10/02/2013 07:52:00 PM,2013-10-02 19:52:00,,NaT,,,OTHER,,25700.0
1,200952042,13062965,12/31/2013 09:46:58 PM,2013-12-31 21:46:58,1834,CDS-POSS MARIJUANA/HASHISH,GERMANTOWN,GUNNERS BRANCH RD,GERMANTOWN,MD,...,470.0,12/31/2013 09:46:00 PM,2013-12-31 21:46:00,,NaT,,,5D,,
2,200926636,13031483,07/06/2013 09:06:24 AM,2013-07-06 09:06:24,1412,VANDALISM-MOTOR VEHICLE,MONTGOMERY VILLAGE,OLDE TOWNE AVE,GAITHERSBURG,MD,...,431.0,07/06/2013 09:06:00 AM,2013-07-06 09:06:00,,NaT,,,6D,,
3,200929538,13035288,07/28/2013 09:13:15 PM,2013-07-28 21:13:15,2752,FUGITIVE FROM JUSTICE(OUT OF STATE),BETHESDA,BEACH DR,CHEVY CHASE,MD,...,11.0,07/28/2013 09:13:00 PM,2013-07-28 21:13:00,,NaT,,,2D,,
4,200930689,13036876,08/06/2013 05:16:17 PM,2013-08-06 17:16:17,2812,DRIVING UNDER THE INFLUENCE,BETHESDA,BEACH DR,SILVER SPRING,MD,...,178.0,08/06/2013 05:16:00 PM,2013-08-06 17:16:00,,NaT,,,2D,,


Processing crimes times to identify the part of day they were commited:

In [10]:
from datetime import time
from datetime import datetime

morning_start = time(6)
#morning_end = time(11,59)
afternoon_start = time(12)
#afternoon_end = time(17,59)
evening_start = time(18)
evening_end = time(23,59)
dawn_start = time(0)
#dawn_end = time(5,59)


def part_of_day(t):
    if morning_start <= t < afternoon_start:
        return 'morning'
    if afternoon_start <= t < evening_start:
        return 'afternoon'
    if evening_start <= t <= evening_end:
        return 'evening'
    if dawn_start <= t < morning_start:
        return 'dawn'
    
    return 'unknown'

In [11]:
#%load_ext cython
#crimes_bkp = crimes
#crimes = crimes[:100]
crime_period = []

#cdef long i
for i in range(len(crimes)):
        crime_period.append(part_of_day(crimes.datetime_Start.dt.time[i]))

crimes['crimes_period'] = crime_period


KeyboardInterrupt: 

In [134]:
crimes.head()


Unnamed: 0,Incident ID,CR Number,Dispatch Date / Time,datetime_Dispatch,Class,Class Description,Police District Name,Block Address,City,State,...,Start Date / Time,datetime_Start,End Date / Time,datetime_End,Latitude,Longitude,Police District Number,Location,Address Number,crimes_period
0,200939101,13047006,10/02/2013 07:52:41 PM,2013-10-02 19:52:41,511,BURG FORCE-RES/NIGHT,OTHER,25700 MT RADNOR DR,DAMASCUS,MD,...,10/02/2013 07:52:00 PM,2013-10-02 19:52:00,,NaT,,,OTHER,,25700.0,evening
1,200952042,13062965,12/31/2013 09:46:58 PM,2013-12-31 21:46:58,1834,CDS-POSS MARIJUANA/HASHISH,GERMANTOWN,GUNNERS BRANCH RD,GERMANTOWN,MD,...,12/31/2013 09:46:00 PM,2013-12-31 21:46:00,,NaT,,,5D,,,evening
2,200926636,13031483,07/06/2013 09:06:24 AM,2013-07-06 09:06:24,1412,VANDALISM-MOTOR VEHICLE,MONTGOMERY VILLAGE,OLDE TOWNE AVE,GAITHERSBURG,MD,...,07/06/2013 09:06:00 AM,2013-07-06 09:06:00,,NaT,,,6D,,,morning
3,200929538,13035288,07/28/2013 09:13:15 PM,2013-07-28 21:13:15,2752,FUGITIVE FROM JUSTICE(OUT OF STATE),BETHESDA,BEACH DR,CHEVY CHASE,MD,...,07/28/2013 09:13:00 PM,2013-07-28 21:13:00,,NaT,,,2D,,,evening
4,200930689,13036876,08/06/2013 05:16:17 PM,2013-08-06 17:16:17,2812,DRIVING UNDER THE INFLUENCE,BETHESDA,BEACH DR,SILVER SPRING,MD,...,08/06/2013 05:16:00 PM,2013-08-06 17:16:00,,NaT,,,2D,,,afternoon
