# 1. Introduction
---

# 2. Data description
---

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# %matplotlib inline

In [2]:
crimes_df = pd.read_csv("MontgomeryCountyCrime2013.csv")
crimes_df.head(5)

Unnamed: 0,Incident ID,CR Number,Dispatch Date / Time,Class,Class Description,Police District Name,Block Address,City,State,Zip Code,...,Sector,Beat,PRA,Start Date / Time,End Date / Time,Latitude,Longitude,Police District Number,Location,Address Number
0,200939101,13047006,10/02/2013 07:52:41 PM,511,BURG FORCE-RES/NIGHT,OTHER,25700 MT RADNOR DR,DAMASCUS,MD,20872.0,...,,,,10/02/2013 07:52:00 PM,,,,OTHER,,25700.0
1,200952042,13062965,12/31/2013 09:46:58 PM,1834,CDS-POSS MARIJUANA/HASHISH,GERMANTOWN,GUNNERS BRANCH RD,GERMANTOWN,MD,20874.0,...,M,5M1,470.0,12/31/2013 09:46:00 PM,,,,5D,,
2,200926636,13031483,07/06/2013 09:06:24 AM,1412,VANDALISM-MOTOR VEHICLE,MONTGOMERY VILLAGE,OLDE TOWNE AVE,GAITHERSBURG,MD,20877.0,...,P,6P3,431.0,07/06/2013 09:06:00 AM,,,,6D,,
3,200929538,13035288,07/28/2013 09:13:15 PM,2752,FUGITIVE FROM JUSTICE(OUT OF STATE),BETHESDA,BEACH DR,CHEVY CHASE,MD,20815.0,...,D,2D1,11.0,07/28/2013 09:13:00 PM,,,,2D,,
4,200930689,13036876,08/06/2013 05:16:17 PM,2812,DRIVING UNDER THE INFLUENCE,BETHESDA,BEACH DR,SILVER SPRING,MD,20815.0,...,D,2D3,178.0,08/06/2013 05:16:00 PM,,,,2D,,


In [3]:
crimes_df.dtypes

Incident ID                 int64
CR Number                   int64
Dispatch Date / Time       object
Class                       int64
Class Description          object
Police District Name       object
Block Address              object
City                       object
State                      object
Zip Code                  float64
Agency                     object
Place                      object
Sector                     object
Beat                       object
PRA                       float64
Start Date / Time          object
End Date / Time            object
Latitude                  float64
Longitude                 float64
Police District Number     object
Location                   object
Address Number            float64
dtype: object

In [30]:
# Null cells per column
crimes_df.isnull().sum()
# set(crimes_df['City'])

{'ASHTON',
 'BARNESVILLE',
 'BEALLSVILLE',
 'BETHESDA',
 'BOYDS',
 'BRINKLOW',
 'BROOKEVILLE',
 'BURTONSVILLE',
 'CABIN JOHN',
 'CHEVY CHASE',
 'CLARKSBURG',
 'DAMASCUS',
 'DERWOOD',
 'DICKERSON',
 'GAITHERSBURG',
 'GERMANTOWN',
 'GLEN ECHO',
 'HYATTSVILLE',
 'KENSINGTON',
 'KISSIMMEE',
 'LAUREL',
 'MONTGOMERY VILLAGE',
 'MOUNT AIRY',
 'OLNEY',
 'POOLESVILLE',
 'POTOMAC',
 'ROCKVILLE',
 'SANDY SPRING',
 'SILVER SPRING',
 'SPENCERVILLE',
 'TAKOMA PARK',
 'WASHINGTON GROVE'}

## Columns information
---

| Columns name           | Descriprition | Actual type | Expected type |
| :--------------------- | :------------ | ----------- | ------------- |
| Incident ID            | Unique identifier from database | int64 | - |
| CR Number              | Police Report Number | int64 | - |
| Dispatch Date / Time   | The actual date and time a Officer was dispatched | object | datetime |
| Class                  | Four digit code identifying the crime type of the incident | int64  | - |
| Class Description      | Common name description of the incident class type | object | - |
| Police District Name   | Name of District (Rockville,Weaton etc.) | object | - |
| Block Address          | Address in 100 block level | object | - |
| City                   | City | object | - |
| State                  | State | object | - |
| Zip Code               | Zip Code | float64 | int64 |
| Agency                 | Assigned Police Department | object | - |
| Place                  | Place description | object | - |
| Sector                 | Police Sector Name | object | - |
| Beat                   | Police patrol area subset within District | object | - |
| PRA                    | Police patrol are subset within Beat | float64 | int64 |
| Start Date / Time      | Occurred from date/time | object | datetime |
| End Date / Time        | Occurred to date/time | object | datetime |
| Latitude               | Latitude of the location of the ocorrence | float64 | - |
| Longitude              | Longitude of the location of the ocorrence | float64 | - |
| Police District Number | Major Police Boundary | object | - |
| Location               | Location - This column store the data of the GPS coordinates of the incident as a string in the pattern: (&lt;latitude&gt;, &lt;longitude&gt;) | object | - |
| Address Number         | Address Number - The number extraced from block address column, as same rows does not have a number in 'Block Address', same 'Address Number' have a missing value.| float64 | int64/object 

In [5]:
# Função para comparação entre as colunas de 'Location' e 'Latitude' e 'Longitude'
def check_coordinates(row):
    if type(row['Location']) != float:
        loc_splited = pd.to_numeric(row['Location'][1:-1].split(','))
        return row['Latitude'] == loc_splited[0] and row['Longitude'] == loc_splited[1]
    else:
        return pd.isnull(row['Latitude']) and pd.isnull(row['Longitude'])

(~crimes_df.apply(check_coordinates, axis=1)).sum()

0

In [6]:
# Função para comparação entre as colunas de 'Block Address' e 'Address Number'
def check_addr_number(row):
    block = row['Block Address']
    numbers_in_block = [pd.to_numeric(s) for s in block.split() if s.isdigit()]
    if len(numbers_in_block) > 0: 
        addr_number = pd.to_numeric(numbers_in_block[0])
        # Same number at 'Address Number' and 'Block Address' columns
        return addr_number == row['Address Number']
    else:
        # When 'Block Address' has no number 'Address Number' is null.
        return pd.isnull(row['Address Number'])    
    
(~crimes_df.apply(check_addr_number, axis=1)).sum()

0

## Missing values
---
Almost all columns that have a missing value are associated with localization but the "End Date / Time" which have nearly 40% with 'nan' values. The localization columns with missing values are:
- Zip Code
- Sector
- Beat
- PRA
- Latitude
- Longitude
- Location
- Address Number

Is important to note that e the location column is a pair of latitude and longitude, so the latitude and longitude columns have de same information extracted from location, this implies that the lines with missing values for locatation also missing values for latitude and longitude.

Another derivated column is 'Address Number', which is the first number extracted from the 'Block Address' column, when it have a number. 'Block Address' does not have missing values but 'Address Number' have because same row does not have a number at 'Block Address' even having a address.

This makes the missing values analysis focus on:
- Zip Code
- Sector
- Beat
- PRA

Besides de 'Zip Code', all columns are related with police sectorization. Probabily, these values are missing for not respecting the 100 block level. There still other columns related with police sectorization without missing values:
- Police Disctrict Name
- Agency

These columns also have greater area than the columns with missing values.

# 3. Analyzing the times of crimes
---

In [7]:
dispatch_time = pd.to_datetime(crimes_df['Dispatch Date / Time'])

In [10]:
dispatch_time.dt.weekday_name.value_counts()

Tuesday      3836
Monday       3734
Wednesday    3611
Friday       3594
Thursday     3404
Saturday     2807
Sunday       2383
Name: Dispatch Date / Time, dtype: int64

In [19]:
dispatch_time.dt.weekday.value_counts().sort_index()

0    3734
1    3836
2    3611
3    3404
4    3594
5    2807
6    2383
Name: Dispatch Date / Time, dtype: int64

In [20]:
dispatch_time.dt.hour.value_counts()

7     1278
9     1222
16    1211
15    1179
8     1174
14    1142
13    1132
18    1119
10    1116
17    1115
11    1105
6     1076
20    1065
12    1063
23    1039
19    1030
22    1022
21    1012
0      904
1      855
2      684
3      376
4      228
5      222
Name: Dispatch Date / Time, dtype: int64

In [21]:
dispatch_time.dt.hour.value_counts().sort_index()

0      904
1      855
2      684
3      376
4      228
5      222
6     1076
7     1278
8     1174
9     1222
10    1116
11    1105
12    1063
13    1132
14    1142
15    1179
16    1211
17    1115
18    1119
19    1030
20    1065
21    1012
22    1022
23    1039
Name: Dispatch Date / Time, dtype: int64

In [14]:
dispatch_time.dt.strftime('%B').value_counts()

October      4075
August       4002
November     3941
September    3927
December     3904
July         3520
Name: Dispatch Date / Time, dtype: int64

In [23]:
dispatch_time.dt.month.value_counts().sort_index()

7     3520
8     4002
9     3927
10    4075
11    3941
12    3904
Name: Dispatch Date / Time, dtype: int64

### What day of the week are the most crimes committed on? (ie Monday, Tuesday, etc)

It's simple to see that tuesday is the day most crimes are committed but the diferrence between other days like monday are not much significant.The crimes are more frequent during de week having almost 600 hundreds of diference between saturday and thursday, the weekend day with more crimes and the not weekend day with less crimes.

### During what time of day are the most crimes committed?

Again it's simple to find that the majority of crimes ocur at 7 in the morning but again the difference between other hours are not much significant. But we can draw more info of these date, for example, between 0 hour and 5 hour have a decreasing pattern and the window at 3-5 hours are really low in contrast with the rest of the day. Another interesting aspect is that the peaks are at high intense traffic (7, 9 and 16), hours that people goes to and back school and work.

### During what month are the most crimes committed?

October is the month with more ocurrences of crimes in this dataset, but the crimes occur alike. The most off month is July. The dataset is not extent enough for extract info about that for two reasons: the dataset have only half months of the year of 2013 and have data about only one year. To extract this kind of information of seasonality it's needed more data dispersaded through the hole year.


# 4. Analyzing locations of crimes
---

In [32]:
crimes_df['City'].value_counts()

SILVER SPRING         8626
ROCKVILLE             3453
GAITHERSBURG          3403
GERMANTOWN            2170
BETHESDA              1736
MONTGOMERY VILLAGE     687
POTOMAC                527
CHEVY CHASE            498
OLNEY                  380
KENSINGTON             363
BURTONSVILLE           304
DERWOOD                270
DAMASCUS               230
CLARKSBURG             173
TAKOMA PARK            141
POOLESVILLE            105
BOYDS                   90
BROOKEVILLE             70
SANDY SPRING            43
DICKERSON               26
ASHTON                  19
CABIN JOHN              18
SPENCERVILLE             9
WASHINGTON GROVE         6
BRINKLOW                 5
BARNESVILLE              4
GLEN ECHO                4
MOUNT AIRY               3
BEALLSVILLE              2
LAUREL                   2
HYATTSVILLE              1
KISSIMMEE                1
Name: City, dtype: int64

In [33]:
crimes_df['Police District Name'].value_counts()

SILVER SPRING         5533
WHEATON               4375
MONTGOMERY VILLAGE    3812
ROCKVILLE             3480
BETHESDA              3383
GERMANTOWN            2755
TAKOMA PARK             23
OTHER                    8
Name: Police District Name, dtype: int64

In [35]:
crimes_df['Police District Number'].value_counts()

3D       5533
4D       4375
6D       3812
1D       3480
2D       3383
5D       2755
TPPD       23
OTHER       8
Name: Police District Number, dtype: int64

### In what area did the most crimes occur? What physical locations (like cities) does this area correspond to?

Analyzing by city we can say that Silver Spring has the most crimes by far, while analizing by Police District, the distribution is not so accetuate at one place but having Silver Spring as the Police District with more crimes. This can be confusing but the city of Silver Spring is located in the border of different police districts which causes de Silver Spring district have different total crimes than the city of Silver Spring.

### Which area has the highest number of crimes per capita? You may be able to find population data per area online. For example, this annual report has per-district populations towards the bottom.
// Later


# 5. Analyzing types of crime
---

# 6. Combine Analysis
---

# 7. Posing and answering your own questions
---

# Ideias de análise
***
 - Distribuição de intervalo entre 
 - <strike>Perfil de crimes por localidade</strike>
 - Analisar se nulidade das colunas relacionadas a localizção tem relação

# Referências

[Descrição de códigos de classe][response_codes]: Está página contém a descrição das classes das ocorrências listadas no dataset e define um agrupamento para os códigos. Estes agrupamentos foram utilizados para classificar as ocorrências, foi necessário criar alguns agrupamentos para códigos específicos que não possuem grupos no link referenciado.

[Relatório policial Rockerville][uniform_crime_stat_report]: Relatório policial comparativo entre os anos de 2013 e 2014. Este relatório foi utilizado para obter algum entendimento sobre o sistema policial de relatórios implementado nos Estados Unidos.

[Manual para sistema de relatório uniforme de crimes][ucr_handbook]: Uniform Crime Reporting é uma iniciativa do governo americano para análise estatística dos crimes. Foi concebido em 1929 e em 1930 o FBI foi incubido de coletar, publicar e armazenar esses dados.

[Mapa policial][mont_county_pol_map]: Mapa Policial interativo do Condado de Montgomenry, Maryland.

[Fonte dataset][dataset]: Link com origem do dataset com definições sobre as colunas, também provê um ambiente para análise.

[response_codes]: http://wiki.radioreference.com/index.php/Montgomery_County_(MD)_Response_Codes "Montgomery County (MD) Response Codes"
[uniform_crime_stat_report]: http://www.rockvillemd.gov/DocumentCenter/View/10969 "Uniform Crime Statistics Report - Rockville City Police Department"
[ucr_handbook]: https://www2.fbi.gov/ucr/handbook/ucrhandbook04.pdf "Uniform Crime Report Handbook"
[mont_county_pol_map]: http://mcgov-gis.maps.arcgis.com/apps/Viewer/index.html?appid=4317830a05654b8f907e65515970a5ba "Montgomery County Police Map"

[dataset]: https://data.policefoundation.org/Incidents/Montgomery-County-MD-MCPD-Incidents-07-2003-/c2mn-zwn5 "Montgomery County, MD - MCPD Incidents "