# Austin crimes

This is an EDA. I will explore which data is available for Austin crimes in 2015.

Also, I'll be checking what needs to be cleaned and just taking a quick glimpse overall.

Data from: [Crime Reports 2015 | Open Data | City of Austin Texas][1]

[1]: https://data.austintexas.gov/Public-Safety/Crime-Reports-2015/g3bw-w7hh

In [1]:
# imports

import pandas as pd

In [2]:
loc = '../../data/raw/crime_reports_austin.csv'
df = pd.read_csv(loc)
df['Occurred Date Time'] = pd.to_datetime(df['Occurred Date Time'])
df['Clearance Date'] = pd.to_datetime(df['Clearance Date'])
df.head().T

Unnamed: 0,0,1,2,3,4
Incident Number,20155006575,20151891608,20155050011,20158004086,20155054719
Highest Offense Description,FORGERY AND PASSING,AUTO THEFT,THEFT,THEFT,DEBIT CARD ABUSE
Highest Offense Code,1000,700,600,600,1108
Family Violence,N,N,N,N,N
Occurred Date Time,2015-02-11 15:29:00,2015-07-07 17:50:00,2015-11-02 12:00:00,2015-11-12 00:30:00,2015-12-13 12:00:00
Occurred Date,02/11/2015,07/07/2015,11/02/2015,11/12/2015,12/13/2015
Occurred Time,1529,1750,1200,30,1200
Report Date Time,02/15/2015 03:29:00 PM,07/08/2015 10:01:00 PM,11/12/2015 04:27:00 PM,11/12/2015 03:18:00 AM,12/15/2015 02:28:00 PM
Report Date,02/15/2015,07/08/2015,11/12/2015,11/12/2015,12/15/2015
Report Time,1529,2201,1627,318,1428


In [3]:
df.shape

# woah

(116879, 27)

In [4]:
pct_nan = df.isnull().sum() / df.shape[0]
pct_nan = pct_nan[pct_nan > 0.01]
pct_nan.name = "nan"
pct_nan.to_frame().style \
    .format("{:.1%}")

Unnamed: 0,nan
Council District,1.1%
Clearance Status,8.5%
Clearance Date,8.5%
UCR Category,67.0%
Category Description,67.0%
Latitude,2.1%
Longitude,2.1%
Location,2.1%


In [5]:
df['Incident Number'].nunique()

116879

In [6]:
df['Highest Offense Description'].value_counts().head(10)

THEFT                             10730
FAMILY DISTURBANCE                10409
BURGLARY OF VEHICLE                9439
CRIMINAL MISCHIEF                  5858
THEFT BY SHOPLIFTING               4082
DISTURBANCE - OTHER                3855
ASSAULT W/INJURY-FAM/DATE VIOL     3822
DWI                                3564
BURGLARY OF RESIDENCE              3205
WARRANT ARREST NON TRAFFIC         2823
Name: Highest Offense Description, dtype: int64

In [7]:
df['Highest Offense Code'].value_counts().head(10)

# almost the same as last one

600     10730
3400    10409
601      9439
900      6451
1400     5858
607      4082
3401     3855
2100     3564
500      3242
902      3070
Name: Highest Offense Code, dtype: int64

In [8]:
df[df['Highest Offense Code'] == 900]['Highest Offense Description'].value_counts()

ASSAULT W/INJURY-FAM/DATE VIOL    3822
ASSAULT WITH INJURY               2629
Name: Highest Offense Description, dtype: int64

In [9]:
df['Family Violence'].value_counts()

N    109568
Y      7311
Name: Family Violence, dtype: int64

In [10]:
df['Occurred Date Time'].min()

# ooooh

Timestamp('2003-01-01 00:01:00')

In [11]:
df['Occurred Date Time'].max()

Timestamp('2015-12-31 23:59:00')

In [12]:
df['Location Type'].value_counts().head(10)

RESIDENCE / HOME                      47699
STREETS / HWY / ROAD / ALLEY          24661
PARKING LOTS / GARAGE                 12304
OTHER / UNKNOWN                        5988
COMMERCIAL / OFFICE BUILDING           3890
DEPARTMENT / DISCOUNT STORE            3260
GROCERY / SUPERMARKET                  2207
HOTEL / MOTEL / ETC.                   2109
RESTAURANTS                            1740
DRUG STORE / DR. OFFICE / HOSPITAL     1564
Name: Location Type, dtype: int64

In [13]:
df['Address'].nunique()

41507

In [14]:
df['Zip Code'].nunique()

55

In [15]:
df['Council District'].value_counts()

9.0     19005
3.0     18001
4.0     16522
1.0     14082
2.0     12241
7.0     10868
5.0      9359
6.0      5825
10.0     4907
8.0      4805
Name: Council District, dtype: int64

In [16]:
df['Clearance Status'].value_counts()

N    74884
C    27962
O     4092
Name: Clearance Status, dtype: int64

In [17]:
df['Clearance Date'].min()

Timestamp('2015-01-01 00:00:00')

In [18]:
df['Clearance Date'].max()

# things are happening

Timestamp('2018-09-25 00:00:00')

In [19]:
# Uniform Crime Report
df['UCR Category'].value_counts()

23H    12559
23F    10028
220     4890
23C     4082
240     2004
13A     1860
120      889
23G      698
11A      539
23A      438
23D      294
11C      162
23E       89
09A       24
23B        8
11B        5
Name: UCR Category, dtype: int64

In [20]:
df['Category Description'].value_counts()

Theft                 28196
Burglary               4890
Auto Theft             2004
Aggravated Assault     1860
Robbery                 889
Rape                    706
Murder                   24
Name: Category Description, dtype: int64

In [21]:
# various repeated localities, maybe
# or quantized reporting
df['X-coordinate'].notnull().sum()

115978

In [22]:
df['X-coordinate'].nunique()

27989

In [23]:
df['Y-coordinate'].nunique()

27989

In [24]:
df['Latitude'].nunique()

37929

In [25]:
df['Longitude'].nunique()

37914

In [28]:
df['Location'][0]  # location is a lat/lon string

'(30.26167703, -97.71877525)'

In [26]:
df.head().T

Unnamed: 0,0,1,2,3,4
Incident Number,20155006575,20151891608,20155050011,20158004086,20155054719
Highest Offense Description,FORGERY AND PASSING,AUTO THEFT,THEFT,THEFT,DEBIT CARD ABUSE
Highest Offense Code,1000,700,600,600,1108
Family Violence,N,N,N,N,N
Occurred Date Time,2015-02-11 15:29:00,2015-07-07 17:50:00,2015-11-02 12:00:00,2015-11-12 00:30:00,2015-12-13 12:00:00
Occurred Date,02/11/2015,07/07/2015,11/02/2015,11/12/2015,12/13/2015
Occurred Time,1529,1750,1200,30,1200
Report Date Time,02/15/2015 03:29:00 PM,07/08/2015 10:01:00 PM,11/12/2015 04:27:00 PM,11/12/2015 03:18:00 AM,12/15/2015 02:28:00 PM
Report Date,02/15/2015,07/08/2015,11/12/2015,11/12/2015,12/15/2015
Report Time,1529,2201,1627,318,1428


There's no racial information :(

We will focus on the spatial information.