### The Stanford Data Project Analysis - Nashville, TN
On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.

### 1. Purpose-of-Analysis
The purpose of the analysis is kind of hollistic approach to explore what we can get out of the data, so I am not sure if there is a specific purpose or question. I will insert all the questions that come to my mind below and will update this notebook frequently. So there are couple of questions we are trying to answer here:
#### 1.1. Is being arrested a gender dependeant?
#### 1.2 Is being arrested a race dependeant? 
#### 1.3 How the race distribution looks like comparing with the actual city race distribution?
#### 1.4. Is the weather a playing factor of increasing/decreasing the tickets rates?
#### 1.5 how the spatial distribution of the tickets looks  like?
#### 1.6 How the spatial distribution of the tickets and gender looks like?
#### 1.7 Do girls make violations at a specific time of the day?
#### 1.8 How the type of violation is distributed around the city?
#### 1.9 Are there any places in the city where speeding is the most common violation?

### 2. Data aquisition
The data of the project is in a form of compressed file hosted online, it will be downloaded and extracted to the project directory. You can always host the data anywhere else and change the pointer in the read csv  line

In [1]:
import datetime
import pandas as pd
import requests, zipfile, io

In [None]:
url="https://stacks.stanford.edu/file/druid:hp256wp2687/hp256wp2687_tn_nashville_2019_08_13.csv.zip"

In [None]:
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
tn_raw=pd.read_csv(z.namelist()[0], low_memory=False)

In [2]:
tn_raw=pd.read_csv('tn_nashville_2019_08_13.csv', parse_dates=[['date', 'time']])

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
tn=tn_raw.copy()

### 3. Data Exploration and Cleaning
In this step we will select the column of interest and  will drop all the na values in these columns. We will also drop the columns that we won't need or use.

In [5]:
tn.head()

Unnamed: 0,date_time,raw_row_number,location,lat,lng,precinct,reporting_area,zone,subject_age,subject_race,...,raw_traffic_citation_issued,raw_misd_state_citation_issued,raw_suspect_ethnicity,raw_driver_searched,raw_passenger_searched,raw_search_consent,raw_search_arrest,raw_search_warrant,raw_search_inventory,raw_search_plain_view
0,2010-10-10 nan,232947,"DOMINICAN DR & ROSA L PARKS BLVD, NASHVILLE, T...",36.187925,-86.798519,6.0,4403.0,611.0,27.0,black,...,False,,N,False,False,False,False,False,False,False
1,2010-10-10 10:00:00,237161,"1122 LEBANON PIKE, NASHVILLE, TN, 37210",36.155521,-86.735902,5.0,9035.0,513.0,18.0,white,...,True,,N,False,False,False,False,False,False,False
2,2010-10-10 10:00:00,232902,"898 DAVIDSON DR, , TN, 37205",36.11742,-86.895593,1.0,5005.0,121.0,52.0,white,...,False,,N,False,False,False,False,False,False,False
3,2010-10-10 22:00:00,233219,"MURFREESBORO PIKE & NASHBORO BLVD, ANTIOCH, TN...",36.086799,-86.648581,3.0,8891.0,325.0,25.0,white,...,False,,N,False,False,False,False,False,False,False
4,2010-10-10 01:00:00,232780,"BUCHANAN ST, NORTH, TN, 37208",36.180038,-86.809109,,,,21.0,black,...,False,,N,True,True,False,False,False,False,False


In [6]:
tn.shape

(3092351, 41)

In [7]:
tn.columns

Index(['date_time', 'raw_row_number', 'location', 'lat', 'lng', 'precinct',
       'reporting_area', 'zone', 'subject_age', 'subject_race', 'subject_sex',
       'officer_id_hash', 'type', 'violation', 'arrest_made',
       'contraband_drugs', 'contraband_weapons', 'frisk_performed',
       'search_conducted', 'search_person', 'search_vehicle', 'search_basis',
       'reason_for_stop', 'vehicle_registration_state', 'notes',
       'raw_traffic_citation_issued', 'raw_misd_state_citation_issued',
       'raw_suspect_ethnicity', 'raw_driver_searched',
       'raw_passenger_searched', 'raw_search_consent', 'raw_search_arrest',
       'raw_search_warrant', 'raw_search_inventory', 'raw_search_plain_view'],
      dtype='object')

#### Removing the uneeded columns

In [8]:
tn=tn.iloc[:,[0,1,2,4,5,9,10,11,13,14,15,16,17,18,19,20,21,22,23,24,]]

In [9]:
tn.columns

Index(['date_time', 'raw_row_number', 'location', 'lng', 'precinct',
       'subject_race', 'subject_sex', 'officer_id_hash', 'violation',
       'contraband_found', 'contraband_drugs', 'contraband_weapons',
       'frisk_performed', 'search_conducted', 'search_person',
       'search_vehicle'],
      dtype='object')

#### Checking for the nulls in each column

In [10]:
tn.isnull().sum()

date_time                   0
raw_row_number              0
location                    0
lng                    187106
precinct               390222
subject_race             1850
subject_sex             12822
officer_id_hash            11
violation                8020
arrest_made                28
citation_issued           320
outcome                  1935
contraband_found      2964646
contraband_drugs      2964646
contraband_weapons    2964646
frisk_performed            22
search_conducted           39
search_person              43
search_vehicle             41
dtype: int64

#### Setting Date as index

In [11]:
tn.set_index('date_time',inplace=True)

In [12]:
tn.head()

Unnamed: 0_level_0,raw_row_number,location,lng,precinct,subject_race,subject_sex,officer_id_hash,violation,arrest_made,citation_issued,warning_issued,outcome,contraband_found,contraband_drugs,contraband_weapons,frisk_performed,search_conducted,search_person,search_vehicle
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2010-10-10 nan,232947,"DOMINICAN DR & ROSA L PARKS BLVD, NASHVILLE, T...",-86.798519,6.0,black,male,80ed1b32eb,investigative stop,False,False,True,warning,,,,False,False,False,False
2010-10-10 10:00:00,237161,"1122 LEBANON PIKE, NASHVILLE, TN, 37210",-86.735902,5.0,white,male,a983204b21,moving traffic violation,False,True,False,citation,,,,False,False,False,False
2010-10-10 10:00:00,232902,"898 DAVIDSON DR, , TN, 37205",-86.895593,1.0,white,male,f5d8fbd78b,vehicle equipment violation,False,False,True,warning,,,,False,False,False,False
2010-10-10 22:00:00,233219,"MURFREESBORO PIKE & NASHBORO BLVD, ANTIOCH, TN...",-86.648581,3.0,white,male,4f1d028e45,registration,False,False,True,warning,,,,False,False,False,False
2010-10-10 01:00:00,232780,"BUCHANAN ST, NORTH, TN, 37208",-86.809109,,black,male,0f0e6b7d67,vehicle equipment violation,False,False,True,warning,False,False,False,False,True,True,True


#### 1.1. Is being arrested a gender dependeant?

In [None]:
tn.subject_sex.sum()