### The Stanford Data Project Analysis - Nashville, TN
On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country. Our goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public.

### 1. Purpose-of-Analysis
The purpose of the analysis is kind of hollistic approach to explore what we can get out of the data, so I am not sure if there is a specific purpose or question. I will insert all the questions that come to my mind below and will update this notebook frequently. So there are couple of questions we are trying to answer here:
#### 1.1. Is being arrested a gender dependeant?
#### 1.2 Is being arrested a race dependeant? 
#### 1.3 How the race distribution looks like comparing with the actual city race distribution?
#### 1.4. Is the weather a playing factor of increasing/decreasing the tickets rates?
#### 1.5 how the spatial distribution of the tickets looks  like?
#### 1.6 How the spatial distribution of the tickets and gender looks like?
#### 1.7 Do girls make violations at a specific time of the day?
#### 1.8 How the type of violation is distributed around the city?
#### 1.9 Are there any places in the city where speeding is the most common violation?

### 2. Data aquisition
The data of the project is in a form of compressed file hosted online, it will be downloaded and extracted to the project directory. You can always host the data anywhere else and change the pointer in the read csv  line

In [5]:
import pandas as pd
import requests, zipfile, io

In [6]:
url="https://stacks.stanford.edu/file/druid:hp256wp2687/hp256wp2687_tn_nashville_2019_08_13.csv.zip"

In [3]:
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
tn_raw=pd.read_csv(z.namelist()[0], low_memory=False)

In [12]:
tn_raw=pd.read_csv('tn_nashville_2019_08_13.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [13]:
tn=tn_raw.copy()

### 3. Data Exploration and Cleaning
In this step we will select the column of interest and  will drop all the na values in these columns. We will also drop the columns that we won't need or use.

In [14]:
tn.head()

Unnamed: 0,raw_row_number,date,time,location,lat,lng,precinct,reporting_area,zone,subject_age,...,raw_traffic_citation_issued,raw_misd_state_citation_issued,raw_suspect_ethnicity,raw_driver_searched,raw_passenger_searched,raw_search_consent,raw_search_arrest,raw_search_warrant,raw_search_inventory,raw_search_plain_view
0,232947,2010-10-10,,"DOMINICAN DR & ROSA L PARKS BLVD, NASHVILLE, T...",36.187925,-86.798519,6.0,4403.0,611.0,27.0,...,False,,N,False,False,False,False,False,False,False
1,237161,2010-10-10,10:00:00,"1122 LEBANON PIKE, NASHVILLE, TN, 37210",36.155521,-86.735902,5.0,9035.0,513.0,18.0,...,True,,N,False,False,False,False,False,False,False
2,232902,2010-10-10,10:00:00,"898 DAVIDSON DR, , TN, 37205",36.11742,-86.895593,1.0,5005.0,121.0,52.0,...,False,,N,False,False,False,False,False,False,False
3,233219,2010-10-10,22:00:00,"MURFREESBORO PIKE & NASHBORO BLVD, ANTIOCH, TN...",36.086799,-86.648581,3.0,8891.0,325.0,25.0,...,False,,N,False,False,False,False,False,False,False
4,232780,2010-10-10,01:00:00,"BUCHANAN ST, NORTH, TN, 37208",36.180038,-86.809109,,,,21.0,...,False,,N,True,True,False,False,False,False,False


In [15]:
tn.shape

(3092351, 42)

In [18]:
pd.DataFrame(tn.columns)

Unnamed: 0,0
0,raw_row_number
1,date
2,time
3,location
4,lat
5,lng
6,precinct
7,reporting_area
8,zone
9,subject_age


#### Removing the uneeded columns

In [19]:
tn=tn.iloc[:,[1,2,4,5,9,10,11,13,14,15,16,17,18,19,20,21,22,23,24,]]

In [21]:
tn.columns

Index(['date', 'time', 'lat', 'lng', 'subject_age', 'subject_race',
       'subject_sex', 'type', 'violation', 'arrest_made', 'citation_issued',
       'contraband_weapons', 'frisk_performed', 'search_conducted',
       'search_person'],
      dtype='object')

#### Converting the date/time column to the correct data format

In [25]:
tn['date']=pd.to_datetime(tn['date'])
tn['time']=pd.to_datetime(tn['time'])
tn.dtypes

date                  datetime64[ns]
time                  datetime64[ns]
lat                          float64
lng                          float64
subject_age                  float64
subject_race                  object
subject_sex                   object
type                          object
violation                     object
arrest_made                   object
citation_issued               object
outcome                       object
contraband_found              object
contraband_drugs              object
contraband_weapons            object
frisk_performed               object
search_conducted              object
search_person                 object
dtype: object

#### Checking for the nulls in each column

In [27]:
tn.isnull().sum()

date                        0
time                     5467
lat                    187106
lng                    187106
subject_age               839
subject_race             1850
subject_sex             12822
type                        0
violation                8020
arrest_made                28
citation_issued           320
outcome                  1935
contraband_found      2964646
contraband_drugs      2964646
contraband_weapons    2964646
frisk_performed            22
search_conducted           39
search_person              43
dtype: int64

#### Setting Date as index

In [28]:
tn.set_index('date',inplace=True)