___

<p style='text-align: center;'><img src='https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV' class='img-fluid' alt='CLRSWY'></p>

___

<h1><p style='text-align: center;'>Traffic Police Stops<br>Part - 2</p><h1> - Exploring The Relationship Between Gender And Policing <img src='https://docs.google.com/uc?id=17CPCwi3_VvzcS87TOsh4_U8eExOhL6Ki' class='img-fluid' alt='CLRSWY' width='200' height='100'> 

Does the ``gender`` of a driver have an impact on police behavior during a traffic stop? **In this chapter**, you will explore that question while practicing filtering, grouping, method chaining, Boolean math, string methods, and more!

***

## Examining traffic violations

Before comparing the violations being committed by each gender, you should examine the ``violations`` committed by all drivers to get a baseline understanding of the data.

In this exercise, you'll count the unique values in the ``violation`` column, and then separately express those counts as proportions.

> Before starting your work in this section **repeat the steps which you did in the previos chapter for preparing the data** or **use the csv file you created at the end of chapter-1.*** Continue to this chapter based on where you were in the end of the previous chapter.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy import math

In [2]:
df= pd.read_csv('RI_Part_1') 

  exec(code_obj, self.user_global_ns, self.user_ns)


In [3]:
df.head()

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_conducted,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,RI-2005-00001,Zone K1,600,M,1985.0,20.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,RI-2005-00002,Zone X4,500,M,1987.0,18.0,W,White,Speeding,...,False,,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 12:55:00,RI-2005-00004,Zone X4,500,M,1986.0,19.0,W,White,Equipment/Inspection Violation,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
3,2005-01-06 01:30:00,RI-2005-00005,Zone X4,500,M,1978.0,27.0,B,Black,Equipment/Inspection Violation,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-12 08:05:00,RI-2005-00006,Zone X1,0,M,1973.0,32.0,B,Black,Call for Service,...,False,,,False,Citation,False,30+ Min,True,False,Zone X1


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 480584 entries, 0 to 480583
Data columns (total 21 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   stop_datetime       480584 non-null  object 
 1   id                  480584 non-null  object 
 2   location_raw        480584 non-null  object 
 3   police_department   480584 non-null  object 
 4   driver_gender       480584 non-null  object 
 5   driver_age_raw      480583 non-null  float64
 6   driver_age          478946 non-null  float64
 7   driver_race_raw     480584 non-null  object 
 8   driver_race         480584 non-null  object 
 9   violation_raw       480584 non-null  object 
 10  violation           480584 non-null  object 
 11  search_conducted    480584 non-null  bool   
 12  search_type_raw     17762 non-null   object 
 13  search_type         17762 non-null   object 
 14  contraband_found    480584 non-null  bool   
 15  stop_outcome        480584 non-nul

**INSTRUCTIONS**

*   Count the unique values in the ``violation`` column, to see what violations are being committed by all drivers.
*   Express the violation counts as proportions of the total.

In [18]:
df.violation.value_counts()

Speeding               268736
Moving violation        90228
Equipment               61250
Other                   24216
Registration/plates     19830
Seat belt               16324
Name: violation, dtype: int64

In [5]:
round(df.violation.value_counts(normalize = True), 2)

Speeding               0.56
Moving violation       0.19
Equipment              0.13
Other                  0.05
Registration/plates    0.04
Seat belt              0.03
Name: violation, dtype: float64

***

## Comparing violations by gender

The question we're trying to answer is whether male and female drivers tend to commit different types of traffic violations.

You'll first create a ``DataFrame`` for each gender, and then analyze the ``violations`` in each ``DataFrame`` separately.

**INSTRUCTIONS**

*   Create a ``DataFrame``, female, that only contains rows in which ``driver_gender`` is ``'F'``.
*   Create a ``DataFrame``, male, that only contains rows in which ``driver_gender`` is ``'M'``.
*   Count the ``violations`` committed by female drivers and express them as proportions.
*   Count the violations committed by male drivers and express them as proportions.

In [16]:
df.driver_gender.value_counts()

M    349446
F    131138
Name: driver_gender, dtype: int64

In [7]:
df_F = df[df['driver_gender'] == 'F']

In [8]:
df_M = df[df['driver_gender'] == 'M']

In [9]:
df_F

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_conducted,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
14,2005-02-24 01:20:00,RI-2005-00016,Zone X3,200,F,1983.0,22.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone X3
16,2005-03-14 10:00:00,RI-2005-00019,Zone K3,300,F,1984.0,21.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K3
23,2005-03-29 23:20:00,RI-2005-00026,Zone K3,300,F,1971.0,34.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K3
32,2005-06-06 13:20:00,RI-2005-00035,Zone X4,500,F,1986.0,19.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
33,2005-06-18 16:30:00,RI-2005-00037,Zone X4,500,F,1964.0,41.0,W,White,Other Traffic Violation,...,False,,,False,Arrest Driver,True,30+ Min,False,False,Zone X4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
480570,2015-12-31 21:59:00,RI-2015-47051,Zone K3,300.0,F,1994.0,21.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K3
480572,2015-12-31 22:09:00,RI-2015-47054,Zone K3,300.0,F,1992.0,23.0,H,Hispanic,Equipment/Inspection Violation,...,False,,,False,Warning,False,0-15 Min,False,False,Zone K3
480573,2015-12-31 22:10:00,RI-2015-47055,Zone X4,300.0,F,1989.0,26.0,L,Hispanic,Other Traffic Violation,...,False,,,False,Warning,False,0-15 Min,False,False,Zone X4
480574,2015-12-31 22:10:00,RI-2015-47056,Zone X4,300.0,F,1989.0,26.0,L,Hispanic,Other Traffic Violation,...,False,,,False,Arrest Driver,True,0-15 Min,False,False,Zone X4


In [10]:
df_M

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_conducted,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,RI-2005-00001,Zone K1,600,M,1985.0,20.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,RI-2005-00002,Zone X4,500,M,1987.0,18.0,W,White,Speeding,...,False,,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 12:55:00,RI-2005-00004,Zone X4,500,M,1986.0,19.0,W,White,Equipment/Inspection Violation,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
3,2005-01-06 01:30:00,RI-2005-00005,Zone X4,500,M,1978.0,27.0,B,Black,Equipment/Inspection Violation,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-12 08:05:00,RI-2005-00006,Zone X1,0,M,1973.0,32.0,B,Black,Call for Service,...,False,,,False,Citation,False,30+ Min,True,False,Zone X1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
480579,2015-12-31 22:46:00,RI-2015-47061,Zone X1,0.0,M,1959.0,56.0,H,Hispanic,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone X1
480580,2015-12-31 22:47:00,RI-2015-47062,Zone X4,500.0,M,1988.0,27.0,W,White,Registration Violation,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
480581,2015-12-31 23:08:00,RI-2015-47063,Zone X3,200.0,M,1980.0,35.0,H,Hispanic,Equipment/Inspection Violation,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X3
480582,2015-12-31 23:44:00,RI-2015-47064,Zone K2,900.0,M,1984.0,31.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K2


In [11]:
round(df_F.violation.value_counts(normalize = True), 2)

Speeding               0.66
Moving violation       0.14
Equipment              0.11
Registration/plates    0.04
Other                  0.03
Seat belt              0.03
Name: violation, dtype: float64

In [12]:
round(df_M.violation.value_counts(normalize = True), 2)

Speeding               0.52
Moving violation       0.21
Equipment              0.14
Other                  0.06
Registration/plates    0.04
Seat belt              0.04
Name: violation, dtype: float64

***

## Comparing speeding outcomes by gender

When a driver is pulled over for speeding, many people believe that gender has an impact on whether the driver will receive a ticket or a warning. Can you find evidence of this in the dataset?

First, you'll create two ``DataFrames`` of drivers who were stopped for ``speeding``: one containing ***females*** and the other containing ***males***.

Then, for each **gender**, you'll use the ``stop_outcome`` column to calculate what percentage of stops resulted in a ``'Citation'`` (meaning a ticket) versus a ``'Warning'``.

**INSTRUCTIONS**

*   Create a ``DataFrame``, ``female_and_speeding``, that only includes female drivers who were stopped for speeding.
*   Create a ``DataFrame``, ``male_and_speeding``, that only includes male drivers who were stopped for speeding.
*   Count the **stop outcomes** for the female drivers and express them as proportions.
*   Count the **stop outcomes** for the male drivers and express them as proportions.

In [13]:
female_and_speeding=df[(df.driver_gender == "F") & (df.violation == "Speeding")]

In [14]:
female_and_speeding

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_conducted,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
14,2005-02-24 01:20:00,RI-2005-00016,Zone X3,200,F,1983.0,22.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone X3
16,2005-03-14 10:00:00,RI-2005-00019,Zone K3,300,F,1984.0,21.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K3
23,2005-03-29 23:20:00,RI-2005-00026,Zone K3,300,F,1971.0,34.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K3
32,2005-06-06 13:20:00,RI-2005-00035,Zone X4,500,F,1986.0,19.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X4
34,2005-07-06 11:22:00,RI-2005-00038,Zone X1,0,F,1973.0,32.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone X1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
480547,2015-12-31 19:44:00,RI-2015-47028,Zone K2,500.0,F,1969.0,46.0,W,White,Speeding,...,False,,,False,Warning,False,0-15 Min,False,False,Zone K2
480550,2015-12-31 20:05:00,RI-2015-47031,Zone K2,900.0,F,1996.0,19.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K2
480560,2015-12-31 20:42:00,RI-2015-47041,Zone K2,900.0,F,1978.0,37.0,H,Hispanic,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K2
480569,2015-12-31 21:47:00,RI-2015-47050,Zone K3,300.0,F,1996.0,19.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K3


In [15]:
male_and_speeding=df[(df.driver_gender == "M") & (df.violation == "Speeding")]

In [16]:
male_and_speeding

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_conducted,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,RI-2005-00001,Zone K1,600,M,1985.0,20.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,RI-2005-00002,Zone X4,500,M,1987.0,18.0,W,White,Speeding,...,False,,,False,Citation,False,16-30 Min,False,False,Zone X4
5,2005-01-18 08:15:00,RI-2005-00007,Zone K3,300,M,1965.0,40.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K3
7,2005-01-23 23:15:00,RI-2005-00009,Zone K3,300,M,1972.0,33.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K3
8,2005-01-24 20:32:00,RI-2005-00010,Zone K1,600,M,1987.0,18.0,W,White,Speeding,...,True,Probable Cause,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
480563,2015-12-31 20:50:00,RI-2015-47044,Zone K2,500.0,M,1994.0,21.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,False,False,Zone K2
480568,2015-12-31 21:42:00,RI-2015-47049,Zone X1,0.0,M,1993.0,22.0,W,White,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone X1
480576,2015-12-31 22:26:00,RI-2015-47058,Zone K3,300.0,M,1964.0,51.0,I,Asian,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone K3
480579,2015-12-31 22:46:00,RI-2015-47061,Zone X1,0.0,M,1959.0,56.0,H,Hispanic,Speeding,...,False,,,False,Citation,False,0-15 Min,True,False,Zone X1


In [19]:
female_and_speeding.stop_outcome.value_counts(normalize= True)

Citation            0.953247
Arrest Driver       0.005290
Arrest Passenger    0.001033
N/D                 0.000905
No Action           0.000522
Name: stop_outcome, dtype: float64

In [20]:
male_and_speeding.stop_outcome.value_counts(normalize= True)

Citation            0.944636
Arrest Driver       0.015767
Arrest Passenger    0.001265
N/D                 0.001183
No Action           0.001063
Name: stop_outcome, dtype: float64

***

## Calculating the search rate

During a traffic stop, the police officer sometimes conducts a search of the vehicle. In this exercise, you'll calculate the percentage of all stops that result in a vehicle search, also known as the **search rate**.

**INSTRUCTIONS**

*   Check the data type of ``search_conducted`` to confirm that it's a ``Boolean Series``.
*   Calculate the search rate by counting the ``Series`` values and expressing them as proportions.
*   Calculate the search rate by taking the mean of the ``Series``. (It should match the proportion of ``True`` values calculated above.)

In [23]:
df.search_conducted.dtype

dtype('bool')

In [26]:
df.search_conducted.value_counts()

False    462822
True      17762
Name: search_conducted, dtype: int64

In [27]:
df.search_conducted.value_counts(normalize=True)

False    0.963041
True     0.036959
Name: search_conducted, dtype: float64

In [28]:
df.search_conducted.mean()

0.036959199640437465

***

## Comparing search rates by gender

You'll compare the rates at which **female** and **male** drivers are searched during a traffic stop. Remember that the vehicle search rate across all stops is about **3.8%**.

First, you'll filter the ``DataFrame`` by gender and calculate the search rate for each group separately. Then, you'll perform the same calculation for both genders at once using a ``.groupby()``.

**INSTRUCTIONS 1/3**

*   Filter the ``DataFrame`` to only include **female** drivers, and then calculate the search rate by taking the mean of ``search_conducted``.

In [29]:
df_F.search_conducted.mean()

0.018751239152648355

In [34]:
df[df['driver_gender']== 'F'].search_conducted.mean()

0.018751239152648355

**INSTRUCTIONS 2/3**

*   Filter the ``DataFrame`` to only include **male** drivers, and then repeat the search rate calculation.

In [35]:
df[df['driver_gender']== 'M'].search_conducted.mean()

0.04379217389811301

**INSTRUCTIONS 3/3**

*   Group by driver gender to calculate the search rate for both groups simultaneously. (It should match the previous results.)

In [44]:
df.groupby('driver_gender').search_conducted.sum()

driver_gender
F     2459
M    15303
Name: search_conducted, dtype: int64

In [43]:
df.groupby('driver_gender').search_conducted.mean()

driver_gender
F    0.018751
M    0.043792
Name: search_conducted, dtype: float64

***

## Adding a second factor to the analysis

Even though the search rate for males is much higher than for females, it's possible that the difference is mostly due to a second factor.

For example, you might hypothesize that the search rate varies by violation type, and the difference in search rate between males and females is because they tend to commit different violations.

You can test this hypothesis by examining the search rate for each combination of gender and violation. If the hypothesis was true, you would find that males and females are searched at about the same rate for each violation. Find out below if that's the case!

**INSTRUCTIONS 1/2**

*   Use a ``.groupby()`` to calculate the search rate for each combination of gender and violation. Are males and females searched at about the same rate for each violation?

In [41]:
df.groupby(['driver_gender','violation']).search_conducted.mean()

driver_gender  violation          
F              Equipment              0.040245
               Moving violation       0.038021
               Other                  0.045898
               Registration/plates    0.054700
               Seat belt              0.017746
               Speeding               0.007738
M              Equipment              0.070916
               Moving violation       0.059156
               Other                  0.046120
               Registration/plates    0.103589
               Seat belt              0.031705
               Speeding               0.026630
Name: search_conducted, dtype: float64

**INSTRUCTIONS 2/2**

*   Reverse the ordering to group by violation before gender. The results may be easier to compare when presented this way.

In [42]:
df.groupby(['violation' , 'driver_gender']).search_conducted.mean()

violation            driver_gender
Equipment            F                0.040245
                     M                0.070916
Moving violation     F                0.038021
                     M                0.059156
Other                F                0.045898
                     M                0.046120
Registration/plates  F                0.054700
                     M                0.103589
Seat belt            F                0.017746
                     M                0.031705
Speeding             F                0.007738
                     M                0.026630
Name: search_conducted, dtype: float64

***

## Counting protective frisks

During a vehicle search, the police officer may pat down the driver to check if they have a weapon. This is known as a ``'protective frisk.'``

You'll first check to see how many times 'Protective Frisk' was the only search type. Then, you'll use a string method to locate all instances in which the driver was frisked.

**INSTRUCTIONS**

*   Count the ``search_type`` values to see how many times ``'Protective Frisk'`` was the only search type.
*   Create a new column, frisk, that is ``True`` if ``search_type`` contains the string ``'Protective Frisk'`` and ``False`` otherwise.
*   Check the data type of frisk to confirm that it's a ``Boolean Series``.
*   Take the sum of frisk to count the total number of frisks.

In [47]:
df.search_type.value_counts()

Incident to Arrest                                          6998
Probable Cause                                              4989
Reasonable Suspicion                                        1141
Inventory                                                   1101
Protective Frisk                                             879
Incident to Arrest,Inventory                                 649
Incident to Arrest,Probable Cause                            552
Probable Cause,Reasonable Suspicion                          334
Probable Cause,Protective Frisk                              221
Incident to Arrest,Protective Frisk                          158
Incident to Arrest,Inventory,Probable Cause                  151
Inventory,Probable Cause                                     132
Protective Frisk,Reasonable Suspicion                         83
Incident to Arrest,Inventory,Protective Frisk                 77
Incident to Arrest,Probable Cause,Protective Frisk            74
Inventory,Protective Fris

In [75]:
df['frisk']= df.search_type== 'Protective Frisk'

In [76]:
df.head(2)

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district,frisk
0,2005-01-02 01:55:00,RI-2005-00001,Zone K1,600,M,1985.0,20.0,W,White,Speeding,...,,,False,Citation,False,0-15 Min,False,False,Zone K1,False
1,2005-01-02 20:30:00,RI-2005-00002,Zone X4,500,M,1987.0,18.0,W,White,Speeding,...,,,False,Citation,False,16-30 Min,False,False,Zone X4,False


In [63]:
df.search_type== 'Protective Frisk'

0         False
1         False
2         False
3         False
4         False
          ...  
480579    False
480580    False
480581    False
480582    False
480583    False
Name: search_type, Length: 480584, dtype: bool

In [62]:
 # alternative 
df.search_type.str.contains("Protective Frisk", na = False)    

0         False
1         False
2         False
3         False
4         False
          ...  
480579    False
480580    False
480581    False
480582    False
480583    False
Name: search_type, Length: 480584, dtype: bool

In [77]:
df['frisk'].value_counts()

False    479705
True        879
Name: frisk, dtype: int64

In [78]:
df['frisk'].value_counts(normalize= True)

False    0.998171
True     0.001829
Name: frisk, dtype: float64

In [79]:
df['frisk'].dtype

dtype('bool')

In [80]:
df['frisk'].sum()

879

In [84]:
df.search_type.isin(["Protective Frisk"]).mean()

0.0018290246866312654

In [83]:
df.search_type.isin(["Protective Frisk"]).sum()

879

***

## Comparing frisk rates by gender

You'll compare the rates at which female and male drivers are frisked during a search. Are males frisked more often than females, perhaps because police officers consider them to be higher risk?

Before doing any calculations, it's important to filter the ``DataFrame`` to only include the relevant subset of data, namely stops in which a search was conducted.

**INSTRUCTIONS**

*   Create a ``DataFrame``, searched, that only contains rows in which ``search_conducted`` is ``True``.
*   Take the mean of the frisk column to find out what percentage of searches included a frisk.
*   Calculate the frisk rate for each gender using a ``.groupby()``.

In [65]:
df.search_conducted.value_counts()

False    462822
True      17762
Name: search_conducted, dtype: int64

In [89]:
search_cond= df[df.search_conducted== True]

In [90]:
search_cond.head(2)

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district,frisk
8,2005-01-24 20:32:00,RI-2005-00010,Zone K1,600,M,1987.0,18.0,W,White,Speeding,...,Probable Cause,Probable Cause,True,Citation,False,0-15 Min,True,True,Zone K1,False
9,2005-02-09 03:05:00,RI-2005-00011,Zone X4,500,M,1976.0,29.0,W,White,Registration Violation,...,"Probable Cause,Terry Frisk","Probable Cause,Protective Frisk",False,Citation,False,0-15 Min,False,False,Zone X4,False


In [91]:
search_cond.search_type.unique()

array(['Probable Cause', 'Probable Cause,Protective Frisk',
       'Incident to Arrest,Protective Frisk', 'Incident to Arrest',
       'Protective Frisk,Reasonable Suspicion',
       'Probable Cause,Reasonable Suspicion',
       'Incident to Arrest,Probable Cause',
       'Incident to Arrest,Inventory,Probable Cause', 'Inventory',
       'Incident to Arrest,Inventory', 'Protective Frisk',
       'Inventory,Probable Cause', 'Reasonable Suspicion',
       'Probable Cause,Protective Frisk,Reasonable Suspicion',
       'Inventory,Protective Frisk',
       'Incident to Arrest,Inventory,Protective Frisk',
       'Incident to Arrest,Reasonable Suspicion',
       'Incident to Arrest,Probable Cause,Protective Frisk',
       'Inventory,Probable Cause,Protective Frisk',
       'Inventory,Reasonable Suspicion',
       'Incident to Arrest,Inventory,Reasonable Suspicion',
       'Incident to Arrest,Probable Cause,Reasonable Suspicion',
       'Incident to Arrest,Protective Frisk,Reasonable Suspicion

In [92]:
search_cond['frisk'].mean()

0.04948767030739781

In [93]:
search_cond.search_type.isin(["Protective Frisk"]).mean()

0.04948767030739781

In [98]:
search_cond.groupby('driver_gender').frisk.value_counts(normalize= True)

driver_gender  frisk
F              False    0.957706
               True     0.042294
M              False    0.949356
               True     0.050644
Name: frisk, dtype: float64

In [99]:
search_cond.groupby('driver_gender').frisk.mean()

driver_gender
F    0.042294
M    0.050644
Name: frisk, dtype: float64

In [104]:
df.shape

(480584, 22)

___

<p style='text-align: center;'><img src='https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV' class='img-fluid' alt='CLRSWY'></p>

___

**- Please save the edits made so far to a new csv file with the name RI_Part_2 to use it in the next step.**

**- Load and read new csv file.**

**- Check the first five rows.***

**- Check the shape of the dataframe.**

In [108]:
df.to_csv('RI_Part_2', index= False)

In [109]:
df1= pd.read_csv('RI_Part_2') 

  exec(code_obj, self.user_global_ns, self.user_ns)


In [110]:
df1.head()

Unnamed: 0,stop_datetime,id,location_raw,police_department,driver_gender,driver_age_raw,driver_age,driver_race_raw,driver_race,violation_raw,...,search_type_raw,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district,frisk
0,2005-01-02 01:55:00,RI-2005-00001,Zone K1,600,M,1985.0,20.0,W,White,Speeding,...,,,False,Citation,False,0-15 Min,False,False,Zone K1,False
1,2005-01-02 20:30:00,RI-2005-00002,Zone X4,500,M,1987.0,18.0,W,White,Speeding,...,,,False,Citation,False,16-30 Min,False,False,Zone X4,False
2,2005-01-04 12:55:00,RI-2005-00004,Zone X4,500,M,1986.0,19.0,W,White,Equipment/Inspection Violation,...,,,False,Citation,False,0-15 Min,False,False,Zone X4,False
3,2005-01-06 01:30:00,RI-2005-00005,Zone X4,500,M,1978.0,27.0,B,Black,Equipment/Inspection Violation,...,,,False,Citation,False,0-15 Min,False,False,Zone X4,False
4,2005-01-12 08:05:00,RI-2005-00006,Zone X1,0,M,1973.0,32.0,B,Black,Call for Service,...,,,False,Citation,False,30+ Min,True,False,Zone X1,False


In [111]:
df1.shape

(480584, 22)