<h1>02 Pandas</h1>
$\newcommand{\Set}[1]{\{#1\}}$ 
$\newcommand{\Tuple}[1]{\langle#1\rangle}$ 
$\newcommand{\v}[1]{\pmb{#1}}$ 
$\newcommand{\cv}[1]{\begin{bmatrix}#1\end{bmatrix}}$ 
$\newcommand{\rv}[1]{[#1]}$ 
$\DeclareMathOperator{\argmax}{arg\,max}$ 
$\DeclareMathOperator{\argmin}{arg\,min}$ 
$\DeclareMathOperator{\dist}{dist}$
$\DeclareMathOperator{\abs}{abs}$

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<h1>Possible solution</h1>

In [3]:
df = pd.read_csv('../datasets/dataset_stop_and_searchB.csv')

In [4]:
df.shape

(169427, 6)

In [6]:
(df['Gender'] == 'Gender').sum()

9

In [8]:
df = df[df['Gender'] != 'Gender'].copy()

In [10]:
df.shape

(169418, 6)

In [12]:
df.to_csv("tmp.csv", index=False)

In [14]:
df.duplicated().sum()

163879

In [15]:
169427/9

18825.222222222223

<p>
    First I want to look at the Suspect-ethnicities.
</p>

In [5]:
df['Suspect-ethnicity'].unique()

array(['Other ethnic group - Not stated',
       'White - Any other White background',
       'Asian/Asian British - Indian',
       'Other ethnic group - Any other ethnic group',
       'Black/African/Caribbean/Black British - Any other Black/African/Caribbean background',
       'Black/African/Caribbean/Black British - African',
       'Black/African/Caribbean/Black British - Caribbean',
       'Mixed/Multiple ethnic groups - White and Black African',
       'White - English/Welsh/Scottish/Northern Irish/British',
       'Asian/Asian British - Any other Asian background',
       'Mixed/Multiple ethnic groups - Any other Mixed/Multiple ethnic background',
       'Asian/Asian British - Bangladeshi',
       'Asian/Asian British - Pakistani',
       'Mixed/Multiple ethnic groups - White and Black Caribbean', nan,
       'White - Irish', 'Mixed/Multiple ethnic groups - White and Asian',
       'Asian/Asian British - Chinese', 'Self-defined ethnicity'],
      dtype=object)

<p>
    I can then define a mask for whiteness based on the above.
</p

In [6]:
race_white = (df['Suspect-ethnicity'] == 'White - English/Welsh/Scottish/Northern Irish/British') | \
    (df['Suspect-ethnicity'] == 'White - Any other White background') | \
    (df['Suspect-ethnicity'] == 'White - Irish')

In [7]:
race_white

0         False
1         False
2         False
3         False
4         False
          ...  
169422    False
169423    False
169424    False
169425    False
169426    False
Name: Suspect-ethnicity, Length: 169427, dtype: bool

<p>
    So it is now easy to find out how many whites there are.
</p>

In [8]:
race_white.sum()

52039

<p>
    I can divide this by the number of rows, to find out the % of stop-and-searches are of white people.
    But, in fact, the total includes some people whose ethnicity is not known.
    So, let's count the number of rows, but excluding those.
</p>

In [9]:
race_known = (df['Suspect-ethnicity'] != 'Other ethnic group - Not stated') & \
   (~ df['Suspect-ethnicity'].isnull())

In [10]:
race_known.sum()

138300

<p>
    Now we can do the percentage.
</p>

In [11]:
race_white.sum() * 100 / race_known.sum()

37.62762111352133

<p>
    According to Wikipedia, the UK 2011 census had 81.9% of the population as white. According to https://www.indexmundi.com/united_kingdom/demographics_profile.html, it is 87.2%. According to https://en.wikipedia.org/wiki/Ethnic_groups_in_London, the 2011 census had 59.8% white in London.
</p>
<p>
    So somewhere between 60% and 87% of stops-and-searches should be white people. But only 38% are.
    Therefore, the Metropolitan Police are racist.
</p>

<p>
    What caveats would you place on your answer?
</p>

<p>
    We have made certain decisions about the mixed races: this affects the results. If we had treated them
    as white, it might look less stark.
</p>
<p>
    Our data about London is old (2011); our newer data is not about London.
</p>
<p>
    We could have broken it down by region.
</p>
<p>
    We should exclude those who committed a crime. These were legitimate interventions...
</p>
<ul>
    <li>Was the data sample (one year) representative? (Maybe there was something special about 2018/19)
    </li>
    <li>Are the %s we compared against correct? If there are fewer whites, then 
        there is no evidence of racism; if more whites, then there may even be an anti-white bias. 
        (Two of our figures are from 2011. The only recent figure is for the UK and not London.)
    </li>
    <li>What was the true situation with the people we ignored? We ignored 169427 - 138300 = 31,127! How would they change
        things?
    </li>
    <li>Were we right to treat just three categories as white. If we had treated more as white, then white
        stop-and-searches might rise.
    </li>
</ul>

In [None]:
df['Outcome'].unique()

In [None]:
unnec = df[df['Outcome'] == 'A no further action disposal'].copy()
race_white = (unnec['Suspect-ethnicity'] == 'White - English/Welsh/Scottish/Northern Irish/British') | \
    (unnec['Suspect-ethnicity'] == 'White - Any other White background') | \
    (unnec['Suspect-ethnicity'] == 'White - Irish')
race_known = (unnec['Suspect-ethnicity'] != 'Other ethnic group - Not stated') & \
   (~ unnec['Suspect-ethnicity'].isnull())

In [None]:
race_white.sum() * 100 / race_known.sum()

<p>
    No difference!
</p>