

# Is COMPAS fair? (48pt)

In [1]:
# Import necessary packages
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

    The first task is to analyze fairness of the COMPAS algorithm. As the algorithm is proprietary, you cannot use this to make predictions. But you do not need to predict anything anyway–the COMPAS predictions are already done and included as decile_score variable!
    Your task are the following:

    1. (1pt) Load the COMPAS data, and perform the basic checks.

In [2]:
# Load raw COMPAS data and perform basic checks
raw_df = pd.read_csv('/home/jovyan/PS/data/compas-score-data.csv', sep='\t')
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6172 entries, 0 to 6171
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   age              6172 non-null   int64 
 1   c_charge_degree  6172 non-null   object
 2   race             6172 non-null   object
 3   age_cat          6172 non-null   object
 4   sex              6172 non-null   object
 5   priors_count     6172 non-null   int64 
 6   decile_score     6172 non-null   int64 
 7   two_year_recid   6172 non-null   int64 
dtypes: int64(4), object(4)
memory usage: 385.9+ KB


    2. (1pt) Filter the data to keep only Caucasian and African-Americans. There are just too few offenders of other races.

    COMPAS categorizes offenders into 10 different categories, starting from 1 (least likely to recidivate) till 10 (most likely). But for simplicity, we scale this down to two categories (low risk/high risk) only.

In [3]:
# Filter the data to keep only Caucasian and African-American
compas_df = raw_df[(raw_df.race == "African-American") | (raw_df.race == "Caucasian")]
compas_df.head(5)

Unnamed: 0,age,c_charge_degree,race,age_cat,sex,priors_count,decile_score,two_year_recid
1,34,F,African-American,25 - 45,Male,0,3,1
2,24,F,African-American,Less than 25,Male,4,4,1
4,41,F,Caucasian,25 - 45,Male,14,6,1
6,39,M,Caucasian,25 - 45,Female,0,1,0
7,27,F,Caucasian,25 - 45,Male,0,4,0


    3. (2pt) Create a new dummy variable based off of COMPAS risk score (decile_score), which indicates if an individual was classified as low risk (score 1-4) or high risk (score 5-10).
    Hint: you can proceed in different ways but for technical reasons related the tasks below, the best way to do it is to create a variable “high score”, that takes values 1 (decile score 5 and above) and 0 (decile score 1-4).

In [4]:
# Create a copy to avoid warning
compas_df = compas_df.copy()
# Make a new variable with category of risk level with inequalities
# Low risk = 1-4 (labeled as 0), high risk = 5-10 (labeled as 1)
compas_df["risk_category"] = pd.cut(compas_df.decile_score,
                                    bins = [-np.inf, 5, np.inf],
                                    labels = [0, 1],
                                    right = False)

    4. (6pt) Now analyze the offenders across this new risk category:
    (a) What is the recidivism rate (percentage of offenders who re-commit the crime) for low-risk and high-risk individuals?

In [5]:
# Create a new data frame that only contains observations with recedivism 
recidivism_df = compas_df[compas_df.two_year_recid == 1]
# Calculate the recidivism rate for low risk and high risk individuals
recidivism_risk = compas_df.groupby("risk_category")["two_year_recid"].mean()
recidivism_low_risk = recidivism_risk[0]
recidivism_high_risk = recidivism_risk[1]
# Print the result statements
print('Revidivism rate for low-risk individuals:', recidivism_low_risk)
print('Revidivism rate for high-risk individuals:', recidivism_high_risk)

Revidivism rate for low-risk individuals: 0.3200145296040683
Revidivism rate for high-risk individuals: 0.6344554455445545


    (b) What are the recidivism rates for African-Americans and Caucasians?

In [6]:
# Calculate the recidivism rate for African-American and Caucasians
recidivism_risk_race = compas_df.groupby("race")["two_year_recid"].mean()
recidivism_african_american = recidivism_risk_race[0]
recidivism_caucasian = recidivism_risk_race[1]
# Print result statement
print('Revidivism rate for African-Americans:', recidivism_african_american)
print('Revidivism rate for Caucasians:', recidivism_caucasian)

Revidivism rate for African-Americans: 0.5231496062992126
Revidivism rate for Caucasians: 0.3908701854493581


    5. (8 pt) Now create a confusion matrix comparing COMPAS predictions for recidivism (low risk/high risk) and the actual two-year recidivism and interpret the results. In order to be on the same page, let’s call recidivists “positives”.
    Note: you do not have to predict anything here. COMPAS has made the prediction for you, this is the variable you created in 3 based on decile_score. See the referred articles about the controversy around COMPAS methodology.
    Note 2: Do not just output a confusion matrix with accompanying text like “accuracy = x%, precision = y%”. Interpret your results such as “z% of recidivists were falsly classified as low-risk, COMPAS accurately classified k% of individuals, etc.”

In [7]:
# Create a confusion matrix comparing COMPAS predictions for recidivism  and the actual two-year recidivism.
cm = confusion_matrix(compas_df.two_year_recid,compas_df.risk_category)
cm

array([[1872,  923],
       [ 881, 1602]])

|        | Confusion Matrix |        |
| :----: | :--------: | :--------: |
|         | Low-Risk (-) | High-Risk (+) |
| non-recidivated (-) |      1872 (TN)    |    923 (FP)   |
| recidivated (+) |      881 (FN)    |    1602  (TP)   |



In [8]:
# Count the total number of cases
total_count = cm.sum().sum()
# Count the total positive and negative cases
non_recid, recid = cm.sum(axis = 1)
# Create the true positive, ture negative, false negative, false positive variables
tp = cm[1,1]
tn = cm[0,0]
fn = cm[1,0]
fp = cm[0,1]
# Print interpretation
print(round((tp + tn)/ total_count * 100, 2), "% of individuals were accurately classified by COMPAS (accuracy).\n")
print(round((tp / (recid) * 100), 2), "% of recidivists were correctly classified as high-risk (recall).\n")
print(round((tp / (tp + fp) * 100), 2), "% of high risk individuals were correcly classified as recidivists (precision).\n")
print(round(((fn / recid) * 100), 2), "% of recidivists were falsly classified as low-risk. (FNR)\n")
print(round(((fn / (tn + fn)) * 100), 2), "% of low-risk individuals were falsly classified as recidivists.\n")
print(round(((fp / non_recid) * 100), 2), "% of non-recidivists were falsly classified as high-risk. (FPR)\n")
print(round(((fp / (tp + fp)) * 100), 2), "% of high-risk individuals were falsly classified as non-recidivists.")

65.82 % of individuals were accurately classified by COMPAS (accuracy).

64.52 % of recidivists were correctly classified as high-risk (recall).

63.45 % of high risk individuals were correcly classified as recidivists (precision).

35.48 % of recidivists were falsly classified as low-risk. (FNR)

32.0 % of low-risk individuals were falsly classified as recidivists.

33.02 % of non-recidivists were falsly classified as high-risk. (FPR)

36.55 % of high-risk individuals were falsly classified as non-recidivists.


    6. (8pt) Find the accuracy of the COMPAS classification, and also how its errors were distributed. Would you feel comfortable having a judge to use COMPAS to inform sentencing guidelines? At what point would the error/misclassification risk be acceptable for you? What do you think, how well can judges perform the same task without COMPAS’s help? 
    Remember: human judges are not perfect either!

In [9]:
# Print the accuracy score of the COMPAS classification
print("Accuracy of the COMPAS classification:", (tp + tn)/ total_count)
print(round(((fp / non_recid) * 100), 2), "% of non-recidivists were falsly classified as high-risk. (FPR)")
print(round(((fn / recid) * 100), 2), "% of recidivists were falsly classified as low-risk. (FNR)")

Accuracy of the COMPAS classification: 0.6582038651004168
33.02 % of non-recidivists were falsly classified as high-risk. (FPR)
35.48 % of recidivists were falsly classified as low-risk. (FNR)


The accuracy of the COMPAS classification is about 0.66, which means that about 66% percent of the individuals were correctly classified by COMPAS classification. 

Personally, I would not be comfortable having a judge use COMPAS to inform sentencing guidelines as the error in both predicted and actual results are above 0.30 or 30%, as shown above. That means that within the thousands of people who are released from U.S. prison, about 30% of them are inaccurately classified. 

I would not be comfortable using COMPAS unless the misclassification risk is close to 0%, but it is not realistic. So, if the COMPAS misclassification risk is less than 0.05 or 5%, I would be more comfortable letting judges use COMPAS.

I think the judges may have a better chance than the COMPAS classification right now. Although judges are not perfect and can be biased in many cases, I don't think they can do any worse than misclassifying about a third of the prisoners. That being said, if COMPAS can lower its misclassification risk, I would rather use COMPAS as COMPAS is less biased than human judges.

    7. (10pt) Now repeat your confusion matrix calculation and analysis from 5. But this time do it separately for African-Americans and for Caucasians:

In [10]:
# Filter the race for African American
compas_AA = compas_df[compas_df.race == "African-American"]
# Create a confusion matrix of recidivism and prediction for African American
# Recidivism is positive
cm_AA = confusion_matrix(compas_AA.two_year_recid, compas_AA.risk_category)
# Assign values to TN, TP, FP, FN, and total
tn_AA = cm_AA[0, 0]
fp_AA = cm_AA[0, 1]
fn_AA = cm_AA[1, 0]
tp_AA = cm_AA[1, 1]
total_AA = compas_AA.shape[0]
cm_AA

array([[ 873,  641],
       [ 473, 1188]])

|        | Confusion Matrix for African American |        |
| :----: | :--------: | :--------: |
|         | Low-Risk (-) | High-Risk (+) |
| non-recidivated (-) |      873 (TN)    |    641 (FP)   |
| recidivated (+) |      473 (FN)    |    1188  (TP)   |

In [11]:
# Filter the race for Caucasian
compas_Caucasian = compas_df[compas_df.race == "Caucasian"]
# Create a confusion matrix of recidivism and prediction for Caucasian
# Recidivism is positive
cm_Caucasian = confusion_matrix(compas_Caucasian.two_year_recid, compas_Caucasian.risk_category)
# Assign values to TN, TP, FP, FN, and total
tn_Caucasian = cm_Caucasian[0, 0]
fp_Caucasian = cm_Caucasian[0, 1]
fn_Caucasian = cm_Caucasian[1, 0]
tp_Caucasian = cm_Caucasian[1, 1]
total_Caucasian = compas_Caucasian.shape[0]
cm_Caucasian

array([[999, 282],
       [408, 414]])

|        | Confusion Matrix for Caucasian |        |
| :----: | :--------: | :--------: |
|         | Low-Risk (-) | High-Risk (+) |
| non-recidivated (-) |      999 (TN)    |    282 (FP)   |
| recidivated (+) |      408 (FN)    |    414  (TP)   |

    (a) How accurate is the COMPAS classification for African-American individuals? For Caucasians?

In [12]:
# Find the accuracy of the COMPAS classification for African-American and Caucasians
print("COMPAS classification accurately classifies", round((tn_AA + tp_AA) / total_AA * 100, 2), "% of African Americans.")
print("COMPAS classification accurately classifies", round(((tn_Caucasian + tp_Caucasian) / total_Caucasian) * 100, 2), "% of Caucasians.")

COMPAS classification accurately classifies 64.91 % of African Americans.
COMPAS classification accurately classifies 67.19 % of Caucasians.


    (b) What are the false positive rates (false recidivism rates) FPR=FP/N=FP/(FP+TN)?

In [13]:
# Calculate the FPR 
print("False Positive Rate (FPR) or False Recidivism Rate", round(fp_AA / (fp_AA + tn_AA) * 100, 2), "% of African Americans.")
print("False Positive Rate (FPR) or False Recidivism Rate", round(fp_Caucasian / (fp_Caucasian + tn_Caucasian) * 100, 2), "% of Caucasians.")

False Positive Rate (FPR) or False Recidivism Rate 42.34 % of African Americans.
False Positive Rate (FPR) or False Recidivism Rate 22.01 % of Caucasians.


    (c) The false negative rates (false no-recidivism rates) FNR=FN/P=FN/(FN+TP)?

    We did not talk about FPR and FNR in class, you can consult lecture notes, section 6.1.1.

In [14]:
# Calculate the FNR
print("False Negative Rate (FNR) or False No-Recidivism Rate", round(fn_AA / (fn_AA + tp_AA) * 100, 2), "% of African Americans.")
print("False Negative Rate (FNR) or False No-Recidivism Rate", round(fn_Caucasian / (fn_Caucasian + tp_Caucasian) * 100, 2), "% of Caucasians.")

False Negative Rate (FNR) or False No-Recidivism Rate 28.48 % of African Americans.
False Negative Rate (FNR) or False No-Recidivism Rate 49.64 % of Caucasians.


    8. (12pt) If you have done this correctly, you will find that COMPAS’s percentage of correctly categorized individuals (accuracy) is fairly similar for African-American and Caucasian individuals, but that false positive rates and false negative rates are different. Look again at the overal recidivism rates in the dataset for Black and White individuals. In your opinion, is the COMPAS algorithm fair? Justify your answer.
    Hint: This is not a trick question. If you read the first two recommended readings, you will find that people disagree how you define fairness. Your answer will not be graded on which side you take, but on your justification.

As mentioned, the COMPAS algorithm is already questionable in terms of accuracy as it has over 30% error. Thus, I believe that the COMPAS algorithm is not fair. Fair is defined as to ensure similar FPR and FNR across different categories in this case. However, it is not fair in different categories. For example, race. As shown above, Caucasians have an FPR of approximately 22%, while African Americans have an FPR of roughly 42%. The difference between the two groups in FPR is about 20%. This means that African Americans are 20% more likely to be falsely predicted to recidivate and thus result in inequality in the sentences and treatments at the count. In terms of FNR, Caucasians have about 50% FNR, and African Americans have about 28%. The difference between the two is about 22% which means that Caucasians are 22% more likely to be predicted as having no recidivism compared to African Americans and lead to less harsh sentences. 

There is a clear divide in the treatment between the African Americans and Caucasians in the COMPAS algorithm. Although the overall recidivism rates for Caucasians and African Americans are rather similar, with about 13% difference, the difference in FPR and FNR is too large to say that this algorithm is fair.

# Can you beat COMPAS? (40pt)    

    COMPAS model has created quite a bit controversy. One issue frequently brought up is that it is “closed source”, i.e. its inner workings are not available neither for public nor for the judges who are actually making the decisions. But is it a big problem? Maybe you can devise as good a model as COMPAS to predict recidivism? Maybe you can do even better? Let’s try!
    We proceed as follows:
    • Note that you should not use variable score_text that originates from COMPAS model. Do you see why?
    • First we devise a model that explicitly does not include gender and race. Your task is to use cross-validation to develop the best model you can do based on the available variables.
    • Thereafter we add gender and see if gender improves the model performance.
    • And finally we also add race and see if race has an additional explanatory effect, i.e. does race help to improve the performance of the model.

    1. (8pt) Before we start: what do you think, what is an appropriate model performance measure here? A, P, R, For something else? Maybe you want to report multiple measures? Explain!

Although all the performance measures are important, I think the most appropriate model performance measure here should be precision. This is because I want to avoid false positives, which are the people who were predicted to recidivate but actually did not. It is not wanted to put innocent people back in jail again. 

    2. (6pt) Now it is time to do the modeling. Create a logistic regression model that contains all explanatory variables you have in data into the model. (Some of these you have to convert to dummies). Do not include the variables discussed above, do not include race and gender in this model either to avoid explicit gender/racial bias.
    Use 10-fold CV to compute its relevant performance measure(s) you discussed above.

In [15]:
# Create a logistic regression model that contains all explanatory variables
X_1 = compas_df[["age", "c_charge_degree", "priors_count"]]
X1 = pd.get_dummies(X_1, columns = ["c_charge_degree"], drop_first = True)
y = compas_df.two_year_recid
m = LogisticRegression(solver = "lbfgs")
cv = cross_val_score(m, X1, y, scoring = 'precision', cv = 10)
print('Average Cross-validation results:', cv.mean())

Average Cross-validation results: 0.6798087793453661


    3. (6pt) Experiment with different models to find the best model according to your preformance indicator. (Include/exclude different variables, you may also do feature engineering, e.g. create different age groups, include variables like age2, age2, interaction effects, etc. But do not include race and gender.
    Report what did you try (but no need to report the full results of all unsuccessful models you tried), and your best model’s performance. Is it better or worse than for the COMPAS model?
    Please do not spend too much on tiny differences, e.g. your accuracy is better by 0.001 and F-score worse by 0.0005. Cross-validation is a random process and these figures jump up and down a bit.

In [16]:
# Categorize the priors_count into different categories
compas_df["prior_category"] = pd.cut(compas_df.priors_count,
                      bins = [-np.inf, 1, 5, np.inf],
                      labels = ["0", "1-4", "5+"],
                      right = False)
# Make X explanatory and y predicting variable
# Create another another model that adds the new category
X_2 = compas_df[["age", "c_charge_degree", "prior_category"]]
# Create another model
X2 = pd.get_dummies(X_2, columns = ["c_charge_degree", "prior_category"], drop_first = True)
y = compas_df.two_year_recid
m = LogisticRegression(solver = "lbfgs")
cv = cross_val_score(m, X2, y, scoring = 'precision', cv = 10)
print('Average Cross-validation results:', cv.mean())

Average Cross-validation results: 0.6540072716116024


In [17]:
# Create the variable age squared that sqaure the age variable
compas_df["age_squared"] = compas_df["age"]**2
# Create another model
X_3 = compas_df[["age_squared", "c_charge_degree", "priors_count"]]
X3 = pd.get_dummies(X_3, columns = ["c_charge_degree"], drop_first = True).values
y = compas_df.two_year_recid
m = LogisticRegression(solver = "lbfgs")
cv = cross_val_score(m, X3, y, scoring = 'precision', cv = 10)
print('Average Cross-validation results:', cv.mean())

Average Cross-validation results: 0.6972982679085296


Out of all the experimented models, this model performs the best in terms of precision. This model includes three different variables, age_squared, c_charge_degree, and priors_count. COMPAS had a precision of 0.6345, and mine had a precision significantly higher. Thus, I would say that my model performs better than the COMPAS model.

In [18]:
# Create another age category variable
compas_df["age_cat1"] = pd.cut(compas_df.age,
                            bins = [-np.inf, 18, 21, 30, 40, np.inf],
                            labels = ["<=18", "19-21", "22-30", "29-40", "41+"])
# Create another variable that is based on priors_count 
compas_df["priorTest"] = pd.cut(compas_df.priors_count,  # creating dummy variables
                            bins = [-np.inf, 0, 5, 9, 11, 20, 30, np.inf],
                            labels = ["0", "1-5","6-9", "10-11", "12-20", "21-30", "31+"])
# Create another model 
X_4 = compas_df[["age_cat1", "c_charge_degree", "priorTest"]]
X4 = pd.get_dummies(X_4, columns = ["age_cat1", "c_charge_degree", "priorTest"], drop_first = True).values
y = compas_df.two_year_recid
m = LogisticRegression(solver = "lbfgs")
cv = cross_val_score(m, X4, y, scoring = 'precision', cv = 10)
print('Average Cross-validation results:', cv.mean())

Average Cross-validation results: 0.6405401891276197


For other experiments, I tried making age, priors_count categorical and dividing them into different categories. It seems like the more complicated the model is, the worse it performs.

    4. (4pt) Now add sex to the model. Does it help to improve the performance?    

In [19]:
# Add sex to the model
X_5 = compas_df[["age_squared", "c_charge_degree", "priors_count", "sex"]]
X5 = pd.get_dummies(X_5, columns = ["c_charge_degree", "sex"], drop_first = True).values
y = compas_df.two_year_recid
m = LogisticRegression(solver = "lbfgs")
cv = cross_val_score(m, X5, y, scoring = 'precision', cv = 10)
print('Average Cross-validation results:', cv.mean())

Average Cross-validation results: 0.6841012507808305


After adding sex to the model, it does not help to improve the performance. 

    5. (4pt) And finally add race. Does the model improve? Again, let’s not talk about tiny differences here.

In [20]:
# Add race to the model
X_6 = compas_df[["age_squared", "c_charge_degree", "priors_count", "race", "sex"]]
X6 = pd.get_dummies(X_6, columns = ["c_charge_degree", "race", "sex"], drop_first = True).values
y = compas_df.two_year_recid
m = LogisticRegression(solver = "lbfgs")
cv = cross_val_score(m, X6, y, scoring = 'precision', cv = 10)
print('Average Cross-validation results:', cv.mean())

Average Cross-validation results: 0.6813486238036306


After adding both race and sex, the model did not improve.

    6. (12pt) Discuss the results. Did you manage to be equally good as COMPAS? Did you create a better model? Do gender and race help to improve your predictions? What should judges do when having access to such models? Should they use such models?

I managed to create a better model than COMPAS in its predictive capabilities when looking at precision as a performance measure. Adding both gender and race did not seem to improve the predictions of my best performance model. However, this can also be caused by the randomness of the cross-validation as the difference between the average cross-validation results are not too different between my best performance model and the model with race and sex. Nonetheless, the difference between average cross-validation results is not big. 

I don't think either my model or COMPAS model are good enough in their predictions to be useful tools for the judges to use to predict recidivism. If they decide to use these models, they should take the results from these models with caution. When the features that went into the model are not public information, it is almost impossible to tell what sort of biases are present in the model. It will be perpetuated if the model is used for substantial decisions like sentence lengths for criminals.

# Is your model more fair? (12pt)

    Finally, is your model any better (or worse) than COMAPS in terms of fairness? Let’s use your model to predict recidivism for everyone (i.e. all data, ignore training-testing split), and see if you managed to FPR and FNR for African-Americans and Caucasians are now similar.

    1. (6pt) Replicate 1.7 using your best model: pick the best model from question 2.3, predict recidivism for everyone in data (ie only African-Americans and Caucasians), and compute FPR and FNR.

In [21]:
# Predict the y based on the best performance model
m_logistic = LogisticRegression(penalty = 'none', solver = 'newton-cg').fit(X3, y)
compas_df["y_predicted"] = m_logistic.predict(X3)

In [22]:
# Filter the race for African American
compas_AA = compas_df[compas_df.race == "African-American"]
# Create a confusion matrix of recidivism and prediction for African American
# Recidivism is positive
cm_AA = confusion_matrix(compas_AA.two_year_recid, compas_AA.y_predicted)
# Assign values to TN, TP, FP, FN, and total
tn_AA = cm_AA[0, 0]
fp_AA = cm_AA[0, 1]
fn_AA = cm_AA[1, 0]
tp_AA = cm_AA[1, 1]
total_AA = compas_AA.shape[0]
cm_AA

array([[1080,  434],
       [ 613, 1048]])

|        | Confusion Matrix for African American |        |
| :----: | :--------: | :--------: |
|         | Low-Risk (-) | High-Risk (+) |
| non-recidivated (-) |      1080 (TN)    |    434 (FP)   |
| recidivated (+) |      613 (FN)    |    1048  (TP)   |

In [23]:
# Filter the race for Caucasian
compas_Caucasian = compas_df[compas_df.race == "Caucasian"]
# Create a confusion matrix of recidivism and prediction for Caucasian
# Recidivism is positive
cm_Caucasian = confusion_matrix(compas_Caucasian.two_year_recid, compas_Caucasian.y_predicted)
# Assign values to TN, TP, FP, FN, and total
tn_Caucasian = cm_Caucasian[0, 0]
fp_Caucasian = cm_Caucasian[0, 1]
fn_Caucasian = cm_Caucasian[1, 0]
tp_Caucasian = cm_Caucasian[1, 1]
total_Caucasian = compas_Caucasian.shape[0]
cm_Caucasian

array([[1099,  182],
       [ 498,  324]])

|        | Confusion Matrix for Caucasian |        |
| :----: | :--------: | :--------: |
|         | Low-Risk (-) | High-Risk (+) |
| non-recidivated (-) |      1099 (TN)    |    182 (FP)   |
| recidivated (+) |      498 (FN)    |    324  (TP)   |

In [24]:
# Find the accuracy of the COMPAS classification for African-American and Caucasians
print("Created Model accurately classifies", round((tn_AA + tp_AA) / total_AA * 100, 2), "% of African Americans.")
print("Created Model accurately classifies", round(((tn_Caucasian + tp_Caucasian) / total_Caucasian) * 100, 2), "% of Caucasians.")

Created Model accurately classifies 67.02 % of African Americans.
Created Model accurately classifies 67.67 % of Caucasians.


In [25]:
# Calculate the FPR 
print("False Positive Rate (FPR) or False Recidivism Rate", round(fp_AA / (fp_AA + tn_AA) * 100, 2), "% of African Americans.")
print("False Positive Rate (FPR) or False Recidivism Rate", round(fp_Caucasian / (fp_Caucasian + tn_Caucasian) * 100, 2), "% of Caucasians.")

False Positive Rate (FPR) or False Recidivism Rate 28.67 % of African Americans.
False Positive Rate (FPR) or False Recidivism Rate 14.21 % of Caucasians.


In [26]:
# Calculate the FNR
print("False Negative Rate (FNR) or False No-Recidivism Rate", round(fn_AA / (fn_AA + tp_AA) * 100, 2), "% of African Americans.")
print("False Negative Rate (FNR) or False No-Recidivism Rate", round(fn_Caucasian / (fn_Caucasian + tp_Caucasian) * 100, 2), "% of Caucasians.")

False Negative Rate (FNR) or False No-Recidivism Rate 36.91 % of African Americans.
False Negative Rate (FNR) or False No-Recidivism Rate 60.58 % of Caucasians.


    2. (6pt) Explain what do you get. Are your results different from COMPAS in any significant way?

In terms of accuracy, my model does slightly better than the COMPAS model as the accuracy improves by about 3% for African Americans. In terms of the False Positive Rate, my model also performs better as the FPR drops about 14% and 8% for African Americans and Caucasians. In terms of FNR, my model performs worse than COMPAS as the FNR increases by about 8% and 11% for African Americans and Caucasians.