<font size="5"> The Concept of Odds and Probabilities</font>

In [1]:
library(dplyr)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




In [2]:
#Read in data
df <- read.csv("loan_dataset.csv")
df$Loan_Status = as.factor(df$Loan_Status)
head(df)

Unnamed: 0_level_0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<int>,<int>,<chr>,<fct>
1,Male,Yes,1,Graduate,No,4583,128,360,1,Rural,0
2,Male,Yes,0,Graduate,Yes,3000,66,360,1,Urban,1
3,Male,Yes,0,Not Graduate,No,2583,120,360,1,Urban,1
4,Male,No,0,Graduate,No,6000,141,360,1,Urban,1
5,Male,Yes,2,Graduate,Yes,5417,267,360,1,Urban,1
6,Male,Yes,0,Not Graduate,No,2333,95,360,1,Urban,1


**Logistic Regression Model on selected features**

In [3]:
glm_loan = glm(
        formula = Loan_Status ~ Credit_History + Property_Area + Education, 
        data = df,
        family = binomial()
)

In [4]:
#Create columns for predicted probability and predicted class
df <- df %>% mutate(
            predicted_loan = predict(
                glm_loan,
                type = "response"
            ),
            predicted_class = if_else(
                condition = predicted_loan > 0.5,
                true = 1,
                false = 0
            ),
            predicted_class = as.factor(predicted_class)
        )

**Confusion Matrix**

In [5]:
caret::confusionMatrix(
            data = df$predicted_class, 
            reference = df$Loan_Status
        )

Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0  63   7
         1  85 325
                                          
               Accuracy : 0.8083          
                 95% CI : (0.7702, 0.8426)
    No Information Rate : 0.6917          
    P-Value [Acc > NIR] : 5.209e-09       
                                          
                  Kappa : 0.4738          
                                          
 Mcnemar's Test P-Value : 9.923e-16       
                                          
            Sensitivity : 0.4257          
            Specificity : 0.9789          
         Pos Pred Value : 0.9000          
         Neg Pred Value : 0.7927          
             Prevalence : 0.3083          
         Detection Rate : 0.1313          
   Detection Prevalence : 0.1458          
      Balanced Accuracy : 0.7023          
                                          
       'Positive' Class : 0               
                              

**Model Diagnostics: checking multicolinearity**

In [6]:
car::vif(
        mod = glm_loan
    )

Unnamed: 0,GVIF,Df,GVIF^(1/(2*Df))
Credit_History,1.019024,1,1.009467
Property_Area,1.026862,2,1.006649
Education,1.007761,1,1.003873


**Compute the overall probability and odds of the target variable without considering any predictor variables**

In [7]:
target_probability <- mean(as.numeric(df$Loan_Status)-1)
cat("Target values proability = ", target_probability)

Target values proability =  0.6916667

In [8]:
table(df$Loan_Status)


  0   1 
148 332 

In [9]:
target_odds <- table(df$Loan_Status)[[2]]/table(df$Loan_Status)[[1]]
cat("Target values odds = ", target_odds)

Target values odds =  2.243243

**Computing the conditional probability and odds of the target variable for two levels of predictor Gender.**

In [10]:
gender_odds = df %>% dplyr::group_by(
       Gender
       ) %>% 
       dplyr::summarise(
       Approval_Probability = mean(x = Loan_Status == 1, na.rm = TRUE)
       ) %>%
       dplyr::mutate(
           Approval_Odds = Approval_Probability / (1 - Approval_Probability)
       )
gender_odds

Gender,Approval_Probability,Approval_Odds
<chr>,<dbl>,<dbl>
Female,0.627907,1.6875
Male,0.7055838,2.396552


**Compute the odds ratio using the previous odds.**

In [11]:
d_ratio = gender_odds$Approval_Odds[1]/gender_odds$Approval_Odds[2]
cat("Odds ratio of loan approval for female to male = ", d_ratio)

Odds ratio of loan approval for female to male =  0.7041367