# Probability Intuition

In [10]:
# dependencies
import pandas as pd

In [12]:
df = pd.read_csv('lending_club.csv')
df.head()

Unnamed: 0,loan_amnt,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,...,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,application_type,mort_acc,pub_rec_bankruptcies,address
0,10000.0,36 months,11.44,329.48,B,B4,Marketing,10+ years,RENT,117000.0,...,16.0,0.0,36369.0,41.8,25.0,w,INDIVIDUAL,0.0,0.0,"0174 Michelle Gateway\nMendozaberg, OK 22690"
1,8000.0,36 months,11.99,265.68,B,B5,Credit analyst,4 years,MORTGAGE,65000.0,...,17.0,0.0,20131.0,53.3,27.0,f,INDIVIDUAL,3.0,0.0,"1076 Carney Fort Apt. 347\nLoganmouth, SD 05113"
2,15600.0,36 months,10.49,506.97,B,B3,Statistician,< 1 year,RENT,43057.0,...,13.0,0.0,11987.0,92.2,26.0,f,INDIVIDUAL,0.0,0.0,"87025 Mark Dale Apt. 269\nNew Sabrina, WV 05113"
3,7200.0,36 months,6.49,220.65,A,A2,Client Advocate,6 years,RENT,54000.0,...,6.0,0.0,5472.0,21.5,13.0,f,INDIVIDUAL,0.0,0.0,"823 Reid Ford\nDelacruzside, MA 00813"
4,24375.0,60 months,17.27,609.33,C,C5,Destiny Management Inc.,9 years,MORTGAGE,55000.0,...,13.0,0.0,24584.0,69.8,43.0,f,INDIVIDUAL,1.0,0.0,"679 Luna Roads\nGreggshire, VA 11650"


In [3]:
df.columns

Index(['loan_amnt', 'term', 'int_rate', 'installment', 'grade', 'sub_grade',
       'emp_title', 'emp_length', 'home_ownership', 'annual_inc',
       'verification_status', 'issue_d', 'loan_status', 'purpose', 'title',
       'dti', 'earliest_cr_line', 'open_acc', 'pub_rec', 'revol_bal',
       'revol_util', 'total_acc', 'initial_list_status', 'application_type',
       'mort_acc', 'pub_rec_bankruptcies', 'address'],
      dtype='object')

In [4]:
df = df[['loan_amnt','revol_bal']]
df.head()

Unnamed: 0,loan_amnt,revol_bal
0,10000.0,36369.0
1,8000.0,20131.0
2,15600.0,11987.0
3,7200.0,5472.0
4,24375.0,24584.0


## Joint Probability

The probability of 2 events occurring **SIMULTANEOUSLY**.

**Example:** What is the probability of selecting a borrower with a loan amount at or below N10,000 and a revolving balance at or below N20,000?

In [5]:
desired_outcome = len(df[(df['loan_amnt']<=10000) & (df['revol_bal']<=20000)]) # borrowers with loan amounts <= 10,000 and revolving balances <= 20000
possible_outcomes = len(df) # total number of borrowers in the dataset

In [6]:
desired_outcome / possible_outcomes

0.3708001919046537

There is a 37% chance that a borrower with a loan amount at or below N10,000 also has a revolving balance at or below N20,000 occur simulteneously.

## Marginal Probability

The probability of an event for one random variable **IRRESPECTIVE** of the outcome of another random variable. i.e. Essentially, the probability of an event.

**Example:** What is the probability of finding a borrower with a loan amount of N10,000 in the dataset *(regardless of their revolving balance)*?

In [7]:
desired_outcome = len(df[df['loan_amnt']==10000]) # total number of borrowers with loan amounts of 10,000
possible_outcomes = len(df) # total number of borrowers in the dataset

In [8]:
desired_outcome / possible_outcomes

0.06986339418730904

There is a ~7% chance of finding a borrower with a loan amount of N10,000 (regardless of their revolving balance) in the dataset.

## Conditional Probability

The probability of an event occurring **GIVEN** the occurence of another event.

In [9]:
df_2 = df[df['loan_amnt'] <= 10000]
possible_outcomes = len(df_2)
desired_outcome = len(df_2[df_2['revol_bal'] <= 20000])
desired_outcome / possible_outcomes

0.8935784394263009

Of all the borrowers with a loan amount at or below N10,000, the probability of a borrower having a revolving balance at or below N20,000 is 89%.