# Conditional Probability Demonstration In Python

### Using Employee Churn Data

## Background :
A company has comprehensive database of its past and present workforce, with information on their demographics, education, experience and hiring background as well as their work profile. The management wishes to see if this data can be used for predictive analysis, to control attrition levels. However, in this case, we will use the data for illustration of conditional probabilities

•Sample size is 83


•Gender, Experience Level (<3, 3-5 and >5 years), Function (Marketing, Finance, Client Servicing (CS)) and Source (Internal or External) are independent variables


•Churn is the dependent variable (=1 if employee left within 18 months from joining date)

## Data Description :

| Column Name | Description                                             |
| ----------- | ------------------------------------------------------- |
| `churn`     | 1 = Employee left within 18 months, 0 = Still retained  |
| `function`  | Department: Marketing / Finance / Client Servicing (CS) |
| `exp`       | Experience group: `<3`, `3-5`, `>5` years               |
| `gender`    | `M` (Male) / `F` (Female)                               |
| `source`    | `internal` or `external` hiring                         |


### Import Libraries & Load Data

In [18]:
import pandas as pd

In [19]:
df = pd.read_csv('EMPLOYEE CHURN DATA.csv')
df.head()

Unnamed: 0,sn,churn,function,exp,gender,source
0,1,1,CS,<3,M,external
1,2,1,CS,<3,M,external
2,3,1,CS,>=3 and <=5,M,internal
3,4,1,CS,>=3 and <=5,F,internal
4,5,1,CS,<3,M,internal


### 1. Unconditional Probability of Churn

In [20]:
P_churn = df['churn'].value_counts(normalize = True).round(3)
print("Unconditional Probability of Churn:", P_churn)

Unconditional Probability of Churn: churn
0    0.602
1    0.398
Name: proportion, dtype: float64


#### Inference : Therefore, Pobability of churn is 0.398

### 2. Conditional Probability using Pandas
#### Next, we will find the probability of Employee churn given that the employee is 'Male'
#### P(churn = 1 | gender = M)

In [21]:
# Filter Male employees
male_df = df[df["gender"] == "M"]

P_churn_given_M = male_df["churn"].value_counts(normalize = True).round(3)
print("P(churn | Gender = M):", P_churn_given_M)

P(churn | Gender = M): churn
0    0.587
1    0.413
Name: proportion, dtype: float64


#### Inference : Therefore, Pobability of churn given employee is 'Male' is 0.413

### 3. Conditional Probability using pandas.crosstab

#### P(churn = 1 | gender = M)

In [22]:
ct = pd.crosstab(df["churn"], df["gender"], normalize="columns").round(3)
ct

gender,F,M
churn,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.622,0.587
1,0.378,0.413


### 4. Conditional Probability using Pandas
#### Next, we will find the probability of churn given that the employee is 'Male' and experience '>5' years
#### P(churn = 1 | gender = M AND exp >5)

In [23]:
male_exp_df = df[(df["gender"] == "M") & (df["exp"] == ">5")]

P_churn_given_M_exp5 = male_exp_df["churn"].value_counts(normalize = True).round(3)
print("P(churn | Gender = M & Exp >5):", P_churn_given_M_exp5)

P(churn | Gender = M & Exp >5): churn
0    0.938
1    0.062
Name: proportion, dtype: float64


#### Inference : Therefore, Pobability of churn given employee is 'Male' and experience is '>5' years is 0.062


### Conditional Probability Summary Table

| **Sr. No.** | **Scenario**                            | **Method Used**              | **Python Code Line**                                          | **Probability Output (example data)** |
|------------:|-----------------------------------------|------------------------------|----------------------------------------------------------------|--------------------------------------:|
| 1           | Unconditional Probability of churn       | Mean of churn column         | `df['churn'].value_counts(normalize = True)`                                          | 0.398                                 |
| 2           | P(churn = 1 \| Gender = M)              | Subset + Mean                | `df[df['gender']=='M']['churn'].value_counts(normalize = True)`                       | 0.413                                 |
| 3           | P(churn = 1 \| Gender = M)              | Crosstab (Gender in columns) | `pd.crosstab([1,'M']`)                                               | 0.413                                 |
| 4           | P(churn = 1 \| Gender = M & Exp >5)     | Subset + Mean                | `df[(df['gender']=='M') & (df['exp']=='>5')]['churn'].value_counts(normalize = True)` | 0.062                                 |
