# **Table of Contents**
* [Exploration](#section_a)
    <br><br>
* [LOR - Stat approach](#section_b)
    * [Logit](#section_21)
    * [Drop insig cols](#section_2)
    * [Re-logit](#section_3)
    * [Coeff interpret](#section_4)

In [1]:
import numpy             as np
import pandas            as pd
import matplotlib.pyplot as plt
import seaborn           as sns

import warnings
warnings.simplefilter ("ignore")

<a id='section_a'></a>
# Part 1 - **Exploration**

In [2]:
df  =  pd.read_csv ('datasets/US Heart Patients.csv')

df.sample(4)

Unnamed: 0,male,age,education,currentSmoker,cigsPerDay,BPMeds,prevalentStroke,prevalentHyp,diabetes,totChol,sysBP,diaBP,BMI,heartRate,glucose,TenYearCHD
2678,0,47,3.0,1,9.0,0.0,0,0,0,250.0,98.0,73.0,24.39,60.0,88.0,0
506,1,57,4.0,0,0.0,0.0,0,1,0,303.0,160.5,98.5,25.84,81.0,100.0,0
1310,1,50,4.0,1,15.0,0.0,0,0,0,212.0,132.0,87.0,25.9,75.0,83.0,0
1555,0,38,2.0,1,12.0,0.0,0,0,0,209.0,122.5,76.5,24.51,90.0,73.0,0


> Note :
        
        * TenYearCHD is the target
        * CHD - Coronary Heart Diasease

In [3]:
df.shape

(4240, 16)

In [4]:
df.isnull().sum()

male                 0
age                  0
education          105
currentSmoker        0
cigsPerDay          29
BPMeds              53
prevalentStroke      0
prevalentHyp         0
diabetes             0
totChol             50
sysBP                0
diaBP                0
BMI                 19
heartRate            1
glucose            388
TenYearCHD           0
dtype: int64

In [5]:
df.fillna (method ='ffill',  inplace =True)

<a id='section_b'></a>
# Part 2 - **LoR : Stat Approach**

In [6]:
x  =  df.drop('TenYearCHD', axis=1)
y  =  df['TenYearCHD']

<a id='section_21'></a>
## 1. **Logit**

In [7]:
import statsmodels.api as sm

x     =  sm.add_constant(x)

model =  sm   .Logit(y,x)   .fit()

model.summary()


Optimization terminated successfully.
         Current function value: 0.378831
         Iterations 7


0,1,2,3
Dep. Variable:,TenYearCHD,No. Observations:,4240.0
Model:,Logit,Df Residuals:,4224.0
Method:,MLE,Df Model:,15.0
Date:,"Sun, 27 Jun 2021",Pseudo R-squ.:,0.1107
Time:,22:38:41,Log-Likelihood:,-1606.2
converged:,True,LL-Null:,-1806.1
Covariance Type:,nonrobust,LLR p-value:,7.904e-76

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-7.9749,0.659,-12.106,0.000,-9.266,-6.684
male,0.5094,0.100,5.076,0.000,0.313,0.706
age,0.0620,0.006,9.972,0.000,0.050,0.074
education,-0.0156,0.046,-0.343,0.731,-0.105,0.074
currentSmoker,0.0139,0.144,0.097,0.923,-0.268,0.296
cigsPerDay,0.0209,0.006,3.681,0.000,0.010,0.032
BPMeds,0.2311,0.219,1.054,0.292,-0.198,0.661
prevalentStroke,0.9706,0.441,2.199,0.028,0.106,1.836
prevalentHyp,0.2346,0.128,1.828,0.068,-0.017,0.486


> Comments

        * LLR p-value = 7.9 e-76
        * Insig Cols ----  education , currentSmoker , BPMeds , diabetes , diaBP , BMI , heartRate

<a id='section_2'></a>
## 2. **Drop insig Cols**

In [8]:
x .drop ( [ 'education', 'currentSmoker', 'BPMeds', 'diabetes', 'diaBP', 'BMI', 'heartRate' ] , 
             axis=1 , inplace =True )

<a id='section_3'></a>
## 3. **Re-Logit**

In [9]:
model  =  sm   .Logit(y,x)   .fit()

model.summary()


Optimization terminated successfully.
         Current function value: 0.379175
         Iterations 7


0,1,2,3
Dep. Variable:,TenYearCHD,No. Observations:,4240.0
Model:,Logit,Df Residuals:,4231.0
Method:,MLE,Df Model:,8.0
Date:,"Sun, 27 Jun 2021",Pseudo R-squ.:,0.1099
Time:,22:38:41,Log-Likelihood:,-1607.7
converged:,True,LL-Null:,-1806.1
Covariance Type:,nonrobust,LLR p-value:,9.05e-81

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-8.3476,0.479,-17.437,0.000,-9.286,-7.409
male,0.5068,0.098,5.158,0.000,0.314,0.699
age,0.0632,0.006,10.583,0.000,0.051,0.075
cigsPerDay,0.0210,0.004,5.476,0.000,0.014,0.029
prevalentStroke,1.0220,0.438,2.334,0.020,0.164,1.880
prevalentHyp,0.2361,0.125,1.881,0.060,-0.010,0.482
totChol,0.0017,0.001,1.702,0.089,-0.000,0.004
sysBP,0.0135,0.003,5.060,0.000,0.008,0.019
glucose,0.0068,0.002,4.388,0.000,0.004,0.010


> Comments
    
        * LLR p-value decreases  from  e-76  to  e-81 
        * Overall model more sig

<a id='section_4'></a>
## 4. **Coeff Interpret**

In [10]:
model.params

const             -8.347633
male               0.506822
age                0.063176
cigsPerDay         0.021029
prevalentStroke    1.021954
prevalentHyp       0.236061
totChol            0.001731
sysBP              0.013503
glucose            0.006779
dtype: float64

> Comments
    
        *  OBS for a Num Col
        
              * When age incs by 1 unit ..... Log (Odds) incs by 0.06 unit 
              * When age incs by 1 unit ..... Odds of having CHD incs by 0.063 unit | 6.3 %
              
        *  OBS for a Cat Col
        
               * When Gender = 'Male' ..... Log (Odds) incs by 0.50 unit
               * When Gender = 'Male' ..... Odds of having CHD incs by 0.50 unit | 50 %