The following analysis uses the data from a maternity study to draw meanginful statistically based conclusions.

Developed by Ahmed Kayal

##### Package imports

In [1]:
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression

<br>

#### File read

In [2]:
maternity_info = pd.read_csv("/Users/ahmed/Desktop/lowbwt.csv")
maternity_info.head()

Unnamed: 0,STUDYID,LOW,AGE,LWT,RACE,SMOKE,PTL,HT,UI,FTV,BWT
0,85,0,19,182,2,0,0,0,1,0,2523
1,86,0,33,155,3,0,0,0,0,3,2551
2,87,0,20,105,1,1,0,0,0,1,2557
3,88,0,21,108,1,1,0,0,1,2,2594
4,89,0,18,107,1,1,0,0,1,0,2600


In [3]:
maternity_info.columns

Index(['STUDYID', 'LOW', 'AGE', 'LWT', 'RACE', 'SMOKE', 'PTL', 'HT', 'UI',
       'FTV', 'BWT'],
      dtype='object')

*Data dictionary*


* STUDYID: Subject ID number <br>
* LOW: Low birth weight indicator, 0=no, 1=yes <br>
* AGE: Age of mother, years <br>
* LWT: Weight of mother at last menstrual period, pounds <br>
* RACE: Race, 1=white, 2=black, 3=other <br>
* SMOKE: Mother's smoking status at pregnancy, 0=no, 1=yes <br>
* PTL: History of pre-mature labor, 0=none, 1=one, etc. <br>
* HT: History of hypertension, 0=no, 1=yes <br>
* UI: Presence of uterine irritability, 0=no, 1=yes <br>
* FTV: Number of physician visits during first trimester, 0=none, 1=one, etc. <br>
* BWT: Birth weight, grams <br>
<br>


**Interest**

The goal of this analysis is to determine the risk factors that are associated with with low birth weight. To do so, I'll look at the odds ratio of each feature and the associated p-value. 

#### Subsetting dataframe to the features of interest for my analysis

In [4]:
relevant_columns = ["AGE", "LWT", "SMOKE", "PTL", "HT", "UI", "FTV"]
target = maternity_info.LOW


In [5]:
# Executing a for loop to quickly calculate the odds ratio of each feature

for col in relevant_columns:
    clf = LogisticRegression(solver='lbfgs').fit(maternity_info[[col]], target)
    print(f"{col} Odds ratio value: {round(np.exp(clf.coef_)[0][0], 5)}")
    
    model = sm.formula.glm("LOW ~ %s" % (col), family=sm.families.Binomial(), data=maternity_info).fit()
    print(f"Associated p-value: {round(pd.read_html(model.summary().tables[1].as_html(), header=0, index_col=0)[0]['P>|z|'][col], 5)} \n")
          

AGE Odds ratio value: 0.95018
Associated p-value: 0.105 

LWT Odds ratio value: 0.98604
Associated p-value: 0.023 

SMOKE Odds ratio value: 1.89433
Associated p-value: 0.028 

PTL Odds ratio value: 2.07393
Associated p-value: 0.011 

HT Odds ratio value: 2.43446
Associated p-value: 0.046 

UI Odds ratio value: 2.24072
Associated p-value: 0.023 

FTV Odds ratio value: 0.87644
Associated p-value: 0.389 



**Conclusion**

From the above output, the several risk factors that have a statistically significant association with low birth weight are:
* Weight of the mother at their last menstrual period
* Smoking
* History of pre-mature labor
* History of hypertension
* Presence of uterine irritability

Additionally, using the odds ratio values as a foundation, there are a several meaningful conclusions that can be drawn:
* The odds of having a  baby with a low birth weight is .986 times lower with every one pound increase in the weight of the mother at her last menstrual period. 
* Women who smoke during pregnancy are 1.89 times more likely to give birth to a baby with a low birth weight than pregnant women who do not smoke. 
* Women who have a history of premature labor are 2.07 times more likely to give birth to a baby with a low birth weight than pregnant women who do not have such a history. 
* The odds of having a baby with a low birth weight is 2.43 times more likely in women who have a history of hypertension.
* Finally, women are 2.24 more likely to give birth to a child with low birth weight if they have a history of uterine irritability. 

<br>