<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Overview" data-toc-modified-id="Overview-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Overview</a></span></li><li><span><a href="#Data-Exploration" data-toc-modified-id="Data-Exploration-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Exploration</a></span></li><li><span><a href="#Data-Preprocessing" data-toc-modified-id="Data-Preprocessing-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data Preprocessing</a></span><ul class="toc-item"><li><span><a href="#Data-Visualization" data-toc-modified-id="Data-Visualization-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Data Visualization</a></span><ul class="toc-item"><li><span><a href="#Demographics-Visualization" data-toc-modified-id="Demographics-Visualization-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>Demographics Visualization</a></span></li><li><span><a href="#Ethnicity-and-Race-Visualziation" data-toc-modified-id="Ethnicity-and-Race-Visualziation-3.1.2"><span class="toc-item-num">3.1.2&nbsp;&nbsp;</span>Ethnicity and Race Visualziation</a></span></li><li><span><a href="#Feeding-Type-Visualization" data-toc-modified-id="Feeding-Type-Visualization-3.1.3"><span class="toc-item-num">3.1.3&nbsp;&nbsp;</span>Feeding Type Visualization</a></span></li><li><span><a href="#Transforming-Feeding-Type-Data" data-toc-modified-id="Transforming-Feeding-Type-Data-3.1.4"><span class="toc-item-num">3.1.4&nbsp;&nbsp;</span>Transforming Feeding Type Data</a></span></li><li><span><a href="#Initial-Feeding-Visualization" data-toc-modified-id="Initial-Feeding-Visualization-3.1.5"><span class="toc-item-num">3.1.5&nbsp;&nbsp;</span>Initial Feeding Visualization</a></span></li></ul></li></ul></li><li><span><a href="#Initial-Feeding-ECDF" data-toc-modified-id="Initial-Feeding-ECDF-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Initial Feeding ECDF</a></span></li></ul></div>

# Data Exploration Breast Feeding Data
Analysis done by: Brian Naoe
Contact Email: bp.naoe@me.com

A short data analysis for breast feeding data to gain statistical insights.  

<a id='Overview'></a>
## Overview

Early initiation of breastfeeding, within one hour of birth is recommended by WHO. Although it is one of the core indicators for assessing infant and young child feeding practices, it is a far from universal practice. Data from 2002-2005 show that 46 low- and middle-income countries (LMIC) had included early initiation of breastfeeding in Demographic Health Surveys. Of these, 54% recorded that less than half of all new-borns were put to the breast within an hour of birth. Furthermore, no country had more than 80% of babies breastfeeding within an hour of birth. Global estimates are that less than half (42%) of all newborns are put to the breast within the first hour of birth.

Reference: Shrimpton, R. (2017).Early initiation of breastfeeding. Retrieved from https://www.who.int/elena/titles/commentary/early_breastfeeding/en/


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

## Data Exploration

A 2017 data have been used for this data analysis. The data contains information about the breast feeding events during the mother's admission.

In [None]:
raw_data = pd.read_csv('Breast_feed.csv')
df = raw_data.copy()
df.head()

## Data Preprocessing
Grouped rows with the same event date and time for the the same encounter. Each encounter has multiple rows for multiple breast feeding events during the admission.

In [None]:
df_group = df.groupby(['pid','event_dt_tm']).first().reset_index()
df_group.head()

In [None]:
#Investigating the null values
df_group.info()

### Data Visualization

#### Demographics Visualization

In [None]:
df_group_mrn = df.groupby('pid').first().reset_index()
df_group_mrn

#### Ethnicity and Race Visualziation

In [None]:
df_ethnicity = df_group_mrn[['ethnicity','race']]
df_ethnicity.head()

In [None]:
# looking at ethnicity and race population
plt.figure(figsize=(20,5))
sns.countplot(x='ethnicity', hue='race', data=df_ethnicity, edgecolor=(0,0,0), palette='pastel')
# Rotate
plt.xticks(rotation=-45)
plt.show()

#### Feeding Type Visualization

In [None]:
# looking at unique values for feeding type
df_group['newborn_feeding_type'].unique()

In [None]:
df_group['newborn_feeding_type'].count()

#### Transforming Feeding Type Data 
There are too many re-occuring unique values for feeding time that can be transformed into several feeding types. This due to users documenting different feeding types that mothers take during feeding.   

In [None]:
dic = {'Breast milk, Formula,':'Breast milk, Formula','Breast milk, Other:':'Breast milk','Fortified breast milk, Other:':'Fortified breast milk',
      'Formula, Fortified breast milk, Other:':'Formula, Fortified breast milk','SIMILAC':'Similac', 'Formula, Other:':'Formula', 'Other':'Other'}

df_feed = df_group.copy()

for k, v in dic.items():
    df_feed['newborn_feeding_type'] = np.where(df_feed['newborn_feeding_type'].str.contains(k, case=False),v, df_feed['newborn_feeding_type'])    
    
df_unique_feed = df_feed['newborn_feeding_type'].value_counts()
df_unique_feed = df_unique_feed.rename_axis('unique_values').reset_index(name='counts')
df_unique_feed

In [None]:
plt.figure(figsize=(20,5))
sns.countplot(x='newborn_feeding_type', data=df_feed, edgecolor=(0,0,0), palette='pastel')
# Rotate xticks
plt.xticks(rotation=-45)
plt.show()

#### Initial Feeding Visualization
We want to see what percentage of the patient population are feeding with in 1 hour after birth.

In [None]:
# Transformed the index to have a multi-index using pid and event_dt_tm to get the initial feeding event dt tm
df_sort_feed = df_feed.copy()
df_sort_feed['event_dt_tm'] = pd.to_datetime(df_sort_feed['event_dt_tm'])
df_sort_feed.set_index(['pid','event_dt_tm'], drop=True, append=False, inplace=True, verify_integrity=False)
df_sort_feed = df_sort_feed.sort_index()
df_sort_feed

In [None]:
# grouped rows and took the 1st row to isolate the first feeding per newborn.
df_first_feed = df_sort_feed.groupby(level=0).apply(lambda x: x.iloc[0:1])
df_first_feed.index = df_first_feed.index.droplevel(0)
df_first_feed

In [None]:
df_first_feed.info()

In [None]:
# Categorized first feeding event by less than 1 hour, >1 to 4 hours, > 4 hours
df_first_feed.dropna(subset=['birth_to_event_hrs'], how='all', inplace=True)
df_first_feed.loc[df_first_feed['birth_to_event_hrs'] <= 1, 'FIRST_FEED'] = '<=1 HR'
df_first_feed.loc[(df_first_feed['birth_to_event_hrs'] >= 1) & (df_first_feed['birth_to_event_hrs'] <= 4), 'FIRST_FEED'] = '>1 to 4 HRS'
df_first_feed.loc[df_first_feed['birth_to_event_hrs'] > 4, 'FIRST_FEED'] = '>4 HRS'

df_first_feed

In [None]:
# removed rows with 0 to less than 0 birth to event in hours
df_first_feed = df_first_feed[df_first_feed['birth_to_event_hrs']>0]
df_first_feed.head()

In [None]:
# Visualization using swarmplot showing patient population first feeding in hours grouped by feeding type. 
plt.figure(figsize=(20,10))
sns.swarmplot(x="newborn_feeding_type", y="birth_to_event_hrs", data=df_first_feed)
plt.yticks([0,5,10,15,20,25,30,35,40])
plt.ylabel("FIRST FEED IN HRS")
plt.xticks(rotation=-45)

plt.show()

## Initial Feeding ECDF

In [None]:
# ECDF function
def ecdf(data):

    # Number of data points: n
    n = len(data)

    # x-data for the ECDF: x
    x = np.sort(data)

    # y-data for the ECDF: y
    y = np.arange(1, n+1) / n

    return x, y

In [None]:
# Compute ECDF
x, y = ecdf(df_first_feed['birth_to_event_hrs'])

# Generate plot
plt.figure(figsize=(20,10))
plt.yticks([0,.05,.10,.15,.20,.25,.30,.35,.40,.45,.50,.55,.60,.65,.70,.75,.80,.85,.90,.95,1])
_ = plt.plot(x, y, marker = '.', linestyle = 'none')

# Make the margins
plt.margins(0.02)

_ = plt.xlabel('birth_to_event_hrs')
_ = plt.ylabel('ECDF')


# Specify array of percentiles: percentiles
percentiles = np.array([10, 25, 50, 75, 97.5])

# Compute percentiles: ptiles_vers
ptiles = np.percentile(df_first_feed['birth_to_event_hrs'], percentiles)

# Overlay percentiles as red diamonds.
_ = plt.plot(ptiles, percentiles / 100, marker='D', color='red',
             linestyle='none')


# zip joins x and y coordinates in pairs
for x,y in zip(ptiles,percentiles / 100):

    label = "{:.2f}".format(y)

    plt.annotate(label, # this is the text
                 (x,y), # this is the point to label
                 textcoords="offset points", # how to position the text
                 xytext=(0,10), # distance from text to points (x,y)
                 ha='left') # horizontal alignment can be left, right or center
plt.show()  

The ECDF shows that in 2017, 10% of the mothers breast fed within 1 hour from the birth of the newborn.