Vermont Oxford Network (VON) is a nonprofit voluntary collaboration of health care professionals working together as an interdisciplinary community to change the landscape of neonatal care.  Founded in 1988, VON is now comprised of teams of health professionals representing neonatal intensive care units and level I and II care centers around the world, in support of the mission to improve the quality and safety of medical care for newborn infants and their families through a coordinated program of research, education, and quality improvement projects. (ref: https://public.vtoxford.org/about-us/)

Mortality Risk Amongst Very Low Birth Weight Infants Born in the Republic of Ireland was published by  National Perinatal Epidemiology Centre, 2018 (ref: 
https://www.ucc.ie/en/media/research/nationalperinatalepidemiologycentre/annualreports/MortalityRiskAmongstVeryLowBirthWeightInfantsintheRepublicofIrelandReport2014-2016.pdf)

This notebook aims to sythesise data to reflect the findings in this report

# The Variables - Explained

## Gestational Weeks
Gestational age is the common term used during pregnancy to describe how far along the pregnancy is. It is measured in weeks, from the first day of the woman's last menstrual cycle to the current date. A normal pregnancy can range from 38 to 42 weeks. Infants born before 37 weeks are considered premature. The outcome of this variable will be most likely between 22 - 32 weeks.
(ref:https://medlineplus.gov/ency/article/002367.htm)

## Birthweight
Birth weight is the first weight of your baby, taken just after he or she is born. A low birth weight is less than 5.5 pounds. A high birth weight is more than 8.8 pounds. The expected result of this variable will see most weights less than 5.5 pounds 

## Congenital Anomaly 
Congenital anomalies are also known as birth defects, congenital disorders or congenital malformations. Yes (0) or No (1).

## Disposition at 1 Year
Did the infant live or die.  The outcome is recorded one year after birth.  There are 4 possible outcomes:
- Died (0)
- Home (1)
- Still Hospitalised (2)
- Unkown (3)
 



In [1]:
# create a dataframe structure by starting with the column headings
import pandas as pd
import numpy as np
df = pd.DataFrame(columns= ["Gestational Weeks", "Birthweight", "Congenital Anomaly", "Disposition at 1 Year" ])

In [2]:
df

Unnamed: 0,Gestational Weeks,Birthweight,Congenital Anomaly,Disposition at 1 Year


In [3]:
# Testing to populate the data 
data = [['Alex',10, 6, 2],['Bob',12, 6 ,2],['Clarke',13, 6 ,2]]
df = pd.DataFrame(data, columns = ['Gestational Weeks', 'Birthweight', 'Congenital Anomaly', 'Disposition at 1 Year'])
df

Unnamed: 0,Gestational Weeks,Birthweight,Congenital Anomaly,Disposition at 1 Year
0,Alex,10,6,2
1,Bob,12,6,2
2,Clarke,13,6,2


In [4]:
# The above table is the basic structure which I will use as the foundations for systhessing my dataset

## Gestational Weeks
In this dataset the Gestational weeks has a spread from 21 weeks to 33 weeks.


In [31]:
#Setting Parameters for gestational weeks.  Majority of babies born weeks 28, 29, 30
#mean and standard deviation
gw = np.random.normal(28, 3, 100) 
gw

array([21.04099303, 31.86162443, 23.62483107, 28.35040014, 28.18910984,
       31.67013815, 31.0507699 , 26.52863348, 30.34872386, 30.66253344,
       24.85932132, 20.39988942, 26.4900063 , 29.90401596, 25.844784  ,
       31.52347509, 25.79245472, 23.07852362, 27.55198246, 25.64100304,
       27.80875377, 30.349714  , 27.67477326, 26.95906774, 31.63106736,
       28.30790119, 33.55997358, 25.90015051, 28.04582992, 25.7648078 ,
       31.16422338, 26.65238492, 26.98563927, 23.52439689, 26.50725472,
       26.48517865, 26.50079837, 28.26543681, 23.84915476, 26.2476226 ,
       29.16918026, 29.63104024, 31.0103263 , 29.50272221, 29.13432679,
       29.06870948, 29.54366562, 28.22147537, 29.88747809, 28.56165866,
       27.8068829 , 28.46494757, 27.54623039, 29.08595844, 16.96224617,
       29.42305442, 29.19260768, 24.35359976, 29.8641145 , 26.31254372,
       27.99167091, 28.31093776, 32.64644231, 28.52757294, 27.97670894,
       28.44605406, 25.02805231, 25.93805915, 28.00351563, 28.48

## Birthweight
The birth weights of very low birth weight infants ranged from
360g to 2,640g

In [6]:
#Setting Parameters for bithweight, low = 400g, High = 1500g

bw = np.random.randint(400, 1500, size=100) #this function draws random numbers from a uniform distribution
bw

array([ 849,  795, 1280,  559,  810, 1180,  780,  666,  778,  605,  949,
       1385,  534, 1146,  620, 1205,  543, 1371,  818, 1491,  971,  696,
       1200,  412,  437,  676,  466, 1368,  612,  609,  496,  904,  728,
       1459, 1435, 1067,  634, 1190,  449,  608,  701,  907,  786,  494,
        822, 1302, 1277,  778,  762, 1096,  745,  557,  650, 1113,  593,
       1035, 1336, 1375, 1173, 1416,  891,  648, 1165,  776, 1196, 1476,
        730,  775,  607,  878,  836,  584, 1264,  835,  782,  926, 1111,
       1259, 1309,  757, 1040,  500, 1352, 1168,  881,  444, 1479, 1258,
        622, 1040, 1145, 1186, 1261,  864, 1442, 1350, 1184,  897,  530,
        650])

## Congenital Anomaly
About 9% of babies have a congenital anomaly.  Does the baby have a congenital anomaly? No = 0, Yes = 1  

In [18]:
# using random.choice, I have implmented a bias to reflect the real world
ca = np.random.choice(["Yes","No"], 100, p=[0.1, 0.9])
ca


array(['No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No',
       'No', 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No',
       'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'No', 'No',
       'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes',
       'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes',
       'No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'No', 'No', 'Yes', 'No',
       'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'Yes', 'Yes',
       'No', 'No', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'No',
       'No', 'No', 'No', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'No',
       'No', 'No'], dtype='<U3')

## Disposition at 1 Year


In [25]:
dip = np.random.choice(["Died", "At Home", "Hospital", "Unknown"], 100, p=[0.2, 0.4, 0.3, 0.1])
dip

array(['Hospital', 'At Home', 'At Home', 'Hospital', 'Unknown', 'At Home',
       'Hospital', 'At Home', 'At Home', 'Hospital', 'Unknown', 'At Home',
       'At Home', 'At Home', 'Died', 'At Home', 'Died', 'Died', 'At Home',
       'Died', 'At Home', 'Died', 'Died', 'Hospital', 'Hospital',
       'Hospital', 'At Home', 'Unknown', 'At Home', 'At Home', 'Died',
       'At Home', 'Hospital', 'Unknown', 'At Home', 'At Home', 'Died',
       'Died', 'At Home', 'Hospital', 'Hospital', 'Died', 'At Home',
       'Died', 'At Home', 'At Home', 'At Home', 'At Home', 'At Home',
       'Hospital', 'Died', 'Hospital', 'At Home', 'At Home', 'Hospital',
       'Unknown', 'Unknown', 'At Home', 'Hospital', 'Hospital',
       'Hospital', 'Died', 'Died', 'At Home', 'At Home', 'At Home',
       'Died', 'At Home', 'Died', 'Died', 'Died', 'At Home', 'At Home',
       'Died', 'Died', 'At Home', 'Unknown', 'Died', 'At Home',
       'Hospital', 'At Home', 'At Home', 'Hospital', 'Died', 'At Home',
       'Hospita

# Creating the Dataset

In [26]:
# Populating the Dataset

df = pd.DataFrame({"Gestational Weeks" : gw, "Birthweight" : bw, "Congenital Anomaly" : ca, "Disposition" : dip})
df

Unnamed: 0,Gestational Weeks,Birthweight,Congenital Anomaly,Disposition
0,30.639557,654,No,Hospital
1,31.218469,445,No,At Home
2,28.657310,1110,No,At Home
3,22.987491,1027,No,Hospital
4,30.043038,899,Yes,Unknown
5,27.054224,442,No,At Home
6,25.919980,1288,No,Hospital
7,28.024976,531,No,At Home
8,27.596233,1039,No,At Home
9,30.056321,567,No,Hospital
