#INTRODUCTION
Problem Statement
Birth weight is a critical indicator of newborn health and future developmental outcomes. Low birth weight (LBW) is linked to increased risks of infant mortality, chronic health conditions, and developmental challenges. Healthcare providers, public health agencies, and policymakers require data-driven insights to identify risk factors and implement effective interventions for improving neonatal health outcomes.

Project Goal
This project seeks to:

Analyze correlations between prenatal care, maternal health, and birth outcomes.

Provide evidence-based recommendations for reducing LBW incidence and enhancing neonatal care.

Key Questions
What maternal factors (e.g., age, BMI, health conditions) most strongly correlate with birth weight?

Are there patterns in birth weight related to prenatal care quality,lifestyle habits(smoking)

What interventions (e.g., nutritional programs, healthcare policies) show the greatest potential for improving birth weight?

Target Audience
Healthcare providers (OB-GYNs, midwives, pediatricians)

Public health departments and policymakers

Researchers in maternal and child health

Health insurance providers and community health organizations

# 2. Data Understanding
We'll now systematically explore each dataset to understand:

Structure (columns and data types)

- Sample data (via .head())

- Missing values

- Duplicates

- Basic statistics (via .describe())

In [13]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


## LOAD DATA SET

In [14]:
df = pd.read_csv("Dataset/babies.csv")
df.head

<bound method NDFrame.head of       case  bwt  gestation  parity   age  height  weight  smoke
0        1  120      284.0       0  27.0    62.0   100.0    0.0
1        2  113      282.0       0  33.0    64.0   135.0    0.0
2        3  128      279.0       0  28.0    64.0   115.0    1.0
3        4  123        NaN       0  36.0    69.0   190.0    0.0
4        5  108      282.0       0  23.0    67.0   125.0    1.0
...    ...  ...        ...     ...   ...     ...     ...    ...
1231  1232  113      275.0       1  27.0    60.0   100.0    0.0
1232  1233  128      265.0       0  24.0    67.0   120.0    0.0
1233  1234  130      291.0       0  30.0    65.0   150.0    1.0
1234  1235  125      281.0       1  21.0    65.0   110.0    0.0
1235  1236  117      297.0       0  38.0    65.0   129.0    0.0

[1236 rows x 8 columns]>


# 3. EXPLORATORY DATA ANALYSIS

In this section we aim to understand the overall structure of the `Dataset/babies.csv` by examining the dataset and  summarizing its main characteristics using statistics and visualizations.

**Steps to achieve this:**

- Understand the structure and content of the data

- Identify missing or inconsistent values

- Explore relationships between features

- Generate insights that can guide further analysis or modeling


### Step 1: Overview of Columns and Data types

This helps us understand which columns are numerical, categorical, and where missing values exist.

In [15]:
# Column names and data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1236 entries, 0 to 1235
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   case       1236 non-null   int64  
 1   bwt        1236 non-null   int64  
 2   gestation  1223 non-null   float64
 3   parity     1236 non-null   int64  
 4   age        1234 non-null   float64
 5   height     1214 non-null   float64
 6   weight     1200 non-null   float64
 7   smoke      1226 non-null   float64
dtypes: float64(5), int64(3)
memory usage: 77.4 KB


### Step 2: Check for missing Data

In [None]:
df.isnull().sum().sort_values(ascending=False)

weight       36
height       22
gestation    13
smoke        10
age           2
case          0
bwt           0
parity        0
dtype: int64