<a href="https://colab.research.google.com/github/Shriyansh-Gupta-8786/NHANES-Body-Analysis/blob/main/Minor_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Capstone Project 1: Working with NumPy Matrices (Multidimensional Data)
​This project aims to analyze body measurement data for adult males and females from the National Health and Nutrition Examination Survey (NHANES) using Python libraries, including NumPy and Matplotlib.​ Below outlines various tasks to be performed, presenting data visualizations and interpretations.

# Mounting Google Drive, Importing Data Files and Read the Data as Numpy Matrices

First, we will load the datasets containing body measurements for adult males and females. The data will be read into two NumPy matrices named male and female.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import numpy as np

# Load the male and female datasets
male = np.genfromtxt('/content/drive/MyDrive/Corizo/nhanes_adult_male_bmx_2020.csv', delimiter=',', skip_header=1)
female = np.genfromtxt('/content/drive/MyDrive/Corizo/nhanes_adult_female_bmx_2020.csv', delimiter=',', skip_header=1)

# Check for NaN values
print("Any NaN in male data:", np.isnan(male).any())
print("Any NaN in female data:", np.isnan(female).any())

# Remove rows with NaN values
male = male[~np.isnan(male).any(axis=1)]
female = female[~np.isnan(female).any(axis=1)]


#Display the shapes of the matrices
male.shape, female.shape
display(male)
display(female)

The matrices have been successfully loaded and now contain the body measurement data for each gender. Each matrix consists of seven columns as described.

# Plot Histograms for Weights

We will create histograms for the weights of adult females and males to visualize the distributions.

In [None]:
import matplotlib.pyplot as plt

# Extract weights from the matrices
female_weights = female[:, 0]  # First column (weight) for females
male_weights = male[:, 0]      # First column (weight) for males

# Create histograms
plt.figure(figsize=(10, 8))

# Female weights histogram
plt.subplot(2, 1, 1)
plt.hist(female_weights, bins=20, color='pink', alpha=0.7)
plt.title('Female Weight Distribution')
plt.xlim(40, 150)
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')

# Male weights histogram
plt.subplot(2, 1, 2)
plt.hist(male_weights, bins=20, color='blue', alpha=0.7)
plt.title('Male Weight Distribution')
plt.xlim(40, 150)
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()

The histograms reveal that both male and female weight distributions are approximately normal with slight skewness. The female weights tend to concentrate more towards the lower end, while male weights are more evenly distributed.

# Box-and-Whisker Plot for Weights

Next, we will create a box-and-whisker plot to compare male and female weights side by side.

In [None]:
# Create a boxplot for comparing male and female weights
plt.figure(figsize=(8, 5))
#plt.boxplot([female_weights, male_weights], labels=['Females', 'Males'])
plt.boxplot([female[1:,1], male[3:,1]], labels=['Females', 'Males'])
plt.title('Weight Comparison (Box-and-Whisker Plot)')
plt.ylabel('Weight (kg)')
plt.grid()
plt.show()


The box-and-whisker plot illustrates the weight distributions for both genders clearly. Males generally have a higher median weight compared to females, and the spread (interquartile range) of weights is also wider among males.

# Basic Numerical Aggregates

We will compute basic statistical measures for male and female weights, including mean, median, standard deviation, and variance.

In [None]:
male_stats = {
    'mean': np.mean(male_weights),
    'median': np.median(male_weights),
    'std': np.std(male_weights),
    'min': np.min(male_weights),
    'max': np.max(male_weights),
    'q1': np.percentile(male_weights, 25),
    'q3': np.percentile(male_weights, 75)
}

female_stats = {
    'mean': np.mean(female_weights),
    'median': np.median(female_weights),
    'std': np.std(female_weights),
    'min': np.min(female_weights),
    'max': np.max(female_weights),
    'q1': np.percentile(female_weights, 25),
    'q3': np.percentile(female_weights, 75)
}

print("Male Weight Stats:", male_stats)
print("Female Weight Stats:", female_stats)

Upon reviewing the statistical measures, it is evident that males have a higher average weight compared to females. The standard deviation for female weights is lower, indicating less variability in female weights compared to males.

# Adding Body Mass Index (BMI)

We will calculate the BMI for female participants and append a new column to the female dataset.

In [None]:
# Calculate BMI = weight (kg) / (height (m))^2; height in meters is second column divided by 100
height_in_meters = female[:, 1] / 100
BMI = female_weights / (height_in_meters ** 2)

# Add the BMI column to the female matrix
female = np.column_stack((female, BMI))
print(female[:5])


Having computed the BMI values, the female matrix now offers a comprehensive view of health metrics, essential for examining body composition relative to weight.

# Standardize of the Female Dataset

Standardizing the female dataset, based on z-scores, enhances the comparability of body measurements.

In [None]:
# Compute z-scores for each column
zfemale = (female - np.mean(female, axis=0)) / np.std(female, axis=0)
print(zfemale[:5])

The standardized matrix zfemales allows for the normalization of data, facilitating direct comparison across different body measurements.

# Scatterplot Matrix and Correlation Coefficients

Next, we will visualize the relationships between selected standardized measurements using a scatterplot matrix.

We will calculate and interpret Pearson’s and Spearman’s correlation coefficients for the chosen metrics.

In [None]:
import seaborn as sns
import pandas as pd
selected_columns = [0, 1, 6, 5, 7]  # weight, height, waist, hip, BMI
zfemale_selected = zfemale[:, selected_columns]
df_zfemale = pd.DataFrame(zfemale_selected, columns=['Weight', 'Height', 'Waist', 'Hip', 'BMI'])

sns.pairplot(df_zfemale)
plt.show()

corr_pearson = df_zfemale.corr(method='pearson')
corr_spearman = df_zfemale.corr(method='spearman')

print("Pearson Correlation:\n", corr_pearson)
print("Spearman Correlation:\n", corr_spearman)



The scatterplot matrix showcases distinct relationships, such as the notable correlation between weight and BMI. This visualization aids in understanding how these metrics relate within standardized criteria.

The results from the correlation calculations underscore significant relationships between various metrics. The stronger correlations between weight and BMI highlights how these metrics coalesce in determining body composition.

# Waist Circumference Ratios

Waist circumference ratios will be computed for enhanced analytical insights.

In [None]:
# Calculate ratios
male_waist_height_ratio = male[:, 0] / male[:, 1]
female_waist_height_ratio = female[:, 0] / female[:, 1]

male_waist_hip_ratio = male[:, 0] / male[:, 5]
female_waist_hip_ratio = female[:, 0] / female[:, 5]

# Add these ratios to the respective matrices
male = np.column_stack((male, male_waist_height_ratio, male_waist_hip_ratio))
female = np.column_stack((female, female_waist_height_ratio, female_waist_hip_ratio))
print(male[:5])
print(female[:5])


The newly derived waist circumference ratios provide essential health indicators concerning fat distribution relative to height and hip circumference.



# Box-and-Whisker Plot for Ratios

We will visualize the distributions of the newly computed waist-to-height and waist-to-hip ratios.

In [None]:
# Create a boxplot for ratios
plt.figure(figsize=(10, 6))
plt.boxplot([female[:, 8], female[:, 8], male[:, 8], male[:, 8]],
            labels=['Female Waist-Height', 'Female Waist-Hip', 'Male Waist-Height', 'Male Waist-Hip'])
plt.title('Waist Circumference Ratios Box-and-Whisker Plot')
plt.ylabel('Ratio')
plt.grid()
plt.show()


This box-and-whisker plot indicates that females tend to have lower ratios on both metrics compared to males. The dispersion is notably wider among males indicating greater variability in waist ratios.

# Advantages and Disadvantages of BMI, Waist-to-Height Ratio, and Waist-to-Hip Ratio

Finally, we will examine the advantages and disadvantages of BMI, waist-to-height ratio, and waist-to-hip ratio.

In [None]:
print('''
Advantages:

*BMI: Simple and widely accepted measure, easy to calculate using height and weight.

*Waist-to-Height Ratio: Considered a better indicator for assessing health risk compared to BMI, focusing on fat distribution.

*Waist-to-Hip Ratio: Useful for identifying abdominal fat levels, crucial for metabolic health.

Disadvantages:

*BMI: Fails to distinguish between muscle and fat mass, potentially misclassifying athletic individuals.

*Waist-to-Height Ratio: Less validated in certain populations; thresholds need further specification.

*Waist-to-Hip Ratio: Calculations may be subjective, influenced by measurement techniques, and population-specific norms.
''')


#Standardized Measurements for Extreme BMI

Lastly, we extract and print the standardized body measurements of individuals with the lowest and highest BMIs.

In [None]:
# Find indices of BMIs
zfemales = zfemale.copy()
low_bmi_indices = np.argsort(female[:, 7])[:5]  # Lowest 5 BMIs
high_bmi_indices = np.argsort(female[:, 7])[-5:]  # Highest 5 BMIs

extreme_bmi_measurements = zfemales[np.concatenate((low_bmi_indices, high_bmi_indices))]
print(extreme_bmi_measurements)


The extracted data reveals how individuals at the extremes of BMI differ significantly in terms of their measurements, accentuating the impact of BMI as a health indicator.