# Capstone Project 1: Working with Numpy Matrices (Multidimensional Data)
This project involves analyzing body measurements from the NHANES dataset for adult males and females. We will explore various statistical methods to compare the distributions, correlations, and standardizations of these measurements.


2. Import Libraries
Start by importing the necessary libraries such as numpy, pandas, and matplotlib.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set seaborn style for better visuals
sns.set(style="whitegrid")


3. Load the Data
Load the two datasets (nhanes_adult_male_bmx_2020.csv and nhanes_adult_female_bmx_2020.csv) into numpy matrices.

In [None]:
# Load the datasets
male_df = pd.read_csv('nhanes_adult_male_bmx_2020.csv')
female_df = pd.read_csv('nhanes_adult_female_bmx_2020.csv')

# Convert to numpy matrices
male = male_df.to_numpy()
female = female_df.to_numpy()

# Display the first few rows to check the data
male[:5], female[:5]


4. Plot Histograms of Weights
Create a subplot with two histograms to visualize the distribution of weights for males and females.

In [None]:
plt.figure(figsize=(12, 8))

# Female weights
plt.subplot(2, 1, 1)
plt.hist(female[:, 0], bins=20, color='salmon', edgecolor='black')
plt.title('Histogram of Female Weights')
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')

# Male weights
plt.subplot(2, 1, 2)
plt.hist(male[:, 0], bins=20, color='lightblue', edgecolor='black')
plt.title('Histogram of Male Weights')
plt.xlabel('Weight (kg)')
plt.ylabel('Frequency')

# Make x-axis limits identical
plt.xlim([40, 160])  # Adjust based on data range

plt.tight_layout()
plt.show()


5. Boxplot Comparison of Weights
Draw a box-and-whisker plot to compare the distribution of weights between males and females.

In [None]:
plt.figure(figsize=(8, 6))

# Boxplot for male and female weights
plt.boxplot([female[:, 0], male[:, 0]], labels=['Female', 'Male'], patch_artist=True)

plt.title('Boxplot of Weights: Male vs Female')
plt.ylabel('Weight (kg)')
plt.show()

6. Compute Basic Numerical Aggregates
Calculate and compare the basic numerical aggregates (mean, median, standard deviation, skewness, kurtosis) for male and female weights.

In [None]:
from scipy.stats import skew, kurtosis

# Compute aggregates
def compute_aggregates(data):
    return {
        'mean': np.mean(data),
        'median': np.median(data),
        'std': np.std(data),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

female_aggregates = compute_aggregates(female[:, 0])
male_aggregates = compute_aggregates(male[:, 0])

female_aggregates, male_aggregates


7. Add BMI Column to Female Matrix
Calculate the BMI and add it as an eighth column to the female matrix.

In [None]:
# Calculate BMI: weight (kg) / height (m)^2
female_bmi = female[:, 0] / (female[:, 1] / 100) ** 2
female = np.column_stack((female, female_bmi))

female[:5]  # Display the first few rows with the new BMI column


8. Standardize the Female Dataset
Create a standardized version of the female dataset using z-scores.

In [None]:
zfemale = (female - np.mean(female, axis=0)) / np.std(female, axis=0)

zfemale[:5]  # Display the first few rows of the standardized dataset


9. Scatterplot Matrix
Draw a scatterplot matrix for the standardized variables (height, weight, waist circumference, hip circumference, and BMI).

In [None]:
# Create a DataFrame for easier plotting
zfemale_df = pd.DataFrame(zfemale, columns=['Weight', 'Height', 'Arm Length', 'Leg Length', 'Arm Circumference', 'Hip Circumference', 'Waist Circumference', 'BMI'])

# Select columns for the scatterplot matrix
sns.pairplot(zfemale_df[['Weight', 'Height', 'Waist Circumference', 'Hip Circumference', 'BMI']])
plt.suptitle('Scatterplot Matrix for Standardized Female Measurements', y=1.02)
plt.show()

# Compute Pearson and Spearman correlations
pearson_corr = zfemale_df[['Weight', 'Height', 'Waist Circumference', 'Hip Circumference', 'BMI']].corr(method='pearson')
spearman_corr = zfemale_df[['Weight', 'Height', 'Waist Circumference', 'Hip Circumference', 'BMI']].corr(method='spearman')

pearson_corr, spearman_corr


10. Waist-to-Height and Waist-to-Hip Ratios
Calculate these ratios for both males and females, and add them as additional columns.

In [None]:
# Calculate ratios
male_wthr = male[:, 6] / male[:, 1]
male_whtr = male[:, 6] / male[:, 5]

female_wthr = female[:, 6] / female[:, 1]
female_whtr = female[:, 6] / female[:, 5]

# Add as new columns
male = np.column_stack((male, male_wthr, male_whtr))
female = np.column_stack((female, female_wthr, female_whtr))

male[:5], female[:5]  # Display first few rows to check new columns


11. Boxplot for Ratios
Draw a boxplot comparing the waist-to-height and waist-to-hip ratios for both males and females.

In [None]:
plt.figure(figsize=(10, 6))

# Boxplot for the ratios
plt.boxplot([female[:, 8], male[:, 8], female[:, 9], male[:, 9]],
            labels=['Female WTHR', 'Male WTHR', 'Female WHtR', 'Male WHtR'],
            patch_artist=True)

plt.title('Boxplot Comparison: Waist-to-Height and Waist-to-Hip Ratios')
plt.ylabel('Ratio')
plt.show()


12. Discuss the Results of BMI and Ratios
Provide a discussion of the advantages and disadvantages of BMI, waist-to-height ratio, and waist-to-hip ratio.

In [None]:
### Discussion
- **BMI:** A simple and widely used measure but does not account for muscle mass or fat distribution.
- **Waist-to-Height Ratio:** Better at predicting cardiovascular risks, especially in diverse populations.
- **Waist-to-Hip Ratio:** Reflects fat distribution but might not account for differences in muscle mass.


13. Standardized Measurements of Extreme BMI Individuals
Print the standardized body measurements for the individuals with the lowest and highest BMI.

In [None]:
# Get indices of the 5 lowest and 5 highest BMI
low_bmi_indices = np.argsort(zfemale[:, 7])[:5]
high_bmi_indices = np.argsort(zfemale[:, 7])[-5:]

# Print standardized measurements
print("Standardized measurements for the 5 individuals with the lowest BMI:")
print(zfemale[low_bmi_indices])

print("\nStandardized measurements for the 5 individuals with the highest BMI:")
print(zfemale[high_bmi_indices])


14. Conclusion

# Conclusion
This analysis provided a comprehensive comparison of male and female body measurements from the NHANES dataset. We explored different statistical methods, including histograms, boxplots, and scatterplots, to compare distributions and relationships among variables. The results highlighted differences in weight distribution between genders and provided insights into the usefulness of various anthropometric measures.
