
# Biological Variables and Data Validation in Biostatistics

This notebook explores concepts such as biological variables, confounders, and data validation in biostatistics. It includes practical examples and Python-based methods for validation and analysis.

## Topics Covered
- Static vs. Dynamic Biological Variables
- Biological Experiments and Statistical Integration
- Confounders and Latent Variables
- Validation of Biological Data

### Prerequisites
Install the required Python packages:
```bash
pip install numpy pandas matplotlib seaborn
```
        


## Static vs. Dynamic Biological Variables

- **Static Variables:** Do not change over time (e.g., blood type, genetic traits).
- **Dynamic Variables:** Change over time or under different conditions (e.g., blood pressure, BMI).
        

In [None]:

import pandas as pd

# Example of static and dynamic variables
data = pd.DataFrame({
    "Subject": [1, 2, 3, 4],
    "BloodType": ["A", "B", "AB", "O"],  # Static variable
    "BloodPressure": [120, 130, 125, 140],  # Dynamic variable
    "BMI": [22.5, 24.8, 23.1, 27.0]  # Dynamic variable
})

print("Example Data:")
print(data)
        


## Integrating Biological Experiments with Statistical Analysis

Biological experiments provide data for statistical evaluation. For example, comparing ejection fractions across treatments can reveal the efficacy of a cardiology treatment.
        

In [None]:

# Example: Ejection Fraction Calculation
data = pd.DataFrame({
    "Group": ["Treatment", "Control"],
    "SystolicVolume": [50, 45],
    "DiastolicVolume": [120, 115]
})

# Calculate Ejection Fraction (%)
data["EjectionFraction"] = (data["SystolicVolume"] / data["DiastolicVolume"]) * 100

print("Ejection Fraction Data:")
print(data)
        


## Identifying Confounders in Biological Data

Confounders are variables that can bias the relationship between an independent variable and a dependent variable. Consider controlling for confounders in experiments.
        

In [None]:

# Example: Confounder Analysis
data = pd.DataFrame({
    "Subject": [1, 2, 3, 4],
    "BMI": [22, 25, 28, 30],
    "ActivityLevel": ["High", "Medium", "Low", "Low"],
    "Outcome": [1, 0, 1, 0]  # Binary outcome (e.g., disease presence)
})

# Check for potential confounders
print("Correlation between BMI and Outcome:")
print(data.corr())
        


## Validating Biological Data

Validation involves ensuring the accuracy and reliability of data by:
1. Checking for missing values.
2. Detecting duplicates.
3. Ensuring data consistency.
        

In [None]:

# Data validation example
data = pd.DataFrame({
    "Subject": [1, 2, 2, 4],
    "BloodPressure": [120, 130, None, 140],
    "BMI": [22.5, 24.8, 24.8, None]
})

# Check for missing values
print("Missing Values:")
print(data.isnull().sum())

# Check for duplicates
print("
Duplicate Rows:")
print(data[data.duplicated()])

# Fill missing values with median
data["BloodPressure"] = data["BloodPressure"].fillna(data["BloodPressure"].median())
data["BMI"] = data["BMI"].fillna(data["BMI"].median())

print("
Cleaned Data:")
print(data)
        


## Visualizing Biological Data

Use histograms and scatter plots to explore the distribution and relationships of biological variables.
        

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

# Histogram for BMI
sns.histplot(data["BMI"], kde=True)
plt.title("BMI Distribution")
plt.xlabel("BMI")
plt.ylabel("Frequency")
plt.show()

# Scatter plot for Blood Pressure vs. BMI
sns.scatterplot(x=data["BMI"], y=data["BloodPressure"])
plt.title("Blood Pressure vs. BMI")
plt.xlabel("BMI")
plt.ylabel("Blood Pressure")
plt.show()
        