
# Introduction to Python for Biostatistics

This notebook introduces Python programming concepts for biostatistics, including basic operations, data manipulation, and exploratory analysis.
        


## Dataset Information

For practicing biostatistical techniques, any dataset with numerical and categorical variables can be used. Here, you can use the **Diabetes dataset** for exploratory analysis.

1. **Kaggle:**
   - [Diabetes Dataset - Kaggle](https://www.kaggle.com/datasets/mathchi/diabetes-data)

### Dataset Attributes

- **Pregnancies**: Number of pregnancies.
- **Glucose**: Plasma glucose concentration.
- **BloodPressure**: Diastolic blood pressure (mm Hg).
- **SkinThickness**: Triceps skinfold thickness (mm).
- **Insulin**: 2-Hour serum insulin (mu U/ml).
- **BMI**: Body mass index (weight in kg/(height in m)^2).
- **DiabetesPedigreeFunction**: Diabetes pedigree function.
- **Age**: Age of the patient.
- **Outcome**: Class variable (0 = non-diabetic, 1 = diabetic).
        


## Basic Python Operations

Learn how to perform basic operations in Python such as addition, string manipulation, and comments.
        

In [None]:

# Addition
print("Addition example:", 8 + 9)

# String manipulation
name = "Python"
print(f"Welcome to {name} programming!")

# Comments
# This is a comment and will not be executed
print("Comments make code readable.")
        


## Variables and Data Types

Explore variables and data types, including integers, floats, and strings.
        

In [None]:

# Define variables
days_in_year = 365
pi_value = 3.14159
language = "Python"

# Display variables and their types
print(f"Days in a year: {days_in_year}, Type: {type(days_in_year)}")
print(f"Value of Pi: {pi_value}, Type: {type(pi_value)}")
print(f"Programming language: {language}, Type: {type(language)}")
        


## Loading Data

Learn how to load data using pandas, a powerful library for data manipulation.
        

In [None]:

import pandas as pd

# Load the dataset (replace with your file path)
# Example: data = pd.read_csv('C:\Path\to\diabetes.csv')
data = pd.DataFrame({
    "Age": [25, 30, 35],
    "BMI": [22.5, 24.3, 28.7],
    "Outcome": [0, 1, 0]
})

# Display the first few rows
print(data.head())
        


## Basic Data Exploration

Perform basic exploratory operations on the dataset.
        

In [None]:

# Display data information
print(data.info())

# Display summary statistics
print(data.describe())
        


## Visualizing Data

Create basic visualizations to explore the dataset.
        

In [None]:

import seaborn as sns
import matplotlib.pyplot as plt

# Histogram for Age
sns.histplot(data['Age'], kde=True)
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
        