# Session 7: Statistical Analysis with Python

**Objective:** Perform statistical analysis using Python libraries such as NumPy and SciPy.

## 1. Introduction to NumPy and SciPy

- **NumPy**: A fundamental package for numerical computing in Python.
- **SciPy**: A library built on NumPy that provides additional scientific computing capabilities.

In [None]:
# Install libraries (if not already installed)
# pip install numpy scipy

In [None]:
# Import libraries
import numpy as np
from scipy import stats

## 2. Descriptive Statistics

In [None]:
# Mean, Median, and Standard Deviation
data = [12, 15, 14, 10, 18, 20, 15, 14, 17, 16]

mean_value = np.mean(data)
median_value = np.median(data)
std_dev = np.std(data)

print(f"Mean: {mean_value}, Median: {median_value}, Standard Deviation: {std_dev}")

In [None]:
# Five-Number Summary
q1, q3 = np.percentile(data, [25, 75])
minimum, maximum = np.min(data), np.max(data)
print(f"Min: {minimum}, Q1: {q1}, Median: {median_value}, Q3: {q3}, Max: {maximum}")

## 3. Hypothesis Testing and Regression Analysis

In [None]:
# T-test (Comparing Two Groups)
group1 = [22, 24, 20, 23, 21, 19, 25]
group2 = [30, 32, 29, 31, 28, 35, 33]

stat, p_value = stats.ttest_ind(group1, group2)
print(f"T-test Statistic: {stat}, P-value: {p_value}")

In [None]:
# Linear Regression Analysis
from scipy.stats import linregress

x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 5, 4, 5, 7])

slope, intercept, r_value, p_value, std_err = linregress(x, y)
print(f"Slope: {slope}, Intercept: {intercept}, R-squared: {r_value**2}")

## Activity: Analyze a Dataset

In [None]:
# Task 1: Load a dataset and compute descriptive statistics
import pandas as pd

# Replace 'data.csv' with your actual file
df = pd.read_csv("data.csv")
print(df.describe())

In [None]:
# Task 2: Perform hypothesis testing on a dataset
group1 = df[df['group'] == 'A']['value']
group2 = df[df['group'] == 'B']['value']

stat, p_value = stats.ttest_ind(group1, group2)
print(f"T-test P-value: {p_value}")