# Daily Blog #52 - Descriptive & Inferential Statistics in Python
### June 21, 2025

## Python packages:

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
```

If you haven’t installed them yet:

```
pip install numpy pandas matplotlib seaborn scipy
```

---

## 1. Descriptive Statistics

**Example Data**:

```python
# Generate some synthetic data
np.random.seed(42)
data = pd.DataFrame({
    "scores": np.random.normal(75, 10, 100),  # normally distributed scores
    "category": np.random.choice(["A", "B"], size=100)
})

data.head()
```

**Compute Summary Stats**:

```python
print(data['scores'].describe())  # count, mean, std, min, 25%, 50%, 75%, max
print(data.groupby('category')['scores'].mean())  # mean by group
print(data.groupby('category')['scores'].std())   # std dev by group
```

**Visualize the distribution**:

```python
sns.histplot(data['scores'], kde=True)
plt.title('Score Distribution')
plt.show()

sns.boxplot(x='category', y='scores', data=data)
plt.title('Score by Category')
plt.show()
```

These visualizations give you a quick idea of the center and spread.

---

## 2. Inferential Statistics

### One-sample t-test

**Goal**: Test if the mean is significantly different from some hypothesized mean (e.g. 70).

```python
t_stat, p_val = stats.ttest_1samp(data['scores'], 70)
print('T-Statistic:', t_stat, '| P-Value:', p_val)

if p_val < 0.05:
    print("Significant difference from 70!")
else:
    print("No significant difference from 70.")
```

### Independent t-test

**Goal**: Compare mean scores between Category A and B.

```python
a_scores = data.loc[data['category']=='A', 'scores']
b_scores = data.loc[data['category']=='B', 'scores']

t_stat, p_val = stats.ttest_ind(a_scores, b_scores, equal_var=False)
print('T-Statistic:', t_stat, '| P-Value:', p_val)
```

Interpretation: Low p-value means the two groups are significantly different.

---

### Chi-square Test

If you want to check the relationship between two categorical variables:

```python
# Dummy categorical data
category2 = np.random.choice(["X","Y"], size=100)
data['category2'] = category2

# Build contingency table
contingency = pd.crosstab(data['category'], data['category2'])

chi2, p, dof, expected = stats.chi2_contingency(contingency)
print('Chi-square:', chi2, '| P-Value:', p)
```

---

## Going further

* **ANOVA** (`stats.f_oneway`)
* **Correlation tests** (`stats.pearsonr`, `stats.spearmanr`)
* **Confidence intervals**:

```python
mean = data['scores'].mean()
std_err = stats.sem(data['scores'])
ci = stats.t.interval(0.95, len(data)-1, loc=mean, scale=std_err)
print('95% Confidence Interval:', ci)
```

---

## Takeaway:
- **Descriptive statistics** → Summarize and visualize your data.
- **Inferential statistics** → Make conclusions that go beyond your sample (p-values, t-tests, chi-square).