# NEURO-105: Statistics and Probability using Python
## Lesson 1 - Friday 16/1/26

**Instructor:** Alexandros Pittis  
**Course:** MSc in Neurosciences, University of Crete

---

### Today's Objectives
1. Get familiar with Jupyter notebooks
2. Load and explore data with Pandas
3. Visualize data with Seaborn
4. Explore probability distributions

---

## Part 1: Jupyter Notebook Basics

A Jupyter notebook consists of **cells**:
- **Markdown cells**: Text and explanations (like this one)
- **Code cells**: Python code (executable)

### Essential Shortcuts

| Action | Shortcut |
|--------|----------|
| Run cell | `Shift + Enter` |
| Run cell, stay in place | `Ctrl + Enter` |
| Insert cell below | `B` (command mode) |
| Insert cell above | `A` (command mode) |
| Delete cell | `DD` (command mode) |
| Switch to Markdown | `M` (command mode) |
| Switch to Code | `Y` (command mode) |

**Command mode** = Press `Esc` (blue border)  
**Edit mode** = Press `Enter` (green border)

üëâ **Try it:** Run the cell below with `Shift + Enter`

In [None]:
print("Hello, NEURO-105!")
2 + 2

---
## Part 2: Setup - Import Libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats

# Nice plot style
sns.set_theme(style="whitegrid")

print("Libraries loaded!")

---
## Part 3: Python Essentials (Quick Review)

Just the basics we'll need today.

In [None]:
# Variables
n_samples = 100
mean_value = 25.5
name = "experiment_1"

print(f"Running {name} with {n_samples} samples")

In [None]:
# Lists
values = [10, 20, 30, 40, 50]

print("First:", values[0])
print("Last:", values[-1])
print("Length:", len(values))

In [None]:
# Loops
for v in values:
    print(v * 2)

In [None]:
# Functions
def calculate_mean(data):
    return sum(data) / len(data)

calculate_mean(values)

---
## Part 4: Working with Data (Pandas)

Let's create a simple dataset: reaction times from a neuroscience experiment.

In [None]:
# Create dataset
np.random.seed(42)

data = pd.DataFrame({
    'subject': range(1, 31),
    'group': ['Control'] * 15 + ['Treatment'] * 15,
    'age': np.random.randint(22, 40, 30),
    'reaction_time': np.concatenate([
        np.random.normal(350, 40, 15),  # Control
        np.random.normal(310, 35, 15)   # Treatment (faster)
    ]).round(1)
})

data.head(10)

In [None]:
# Quick statistics
data.describe()

In [None]:
# Filter: only Control group
data[data['group'] == 'Control']

In [None]:
# Statistics by group
data.groupby('group')['reaction_time'].agg(['mean', 'std', 'count'])

---
## Part 5: Visualization (Seaborn)

Seaborn makes statistical plots easy and beautiful.

In [None]:
# Histogram
sns.histplot(data=data, x='reaction_time', bins=12)
sns.despine()

In [None]:
# Histogram by group
sns.histplot(data=data, x='reaction_time', hue='group', bins=10, alpha=0.6)
sns.despine()

In [None]:
# Box plot - compare groups
sns.boxplot(data=data, x='group', y='reaction_time', palette='Set2')
sns.despine()

In [None]:
# Violin plot - shows full distribution
sns.violinplot(data=data, x='group', y='reaction_time', palette='Set2')
sns.despine()

In [None]:
# KDE - smooth density estimate
sns.kdeplot(data=data, x='reaction_time', hue='group', fill=True, alpha=0.4)
sns.despine()

In [None]:
# Scatter plot
sns.scatterplot(data=data, x='age', y='reaction_time', hue='group', s=80)
sns.despine()

---
## Part 6: Probability Distributions

Key distributions you'll encounter in biomedical research.

### 6.1 Normal (Gaussian) Distribution ‚≠ê

The most important distribution - the "bell curve".

**Parameters:** Œº (mean), œÉ (standard deviation)

In [None]:
from scipy.stats import norm

# Generate samples from Normal distribution
samples = norm.rvs(loc=0, scale=1, size=5000)  # Œº=0, œÉ=1

# Plot
sns.histplot(samples, stat='density', alpha=0.5)
sns.kdeplot(samples, color='red', linewidth=2)
sns.despine()

In [None]:
# The 68-95-99.7 rule
samples = norm.rvs(0, 1, size=100000)

print("Percentage of data within:")
print(f"  ¬±1œÉ: {np.mean(np.abs(samples) <= 1) * 100:.1f}% (theoretical: 68.3%)")
print(f"  ¬±2œÉ: {np.mean(np.abs(samples) <= 2) * 100:.1f}% (theoretical: 95.4%)")
print(f"  ¬±3œÉ: {np.mean(np.abs(samples) <= 3) * 100:.1f}% (theoretical: 99.7%)")

### 6.2 Binomial Distribution

Number of successes in n independent trials.

**Parameters:** n (trials), p (probability of success)

In [None]:
from scipy.stats import binom

# Example: 20 coin flips, P(heads) = 0.5
samples = binom.rvs(n=20, p=0.5, size=5000)

sns.histplot(samples, discrete=True, stat='probability')
sns.despine()

### 6.3 Poisson Distribution

Number of events in a fixed interval (time, space, etc).

**Parameter:** Œª (average rate)

In [None]:
from scipy.stats import poisson

# Example: average 3 mutations per gene
samples = poisson.rvs(mu=3, size=5000)

sns.histplot(samples, discrete=True, stat='probability', color='forestgreen')
sns.despine()

### 6.4 Student's t-Distribution

Like Normal but with heavier tails. Used for t-tests with small samples.

**Parameter:** df (degrees of freedom)

In [None]:
from scipy.stats import t

# Compare t vs Normal
x = np.linspace(-4, 4, 200)

df_plot = pd.DataFrame({
    'x': np.tile(x, 3),
    'density': np.concatenate([norm.pdf(x), t.pdf(x, df=3), t.pdf(x, df=10)]),
    'dist': ['Normal'] * len(x) + ['t (df=3)'] * len(x) + ['t (df=10)'] * len(x)
})

sns.lineplot(data=df_plot, x='x', y='density', hue='dist')
sns.despine()

### 6.5 Chi-Square Distribution

Used for categorical data tests (goodness-of-fit, independence).

**Parameter:** df (degrees of freedom)

In [None]:
from scipy.stats import chi2

x = np.linspace(0, 15, 200)

chi_plot = pd.DataFrame({
    'x': np.tile(x, 3),
    'density': np.concatenate([chi2.pdf(x, df=2), chi2.pdf(x, df=5), chi2.pdf(x, df=10)]),
    'df': ['df=2'] * len(x) + ['df=5'] * len(x) + ['df=10'] * len(x)
})

sns.lineplot(data=chi_plot, x='x', y='density', hue='df')
sns.despine()

---
## Part 7: Summary

In [None]:
# Quick reference table
reference = pd.DataFrame({
    'Distribution': ['Normal', 'Binomial', 'Poisson', 'Student-t', 'Chi-Square'],
    'Type': ['Continuous', 'Discrete', 'Discrete', 'Continuous', 'Continuous'],
    'scipy.stats': ['norm', 'binom', 'poisson', 't', 'chi2'],
    'Use Case': [
        'Natural measurements, errors',
        'Successes in n trials',
        'Events per interval',
        't-tests (small samples)',
        'Categorical tests'
    ]
})
reference

---
## Exercises

### Exercise 1
Using our `data` DataFrame, find the mean reaction time for subjects older than 30.

In [None]:
# YOUR CODE HERE


### Exercise 2
Create a histogram of ages in our dataset, colored by group.

In [None]:
# YOUR CODE HERE


### Exercise 3
Generate 1000 samples from a Normal distribution with Œº=100 and œÉ=15 (like IQ scores). What percentage are above 130?

In [None]:
# YOUR CODE HERE


---
## Resources

- [Seaborn Tutorial](https://seaborn.pydata.org/tutorial.html)
- [Pandas Getting Started](https://pandas.pydata.org/docs/getting_started/index.html)
- [SciPy Stats](https://docs.scipy.org/doc/scipy/reference/stats.html)
- [Handbook of Biological Statistics](https://www.biostathandbook.com/)

---

**Next class (23/1/26):** Fitting models to data, correlation

---
*NEURO-105 - MSc in Neurosciences, University of Crete*