# NEURO-105: Statistics and Probability using Python
## Lesson 1 - Friday 16/1/26

**Instructor:** Alexandros Pittis  
**Course:** MSc in Neurosciences, University of Crete

---

## Part 1: Jupyter Notebook Basics

A Jupyter notebook consists of **cells**:
- **Markdown cells**: Text and explanations (like this one)
- **Code cells**: Python code (executable)

### Essential Shortcuts

| Action | Shortcut |
|--------|----------|
| Run cell | `Shift + Enter` |
| Run cell, stay in place | `Ctrl + Enter` |
| Insert cell below | `B` (command mode) |
| Insert cell above | `A` (command mode) |
| Delete cell | `DD` (command mode) |
| Switch to Markdown | `M` (command mode) |
| Switch to Code | `Y` (command mode) |

**Command mode** = Press `Esc` (blue border)  
**Edit mode** = Press `Enter` (green border)

ðŸ‘‰ **Try it:** Run the cell below with `Shift + Enter`

In [None]:
print("Hello, NEURO-105!")

---
## Part 2: Python Basics

In [None]:
# Variables - storing values
x = 10
y = 3.14
name = "neuroscience"

print(x)
print(y)
print(name)

In [None]:
# Simple math
a = 5
b = 3

print(a + b)   # addition
print(a - b)   # subtraction
print(a * b)   # multiplication
print(a / b)   # division
print(a ** 2)  # power

In [None]:
# Lists - collections of values
numbers = [10, 20, 30, 40, 50]

print(numbers)
print(numbers[0])   # first element (index starts at 0)
print(numbers[-1])  # last element
print(len(numbers)) # length

In [None]:
# Simple loop
for n in numbers:
    print(n)

---
## Part 3: Import Libraries

Libraries give us extra tools. We need:
- **numpy**: Numbers and random data
- **seaborn**: Plotting
- **scipy.stats**: Statistical distributions

In [None]:
import numpy as np
import seaborn as sns
from scipy import stats

sns.set_theme(style="whitegrid")

print("Libraries loaded!")

---
## Part 4: Random Numbers and Plotting

Let's generate random data and visualize it.

In [None]:
# Generate 1000 random numbers from a Normal distribution
# Normal distribution has: mean (center) and std (spread)

data = np.random.normal(loc=0, scale=1, size=1000)

print(data[:10])  # show first 10 values

In [None]:
# Plot a histogram
sns.histplot(data)
sns.despine()

In [None]:
# Basic statistics
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Std:", np.std(data))
print("Min:", np.min(data))
print("Max:", np.max(data))

---
## Part 5: Different Distributions

### Normal Distribution (Gaussian)
The famous "bell curve". Most natural measurements follow this.

In [None]:
# Normal distribution with different parameters
normal_1 = np.random.normal(loc=0, scale=1, size=1000)    # mean=0, std=1
normal_2 = np.random.normal(loc=5, scale=1, size=1000)    # mean=5, std=1
normal_3 = np.random.normal(loc=0, scale=2, size=1000)    # mean=0, std=2

sns.kdeplot(normal_1, label="mean=0, std=1")
sns.kdeplot(normal_2, label="mean=5, std=1")
sns.kdeplot(normal_3, label="mean=0, std=2")
sns.despine()

### Uniform Distribution
All values equally likely between min and max.

In [None]:
# Uniform distribution: random numbers between 0 and 10
uniform_data = np.random.uniform(low=0, high=10, size=1000)

sns.histplot(uniform_data, bins=20)
sns.despine()

### Binomial Distribution
Number of successes in n trials (like coin flips).

In [None]:
# Binomial: flip a coin 20 times, count heads
# n=20 flips, p=0.5 probability of heads
binomial_data = np.random.binomial(n=20, p=0.5, size=1000)

sns.histplot(binomial_data, discrete=True)
sns.despine()

### Poisson Distribution
Number of events in a fixed interval (e.g., mutations per gene).

In [None]:
# Poisson: average of 3 events per interval
poisson_data = np.random.poisson(lam=3, size=1000)

sns.histplot(poisson_data, discrete=True, color="green")
sns.despine()

### Compare Distributions

In [None]:
# Compare Normal vs Uniform
normal = np.random.normal(5, 1, 1000)
uniform = np.random.uniform(2, 8, 1000)

sns.kdeplot(normal, label="Normal", fill=True, alpha=0.3)
sns.kdeplot(uniform, label="Uniform", fill=True, alpha=0.3)
sns.despine()

---
## Part 6: Introduction to Pandas

Pandas lets us work with tables (like Excel). A table is called a **DataFrame**.

In [None]:
import pandas as pd

In [None]:
# Create a simple table manually
patients = pd.DataFrame({
    'name': ['Anna', 'Bob', 'Clara', 'David', 'Eva'],
    'age': [25, 32, 28, 45, 38],
    'weight': [62, 78, 55, 82, 68],
    'group': ['Control', 'Treatment', 'Control', 'Treatment', 'Control']
})

patients

In [None]:
# Access a column
patients['age']

In [None]:
# Calculate statistics on a column
print("Mean age:", patients['age'].mean())
print("Median age:", patients['age'].median())
print("Std age:", patients['age'].std())

In [None]:
# Quick summary of all numeric columns
patients.describe()

In [None]:
# Filter rows: only Control group
patients[patients['group'] == 'Control']

---
## Part 7: Plotting with Pandas Data

Let's create a larger dataset and visualize it.

In [None]:
# Create a larger dataset
np.random.seed(42)  # for reproducibility

experiment = pd.DataFrame({
    'subject': range(1, 31),
    'group': ['Control'] * 15 + ['Treatment'] * 15,
    'score': np.concatenate([
        np.random.normal(50, 10, 15),   # Control: mean=50
        np.random.normal(60, 10, 15)    # Treatment: mean=60
    ])
})

experiment.head(10)

In [None]:
# Histogram of all scores
sns.histplot(data=experiment, x='score')
sns.despine()

In [None]:
# Histogram by group
sns.histplot(data=experiment, x='score', hue='group')
sns.despine()

In [None]:
# Box plot - compare groups
sns.boxplot(data=experiment, x='group', y='score')
sns.despine()

In [None]:
# Violin plot
sns.violinplot(data=experiment, x='group', y='score')
sns.despine()

In [None]:
# Statistics by group
experiment.groupby('group')['score'].mean()

In [None]:
# More detailed statistics by group
experiment.groupby('group')['score'].describe()

---
## Part 8: Summary

Today we learned:
- How to use Jupyter notebooks
- Basic Python: variables, lists, loops
- Generate random data from distributions (Normal, Uniform, Binomial, Poisson)
- Plot with seaborn: `histplot`, `kdeplot`, `boxplot`, `violinplot`
- Work with tables using pandas
- Calculate mean, median, std

---
## Exercises

### Exercise 1
Generate 500 random numbers from a Normal distribution with mean=100 and std=15. Plot a histogram.

In [None]:
# YOUR CODE HERE


### Exercise 2
Using the `experiment` DataFrame, find the mean score for the Control group only.

In [None]:
# YOUR CODE HERE


### Exercise 3
Create your own small DataFrame with 5 people: name, height (in cm), and city. Then calculate the mean height.

In [None]:
# YOUR CODE HERE


---
## Resources

- [Seaborn Tutorial](https://seaborn.pydata.org/tutorial.html)
- [Pandas Getting Started](https://pandas.pydata.org/docs/getting_started/index.html)
- [NumPy for Beginners](https://numpy.org/doc/stable/user/absolute_beginners.html)

---

**Next class (23/1/26):** Fitting models to data, correlation

---
*NEURO-105 - MSc in Neurosciences, University of Crete*