# Technical Introduction to Statistics 

This notebook will cover the fundamental statistical operations and how to utilize these concepts in a format that is *'data science ready'*

In [3]:
# Install pandas 
import pandas as pd

Sample data for testing purposes:

In [4]:
# Creating a simple dataset using a Python dictionary
data = {
    'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Score': [85, 90, 78, 92, 88]
}

# Convert the dictionary into a DataFrame (table)
df = pd.DataFrame(data)

# Display the table
print(df)

   Student  Score
0    Alice     85
1      Bob     90
2  Charlie     78
3    David     92
4      Eva     88


### Descriptive statistics

This is the branch of statistics that summarizes and describes the main features of a dataset so that we can get an idea of what the data looks like. 

- **Mean**

In [5]:
# Mean

mean_score = df['Score'].mean()
print("Mean Score:", mean_score)

Mean Score: 86.6


- **Median** \
Sorts the scores and picks the middle one. Useful when your data has outliers.

In [7]:
# Median

median_score = df['Score'].median()
print("Median Score:", median_score)

Median Score: 88.0


- **Mode**

In [8]:
# Mode

mode_score = df['Score'].mode()
print("Mode Score:", mode_score.tolist())

Mode Score: [78, 85, 88, 90, 92]


- **Standard Deviation** \
How spread out the data is

  - A low standard deviation means most values are close to the mean → consistent, stable data.
  - A high standard deviation means values are spread out over a wider range → more variability, less predictable.

In [9]:
# Standard Deviation

std_score = df['Score'].std()
print("Standard Deviation:", std_score)

Standard Deviation: 5.458937625582473


- **Min, Max, and Range**

In [10]:
# Minimum, Maximum and Range

min_score = df['Score'].min()
max_score = df['Score'].max()
range_score = max_score - min_score

print("Min:", min_score, "| Max:", max_score, "| Range:", range_score)

Min: 78 | Max: 92 | Range: 14
