<a href="https://colab.research.google.com/github/Chood16/DSCI222/blob/main/lectures/(5)_Math_Stats_API_Reference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Common Mathematical & Statistical Routines in Python



## What is an API reference?

An API reference in data science is a structured technical document that describes how to use a library, framework, or service programmatically. We've actually already seen one of these for [NumPy](https://numpy.org/doc/stable/reference/)

It usually includes:

* Available functions, classes, and methods (e.g., pandas.DataFrame.groupby)

* Parameters/arguments (with their types, default values, and constraints)

* Return types and structures (e.g., NumPy arrays, Pandas Series)

* Usage examples (code snippets showing typical workflows)

* Error conditions and exceptions

* The collection of items explained in this reference is what an API (Application Programming Interface) is

Python’s [**Standard Library**](https://docs.python.org/3/library/index.html) is huge — it’s basically a toolbox that ships with Python.
Here are some of the most commonly used (and handy) modules, grouped by category:

## Math & Numbers
- [**math**](https://docs.python.org/3/library/math.html) → basic math functions (`sqrt`, `log`, `sin`, etc.)
- [**statistics**](https://docs.python.org/3/library/statistics.html) → `mean`, `median`, `stdev`, etc.
- **random** → pseudorandom numbers

## Files & OS
- **os** → operating system interfaces (paths, environment variables)

## Dates & Time
- **datetime** → date/time objects and arithmetic
- **time** → timestamps, sleep, performance timing
- **calendar** → working with months, weekdays, leap years

## API Reference Categories for Math and Statistics
A Python API reference is like the official “instruction manual” for a library, module, or framework.

For math and statistics, we have:

- **Descriptive statistics** (mean, median, variance, etc.)
- **Mathematical transforms** (log, exp, trig)
- **Linear algebra** (dot products, matrix ops)
- **Probability distributions** (PDFs, CDFs, random sampling)
- **Hypothesis testing** (t-tests, chi-square, etc.)

## Mathematical Functions
- **NumPy:** Vectorized math → `np.log`, `np.exp`, `np.sqrt`, `np.sin`, etc.
- **math:** Scalar math, same names → `math.log`, `math.exp`
- **Tip:** Use NumPy for arrays; `math` for single values.

## Descriptive Stats API Map

| Task    | NumPy         | Pandas      | `statistics`     | SciPy                |
| ------- | ------------- | ----------- | ---------------- | -------------------- |
| Mean    | `np.mean()`   | `pd.mean()`   | `stats.mean()`   | —                    |
| Std Dev | `np.std()`    | `pd.std()`    | `stats.stdev()`  | —                    |
| Median  | `np.median()` | `pd.median()` | `stats.median()` | —                    |
| Mode    | —             | `pd.mode()`   | `stats.mode()`   | `scipy.stats.mode()` |


In [None]:
import math
import statistics
import numpy as np
import pandas as pd
from scipy import stats, optimize, integrate, linalg

# SciPy (scientific python) is very robust so we typically only import submodules
# We will build on scipy later when we learn about Scikit-Learn

# Python → NumPy → SciPy → Scikit-Learn



Use cases for built-in Statistics

In [None]:
# Example dataset: exam scores
scores = [88, 92, 79, 93, 85, 90, 76, 95, 89, 84]

# Central tendency
# notice our variable names aren't the methods
mean_val = statistics.mean(scores)
median_val = statistics.median(scores)
mode_val = statistics.mode(scores)

# Spread
stdev_val = statistics.stdev(scores)
variance_val = statistics.variance(scores)

# Additional
harmonic_mean_val = statistics.harmonic_mean(scores)
geometric_mean_val = statistics.geometric_mean(scores)

print(f"Mean: {mean_val}")
print(f"Median: {median_val}")
print(f"Mode: {mode_val}")
print(f"Standard Deviation: {stdev_val}")
print(f"Variance: {variance_val}")
print(f"Harmonic Mean: {harmonic_mean_val}")
print(f"Geometric Mean: {geometric_mean_val}")

Mean: 87.1
Median: 88.5
Mode: 88
Standard Deviation: 6.118278625016463
Variance: 37.43333333333333
Harmonic Mean: 86.69653610492091
Geometric Mean: 86.90101449144211


## Key Takeaways
* Know the overlapping APIs — choose the right tool for the job
* statistics is quick for lightweight scalar ops
* SciPy fills gaps in NumPy/Pandas
* SciPy will be built on later with

* Your toolkit is only as good as your familiarity with its API

## When to Use Which
- **`statistics` and `math` (stdlib):** Lightweight scalar operations
- **NumPy:** High-performance math/statistics on arrays
- **Pandas:** Descriptive stats directly on Series/DataFrames
- **SciPy:** Advanced statistics, probability models, hypothesis testing
