# Python Statistics Fundamentals: How to Describe Your Data

In this tutorial, you’ll learn:

- What numerical quantities you can use to describe and summarize your datasets
- How to calculate descriptive statistics in pure Python
- How to get descriptive statistics with available Python libraries
- How to visualize your datasets

## Understanding Descriptive Statistics


Descriptive statistics is about describing and summarizing data. It uses two main approaches:

1. **The quantitative approach** describes and summarizes data numerically.
2. **The visual approach** illustrates data with charts, plots, histograms, and other graphs.

When you describe and summarize a single variable, you’re performing **univariate analysis.** 

When you search for statistical relationships among a pair of variables, you’re doing a **bivariate analysis**. 

Similarly, a **multivariate analysis** is concerned with multiple variables at once.

### Types of Measures


In this tutorial, you’ll learn about the following types of measures in descriptive statistics:

- **Central tendency** tells you about the centers of the data. Useful measures include the **mean, median, and mode**.
- **Variability** tells you about the spread of the data. Useful measures include **variance and standard deviation**.
- **Correlation or joint variability** tells you about the **relation between a pair of variables in a dataset**. Useful measures include **covariance and the correlation coefficient**


### Population and Samples


- a **population** refers to the entire group of items or elements of interest, but analyzing the whole population is often impractical. To address this, a sample, a smaller, representative
- **subset** of the population, is used. The sample should ideally reflect the key statistical characteristics of the population, enabling reliable conclusions to be drawn about the larger group.

### Outliers

**outlier** is a data point that significantly differs from the majority of the data in a sample or population. Outliers can arise due to various causes, including:

- **Natural variation in data**
- **Change in the behavior of the observed system**
- **Errors in data collection**



## Choosing Python Statistics Libraries


Here are some of the most popular Python libraries for statistical analysis and related tasks:

- **statistics**: A built-in Python library for descriptive statistics. Best suited for small datasets and when external libraries are unavailable.

- **NumPy**: A third-party library optimized for numerical computing and working with arrays (`ndarray`). It includes many routines for statistical analysis.

- **SciPy**: Built on NumPy, this library provides additional functionality for scientific computing, including `scipy.stats` for advanced statistical analysis.

- **pandas**: A third-party library designed for handling labeled data. It uses:
  - `Series` for one-dimensional data
  - `DataFrame` for two-dimensional data
  It integrates seamlessly with NumPy and SciPy.

- **Matplotlib**: A third-party library for data visualization that works well with NumPy, SciPy, and pandas.

### Integration Between Libraries
- `Series` and `DataFrame` objects from pandas can often replace NumPy arrays in statistical functions.
- You can extract raw data from pandas objects as NumPy arrays using `.values` or `.to_numpy()`.

## Getting Started With Python Statistics Libraries

https://realpython.com/python-statistics/


## Calculating Descriptive Statistics


In [1]:
import math
import statistics
import numpy as np
import scipy.stats
import pandas as pd

ImportError: dlopen(/opt/anaconda3/envs/my_lab_env/lib/python3.12/site-packages/scipy/linalg/_fblas.cpython-312-darwin.so, 0x0002): Library not loaded: @rpath/liblapack.3.dylib
  Referenced from: <763D791F-08C8-3505-BC1F-DFE78F8DC6CF> /opt/anaconda3/envs/my_lab_env/lib/python3.12/site-packages/scipy/linalg/_fblas.cpython-312-darwin.so
  Reason: tried: '/usr/local/opt/sqlite/lib/liblapack.3.dylib' (no such file), '/opt/homebrew/opt/instantclient-basic/liblapack.3.dylib' (no such file), '/usr/local/opt/sqlite/lib/liblapack.3.dylib' (no such file), '/opt/homebrew/opt/instantclient-basic/liblapack.3.dylib' (no such file), '/liblapack.3.dylib' (no such file), '/opt/anaconda3/envs/my_lab_env/lib/python3.12/site-packages/scipy/linalg/../../../../liblapack.3.dylib' (no such file), '/opt/anaconda3/envs/my_lab_env/lib/python3.12/site-packages/scipy/linalg/../../../../liblapack.3.dylib' (no such file), '/opt/anaconda3/envs/my_lab_env/bin/../lib/liblapack.3.dylib' (no such file), '/opt/anaconda3/envs/my_lab_env/bin/../lib/liblapack.3.dylib' (no such file), '/usr/local/lib/liblapack.3.dylib' (no such file), '/usr/lib/liblapack.3.dylib' (no such file, not in dyld cache)