## Descriptive Statistics

Descriptive statistics are used to summarize and describe the main features of a dataset. Here, we'll discuss several key measures:

### 1. Mean

We use np.mean(data) to calculate the average value of the dataset.

In [None]:
import numpy as np
mean = np.mean(data)

### 2. Median

The middle value of a set of numbers when they are arranged in ascending or descending order. If the count of numbers is even, the median is the average of the two middle numbers.
We use np.median(data) to find the middle value of the dataset.

In [None]:
median = np.median(data)

### 3. Mode

The most frequently occurring value in a set of numbers.
We use stats.mode(data) from the SciPy library to determine the most frequently occurring value.

In [None]:
mode = stats.mode(data)

### 4. Variance

A measure of the spread of the numbers. It is the average of the squared differences from the mean.
We use np.var(data) to compute the average of the squared differences from the mean.

In [None]:
variance = np.var(data)

### 5. Standard Deviation

The square root of the variance, indicating how much the numbers deviate from the mean. We use np.std(data) to calculate the square root of the variance.

In [None]:
std_dev = np.std(data)

### 6. Percentiles 

Values below which a certain percent of observations fall. For example, the 25th percentile is the value below which 25% of the observations may be found. We use np.percentile(data, 25) and np.percentile(data, 75) to find the 25th and 75th percentiles, respectively. The interquartile range (IQR) is the difference between these two percentiles.

In [None]:
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
iqr = percentile_75 - percentile_25

## Example

In [3]:
import numpy as np
from scipy import stats

# Generate some random data
np.random.seed(42)
data = np.random.normal(loc=50, scale=15, size=1000)  # Normally distributed data

# Mean: The average of a set of numbers
mean = np.mean(data)
print(f'Mean: {mean}')
print('\n')

# Median: The middle value of a set of numbers
median = np.median(data)
print(f'Median: {median}')
print('\n')

# Mode: The most frequently occurring value in a set
mode = stats.mode(data)
print(f'Mode: {mode.mode[0]} with count {mode.count[0]}')
print('\n')

# Variance: A measure of the spread of the numbers
variance = np.var(data)
print(f'Variance: {variance}')
print('\n')

# Standard Deviation: The square root of the variance
std_dev = np.std(data)
print(f'Standard Deviation: {std_dev}')
print('\n')

# Percentiles: Values below which a certain percent of observations fall
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
print(f'25th Percentile: {percentile_25}')
print(f'75th Percentile: {percentile_75}')
print('\n')

# Example of using percentiles to determine interquartile range (IQR)
iqr = percentile_75 - percentile_25
print(f'Interquartile Range (IQR): {iqr}')

Mean: 50.28998083733488


Median: 50.379509183523325


Mode: 1.380989898963911 with count 1


Variance: 215.52862268959137


Standard Deviation: 14.680893116210314


25th Percentile: 40.28614541806473
75th Percentile: 59.71915813209395


Interquartile Range (IQR): 19.433012714029225


  mode = stats.mode(data)
