# Skewness & Kurtosis

<img src="Skewness%20and%20Kurtosis.png" alt="Alt text" style="width:600px;height:400px;">

Here are two statistical measures that describe the shape of a data distribution. One of the commonly used libraries for this purpose is scipy.

**stats.skew(data)** Skewness measures the asymetry of the pobability distribution of a real-valued random variable about its mean. The value itself can be positive or negative or even undefined. In a perfect normal distribution, the skewness is zero

**stats.kurtosis(data)** Kurtosis measures the tailedness of probability distribution. The standard comparison is with a normal distribution, which has a kurtosis of three.

Scipy computes the excess kurtosis, and a perfect normal distribution would have a value of zero since the calculation is kurtosis — 3

In [5]:
import scipy.stats as stats
from scipy.stats import kurtosis
import pandas as pd

In [3]:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Caluculation for Skewness
skewness = stats.skew(data)
print(f'Skewness: {skewness}')

# Calculation for Kurtosis
kurt_value = kurtosis(data, fisher=True)
print("Kurtosis value:", kurt_value)

Skewness: 0.0
Kurtosis value: -1.2300000000000002


## Pandas

In [8]:
data = {
    'values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
}
df = pd.DataFrame(data)

# Calculate skewness
skewness = df['values'].skew()
print(f'Skewness: {skewness}')

# Calculate kurtosis
kurtosis = df['values'].kurtosis()
print(f'Kurtosis: {kurtosis}')

Skewness: 0.0
Kurtosis: -1.1999999999999997


## Manual Check

In [7]:
def calculate_skewness_kurtosis(data):
    # Calculate the mean of the data
    mean = sum(data) / len(data)

    # Calculate the differences between data points and the mean
    diffs = [x - mean for x in data]

    # Calculate the squared differences
    squared_diffs = [d ** 2 for d in diffs]
    cubed_diffs = [d ** 3 for d in diffs]
    fourth_diffs = [d ** 4 for d in diffs]

    # Calculate the variance
    variance = sum(squared_diffs) / len(data)

    # Calculate the standard deviation
    std_dev = variance ** 0.5

    # Calculate the skewness
    skewness = sum(cubed_diffs) / len(data) / (std_dev ** 3)

    # Calculate the kurtosis
    kurtosis = sum(fourth_diffs) / len(data) / (std_dev ** 4)

    return skewness, kurtosis - 3  # Subtracting 3 gives the excess kurtosis, consistent with scipy's definition

# usage
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
skewness, kurtosis = calculate_skewness_kurtosis(data)

print(f'Skewness: {skewness}')
print(f'Kurtosis: {kurtosis}')

Skewness: 0.0
Kurtosis: -1.2299999999999998


## Skewness Calculation

$$
\text{Skewness} = \frac{n}{(n-1)(n-2)} \sum \left( \frac{{X_i - \bar{X}}}{s} \right)^3
$$

where:
- $n$ is the number of observations,
- $X_i$ is each individual observation,
- $\bar{X}$ is the mean of the observations,
- $s$ is the standard deviation of the observations.

## Kurtosis Calculation

$$
\text{Kurtosis} = \frac{\sum_{i=1}^{n} (X_i - \bar{X})^4 / n} {s^4}
$$

where:
- $X_i$ represents each individual data point,
- $\bar{X}$ is the mean of the data points,
- $s$ is the standard deviation of the data points,
- $n$ is the number of data points.

### Additional Information

### Skewness:

- **Understanding Symmetry:** Skewness measures the symmetry of a data distribution. If you’re assuming normal distribution in certain data, it’s crucial to check if the dataset is skewed because it impacts various statistical metrics and tests you might use.

- **Decision Making:** In finance, for instance, understanding skewness can help investors understand probable future outcomes. Negative skewness (left-tail) in investment returns would indicate a greater probability of large losses, while positive skewness (right-tail) would indicate a higher chance of large gains.

- **Data Transformation:** If you find your data is skewed, this might be a prompt to apply a transformation (like a log transformation) to make it more symmetric, especially if you’re going to perform techniques that assume a normal distribution.

### Kurtosis:

- **Tail Risk:** Kurtosis helps you understand the extremities in the data by measuring the “tailedness.” High kurtosis (‘Leptokurtic’ data) indicates a distribution with tail data exceeding the tails of the normal distribution. This means there are more chances of outliers, indicating a “heavy-tailed” distribution and potentially higher risk in financial contexts.

- **Volatility Estimation:** Especially in finance, kurtosis is critical to assess market volatility. Higher kurtosis indicates more frequent significant market movements than expected for a normal distribution.

- **Statistical Inference:** Kurtosis affects various aspects of statistical data modeling and hypothesis testing. If you’re involved in advanced predictive modeling, knowing the kurtosis can help you better understand the behavior of your estimation procedures.

- **Descriptive Analysis:** For a detailed analysis of data, kurtosis helps in understanding whether outliers are likely due to the ‘peaky’ nature of the distribution.


Both skewness and kurtosis are essential for understanding your data more deeply and are crucial for certain advanced statistical models and methodologies. They help diagnose the characteristics of your distribution and the validity of the assumptions you might make based on it.