# Comparison of Pearson and Spearman Correlation

| **Aspect**                | **Pearson Correlation**                                         | **Spearman Correlation**                                      |
|---------------------------|-----------------------------------------------------------------|---------------------------------------------------------------|
| **Definition**             | Measures the linear relationship between two continuous variables. | Measures the monotonic relationship between two variables, which can be either linear or non-linear. |
| **Assumption of Data**     | Assumes that the data is normally distributed.                 | Does not assume normal distribution. Can be used with ranked data. |
| **Type of Relationship**   | Measures only linear relationships (straight-line).            | Measures any monotonic relationship (either increasing or decreasing). |
| **Data Type**              | Requires continuous data and interval or ratio scale.         | Can be used with ordinal, interval, or ratio data. Can also handle tied ranks. |
| **Sensitivity to Outliers**| Sensitive to outliers, as outliers can significantly affect the correlation. | Less sensitive to outliers because it only considers the ranks of the values, not their actual values. |
| **Range**                  | Ranges from -1 to +1, where -1 indicates perfect negative linear correlation, 0 indicates no linear correlation, and +1 indicates perfect positive linear correlation. | Ranges from -1 to +1, where -1 indicates perfect negative monotonic correlation, 0 indicates no monotonic correlation, and +1 indicates perfect positive monotonic correlation. |
| **Calculation Method**     | Based on covariance between the variables. Uses actual data values. | Based on the ranks of the data values. Uses the differences between ranks. |
| **Interpretation**         | A higher positive or negative value indicates a stronger linear relationship. | A higher positive or negative value indicates a stronger monotonic relationship (increasing or decreasing). |

## When to Use Pearson Correlation:
- **When the relationship is linear**: Pearson correlation is appropriate when you believe there is a straight-line (linear) relationship between the variables.
- **When the data is continuous and normally distributed**: Pearson assumes that both variables are normally distributed and are measured on an interval or ratio scale.
- **When you need a measure of linear dependence**: Pearson gives you a quantitative measure of the degree of linear relationship between two variables.

## When to Use Spearman Correlation:
- **When the relationship is monotonic**: Use Spearman when you suspect a monotonic relationship (i.e., as one variable increases, the other variable either consistently increases or decreases, but not necessarily in a linear fashion).
- **When the data is not normally distributed**: Spearman correlation does not require the data to be normally distributed, making it more appropriate for skewed data.
- **When the data is ordinal**: If the variables are measured on an ordinal scale (e.g., rankings), Spearman correlation is more appropriate because it works with ranked data rather than raw values.
- **When there are outliers**: Spearman correlation is less sensitive to outliers because it uses ranks instead of actual data values. So, if your dataset contains significant outliers, Spearman is a better choice.

## Summary of Usage:

- **Use Pearson** when:
  - You need to measure the strength and direction of a linear relationship.
  - The data is continuous, normally distributed, and you are looking for a linear association.
  
- **Use Spearman** when:
  - You are dealing with non-linear but monotonic relationships.
  - The data is not normally distributed or is ordinal in nature.
  - You want a correlation measure that is less sensitive to outliers.

## Conclusion:

- **Pearson** is powerful for analyzing linear relationships and is the preferred method when assumptions (such as normality and linearity) are met.
- **Spearman** is more robust, suitable for non-linear relationships, and works better with non-parametric or ordinal data.
