<a href="https://www.kaggle.com/code/hassaneskikri/choosing-the-right-statistical-test-types?scriptVersionId=168272562" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
%%html
<style>
    *{
        font-family: 'Arial', sans-serif;
        align-item : center;
        justifiy-content:center;
        max-width : 1000px;
    }
    h1{
        color: #FFD700;
        border-bottom: 3px solid #FFD700;
        text-align:center;
        padding-bottom: 0.3em;
        font-size:bold;
    }
    h2{
        color:#2dd4bf;
        padding-bottom: 0.3em;
    }
    p, ol, ul {
        font-size: 18px;
        line-height: 1.5;
        color: #eee;
    }
    a {
        color: #d946ef;
        text-decoration: none;
    }
    a:hover {
        text-decoration: underline;
        color : #86198f;
    }
    img{
        display: flex;
        margin-left: auto;
        margin-right: auto;
        width: 700px;
        height: auto;
        text-align: center;
        border-radius: 15px;
    }
    
</style>


Statistical tests are used in hypothesis testing. They can be used to:

- determine whether a predictor variable has a statistically significant relationship with an outcome variable.
- estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

![image.png](attachment:7d776e8b-a51e-412b-9b06-be5fbac2c47b.png)

# When to perform a statistical test

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment, or through observations made using probability sampling methods.

For a statistical test to be valid, your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

- whether your data meets certain assumptions.
- the types of variables that you’re dealing with.


Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.


# Regression test

![image.png](attachment:7be09cea-6dd3-48e6-82c2-4b966521b630.png)

# Comparison tests

Comparison tests look for differences among group means. They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

![image.png](attachment:11bf960a-ce46-44ad-864f-9da0e088a234.png)

# Correlation tests

![image.png](attachment:3760dcdc-0f24-478d-9f31-a4fb4a1d7b26.png)

# Choosing a nonparametric test

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

![image.png](attachment:2ff4706c-d4ff-42c0-8235-732613cb3bf0.png)

# skewness 

Skewness is a measure of the asymmetry of a distribution. A distribution is asymmetrical when its left and right side are not mirror images.

A distribution can have right (or positive), left (or negative), or zero skewness. A right-skewed distribution is longer on the right side of its peak, and a left-skewed distribution is longer on the left side of its peak:

![image.png](attachment:ec769204-56cd-420d-8353-edffe3a2a881.png)

![image.png](attachment:b2f3dea1-9e95-4e3d-b14e-12821fc18124.png)

![image.png](attachment:5c861542-c8ed-469e-8797-c3b07ffc8889.png)

-   `Right skew: mean > median`

- `Zero skew: mean = median`

- `Left skew: mean < median`


![image.png](attachment:ca297851-c6c6-41a9-868f-faaea6bf46e3.png)

Real observations rarely have a Pearson’s median skewness of exactly 0. If your data has a value close to 0, you can consider it to have zero skew. 

# What to do if your data is skewed

One reason you might check if a distribution is skewed is to verify whether your data is appropriate for a certain statistical procedure. Many statistical procedures assume that variables or residuals are normally distributed. Skew is a common way that a distribution can differ from a normal distribution.

You generally have three choices if your statistical procedure requires a normal distribution and your data is skewed:

- `Do nothing.` Many statistical tests, including t tests, ANOVAs, and linear regressions, aren’t very sensitive to skewed data. Especially if the skew is mild or moderate, it may be best to ignore it.
- `Use a different model.` You may want to choose a model that doesn’t assume a normal distribution. Non-parametric tests or generalized linear models could be more appropriate for your data.
- `Transform the variable.` Another option is to transform a skewed variable so that it’s less skewed. “Transform” means to apply the same function to all the observations of a variable.

![image.png](attachment:3f3989a0-2669-45a5-8316-e5fb0b199968.png)

# Kurtosis?

Kurtosis is a measure of the tailedness of a distribution. Tailedness is how often outliers occur. Excess kurtosis is the tailedness of a distribution relative to a normal distribution.

- Distributions with medium kurtosis (medium tails) are `mesokurtic.`
- Distributions with low kurtosis (thin tails) are `platykurtic.`
- Distributions with high kurtosis (fat tails) are `leptokurtic.`

Tails are the tapering ends on either side of a distribution. They represent the probability or frequency of values that are extremely high or low compared to the mean. In other words, tails represent how often outliers occur.



![image.png](attachment:0543cef5-9d0c-486e-8cd3-295c3fca4622.png)

![image.png](attachment:48ad7458-d254-4bab-a77e-bbcd796d4d69.png)

# Correlation Vs Causation

## Correlation:

- What It Is: Correlation indicates a relationship or association between two variables. When one variable changes, the other tends to change as well; however, this doesn't imply that one variable causes the change in the other.
- Measurement: Correlation is quantified by a correlation coefficient (such as Pearson's r), which ranges from -1 to 1. A value closer to 1 or -1 indicates a strong relationship, while 0 indicates no relationship.
- Example: There's a correlation between ice cream sales and drowning incidents. As ice cream sales increase, drowning incidents also tend to increase. However, buying more ice cream doesn't cause more drownings. They're both related to warmer weather.

## Types of correlation

![image.png](attachment:70761ed2-a634-46c9-b299-04630555091a.png)


## Causation:

- What It Is: Causation indicates that one event is the result of the occurrence of the other event; there is a cause-effect relationship between variables.
- Establishing Causation: To determine causation, researchers must use experimental designs where they can control for other variables and directly observe the effect of changing one variable on another.
- Example: Taking antibiotics can lead to a decrease in bacterial infection. Here, the intake of antibiotics causes the reduction in infection.

# Resources

- [test statistics](![image.png](attachment:83825474-f0a0-45d1-aa50-dd9d25c2c9f6.png))
- [Skewness](https://www.scribbr.com/statistics/skewness/)
- [Kurtosis](https://www.scribbr.com/statistics/kurtosis/)
- [ corelation vs causation](https://www.scribbr.com/methodology/correlation-vs-causation/)
- [ correlation coefficient](https://www.scribbr.com/statistics/correlation-coefficient/)