# Section 4 - Measures of central tendency, asymmetry, and variability
-------------------------------------------------------------------------------------------------------------------------

# Univariate measures:
# The main measures of _central tendency_ - mean, median, and mode:
(There is no best measure between mean, median, and mode, but they should be used together rather than independently)

- ## _Mean_ - a simple average, but easily affected by _outliers_ and is _not_ enough to make definite conclusions
    - ## _Population_ mean:
        - ## $\mu = \frac{\sum_{i=1}^{N}x_i}{N}$
    - ## _Sample_ mean:
        - ## $\bar{x} = \frac{\sum_{i=1}^{n}x_i}{n}$


- ## _Median_ - the middle number in an ordered dataset
    - ## $Median =$ the number at position $\frac{n+1}{2}$ in an ordered list of $n$ observations


- ## _Mode_ - the value that occurs most often
    - ## Sometimes there is _no mode_
    - ## Sometimes there may be _more than one mode_; (up to 2-3 modes are tolerable)
-------------------------------------------------------------------------------------------------------------------------

# Measures of _asymmetry_ - measuring _skewness_:

- ## _Skewness_
    - ## Indicates whether the data is concentrated on one side
        - _**Positive (right) skew**_ - outliers / tail favors the right side; **median < mean**
        - _**Zero skew**_ - the distribution is symmetrical; **mean = median = mode**
        - _**Negative (left) skew**_ - outliers / tail favors the left side; **mean < median**
    - ## $Sample$ $skewness = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3}{\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2}^3}$
-------------------------------------------------------------------------------------------------------------------------

# Measures of _variability_ - variance, standard deviation, and coefficient of variation:
(There are different formulas for sample and population)

- ## _Variance_
    - ## _Population_ variance:
        - ## $\sigma^2 = \frac{\sum_{i=1}^{N}(x_i-\mu)^2}{N}$
    - ## _Sample_ variance:
        - ## $s^2 = \frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n-1}$
            - ## A lower denominator (relative to the denominator in the _Population_ variance formula) yields a greater _potential_ variance due to the dataset ony being a sample from a larger population

- ## _Standard deviation_ - the most common measure of variability for a _single_ dataset
(Comparing the _**standard deviations**_ of two (2) different datasets is **meaningless**)
    - ## _Population_ standard deviation:
        - ## $\sigma = \sqrt{\sigma^2}$
    - ## _Sample_ standard deviation:
        - ## $s = \sqrt{s^2}$

- ## _Coefficient of variation_ (CV) - (relative standard deviation) - does _not_ have a unit of measurement
(Comparing the _**coefficients of variation**_ of two (2) or more different datasets **has meaning**)
    - ## $CV = \frac{Stdev}{mean}$
    - ## _Population_ CV
        - ## $c_v = \frac{\sigma}{\mu}$
    - ## _Sample_ CV
        - ## $\hat{c}_v = \frac{s}{\bar{x}}$
-------------------------------------------------------------------------------------------------------------------------

# Multivariate measures (measures of relationship):
# Calculating and understanding _covariance_:
(The main statistic to measure the correlation between two (2) variables is called _covariance_.  It can be very small (close to zero (0) or very large (in the millions, etc.))

- ## _Covariance_
    - ## Covariance may be:
        - ## > 0
        - ## = 0
        - ## < 0
    - ## _Population_ covariance
        - ## $\sigma_{xy} = \frac{\sum_{i=1}^{N}(x_i-\mu_x)*(y_i-\mu_y)}{N}$
    - ## _Sample_ covariance
        - ## $s_{xy} = \frac{\sum_{i=1}^{n}(x_i-\bar{x})*(y_i-\bar{y})}{n-1}$


# _Correlation coefficient_:
(Correlation adjusts covariance, so that the relationship between the two (2) variables becomes easy and intuitive to interpret.)

- ## _Correlation coefficient_ - must be between -1 and 1 ( -1 $\leq$ Correlation coefficient $\leq$ 1 )
    - If Correlation coeff. = 1 (_**Perfect positive correlation**_)
    - If Correlation coeff. = 0 (_**Absolutely independent variables**_)
    - If Correlation coeff. = -1 (_**Perfect negative correlation**_)
    - If Correlation coeff. = [-1,0) (_**Imperfect negative correlation**_)

    - ## $corrcoeff = \frac{covariance(x,y)}{Stdev(x)*Stdev(y)}$ = $\frac{covariance(y,x)}{Stdev(y)*Stdev(x)}$
    - ## _Population_ correlation coefficient
        - ## $corrcoeff_{pop} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}$
    - ## _Sample_ correlation coefficient
        - ## $corrcoeff_{smpl} = \frac{s_{xy}}{s_xs_y}$
        
# _Causality_:
- Important to understand the direction of _**causal relationships**_
- _Correlation_ _**does not**_ imply _causation_