### Hypothesis testing and AB testing 
- **Two Possible Hypothesis**
    - Null hypothesis ($H_{0}$): negative result to a test (e.g. A=B no difference)
    - Alternative hypothesis ($H_{1}$: positive result to a test (e.g. A

### T-test
- **Assumptions**: The t-test assumes that the data follows a normal distribution, the variances of the groups being compared are equal (in the case of independent samples), and the observations are independent.
- **Limitations**: The t-test relies on assumptions regarding the data distribution and homogeneity of variances.
- T-statistic: the coefficient for a predictor, divided by the standard error of the coefficient, given a metric to compare the importance of variables in the model. It is used to compare the means of two groups of data
- T-distribution: It is similar in shape to the standard normal distribution (bell-shaped), but it has thicker tails, which means it allows for more variability in the data
- A low t statistic is NOT significant
- Population std is unknown
- Useful to test samples with size less than 30 (for 30+ T-distibrution is indistibuishable with normal distribution)
- Compares the means of two groups (it can be used insread of a permutation test)
- Used to compare means between two groups and determine if there is a significant difference. It is often used when the sample sizes are small and the data are approximately normally distributed.

### F-test
- F-statistic: it is used in linear regression to compare the variation accounted for by the regression model to the overall variation in the data.
    - It is based on the ratio of the variance across group means to the variance due to residual error. The higher the ratio, the more statistically significant the result.

### Fisher's Exact-test
- When counts are extremely low the resampling procedure will yield more accurate p-values
- used to determine the significance of the association between two categorical variables in a 2x2 contingency table. It is typically employed when the sample sizes are small or when expected cell counts are low.

### Other tests
- Mann-Whitney U test: A non-parametric test used to compare the medians of two independent groups when the data are not normally distributed.
- Kruskal-Wallis test: A non-parametric test used to compare the medians among three or more independent groups when the data are not normally distributed.
- Wilcoxon signed-rank test: A non-parametric test used to compare two related groups when the data are not  normally distributed.

### ChiSquare (&chi;²)-test
- It can be usedin feature selection
- Used to determine whether there is a significant association between two categorical variables
- It compares the observed frequencies of the data with the frequencies that would be expected if the variables were independent
- Works better with large samples and the observations must be independent
- **Assumptions**: linear relationships between the variables
- &chi;² distribution: measures the extent to which results depart from the null expectation of independence
- &chi;² goodness-of-fit test: compares the observed frequencies in different categories with the expected frequencies under the null hypothesis. The test calculates a test statistic (chi-square statistic) and compares it to the critical value from the chi-square distribution to determine if the observed data significantly deviates from the expected distribution.
    - A low &chi;² value indicates that they closely follow the expected distribution
    - A high &chi;² value indicates they differ moderetly from what is expected
    - it is primarely used to examine whether two categorical variables are independent in influencing the test statistic
    - Used to examine the association between categorical variables. It determines if there is a significant difference between the observed and expected frequencies (using row * columns contigency tables)

### Permutation-test
- **Assumptions**: The permutation test is a non-parametric test that does not make any assumptions about the underlying data distribution. It is based on the concept of randomly permuting the observed data to create a null distribution under the null hypothesis.
- **Limitations**: The permutation test can be computationally intensive, especially for large datasets or complex study designs. It also requires careful consideration of the permutation scheme and may not be as straightforward to implement as the t-test.

### Multi-arm Bandit Test

In a traditional A/B test, the traffic is evenly split between two or more variations, and the performance of each variation is compared to determine the best-performing option. However, this approach can be suboptimal in situations where there is uncertainty about the performance of different variations or if the goal is to maximize the overall reward or outcome.

A multi-arm bandit test addresses this challenge by dynamically adjusting the traffic allocation to the different variations based on real-time feedback during the test. The test starts with an initial allocation of traffic and gradually adapts it based on the observed performance of each variation.


- Continuous distributions: for continuous variables, t, p(t) cannot be interpreted as probability, the  height of the curve of a particular value of t can be interpreted as how likely it is that we would observe that particular t
- Goodness of fit: is a statistical concept that measures how well an observed data set fits a theoretical or expected distribution. In a goodness-of-fit test, the null hypothesis assumes that the observed data conforms to the expected distribution. The alternative hypothesis suggests that there is a significant difference between the observed data and the expected distribution.
- Poisson distribution: model events per time period (events that occur at a constant rate).
- Bernoulli distribution: two possible outcomes
- Binomial distribution: is the frequency distribution of the number of successes (x) in a given number of trials (n) with specified probability (p) of sucess in each trial (events (N) that have two outcomes
- Exponential distribution: frequency distribution of the time or distance from one event to the next
- Weibull distribution: a generalized version of the exponential distribution in which the event rate is allowed to shift over time. A changing event rate over time (an increasing probability of device failure)
- Uniform distribution: all the n number of possible outcomes are equally likely

- The beta density function: it can be used for continuous variables that are restricted to between 0 and 1 (&alpha; and &beta; are parameters to control the shape of the density function and must be positive)
- The Gaussian density function: it is used in many applications and one reason is the case with which the gaussian pdf can be manipulated
- The effect of adding random variables to the model is that the output of the model, t, is now itself a random variable. In other words, there is no single value of $t_n$ for a particular $x_n$. As such, we cannot use the loss as means of optimizing W and $\sigma^2$
- **Framework for hypothesis test**
    - State clear what your variables are
    - State the null and alternative hypothesis
    - Decide upon a level of significant $\alpha$;
    - Compute a test statistic (Z, T, $\chi^2$, and F test)
    - Find the P value corresponding to your test statistic (left, right, or two tailed)
    - Form a conclusion: if $p<\alpha$ reject the null hypothesis 
- **Assumptions**
    - Minimum sample size (N)- is the sample size big enough?
    - Normality - does your data follows a normal distribution 
    - Linearity - consistant relationship between the independent variable and the dependent variable
    - Homogeneity of variance: similar levels of variance for all variable groups
- **Notes**
    - R^2: it is a good indicator of how well the dataset is fitting the model.
    - Variance: high variance -> further away from the mean. When dealing with vectors the concept of variance is generalized to a covariance matrix
    - Covariance: diagonal elements correspond to the variance of the individual elements of x. The off diagonal elements tells us to what extend different element of x co-vary (how dependent they are on one another)
        - linear relationship between two variables
        - the magnitude does NOT signify the strength of their relationships so only matters the sign, whether it is positive or negative which tells the relationship 
- Correlation: measures the relationship between two variables as well as its magnitude defined the strength between variables
- Pearson correlation: measures linear relationship and fails to capture non-linear relationship
- Spearman's Rank correlation: used for ordinal or continuous variables. Also, assess the strength and direction of the monotonic relationship between two variables.It is robust to outliers and can capture associations that may not be detected by Pearson correlation.
- Kendall Rank correlation: used for ordinal or continuous variables. Also, assess the strength and direction of the monotonic relationship between two variables. It does not assume linearity or normality in the data.
- Point Biserial correlation: one variable binary, one variable continuous. It captures both linear and non linear relationships
- Residuals/errors: difference between observed values and estimates
- Standard deviation (STD): describe how spreadout the values are. A low std means that the  number most numbers are close to the mean while high std means the values are spreadout over a wide range
- Mean absolute deviation (MAD): more robust to outliers
- Bias types
    - Sample bias: a sample that misrepresents the population
    - Bias: systematic error
    - Selection bias:  selectively choosing data
- deterministic process:  is a process where the outcome is entirely determined by the initial conditions and the rules that govern evolution of the system
- Stochastic process: it involves some degree of randomness and uncertainity in the outcome
- Covariance: measures the linear dependence between two random variables
    - Auto-Covariance (ACF):
        - Is a measure of linear dependence between two observations of a ts that are separated by specific lag COV($x_t$, $x_{t-k}$)
        - Is a useful measure in ts analysis as it provides information about the dependence structure of the series over time
- Auto-correlation: measures the overall correlation between a ts and its past values up to lag K and helps identify the order of AR(p) model
- Partial-Auto-correlation: measures the correlation between a ts and its past values at lag K after accounting (controlling) for the effects of intermediate lags
- non-stationary to stationary by geting the difference at each point
- stationary: variance and mean remain constant over time
- non-stationary: it has statistical properties that change over time such a trend, seasonality, or a change in variance
- Weak Stationarity: A time series is considered weakly stationary if its mean, variance, and autocovariance structure do not depend on time. In other words, the mean and variance of the series remain constant over time, and the autocovariance between any two observations depends only on the time lag between them. Weak stationarity allows for trends and seasonality in the data, as long as they are predictable and do not cause the statistical properties to change systematically
- Null hypothesis: it means that nothing is different. The confidence intervals would overlap because we are sampling to approximate the same value
- Alpha: the probability threshold of "unusualness" that chance results must surpass for actual outcomes to be deemmed statistically significant.
- P-value:
    - Is the likelihood of seeing data this extreme under the null hypothesis
    - the probability that, given a chance model, results as extremes as the oserved results could occur
    - given a chance model that embodies the null hypothesis, the p-value is the probability of obtaining results as unsual or extreme as the oserved results.
    - ex. the p-value is 0.308, which means that we would expect to achieve a result as extreme as this, or more extreme result, by random chance over 30% of chance
- Type I Error (False Positive): It is rejecting a true null hypothesis. In other words, concluding there is a significant effect or difference between groups when in fact there is none (mistakenly concluding an effect is real when it is due to chance)
    - Causes: small sample size, inadecuate study design, bias sampling, cofounding variables, 
                    measurement errors</li>
- Type II Error (False Negative): the result of your analysis says that there is no difference between the groups when there actually is a difference (mistakenly concluding an effect is due to chance when it is real) 
    - Causes: large sample size, small effect size (small differences may be statistically significant) 
- AIC: It can help assess the goodness of fit
- BIC: It can help assess the goodness of fit
- Causality:Ability to establish a causal link between two variables. It explores the idea that changes in one variable directly influences changes in another variable. In order to establish causality, there are three key criteria that need to be met:
    1. Association: there must be a statistically significant association or correlation
    2. Temporal order: changes in the cause variable must happen before changes in the effect variable
    3. Absence of confounding factors: confounding variables shouldn't be driving the observed relationship between the cause and effect variables. Controlling for confounding factors help ensure that any observed relationships is not spourious or coincidential
- Confounding variables: are factors that can affect the outcome of a study or experiment and create a false association between the variables being studied. These variables are often not the main focus of the study but can still influence the observed relationship between the independent variable and the dependent variable. Confounding variables can lead to incorrect or misleading conclusions if not properly accounted for in the analysis
    - to address confounding variables use various techniques such as randomization, matching, stratification, and statistical modeling to control or adjust for their effects.
- Anova:
    - It is based on F-statistic
    - It is a statistical procedure for analyzing the results of an experiment with multiple groups
    - it is the extension of similar procedures for the A/B test, used to assess whether the overall variation ammong groups is within the range of chance of variation
    - A useful outcome of ANOVA is the identification of variance components associated with group treatments, interactions, effects, and errors
    - Used to compare means among three or more groups. ANOVA determines if there are significant differences between the means of the groups.
- Practical Sampling
    - Probability Sampling: randomize sample selection (results that represent a whole population)
    - Simple random sampling: everyone in the population has an equal chance of being selected, most random sampling, to be used with small populations
    - Systematic sampling: used when you receive data in a series, select individual for a sample at a regular interval
    - Stratified sampling: makes sure the proportions of the sample are the same as the population, used when it is important to collect data on smaller group as well as larger ones within a population.
    - Cluster sampling: used when the population is divided into equal sub groups, can be used with other methods.
    - Non-probability sampling: sample selection methods without randomization (poor representation of the population)
- **Common distributions**
    - Normal distribution: 1) mean, mode, median are all the same 2) the curve is symmetrical
    - Uniform distribution: all values have equal chance of happening
    - Poisson distribution: 1) it will tell you the probability of something happening based on the number of times that something else has happened 2) example, a sale based on how many people click on your ad banner
    - Exponential distribution: 1) when a component of time is involved 2) example, how many minutes a person spends on a website compared to how likely they are to make a purchase 
- **Data shapes**
    - Skew is simple when the distribution is learning one way or the other
    - Negative skew (skewed left)
    - Positive skew (skewed right)
    - Normal (symmetrical unimodal)
    - Transformations: 1) are ways in which you can try to normalize your data or remove skew 2) they can't fix kurtosis, in this case look for outliers 
- **Types of studies**
    - Observational (taking no action to influence responses before hand)
        - Simple surveys
        - Count data
        - Data mining
    - Experimental studies (you change something and see if it influences the results)
        - repeated measures (giving multiple times but you give something before)
        - A/B testing (people are broken into two groups randomly and each group gets a different treatment)
        - Randomized control trial (similar to A/B testing but one group gets the treatment and the other doesn't) 
            - Experimental Design Steps
                1. Question
                2. Hypothesis
                3. Required variables (independent and dependent variable)
                4. Choose a measurement approach
                5. Select an analysis 

![image.png](attachment:image.png)

<h1>Hypothesis and A/B testing</h1>

<li>Minimizing the squared loss function is equivalent to the maximum likelihood if the noise is assumed to be Gaussian</li>
<li>&#x03C3;<sup>2</sup> is the variance of the noise incorporated into the model to capture effects that the deterministic part of the model cannot</li>